leesf created HUDI-292:
--------------------------

             Summary: Consume more entries from kafka than specified 
sourceLimit.
                 Key: HUDI-292
                 URL: https://issues.apache.org/jira/browse/HUDI-292
             Project: Apache Hudi (incubating)
          Issue Type: Improvement
          Components: Utilities
            Reporter: leesf
            Assignee: leesf
             Fix For: 0.5.1


When _CheckpointUtils#computeOffsetRanges_ for consuming kafka messges. 

Given 
topic = "test",
fromOffsets(partition -> offset pair) = (0 -> 0), (1 -> 0), (2 -> 0), (3 -> 0), 
(4 -> 0),
toOffsets = (0, 100), (1, 1000), (2, 1000), (3, 1000), (4, 1000),
numEvents = 1001.

The output of _CheckpointUtils#computesOffsetRanges_ is  

OffsetRange(topic: 'test', partition: 0, range: [0 -> 100])
OffsetRange(topic: 'test', partition: 1, range: [0 -> 226])
OffsetRange(topic: 'test', partition: 2, range: [0 -> 226])
OffsetRange(topic: 'test', partition: 3, range: [0 -> 226])
OffsetRange(topic: 'test', partition: 4, range: [0 -> 226])

Total count is 1004(100 + 266 * 4), more than 1001, and thus consume more 
entries from kafka  than specified 1001.

CC [~vinoth]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to