gaborgsomogyi opened a new pull request #29131:
URL: https://github.com/apache/spark/pull/29131


   ### What changes were proposed in this pull request?
   [KAFKA-7703](https://issues.apache.org/jira/browse/KAFKA-7703) has been 
discovered and a workaround has been added in SPARK-26267. At that time Spark 
was using 2.0.0 `kafka-clients` library and the issue was resolved in 2.3.0. 
Now Spark uses 2.5.0 `kafka-clients` library which allows us to remove the 
workaround. SPARK-26267 has added multiple things to address this issue:
   * Called `position` right after `poll` function which blocks until `poll` 
result is available
   * Introduced a safety logic to make sure positions seen in previous batch 
are smaller or equals to the newly acquired offsets from Kafka. If this 
assumption not applies retry happens.
   
   To be on the safe side and detect bugs similar to 
[KAFKA-7703](https://issues.apache.org/jira/browse/KAFKA-7703) only the first 
change is removed in this PR. This means if any similar issue may appear retry 
still happens to correct latest offsets.
   
   ### Why are the changes needed?
   [KAFKA-7703](https://issues.apache.org/jira/browse/KAFKA-7703) resolved and 
no workaround needed.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   * Existing unit tests.
   * Crafted a special test with sleeps 
[here](https://github.com/gaborgsomogyi/kafka/commit/466cbbf9cc09c39390409b7675bc1e48bc265e5c).
   This is the 2.5.0 adaptation of @zsxwing reproduction code created 
[here](https://github.com/zsxwing/kafka/commit/4e1aa11bfa99a38ac1e2cb0872c055db56b33246).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to