Github user tdas commented on a diff in the pull request:
https://github.com/apache/spark/pull/22042#discussion_r209475048
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
---
@@ -251,32 +274,53 @@ private[kafka010] case class InternalKafkaConsumer(
untilOffset: Long,
pollTimeoutMs: Long,
failOnDataLoss: Boolean): ConsumerRecord[Array[Byte], Array[Byte]] =
{
- if (offset != nextOffsetInFetchedData || !fetchedData.hasNext()) {
- // This is the first fetch, or the last pre-fetched data has been
drained.
+ if (offset != nextOffsetInFetchedData) {
+ // This is the first fetch, or the fetched data has been reset.
// Seek to the offset because we may call seekToBeginning or
seekToEnd before this.
seek(offset)
poll(pollTimeoutMs)
+ } else if (!fetchedData.hasNext) {
+ // The last pre-fetched data has been drained.
+ if (offset < offsetAfterPoll) {
--- End diff --
Its hard to understand this condition because it hard to understand what
offsetAfterPoll means? Does it refer to the offset that will be fetched next by
the KafkaConsumer?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]