[GitHub] spark pull request #22042: [SPARK-25005][SS]Support non-consecutive offsets ...

tdas Sun, 12 Aug 2018 19:16:39 -0700

Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22042#discussion_r209479417
  
    --- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
 ---
    @@ -251,32 +274,53 @@ private[kafka010] case class InternalKafkaConsumer(
           untilOffset: Long,
           pollTimeoutMs: Long,
           failOnDataLoss: Boolean): ConsumerRecord[Array[Byte], Array[Byte]] = 
{
    -    if (offset != nextOffsetInFetchedData || !fetchedData.hasNext()) {
    -      // This is the first fetch, or the last pre-fetched data has been 
drained.
    +    if (offset != nextOffsetInFetchedData) {
    +      // This is the first fetch, or the fetched data has been reset.
           // Seek to the offset because we may call seekToBeginning or 
seekToEnd before this.
           seek(offset)
           poll(pollTimeoutMs)
    +    } else if (!fetchedData.hasNext) {
    +      // The last pre-fetched data has been drained.
    +      if (offset < offsetAfterPoll) {
    +        // Offsets in [offset, offsetAfterPoll) are missing. We should 
skip them.
    +        resetFetchedData()
    +        throw new MissingOffsetException(offset, offsetAfterPoll)
    --- End diff --
    
    So MissingOffsetRange is only used to signal that some offset may be 
missing due to control messages and nothing else. And the higher function (i.e. 
`get`) just handles it by resetting the fetched offsets. Why not let this 
`fetchData` method handle the situation instead of creating a new exception 
just for this?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #22042: [SPARK-25005][SS]Support non-consecutive offsets ...

Reply via email to