jose-torres commented on a change in pull request #23749: [SPARK-26841][SQL]
Kafka timestamp pushdown
URL: https://github.com/apache/spark/pull/23749#discussion_r266096410
##########
File path:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaRelation.scala
##########
@@ -90,10 +94,12 @@ private[kafka010] class KafkaRelation(
// Calculate offset ranges
val offsetRanges = untilPartitionOffsets.keySet.map { tp =>
val fromOffset = fromPartitionOffsets.getOrElse(tp,
- // This should not happen since topicPartitions contains all
partitions not in
- // fromPartitionOffsets
- throw new IllegalStateException(s"$tp doesn't have a from offset"))
- val untilOffset = untilPartitionOffsets(tp)
+ // This should not happen since topicPartitions contains all
partitions not in
+ // fromPartitionOffsets
+ throw new IllegalStateException(s"$tp doesn't have a from offset")
+ }
+ var untilOffset = untilPartitionOffsets(tp)
+ untilOffset = if (areOffsetsInLine(fromOffset, untilOffset)) untilOffset
else fromOffset
Review comment:
This doesn't seem safe. We should avoid generating nonsensical ranges in the
first place, rather than generating them and then silently clamping them down.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]