jose-torres commented on a change in pull request #23749: [SPARK-26841][SQL]
Kafka timestamp pushdown
URL: https://github.com/apache/spark/pull/23749#discussion_r266593220
##########
File path:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaRelation.scala
##########
@@ -90,10 +94,12 @@ private[kafka010] class KafkaRelation(
// Calculate offset ranges
val offsetRanges = untilPartitionOffsets.keySet.map { tp =>
val fromOffset = fromPartitionOffsets.getOrElse(tp,
- // This should not happen since topicPartitions contains all
partitions not in
- // fromPartitionOffsets
- throw new IllegalStateException(s"$tp doesn't have a from offset"))
- val untilOffset = untilPartitionOffsets(tp)
+ // This should not happen since topicPartitions contains all
partitions not in
+ // fromPartitionOffsets
+ throw new IllegalStateException(s"$tp doesn't have a from offset")
+ }
+ var untilOffset = untilPartitionOffsets(tp)
+ untilOffset = if (areOffsetsInLine(fromOffset, untilOffset)) untilOffset
else fromOffset
Review comment:
I suppose option 2 really is the only good choice here. But let's add a
warning log for this case, saying what the original range was and what user
predicates made us clamp it to empty set.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]