Github user tdas commented on a diff in the pull request:
https://github.com/apache/spark/pull/22042#discussion_r211801968
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
---
@@ -250,33 +341,39 @@ private[kafka010] case class InternalKafkaConsumer(
offset: Long,
untilOffset: Long,
pollTimeoutMs: Long,
- failOnDataLoss: Boolean): ConsumerRecord[Array[Byte], Array[Byte]] =
{
- if (offset != nextOffsetInFetchedData || !fetchedData.hasNext()) {
- // This is the first fetch, or the last pre-fetched data has been
drained.
+ failOnDataLoss: Boolean): FetchedRecord = {
+ if (offset != fetchedData.nextOffsetInFetchedData) {
+ // This is the first fetch, or the fetched data has been reset.
// Seek to the offset because we may call seekToBeginning or
seekToEnd before this.
- seek(offset)
- poll(pollTimeoutMs)
- }
-
- if (!fetchedData.hasNext()) {
- // We cannot fetch anything after `poll`. Two possible cases:
- // - `offset` is out of range so that Kafka returns nothing. Just
throw
- // `OffsetOutOfRangeException` to let the caller handle it.
- // - Cannot fetch any data before timeout. TimeoutException will be
thrown.
- val range = getAvailableOffsetRange()
- if (offset < range.earliest || offset >= range.latest) {
- throw new OffsetOutOfRangeException(
- Map(topicPartition -> java.lang.Long.valueOf(offset)).asJava)
+ poll(offset, pollTimeoutMs)
+ } else if (!fetchedData.hasNext) {
+ // The last pre-fetched data has been drained.
--- End diff --
nit: I was confused with whether the above comment was for the `else if`
above it or for the `if` below it. Maybe inline it with the `else if`. Or leave
a line after it, before the `if` below.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]