Github user tdas commented on a diff in the pull request:
https://github.com/apache/spark/pull/22042#discussion_r211801632
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
---
@@ -346,11 +437,40 @@ private[kafka010] case class InternalKafkaConsumer(
consumer.seek(topicPartition, offset)
}
- private def poll(pollTimeoutMs: Long): Unit = {
+ /**
+ * Poll messages from Kafka starting from `offset` and set `fetchedData`
and `offsetAfterPoll`.
+ * `fetchedData` may be empty if the Kafka fetches some messages but all
of them are not visible
+ * messages (either transaction messages, or aborted messages when
`isolation.level` is
+ * `read_committed`).
+ *
+ * @throws OffsetOutOfRangeException if `offset` is out of range.
+ * @throws TimeoutException if the consumer position is not changed
after polling. It means the
+ * consumer polls nothing before timeout.
+ */
+ private def poll(offset: Long, pollTimeoutMs: Long): Unit = {
--- End diff --
Maybe rename this method to be consistent with that it does .... fetch
data.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]