viirya commented on a change in pull request #29729:
URL: https://github.com/apache/spark/pull/29729#discussion_r499168473
##########
File path:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala
##########
@@ -46,39 +49,40 @@ import org.apache.spark.util.{UninterruptibleThread,
UninterruptibleThreadRunner
private[kafka010] class KafkaOffsetReader(
consumerStrategy: ConsumerStrategy,
val driverKafkaParams: ju.Map[String, Object],
- readerOptions: CaseInsensitiveMap[String],
- driverGroupIdPrefix: String) extends Logging {
+ readerOptions: CaseInsensitiveMap[String]) extends Logging {
/**
- * [[UninterruptibleThreadRunner]] ensures that all [[KafkaConsumer]]
communication called in an
+ * [[UninterruptibleThreadRunner]] ensures that all Kafka communication
called in an
* [[UninterruptibleThread]]. In the case of streaming queries, we are
already running in an
* [[UninterruptibleThread]], however for batch mode this is not the case.
*/
val uninterruptibleThreadRunner = new UninterruptibleThreadRunner("Kafka
Offset Reader")
- /**
- * Place [[groupId]] and [[nextId]] here so that they are initialized before
any consumer is
- * created -- see SPARK-19564.
- */
- private var groupId: String = null
- private var nextId = 0
-
- /**
- * A KafkaConsumer used in the driver to query the latest Kafka offsets.
This only queries the
- * offsets and never commits them.
- */
- @volatile protected var _consumer: Consumer[Array[Byte], Array[Byte]] = null
+ @volatile protected var _admin: Admin = null
- protected def consumer: Consumer[Array[Byte], Array[Byte]] = synchronized {
+ protected def admin: Admin = synchronized {
assert(Thread.currentThread().isInstanceOf[UninterruptibleThread])
- if (_consumer == null) {
- val newKafkaParams = new ju.HashMap[String, Object](driverKafkaParams)
- if (driverKafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG) == null) {
- newKafkaParams.put(ConsumerConfig.GROUP_ID_CONFIG, nextGroupId())
Review comment:
Hmm, this means we can't use group id at all? What it could impact to
our end users? Is this a breaking change?
##########
File path:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala
##########
@@ -563,12 +526,6 @@ private[kafka010] class KafkaOffsetReader(
&& !Thread.currentThread().isInterrupted) {
Thread.currentThread match {
case ut: UninterruptibleThread =>
- // "KafkaConsumer.poll" may hang forever if the thread is
interrupted (E.g., the query
- // is stopped)(KAFKA-1894). Hence, we just make sure we don't
interrupt it.
- //
- // If the broker addresses are wrong, or Kafka cluster is down,
"KafkaConsumer.poll" may
- // hang forever as well. This cannot be resolved in KafkaSource
until Kafka fixes the
- // issue.
Review comment:
Hmm, should we leave comment for this too? At least existing comment
explains why we need uninterruptible behavior here. Removing this makes us hard
to understand the reason in the future.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]