[GitHub] [spark] gaborgsomogyi commented on a change in pull request #29729: [SPARK-32032][SS] Avoid infinite wait in driver because of KafkaConsumer.poll(long) API

GitBox Tue, 06 Oct 2020 01:38:15 -0700


gaborgsomogyi commented on a change in pull request #29729:
URL: https://github.com/apache/spark/pull/29729#discussion_r500102479




##########
File path: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala
##########
@@ -46,39 +49,40 @@ import org.apache.spark.util.{UninterruptibleThread, 
UninterruptibleThreadRunner
 private[kafka010] class KafkaOffsetReader(
     consumerStrategy: ConsumerStrategy,
     val driverKafkaParams: ju.Map[String, Object],
-    readerOptions: CaseInsensitiveMap[String],
-    driverGroupIdPrefix: String) extends Logging {
+    readerOptions: CaseInsensitiveMap[String]) extends Logging {
 
   /**
-   * [[UninterruptibleThreadRunner]] ensures that all [[KafkaConsumer]] 
communication called in an
+   * [[UninterruptibleThreadRunner]] ensures that all Kafka communication 
called in an
    * [[UninterruptibleThread]]. In the case of streaming queries, we are 
already running in an
    * [[UninterruptibleThread]], however for batch mode this is not the case.
    */
   val uninterruptibleThreadRunner = new UninterruptibleThreadRunner("Kafka 
Offset Reader")
 
-  /**
-   * Place [[groupId]] and [[nextId]] here so that they are initialized before 
any consumer is
-   * created -- see SPARK-19564.
-   */
-  private var groupId: String = null
-  private var nextId = 0
-
-  /**
-   * A KafkaConsumer used in the driver to query the latest Kafka offsets. 
This only queries the
-   * offsets and never commits them.
-   */
-  @volatile protected var _consumer: Consumer[Array[Byte], Array[Byte]] = null
+  @volatile protected var _admin: Admin = null
 
-  protected def consumer: Consumer[Array[Byte], Array[Byte]] = synchronized {
+  protected def admin: Admin = synchronized {
     assert(Thread.currentThread().isInstanceOf[UninterruptibleThread])
-    if (_consumer == null) {
-      val newKafkaParams = new ju.HashMap[String, Object](driverKafkaParams)
-      if (driverKafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG) == null) {
-        newKafkaParams.put(ConsumerConfig.GROUP_ID_CONFIG, nextGroupId())

Review comment:
       > Is it easy to switch from group.id authorization to topic based 
authorization?
   
   It's easy, one need to use the topic based ACL on broker.
   
   > Is there possible use case that topic based authorization cannot fully 
replace group.id authorization?
   
   I can't really come up with such a use-case but this doesn't mean there is 
no such. But one thing is for sure: with the actual Kafka API after I've tried 
many other possibilities this one is the best (spent on this net a month or 
so). The price here is that the `group.id` authorization must be converted to 
topic based.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] gaborgsomogyi commented on a change in pull request #29729: [SPARK-32032][SS] Avoid infinite wait in driver because of KafkaConsumer.poll(long) API

Reply via email to