[GitHub] [kafka] hachikuji commented on a diff in pull request #11969: KAFKA-13649: Implement early.start.listeners and fix StandardAuthorizer loading [WIP]

GitBox Wed, 04 May 2022 16:23:17 -0700


hachikuji commented on code in PR #11969:
URL: https://github.com/apache/kafka/pull/11969#discussion_r865430367



##########
core/src/main/scala/kafka/network/SocketServer.scala:
##########
@@ -1864,6 +1780,18 @@ class ConnectionQuotas(config: KafkaConfig, time: Time, 
metrics: Metrics) extend
       sensor
     }
   }
+
+  /**
+   * Close `channel` and decrement the connection count.
+   */
+  def closeChannel(listenerName: ListenerName, channel: SocketChannel): Unit = 
{
+    if (channel != null) {
+      debug(s"Closing connection from 
${channel.socket.getRemoteSocketAddress}")
+      dec(listenerName, channel.socket.getInetAddress)
+      closeSocket(channel, this)

Review Comment:
   It's surprising to find this in `ListenerConnectionQuota`, also that we end 
up with a different logger. Is there any way we can pull it back to where it 
was? For example, maybe we could generalize `closeSocket`:
   
   ```scala
     def closeSocket(
       listenerName: ListenerName,
       channel: SocketChannel,
       connectionQuota: ConnectionQuotas,
       logging: Logging    
     )
   ```
   Or maybe we can stick it into a trait so that we can get rid of the 
`logging` parameter.



##########
core/src/main/scala/kafka/network/SocketServer.scala:
##########
@@ -681,24 +580,27 @@ private[kafka] abstract class Acceptor(val socketServer: 
SocketServer,
   private val blockedPercentMeter = 
newMeter(blockedPercentMeterMetricName,"blocked time", TimeUnit.NANOSECONDS)
   private var currentProcessorIndex = 0
   private[network] val throttledSockets = new 
mutable.PriorityQueue[DelayedCloseSocket]()
+  private var started = false
+  private[network] val startFuture = new CompletableFuture[Void]()
 
-  private[network] case class DelayedCloseSocket(socket: SocketChannel, 
endThrottleTimeMs: Long) extends Ordered[DelayedCloseSocket] {
-    override def compare(that: DelayedCloseSocket): Int = endThrottleTimeMs 
compare that.endThrottleTimeMs
-  }
+  val thread = KafkaThread.nonDaemon(
+    
s"${threadPrefix()}-kafka-socket-acceptor-${endPoint.listenerName}-${endPoint.securityProtocol}-${endPoint.port}",
+    this)
 
-  private[network] def startProcessors(): Unit = synchronized {
-    if (!processorsStarted.getAndSet(true)) {
-      startProcessors(processors)
+  startFuture.thenRun(() => synchronized {
+    if (!shouldRun.get()) {
+      debug(s"Ignoring start future for ${endPoint.listenerName} since it has 
already been shut down.")

Review Comment:
   nit: "it" -> "the acceptor"?



##########
metadata/src/main/java/org/apache/kafka/controller/QuorumController.java:
##########
@@ -906,12 +906,35 @@ private void appendRaftEvent(String name, Runnable 
runnable) {
                 if (this != metaLogListener) {
                     log.debug("Ignoring {} raft event from an old 
registration", name);
                 } else {
-                    runnable.run();
+                    try {
+                        runnable.run();
+                    } finally {
+                        maybeCompleteAuthorizerInitialLoad();
+                    }
                 }
             });
         }
     }
 
+    private void maybeCompleteAuthorizerInitialLoad() {
+        if (!needToCompleteAuthorizerLoad) return;
+        OptionalLong highWatermark = raftClient.highWatermark();
+        if (highWatermark.isPresent()) {
+            if (lastCommittedOffset + 1 >= highWatermark.getAsLong()) {

Review Comment:
   I guess the only issue with this is that the high watermark is a moving 
target. It probably works ok since writes to the metadata log should be 
infrequent. Not sure I have any better ideas. Maybe we could refresh the high 
watermark value only once every second or something like that.



##########
core/src/main/scala/kafka/network/SocketServer.scala:
##########
@@ -104,184 +103,141 @@ class SocketServer(val config: KafkaConfig,
 
   private[this] val nextProcessorId: AtomicInteger = new AtomicInteger(0)
   val connectionQuotas = new ConnectionQuotas(config, time, metrics)
-  private var startedProcessingRequests = false
-  private var stoppedProcessingRequests = false
 
-  // Processors are now created by each Acceptor. However to preserve 
compatibility, we need to number the processors
-  // globally, so we keep the nextProcessorId counter in SocketServer
-  def nextProcessorId(): Int = {
-    nextProcessorId.getAndIncrement()
-  }
+  /**
+   * A future which is completed once all the authorizer futures are complete.
+   */
+  private val allAuthorizerFuturesComplete = new CompletableFuture[Void]
 
   /**
-   * Starts the socket server and creates all the Acceptors and the 
Processors. The Acceptors
-   * start listening at this stage so that the bound port is known when this 
method completes
-   * even when ephemeral ports are used. Acceptors and Processors are started 
if `startProcessingRequests`
-   * is true. If not, acceptors and processors are only started when 
[[kafka.network.SocketServer#startProcessingRequests()]]
-   * is invoked. Delayed starting of acceptors and processors is used to delay 
processing client
-   * connections until server is fully initialized, e.g. to ensure that all 
credentials have been
-   * loaded before authentications are performed. Incoming connections on this 
server are processed
-   * when processors start up and invoke 
[[org.apache.kafka.common.network.Selector#poll]].
-   *
-   * @param startProcessingRequests Flag indicating whether `Processor`s must 
be started.
-   * @param controlPlaneListener    The control plane listener, or None if 
there is none.
-   * @param dataPlaneListeners      The data plane listeners.
+   * True if the SocketServer is stopped. Must be accessed under the 
SocketServer lock.
    */
-  def startup(startProcessingRequests: Boolean = true,
-              controlPlaneListener: Option[EndPoint] = 
config.controlPlaneListener,
-              dataPlaneListeners: Seq[EndPoint] = config.dataPlaneListeners): 
Unit = {
-    this.synchronized {
-      createControlPlaneAcceptorAndProcessor(controlPlaneListener)
-      createDataPlaneAcceptorsAndProcessors(dataPlaneListeners)
-      if (startProcessingRequests) {
-        this.startProcessingRequests()
-      }
-    }
+  private var stopped = false
 
+  // Socket server metrics
+  newGauge(s"${DataPlaneAcceptor.MetricPrefix}NetworkProcessorAvgIdlePercent", 
() => SocketServer.this.synchronized {
     val dataPlaneProcessors = dataPlaneAcceptors.asScala.values.flatMap(a => 
a.processors)
-    val controlPlaneProcessorOpt = controlPlaneAcceptorOpt.map(a => 
a.processors(0))
-    
newGauge(s"${DataPlaneAcceptor.MetricPrefix}NetworkProcessorAvgIdlePercent", () 
=> SocketServer.this.synchronized {
-      val ioWaitRatioMetricNames = dataPlaneProcessors.map { p =>
-        metrics.metricName("io-wait-ratio", MetricsGroup, p.metricTags)
-      }
+    val ioWaitRatioMetricNames = dataPlaneProcessors.map { p =>
+      metrics.metricName("io-wait-ratio", MetricsGroup, p.metricTags)
+    }
+    if (dataPlaneProcessors.isEmpty) {
+      1.0
+    } else {
       ioWaitRatioMetricNames.map { metricName =>
         Option(metrics.metric(metricName)).fold(0.0)(m => 
Math.min(m.metricValue.asInstanceOf[Double], 1.0))
       }.sum / dataPlaneProcessors.size
-    })
-    
newGauge(s"${ControlPlaneAcceptor.MetricPrefix}NetworkProcessorAvgIdlePercent", 
() => SocketServer.this.synchronized {
-      val ioWaitRatioMetricName = controlPlaneProcessorOpt.map { p =>
-        metrics.metricName("io-wait-ratio", MetricsGroup, p.metricTags)
-      }
-      ioWaitRatioMetricName.map { metricName =>
-        Option(metrics.metric(metricName)).fold(0.0)(m => 
Math.min(m.metricValue.asInstanceOf[Double], 1.0))
-      }.getOrElse(Double.NaN)
-    })
-    newGauge("MemoryPoolAvailable", () => memoryPool.availableMemory)
-    newGauge("MemoryPoolUsed", () => memoryPool.size() - 
memoryPool.availableMemory)
-    
newGauge(s"${DataPlaneAcceptor.MetricPrefix}ExpiredConnectionsKilledCount", () 
=> SocketServer.this.synchronized {
-      val expiredConnectionsKilledCountMetricNames = dataPlaneProcessors.map { 
p =>
-        metrics.metricName("expired-connections-killed-count", MetricsGroup, 
p.metricTags)
-      }
-      expiredConnectionsKilledCountMetricNames.map { metricName =>
-        Option(metrics.metric(metricName)).fold(0.0)(m => 
m.metricValue.asInstanceOf[Double])
-      }.sum
-    })
-    
newGauge(s"${ControlPlaneAcceptor.MetricPrefix}ExpiredConnectionsKilledCount", 
() => SocketServer.this.synchronized {
-      val expiredConnectionsKilledCountMetricNames = 
controlPlaneProcessorOpt.map { p =>
-        metrics.metricName("expired-connections-killed-count", MetricsGroup, 
p.metricTags)
-      }
-      expiredConnectionsKilledCountMetricNames.map { metricName =>
-        Option(metrics.metric(metricName)).fold(0.0)(m => 
m.metricValue.asInstanceOf[Double])
-      }.getOrElse(0.0)
-    })
-  }
-
-  /**
-   * Start processing requests and new connections. This method is used for 
delayed starting of
-   * all the acceptors and processors if 
[[kafka.network.SocketServer#startup]] was invoked with
-   * `startProcessingRequests=false`.
-   *
-   * Before starting processors for each endpoint, we ensure that authorizer 
has all the metadata
-   * to authorize requests on that endpoint by waiting on the provided future. 
We start inter-broker
-   * listener before other listeners. This allows authorization metadata for 
other listeners to be
-   * stored in Kafka topics in this cluster.
-   *
-   * @param authorizerFutures Future per [[EndPoint]] used to wait before 
starting the processor
-   *                          corresponding to the [[EndPoint]]
-   */
-  def startProcessingRequests(authorizerFutures: Map[Endpoint, 
CompletableFuture[Void]] = Map.empty): Unit = {
-    info("Starting socket server acceptors and processors")
-    this.synchronized {
-      if (!startedProcessingRequests) {
-        startControlPlaneProcessorAndAcceptor(authorizerFutures)
-        startDataPlaneProcessorsAndAcceptors(authorizerFutures)
-        startedProcessingRequests = true
-      } else {
-        info("Socket server acceptors and processors already started")
-      }
     }
-    info("Started socket server acceptors and processors")
+  })
+  
newGauge(s"${ControlPlaneAcceptor.MetricPrefix}NetworkProcessorAvgIdlePercent", 
() => SocketServer.this.synchronized {

Review Comment:
   Do we register this metric even in kraft?



##########
core/src/main/scala/kafka/server/BrokerServer.scala:
##########
@@ -356,10 +359,13 @@ class BrokerServer(
           endpoints.asScala.map(ep => 
ep.listenerName().orElse("(none)")).mkString(", "))
       }
       val authorizerInfo = ServerInfo(new ClusterResource(clusterId),
-        config.nodeId, endpoints, interBrokerListener)
+        config.nodeId,
+        endpoints,
+        interBrokerListener,
+        config.earlyStartListeners.map(_.value()).asJava)
 
-      /* Get the authorizer and initialize it if one is specified.*/
-      authorizer = config.authorizer
+      // Create and intiialize an authorizer if one is configured.

Review Comment:
   nit: typo "intiialize"



##########
core/src/main/scala/kafka/network/SocketServer.scala:
##########
@@ -1392,15 +1299,24 @@ private[kafka] class Processor(val id: Int,
   private[network] def channel(connectionId: String): Option[KafkaChannel] =
     Option(selector.channel(connectionId))
 
+  def start(): Unit = thread.start()
+
   /**
    * Wakeup the thread for selection.
    */
-  override def wakeup(): Unit = selector.wakeup()
+  def wakeup(): Unit = selector.wakeup()
+
+  def beginShutdown(): Unit = {
+    if (shouldRun.getAndSet(false)) {
+      wakeup()
+      removeMetric("IdlePercent", Map("networkProcessor" -> id.toString))
+      metrics.removeMetric(expiredConnectionsKilledCountMetricName)

Review Comment:
   I know the code already does this, but metric removal seems like something 
to do after shutdown is complete. Maybe we can do it in `close`?
   
   ```scala
     def close(): Unit = {
       try {
         beginShutdown()
         thread.join()
       } finally {
         removeMetric("IdlePercent", Map("networkProcessor" -> id.toString))
         metrics.removeMetric(expiredConnectionsKilledCountMetricName)
       }
     }
   



##########
core/src/main/scala/kafka/network/SocketServer.scala:
##########
@@ -104,184 +103,141 @@ class SocketServer(val config: KafkaConfig,
 
   private[this] val nextProcessorId: AtomicInteger = new AtomicInteger(0)
   val connectionQuotas = new ConnectionQuotas(config, time, metrics)
-  private var startedProcessingRequests = false
-  private var stoppedProcessingRequests = false
 
-  // Processors are now created by each Acceptor. However to preserve 
compatibility, we need to number the processors
-  // globally, so we keep the nextProcessorId counter in SocketServer
-  def nextProcessorId(): Int = {
-    nextProcessorId.getAndIncrement()
-  }
+  /**
+   * A future which is completed once all the authorizer futures are complete.
+   */
+  private val allAuthorizerFuturesComplete = new CompletableFuture[Void]
 
   /**
-   * Starts the socket server and creates all the Acceptors and the 
Processors. The Acceptors
-   * start listening at this stage so that the bound port is known when this 
method completes
-   * even when ephemeral ports are used. Acceptors and Processors are started 
if `startProcessingRequests`
-   * is true. If not, acceptors and processors are only started when 
[[kafka.network.SocketServer#startProcessingRequests()]]
-   * is invoked. Delayed starting of acceptors and processors is used to delay 
processing client
-   * connections until server is fully initialized, e.g. to ensure that all 
credentials have been
-   * loaded before authentications are performed. Incoming connections on this 
server are processed
-   * when processors start up and invoke 
[[org.apache.kafka.common.network.Selector#poll]].
-   *
-   * @param startProcessingRequests Flag indicating whether `Processor`s must 
be started.
-   * @param controlPlaneListener    The control plane listener, or None if 
there is none.
-   * @param dataPlaneListeners      The data plane listeners.
+   * True if the SocketServer is stopped. Must be accessed under the 
SocketServer lock.
    */
-  def startup(startProcessingRequests: Boolean = true,
-              controlPlaneListener: Option[EndPoint] = 
config.controlPlaneListener,
-              dataPlaneListeners: Seq[EndPoint] = config.dataPlaneListeners): 
Unit = {
-    this.synchronized {
-      createControlPlaneAcceptorAndProcessor(controlPlaneListener)
-      createDataPlaneAcceptorsAndProcessors(dataPlaneListeners)
-      if (startProcessingRequests) {
-        this.startProcessingRequests()
-      }
-    }
+  private var stopped = false
 
+  // Socket server metrics
+  newGauge(s"${DataPlaneAcceptor.MetricPrefix}NetworkProcessorAvgIdlePercent", 
() => SocketServer.this.synchronized {

Review Comment:
   The synchronization in these gauges seems less than ideal. Is it worth 
filing a jira to come up with a better approach?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [kafka] hachikuji commented on a diff in pull request #11969: KAFKA-13649: Implement early.start.listeners and fix StandardAuthorizer loading [WIP]

Reply via email to