Re: [PR] [SPARK-46686][PYTHON][CONNECT] Basic support of SparkSession based Python UDF profiler [spark]

via GitHub Wed, 17 Jan 2024 14:57:43 -0800


HyukjinKwon commented on code in PR #44697:
URL: https://github.com/apache/spark/pull/44697#discussion_r1456590439



##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala:
##########
@@ -371,6 +373,14 @@ case class SessionHolder(userId: String, sessionId: 
String, session: SparkSessio
   private[connect] def listListenerIds(): Seq[String] = {
     listenerCache.keySet().asScala.toSeq
   }
+
+  /**
+   * An accumulator for Python executors.
+   *
+   * The accumulated results will be sent to the Python client via 
observed_metrics message.
+   */
+  private[connect] val pythonAccumulator: Option[PythonAccumulator] =
+    Try(session.sparkContext.collectionAccumulator[Array[Byte]]).toOption

Review Comment:
   My concern is that regisgerting too many acumulators because calling this 
will create and register accumator for each session. Especially for Spark 
Connent, there could be a lot of Spark sessions



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-46686][PYTHON][CONNECT] Basic support of SparkSession based Python UDF profiler [spark]

Reply via email to