Re: [PR] [SPARK-46686][PYTHON][CONNECT] Basic support of SparkSession based Python UDF profiler [spark]

via GitHub Wed, 17 Jan 2024 12:43:13 -0800


ueshin commented on code in PR #44697:
URL: https://github.com/apache/spark/pull/44697#discussion_r1456459431



##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala:
##########
@@ -371,6 +373,14 @@ case class SessionHolder(userId: String, sessionId: 
String, session: SparkSessio
   private[connect] def listListenerIds(): Seq[String] = {
     listenerCache.keySet().asScala.toSeq
   }
+
+  /**
+   * An accumulator for Python executors.
+   *
+   * The accumulated results will be sent to the Python client via 
observed_metrics message.
+   */
+  private[connect] val pythonAccumulator: Option[PythonAccumulator] =
+    Try(session.sparkContext.collectionAccumulator[Array[Byte]]).toOption

Review Comment:
   > if the profile is disabled, we shouldn't probably create this accumulator 
to avoid performance issue.
   
   It needs to always have the accumulator because:
   - it can't know whether or not / when the profiler is enabled
   - to support the registered UDFs
   
   What kind of performance issue do you concern?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-46686][PYTHON][CONNECT] Basic support of SparkSession based Python UDF profiler [spark]

Reply via email to