[GitHub] [spark] LucaCanali commented on issue #26953: [SPARK-30306][CORE][PYTHON] Instrument Python UDF execution time and throughput metrics using Spark Metrics system

GitBox Tue, 21 Apr 2020 12:34:06 -0700


LucaCanali commented on issue #26953:
URL: https://github.com/apache/spark/pull/26953#issuecomment-617369628



   I would not worry very much about the performance impact of this additional 
instrumentation, as it hooks on something that is not very fast already, that 
is the serialization/deserialization JVM-Python. Moreover, the instrumentation 
mostly just takes timing values and does so per batch of serialized rows, so 
the impach on the total throughput is expected to be further reduced by this. 
So far, I have only tested this manually and did not observe any particular 
impact. If we have a Python UDF benchmark I could further test with that.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] LucaCanali commented on issue #26953: [SPARK-30306][CORE][PYTHON] Instrument Python UDF execution time and throughput metrics using Spark Metrics system

Reply via email to