[GitHub] [spark] LucaCanali commented on a change in pull request #33559: [SPARK-34265][PYTHON][SQL] Instrument Python UDFs using SQL metrics

GitBox Thu, 10 Feb 2022 02:52:36 -0800


LucaCanali commented on a change in pull request #33559:
URL: https://github.com/apache/spark/pull/33559#discussion_r803543721




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala
##########
@@ -89,7 +89,8 @@ case class FlatMapGroupsInPandasExec(
         Array(argOffsets),

Review comment:
       Thanks @HyukjinKwon for looking into this.
   Unfortunately the proposed solution/test of using `val localPythonMetrics = 
pythonMetrics` does not appear to work.
   Using lazy val for the metrics appears to break many tesdts with Python. In 
particular I can see in that case that when using pyspark and "going through 
rdd" as in  `df_with_udf.rdd.collect()` we get java.lang.NullPointerException.  
   I would not propose to skip the failing test in 
postgreSQL/udf-aggregates_part3.sql, but rather move it to a Python test: see 
test_pandas_udf_nested in test_pandas_udf.py
   However if we can understand more clearly where this issue comes from, all 
the better.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] LucaCanali commented on a change in pull request #33559: [SPARK-34265][PYTHON][SQL] Instrument Python UDFs using SQL metrics

Reply via email to