LucaCanali commented on a change in pull request #33559:
URL: https://github.com/apache/spark/pull/33559#discussion_r803543721
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala
##########
@@ -89,7 +89,8 @@ case class FlatMapGroupsInPandasExec(
Array(argOffsets),
Review comment:
Thanks @HyukjinKwon for looking into this.
Unfortunately the proposed solution/test of using `val localPythonMetrics =
pythonMetrics` does not appear to work.
Using lazy val for the metrics appears to break many tesdts with Python. In
particular I can see in that case that when using pyspark and "going through
rdd" as in `df_with_udf.rdd.collect()` we get java.lang.NullPointerException.
I would not propose to skip the failing test in
postgreSQL/udf-aggregates_part3.sql, but rather move it to a Python test: see
test_pandas_udf_nested in test_pandas_udf.py
However if we can understand more clearly where this issue comes from, all
the better.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]