dtenedor commented on code in PR #44678:
URL: https://github.com/apache/spark/pull/44678#discussion_r1450991957
##########
python/pyspark/sql/udtf.py:
##########
@@ -133,12 +133,28 @@ class AnalyzeResult:
If non-empty, this is a sequence of expressions that the UDTF is
specifying for Catalyst to
sort the input TABLE argument by. Note that the 'partitionBy' list
must also be non-empty
in this case.
+ acquireExecutionMemoryMbRequested: long
Review Comment:
Good question; this should just indicate the max memory that the Python UDTF
could consume, regardless of the Spark cluster size. Then the future call to
`TaskMemoryManager.acquireExecutionMemory` will fail if the cluster does not
have enough memory (possibly after asking existing operators to shrink their
own memory usage by means such as spilling to disk).
Best practices: In general, the Spark cluster owner should endeavor to
provide enough executor memory to accommodate requests of this size so to
support these functions. By the same token, UDTF developers should aim to keep
requested (and actual) memory usage low enough to fit in common Spark cluster
configurations. For most cases, this means at most in the low hundreds of
megabytes for a single function.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]