Re: [PR] [SPARK-46638][Python] Create Python UDTF API to acquire execution memory for function evaluation [spark]

via GitHub Fri, 12 Jan 2024 14:47:52 -0800


dtenedor commented on code in PR #44678:
URL: https://github.com/apache/spark/pull/44678#discussion_r1450991957



##########
python/pyspark/sql/udtf.py:
##########
@@ -133,12 +133,28 @@ class AnalyzeResult:
         If non-empty, this is a sequence of expressions that the UDTF is 
specifying for Catalyst to
         sort the input TABLE argument by. Note that the 'partitionBy' list 
must also be non-empty
         in this case.
+    acquireExecutionMemoryMbRequested: long

Review Comment:
   Good question; this should just indicate the max memory that the Python UDTF 
could consume, regardless of the Spark cluster size. Then the future call to 
`TaskMemoryManager.acquireExecutionMemory` will fail if the cluster does not 
have enough memory (possibly after asking existing operators to shrink their 
own memory usage by means such as spilling to disk).
   
   Best practices: In general, the Spark cluster owner should endeavor to 
provide enough executor memory to accommodate requests of this size so to 
support these functions. By the same token, UDTF developers should aim to keep 
requested (and actual) memory usage low enough to fit in common Spark cluster 
configurations. For most cases, this means at most in the low hundreds of 
megabytes for a single function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-46638][Python] Create Python UDTF API to acquire execution memory for function evaluation [spark]

Reply via email to