dvogelbacher opened a new pull request #25602: [SPARK-28613][SQL] Add config 
option for limiting uncompressed result size in SQL
URL: https://github.com/apache/spark/pull/25602
 
 
   ### What changes were proposed in this pull request?
   This PR adds a new config option 
`spark.sql.driver.maxUncompressedResultSize` (defaulting to empty which 
preserves the same behavior as before this PR).
   If this config option is present then the size of the uncompressed, decoded 
result of SQL actions (e.g., collect) will be limited to its value.
   
   ### Why are the changes needed?
   The main problem with the existing `spark.driver.maxResultSize` is that it 
only enforces the size of the compressed data (of the compressed byte rdd). The 
actual uncompressed size can be much larger. Thus, `spark.driver.maxResultSize` 
is no good mechanism for protecting the driver against OOMs when using spark 
sql.
   
   Adding this new config option provides an additional, better way for 
protecting the driver against OOMs during collects.
   
   ### Does this PR introduce any user-facing change?
   No. 
   
   
   ### How was this patch tested?
   I added a new unit test in `SparkPlanSuite.scala`
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to