[GitHub] [spark] xinrong-meng opened a new pull request, #42079: [WIP][SPARK-44486][PYTHON][CONNECT] Implement PyArrow `self_destruct` feature for `toPandas`

via GitHub Wed, 19 Jul 2023 17:10:10 -0700


xinrong-meng opened a new pull request, #42079:
URL: https://github.com/apache/spark/pull/42079


   ### What changes were proposed in this pull request?
   Implement Arrow `self_destruct` of `toPandas` for memory savings.
   
   Now the Spark configuration 
`spark.sql.execution.arrow.pyspark.selfDestruct.enabled` can be used to enable 
PyArrow’s `self_destruct` feature in Spark Connect, which can save memory when 
creating a Pandas DataFrame via `toPandas` by freeing Arrow-allocated memory 
while building the Pandas DataFrame. 
   
   ### Why are the changes needed?
   Reach parity with vanilla PySpark.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   TBD


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] xinrong-meng opened a new pull request, #42079: [WIP][SPARK-44486][PYTHON][CONNECT] Implement PyArrow `self_destruct` feature for `toPandas`

Reply via email to