Re: [PR] [SPARK-48493][PYTHON] Enhance Python Datasource Reader with direct Arrow Batch support for improved performance [spark]

via GitHub Thu, 29 Aug 2024 11:30:07 -0700


allisonwang-db commented on code in PR #46826:
URL: https://github.com/apache/spark/pull/46826#discussion_r1736892224



##########
python/docs/source/user_guide/sql/python_data_source.rst:
##########
@@ -452,3 +452,67 @@ We can also use the same data source in streaming reader 
and writer
 .. code-block:: python
 
     query = 
spark.readStream.format("fake").load().writeStream.format("fake").start("/output_path")
+
+Python Datasource Reader with direct Arrow Batch support for improved 
performance
+-----------------------------------------------------------------------------------------
+The Python Datasource Reader now supports direct yielding of Arrow Batches, 
which can significantly improve data processing performance. By using the 
efficient Arrow format,

Review Comment:
   🎉 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-48493][PYTHON] Enhance Python Datasource Reader with direct Arrow Batch support for improved performance [spark]

Reply via email to