[GitHub] [spark] linar-jether commented on pull request #29719: [SPARK-32846][SQL][PYTHON] Support createDataFrame from an RDD of pd.DataFrames

GitBox Thu, 21 Oct 2021 08:18:58 -0700


linar-jether commented on pull request #29719:
URL: https://github.com/apache/spark/pull/29719#issuecomment-948721250



   @HyukjinKwon What do you mean by pseudo codes? My initial snippet for using 
pandas<->arrow<->spark conversions was done using this: 
   https://gist.github.com/linar-jether/7dd61ed6fa89098ab9c58a1ab428b2b5 (based 
on spark 2.x) 
   
   And this comment for converting directly from arrow RecordBatches without 
using pandas: 
https://gist.github.com/linar-jether/7dd61ed6fa89098ab9c58a1ab428b2b5#gistcomment-3452086
 (works with spark 3.x)
   
   Basically, all of the logic for creating a dataframe from Arrow 
`RecordBatches/Table` objects already exists at `PythonSQLUtils.toDataFrame`, 
this PR only integrates it into the main api, and helps a bit with schemas and 
type conversions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] linar-jether commented on pull request #29719: [SPARK-32846][SQL][PYTHON] Support createDataFrame from an RDD of pd.DataFrames

Reply via email to