linar-jether commented on pull request #29719: URL: https://github.com/apache/spark/pull/29719#issuecomment-948721250
@HyukjinKwon What do you mean by pseudo codes? My initial snippet for using pandas<->arrow<->spark conversions was done using this: https://gist.github.com/linar-jether/7dd61ed6fa89098ab9c58a1ab428b2b5 (based on spark 2.x) And this comment for converting directly from arrow RecordBatches without using pandas: https://gist.github.com/linar-jether/7dd61ed6fa89098ab9c58a1ab428b2b5#gistcomment-3452086 (works with spark 3.x) Basically, all of the logic for creating a dataframe from Arrow `RecordBatches/Table` objects already exists at `PythonSQLUtils.toDataFrame`, this PR only integrates it into the main api, and helps a bit with schemas and type conversions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
