[GitHub] [spark] holdenk commented on a change in pull request #29719: [SPARK-32846][SQL][PYTHON] Support createDataFrame from an RDD of pd.DataFrames

GitBox Wed, 28 Jul 2021 10:56:10 -0700


holdenk commented on a change in pull request #29719:
URL: https://github.com/apache/spark/pull/29719#discussion_r678531129




##########
File path: python/pyspark/sql/pandas/conversion.py
##########
@@ -297,8 +297,11 @@ class SparkConversionMixin(object):
     """
     Min-in for the conversion from pandas to Spark. Currently, only 
:class:`SparkSession`
     can use this class.
+    pandasRDD=True creates a DataFrame from an RDD of pandas dataframes
+    (currently only supported using arrow)

Review comment:
       So let's say the user specifies a schema, in that case inside of 
_createFromRDD we can just look at the type of each element that were 
processing and see if it's a DataFrame or a Row or a Dictionary and dispatch 
the logic there. What do you think? Or is there a reason I'm missing why we 
couldn't do the dispatch inside of _createFromRDD based on type?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] holdenk commented on a change in pull request #29719: [SPARK-32846][SQL][PYTHON] Support createDataFrame from an RDD of pd.DataFrames

Reply via email to