[GitHub] [spark] linar-jether commented on a change in pull request #29719: [SPARK-32846][SQL][PYTHON] Support createDataFrame from an RDD of pd.DataFrames

GitBox Mon, 12 Jul 2021 13:04:22 -0700


linar-jether commented on a change in pull request #29719:
URL: https://github.com/apache/spark/pull/29719#discussion_r667841596




##########
File path: python/pyspark/sql/pandas/conversion.py
##########
@@ -297,8 +297,11 @@ class SparkConversionMixin(object):
     """
     Min-in for the conversion from pandas to Spark. Currently, only 
:class:`SparkSession`
     can use this class.
+    pandasRDD=True creates a DataFrame from an RDD of pandas dataframes
+    (currently only supported using arrow)

Review comment:
       I agree that this seems to fit well into `_inferSchema` & 
`_createFromRDD`, although we still would need some way to discern between an 
rdd of DataFrames and other types when the user provides a schema (and we don't 
want to peek into the first item).
   
   Do you think it would be better to move the pandas flag into 
`_createFromRDD`? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] linar-jether commented on a change in pull request #29719: [SPARK-32846][SQL][PYTHON] Support createDataFrame from an RDD of pd.DataFrames

Reply via email to