[GitHub] [spark] linar-jether commented on a change in pull request #29719: [SPARK-32846][SQL][PYTHON] Support createDataFrame from an RDD of pd.DataFrames

GitBox Tue, 06 Jul 2021 04:48:14 -0700


linar-jether commented on a change in pull request #29719:
URL: https://github.com/apache/spark/pull/29719#discussion_r664480865




##########
File path: python/pyspark/sql/pandas/conversion.py
##########
@@ -297,8 +297,11 @@ class SparkConversionMixin(object):
     """
     Min-in for the conversion from pandas to Spark. Currently, only 
:class:`SparkSession`
     can use this class.
+    pandasRDD=True creates a DataFrame from an RDD of pandas dataframes
+    (currently only supported using arrow)

Review comment:
       Can we somehow define/get the type of the RDD[py-object] without 
evaluating the first element of it?
   If not, then the RDD might contain any type of object, so the _pandasRDD_ 
option is used as a way to differentiate between initialization from an RDD and 
an RDD of pd.DataFrames.
   
   Thank you for reviewing! please let me know if there's anything else i can 
do to get this merged.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] linar-jether commented on a change in pull request #29719: [SPARK-32846][SQL][PYTHON] Support createDataFrame from an RDD of pd.DataFrames

Reply via email to