Re: [PR] [SPARK-48220][PYTHON] Allow passing PyArrow Table to createDataFrame() [spark]

via GitHub Wed, 22 May 2024 12:19:41 -0700


ianmcook commented on code in PR #46529:
URL: https://github.com/apache/spark/pull/46529#discussion_r1610529343



##########
examples/src/main/python/sql/arrow.py:
##########
@@ -33,20 +33,23 @@
 require_minimum_pyarrow_version()
 
 
-def dataframe_to_arrow_table_example(spark: SparkSession) -> None:
-    import pyarrow as pa  # noqa: F401
-    from pyspark.sql.functions import rand
+def dataframe_to_from_arrow_table_example(spark: SparkSession) -> None:
+    import pyarrow as pa
+    import numpy as np
+
+    # Create a PyArrow Table
+    table = pa.table([pa.array(np.random.rand(100)) for i in range(3)], 
names=["a", "b", "c"])
 
-    # Create a Spark DataFrame
-    df = spark.range(100).drop("id").withColumns({"0": rand(), "1": rand(), 
"2": rand()})
+    # Create a Spark DataFrame from the PyArrow Table
+    df = spark.createDataFrame(table)
 
     # Convert the Spark DataFrame to a PyArrow Table
-    table = df.select("*").toArrow()
+    result_table = df.select("*").toArrow()

Review Comment:
   I followed the pandas example (see below on line 69 of this same file). I 
was wondering this too, but I kept it just to match the pandas example. I'm 
happy to remove both if that would be better.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-48220][PYTHON] Allow passing PyArrow Table to createDataFrame() [spark]

Reply via email to