[GitHub] [spark] BryanCutler commented on a change in pull request #24095: [SPARK-27163][PYTHON] Cleanup and consolidate Pandas UDF functionality

GitBox Mon, 18 Mar 2019 17:14:48 -0700

BryanCutler commented on a change in pull request #24095: [SPARK-27163][PYTHON] 
Cleanup and consolidate Pandas UDF functionality
URL: https://github.com/apache/spark/pull/24095#discussion_r266689897


 ##########
 File path: python/pyspark/sql/session.py
 ##########
 @@ -530,15 +530,24 @@ def _create_from_pandas_with_arrow(self, pdf, schema, 
timezone):
         to Arrow data, then sending to the JVM to parallelize. If a schema is 
passed in, the
         data types will be used to coerce the data in Pandas to Arrow 
conversion.
         """
-        from pyspark.serializers import ArrowStreamSerializer, _create_batch
-        from pyspark.sql.types import from_arrow_schema, to_arrow_type, 
TimestampType
+        from pyspark.serializers import ArrowStreamPandasSerializer
+        from pyspark.sql.types import from_arrow_type, to_arrow_type, 
TimestampType
         from pyspark.sql.utils import require_minimum_pandas_version, \
             require_minimum_pyarrow_version
 
         require_minimum_pandas_version()
         require_minimum_pyarrow_version()
 
         from pandas.api.types import is_datetime64_dtype, is_datetime64tz_dtype
+        import pyarrow as pa
+
+        # Create the Spark schema from list of names passed in with Arrow types
+        if isinstance(schema, (list, tuple)):
+            arrow_schema = pa.Schema.from_pandas(pdf, preserve_index=False)
 
 Review comment:
   this is only since pyarrow 0.12.0, I can check into a workaround although it 
might be a good time to bump the minimum pyarrow version

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] BryanCutler commented on a change in pull request #24095: [SPARK-27163][PYTHON] Cleanup and consolidate Pandas UDF functionality

Reply via email to