[GitHub] BryanCutler commented on a change in pull request #22807: [SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow

GitBox Wed, 09 Jan 2019 10:06:38 -0800

BryanCutler commented on a change in pull request #22807: 
[SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by 
PyArrow
URL: https://github.com/apache/spark/pull/22807#discussion_r246484375


 ##########
 File path: python/pyspark/sql/tests/test_pandas_udf_scalar.py
 ##########
 @@ -138,36 +138,44 @@ def test_vectorized_udf_null_boolean(self):
         self.assertEquals(df.collect(), res.collect())
 
     def test_vectorized_udf_null_byte(self):
-        data = [(None,), (2,), (3,), (4,)]
-        schema = StructType().add("byte", ByteType())
-        df = self.spark.createDataFrame(data, schema)
-        byte_f = pandas_udf(lambda x: x, ByteType())
-        res = df.select(byte_f(col('byte')))
-        self.assertEquals(df.collect(), res.collect())
+        with self.sql_conf({
+                "spark.sql.execution.pandas.arrowSafeTypeConversion": False}):
+            data = [(None,), (2,), (3,), (4,)]
+            schema = StructType().add("byte", ByteType())
+            df = self.spark.createDataFrame(data, schema)
+            byte_f = pandas_udf(lambda x: x, ByteType())
+            res = df.select(byte_f(col('byte')))
+            self.assertEquals(df.collect(), res.collect())
 
     def test_vectorized_udf_null_short(self):
-        data = [(None,), (2,), (3,), (4,)]
-        schema = StructType().add("short", ShortType())
-        df = self.spark.createDataFrame(data, schema)
-        short_f = pandas_udf(lambda x: x, ShortType())
-        res = df.select(short_f(col('short')))
-        self.assertEquals(df.collect(), res.collect())
+        with self.sql_conf({
+                "spark.sql.execution.pandas.arrowSafeTypeConversion": False}):
+            data = [(None,), (2,), (3,), (4,)]
+            schema = StructType().add("short", ShortType())
+            df = self.spark.createDataFrame(data, schema)
+            short_f = pandas_udf(lambda x: x, ShortType())
+            res = df.select(short_f(col('short')))
+            self.assertEquals(df.collect(), res.collect())
 
     def test_vectorized_udf_null_int(self):
-        data = [(None,), (2,), (3,), (4,)]
-        schema = StructType().add("int", IntegerType())
-        df = self.spark.createDataFrame(data, schema)
-        int_f = pandas_udf(lambda x: x, IntegerType())
-        res = df.select(int_f(col('int')))
-        self.assertEquals(df.collect(), res.collect())
+        with self.sql_conf({
+                "spark.sql.execution.pandas.arrowSafeTypeConversion": False}):
 
 Review comment:
   I see, it's because of the NULL values

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] BryanCutler commented on a change in pull request #22807: [SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow

Reply via email to