HyukjinKwon commented on a change in pull request #22807: 
[SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by 
PyArrow
URL: https://github.com/apache/spark/pull/22807#discussion_r249627386
 
 

 ##########
 File path: docs/sql-migration-guide-upgrade.md
 ##########
 @@ -41,6 +41,54 @@ displayTitle: Spark SQL Upgrading Guide
 
   - Since Spark 3.0, JSON datasource and JSON function `schema_of_json` infer 
TimestampType from string values if they match to the pattern defined by the 
JSON option `timestampFormat`. Set JSON option `inferTimestamp` to `false` to 
disable such type inferring.
 
+  - In PySpark, when Arrow optimization is enabled, if Arrow version is higher 
than 0.11.0, Arrow can perform safe type conversion when converting 
Pandas.Series to Arrow array during serialization. Arrow will raise errors when 
detecting unsafe type conversion like overflow. Setting 
`spark.sql.execution.pandas.arrowSafeTypeConversion` to true can enable it. The 
default setting is false. PySpark's behavior for Arrow versions is illustrated 
in the table below:
+  <table class="table">
+        <tr>
+          <th>
+            <b>PyArrow version</b>
+          </th>
+          <th>
+            <b>Integer Overflow</b>
+          </th>
+          <th>
+            <b>Floating Point Truncation</b>
+          </th>
+        </tr>
+        <tr>
+          <th>
+            <b>version < 0.11.0</b>
+          </th>
+          <th>
+            <b>Raise error</b>
+          </th>
+          <th>
+            <b>Silently allows</b>
+          </th>
+        </tr>
+        <tr>
+          <th>
+            <b>version > 0.11.0, arrowSafeTypeConversion=false</b>
+          </th>
+          <th>
+            <b>Silent overflow</b>
+          </th>
+          <th>
+            <b>Silently allows</b>
+          </th>
+        </tr>
+        <tr>
+          <th>
+            <b>version > 0.11.0, arrowSafeTypeConversion=true</b>
 
 Review comment:
   Hmhmm .. yea .. Good point about consistency with the regular UDFs. I was 
thinking of 1. 
    removing such configurations out eventually (personally I don't like the 
bunch of configurations we have currently ..), 2. making all UDFs to expect the 
exact types (<- not sure yet it needs discussion). Yes, we can talk later ..

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to