BryanCutler edited a comment on issue #22807: [SPARK-25811][PySpark] Raise a proper error when unsafe cast is detected by PyArrow URL: https://github.com/apache/spark/pull/22807#issuecomment-454123632 Here is the behaviour of PySpark for integer overflow: _ | Integer Overflow ---|--- **PySpark Non-Arrow** | silent overflow **with pyarrow version < 0.11.1** | raise ArrowInvalid: Integer value out of bounds **version >= 0.11.1 safe=False** | silent overflow **version >= 0.11.1 safe=True** | raise "ArrowInvalid: Integer value out of bounds" _ | Floating Point Truncation ---|--- **PySpark Non-Arrow** | silent **with pyarrow version < 0.11.1** | silent **version >= 0.11.1 safe=False** | silent **version >= 0.11.1 safe=True** | raise "ArrowInvalid: Floating point value truncated" NOTE - pyarrow 0.11.1 incorrectly raises a floating point truncation error when column has integers and NULL values, see https://issues.apache.org/jira/browse/ARROW-4258. This looks like it is going to be fixed in v0.12.0
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
