Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/22408#discussion_r218295350
--- Diff: docs/sql-programming-guide.md ---
@@ -1879,6 +1879,80 @@ working with timestamps in `pandas_udf`s to get the
best performance, see
## Upgrading From Spark SQL 2.3 to 2.4
+ - In Spark version 2.3 and earlier, the second parameter to
array_contains function is implicitly promoted to the element type of first
array type parameter. This type promotion can be lossy and may cause
`array_contains` function to return wrong result. This problem has been
addressed in 2.4 by employing a safer type promotion mechanism. This can cause
some change in behavior and are illustrated in the table below.
+ <table class="table">
+ <tr>
+ <th>
+ <b>Query</b>
+ </th>
+ <th>
+ <b>Result Spark 2.3 or Prior</b>
+ </th>
+ <th>
+ <b>Result Spark 2.4</b>
+ </th>
+ <th>
+ <b>Remarks</b>
+ </th>
+ </tr>
+ <tr>
+ <th>
+ <b>SELECT <br> array_contains(array(1), 1.34D);</b>
+ </th>
+ <th>
+ <b>true</b>
+ </th>
+ <th>
+ <b>false</b>
+ </th>
+ <th>
+ <b>In Spark 2.4, both left and right parameters are promoted
to array(double) and double type respectively.</b>
+ </th>
+ </tr>
+ <tr>
+ <th>
+ <b>SELECT <br> array_contains(array(1), 1.34);</b>
+ </th>
+ <th>
+ <b>true</b>
+ </th>
+ <th>
+ <b>AnalysisException is thrown since integer type can not be
promoted to decimal type in a loss-less manner.</b>
--- End diff --
I left a few comments. Please send a PR, thanks!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]