Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/18853#discussion_r150227062
--- Diff: docs/sql-programming-guide.md ---
@@ -1460,6 +1460,13 @@ that these options will be deprecated in future
release as more optimizations ar
Configures the number of partitions to use when shuffling data for
joins or aggregations.
</td>
</tr>
+ <tr>
+ <td><code>spark.sql.typeCoercion.mode</code></td>
+ <td><code>legacy</code></td>
+ <td>
+ The <code>legacy</code> type coercion mode was used in spark prior
to 2.3, and so it continues to be the default to avoid breaking behavior.
However, it has logical inconsistencies. The <code>hive</code> mode is
preferred for most new applications, though it may require additional manual
casting.
--- End diff --
I don't agree hive's type coercion rule is the most reasonable. One example
is casting both sides to double when comparing string and long, which may lead
to wrong result because of precision lose.
I'd like to be neutral here, just say users can choose different type
coercion mode, like hive, mysql, etc. By default it's spark.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]