Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/18853#discussion_r140799432
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
---
@@ -115,21 +115,46 @@ object TypeCoercion {
* is a String and the other is not. It also handles when one op is a
Date and the
* other is a Timestamp by making the target type to be String.
*/
- val findCommonTypeForBinaryComparison: (DataType, DataType) =>
Option[DataType] = {
- // We should cast all relative timestamp/date/string comparison into
string comparisons
- // This behaves as a user would expect because timestamp strings sort
lexicographically.
- // i.e. TimeStamp(2013-01-01 00:00 ...) < "2014" = true
- case (StringType, DateType) => Some(StringType)
- case (DateType, StringType) => Some(StringType)
- case (StringType, TimestampType) => Some(StringType)
- case (TimestampType, StringType) => Some(StringType)
- case (TimestampType, DateType) => Some(StringType)
- case (DateType, TimestampType) => Some(StringType)
- case (StringType, NullType) => Some(StringType)
- case (NullType, StringType) => Some(StringType)
- case (l: StringType, r: AtomicType) if r != StringType => Some(r)
- case (l: AtomicType, r: StringType) if (l != StringType) => Some(l)
- case (l, r) => None
+ private def findCommonTypeForBinaryComparison(
+ plan: LogicalPlan,
+ l: DataType,
+ r: DataType): Option[DataType] =
+ if (!plan.conf.isHiveTypeCoercionMode) {
+ (l, r) match {
+ // We should cast all relative timestamp/date/string comparison
into string comparisons
+ // This behaves as a user would expect because timestamp strings
sort lexicographically.
+ // i.e. TimeStamp(2013-01-01 00:00 ...) < "2014" = true
--- End diff --
I think this comment should be updated given the latest investigation,
explaining both the original motivation, and how it is flawed. Eg.
"Originally spark cast all relative timestamp/date/string comparison into
string comparisons. The motivation was that this would lead to natural
comparisons on simple string inputs for times, eg. TimeStamp(2013-01-01 00:00
...) < "2014" = true. However, this leads to other logical inconsistencies,
eg. TimeStamp(2013-01-01 00:00 ...) < "5" = true. Also, equals is not
consistent with other binary comparisions. Futhermore, comparing to a string
that does not look like a time at all will still compare, just with an
unexpected result."
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]