liukuijian8040 opened a new pull request, #41162:
URL: https://github.com/apache/spark/pull/41162

   ### What changes were proposed in this pull request?
   When dataTypes of elements in In expression are the same, it will behaviour 
as same as BinaryComparison like EqualTo when the switch is open.
   `
   // test data and content: 
   // test.json
   // {"name":"Michael","age":0}
   
   // test SQL
   
spark.read().json("examples/src/main/resources/test.json").createOrReplaceTempView("t");
   spark.sql("select * from t where age in ('00')").explain(true);
   
   // Before change:
   == Parsed Logical Plan ==
   'Project [*]
   +- 'Filter 'age IN (00)
      +- 'UnresolvedRelation [t], [], false
   
   == Analyzed Logical Plan ==
   age: bigint, name: string
   Project [age#7L, name#8]
   +- Filter cast(age#7L as string) IN (cast(00 as string))
      +- SubqueryAlias t
          +- Relation[age#7L,name#8] json
   
   == Optimized Logical Plan ==
   Filter (isnotnull(age#7L) AND (cast(age#7L as string) = 00))
   +- Relation[age#7L,name#8] json
   
   == Physical Plan ==
   *(1) Filter (isnotnull(age#7L) AND (cast(age#7L as string) = 00))
   +- FileScan json [age#7L,name#8] Batched: false, DataFilters: 
[isnotnull(age#7L), (cast(age#7L as string) = 00)], Format: JSON, Location: 
InMemoryFileIndex[file:/D:/code/spark/examples/src/main/resources/test.json], 
PartitionFilters: [], PushedFilters: [IsNotNull(age)], ReadSchema: 
struct<age:bigint,name:string>
   
   +---+----+
   |age|name|
   +---+----+
   +---+----+
   
   
   // After change:
   spark.sql("select * from t where age = '00'").explain(true);
   == Parsed Logical Plan ==
   'Project [*]
   +- 'Filter 'age IN (00)
      +- 'UnresolvedRelation [t], [], false
   
   == Analyzed Logical Plan ==
   age: bigint, name: string
   Project [age#7L, name#8]
   +- Filter cast(age#7L as bigint) IN (cast(00 as bigint))
      +- SubqueryAlias t
          +- Relation[age#7L,name#8] json
   
   == Optimized Logical Plan ==
   Filter (isnotnull(age#7L) AND (age#7L = 0))
   +- Relation[age#7L,name#8] json
   
   == Physical Plan ==
   *(1) Filter (isnotnull(age#7L) AND (age#7L = 0))
   +- FileScan json [age#7L,name#8] Batched: false, DataFilters: 
[isnotnull(age#7L), (age#7L = 0)], Format: JSON, Location: 
InMemoryFileIndex[file:/D:/code/spark/examples/src/main/resources/test.json], 
PartitionFilters: [], PushedFilters: [IsNotNull(age), EqualTo(age,0)], 
ReadSchema: struct<age:bigint,name:string>
   
   +---+-------+
   |age|   name|
   +---+-------+
   |  0|Michael|
   +---+-------+
   `
   
   ### Why are the changes needed?
   The query results of Spark SQL and Hive SQL are inconsistent with same sql. 
Spark SQL calculates 0 in ('00') as false, which act different from = keyword, 
but Hive calculates true. Hive is compatible with the in keyword in 3.1.0, but 
SparkSQL does not.
   for example, this two query sql should have same result, how ever, the query 
result is different:
   `
   scala> spark.sql("select 1 as test where 0 in ('00')").show;
   +----+
   |test|
   +----+
   +----+
   
   
   scala> spark.sql("select 1 as test where 0 = '00'").show;
   +----+                                                                       
   
   |test|
   +----+
   |   1|
   +----+
   
   `
   
   ### Does this PR introduce _any_ user-facing change?
   We add a switch to support In expression compatible with EqualTo expression 
with false as default value, to make sure it will not change default behavior 
of Spark SQL.
   
   ### How was this patch tested?
   By set 
spark.sql.legacy.inExpressionCompatibleWithEqualTo.enabled=true/false, to check 
whether the analyzed logical plan will cast expression as expected. For true, 
it will generate same Cast logical plan as EqualTo, and false will keep the old 
Cast logical plan solution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to