Hi,
I am using *where method* of dataframe to filter data.
I am comparing Integer field with String type data, this comparision results
full table data.
I have tested same scenario with HIVE and MYSQL but this comparision will
not give any result.
*Scenario : *
val sqlDf = df.where("f1 = 'abc'")
here f1 : Integer
* Input:*
14
15
16
* output: *
14
15
16
*Logical and Physical Plan : *
== Parsed Logical Plan ==
'Filter ('f1 = abc)
+- Relation[f1#0] csv
== Analyzed Logical Plan ==
f1: int
Filter (cast(f1#0 as double) = cast(abc as double))
+- Relation[f1#0] csv
== Optimized Logical Plan ==
Filter (isnotnull(f1#0) && null)
+- Relation[f1#0] csv
== Physical Plan ==
*Project [f1#0]
+- *Filter isnotnull(f1#0)
+- *Scan csv [f1#0] Format: CSV, InputPaths:
file:/C:/Users/santlalg/IdeaProjects/SparkTestPoc/Int, PartitionFilters:
[null], PushedFilters: [IsNotNull(f1)], ReadSchema: struct
In *Optimized Logical Plan*, why *cast(f1#0 as double) > cast(abc as
double)* from *Analyzed Logical Plan* is replaced with /null/?
I am using below version of dependency:
Spark-core : 2.0.2
Spark-sql : 2.0.2
In My scenario this should be false, so that dataframe should not give any
result.
Can someone help me to achieve this?
Thanks
Santlal
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Mismatch-in-data-type-comparision-results-full-data-in-Spark-tp28521.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org