Re: spark null values calculation
sorry i have found what's the reasons. for null I can not compare it directly. I have wrote a note for this. https://bigcount.xyz/how-spark-handles-null-and-abnormal-values.html Thanks. wilson wrote: do you know why the select results below have not consistent behavior? - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
spark null values calculation
my dataset has NULL included in the columns. do you know why the select results below have not consistent behavior? scala> dfs.select("cand_status").count() val res37: Long = 881793 scala> dfs.select("cand_status").where($"cand_status" =!= "NULL").count() val res38: Long = 383717 scala> dfs.select("cand_status").where($"cand_status" === "NULL").count() val res39: Long = 86402 scala> dfs.select("cand_status").where($"cand_status" === "NULL").where($"cand_status" =!= "NULL").count() val res40: Long = 0 as you see: 383717 + 86402 != 881793 for which i expect them to be equal. Thanks. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org