Nulls are excluded with spark.sql("SELECT count(distinct col) FROM
Table").show()
I think it is ANSI SQL behaviour.
scala> spark.sql("select distinct count(null)").show(false)
+-----------+
|count(NULL)|
+-----------+
|0 |
+-----------+
scala> spark.sql("select distinct null").count
res1: Long = 1
Regards,
Hemanth
From: Mohamed Nadjib Mami <[email protected]>
Date: Thursday, 6 April 2017 at 20.29
To: "[email protected]" <[email protected]>
Subject: df.count() returns one more count than SELECT COUNT()
spark.sql("SELECT count(distinct col) FROM Table").show()