df.count() returns one more count than SELECT COUNT()

Mohamed Nadjib Mami Thu, 06 Apr 2017 10:29:43 -0700

I paste this right from Spark shell (Spark 2.1.0):


/scala> spark.sql("SELECT count(distinct col) FROM Table").show()//
//+-------------------------+ //
//|count(DISTINCT col)|//
//+-------------------------+//
//|            4697            |//
//+-------------------------+//

//scala> spark.sql("SELECT distinct col FROM Table").count()//
//res8: Long = 4698 /

That is, `dataframe.count()` is returning one more count that thein-query `COUNT()` function.


Any explanations?

Cheers,
Mohamed

df.count() returns one more count than SELECT COUNT()

Reply via email to