Can you attach the result of eventDF.filter($"entityType" === "user").select("entityId").distinct.explain(true)?
Thanks, Yin On Thu, Nov 5, 2015 at 1:12 AM, 千成徳 <s.c...@opt.ne.jp> wrote: > Hi All, > > I have data frame like this. > > Equality expression is not working in 1.5.1 but, works as expected in 1.4.0 > What is the difference? > > scala> eventDF.printSchema() > root > |-- id: string (nullable = true) > |-- event: string (nullable = true) > |-- entityType: string (nullable = true) > |-- entityId: string (nullable = true) > |-- targetEntityType: string (nullable = true) > |-- targetEntityId: string (nullable = true) > |-- properties: string (nullable = true) > > scala> eventDF.groupBy("entityType").agg(countDistinct("entityId")).show > +----------+------------------------+ > |entityType|COUNT(DISTINCT entityId)| > +----------+------------------------+ > | ib_user| 4751| > | user| 2091| > +----------+------------------------+ > > > ----- not works ( Bug ? ) > scala> eventDF.filter($"entityType" === > "user").select("entityId").distinct.count > res151: Long = 1219 > > scala> eventDF.filter(eventDF("entityType") === > "user").select("entityId").distinct.count > res153: Long = 1219 > > scala> eventDF.filter($"entityType" equalTo > "user").select("entityId").distinct.count > res149: Long = 1219 > > ----- works as expected > scala> eventDF.map{ e => (e.getAs[String]("entityId"), > e.getAs[String]("entityType")) }.filter(x => x._2 == > "user").map(_._1).distinct.count > res150: Long = 2091 > > scala> eventDF.filter($"entityType" in > "user").select("entityId").distinct.count > warning: there were 1 deprecation warning(s); re-run with -deprecation for > details > res155: Long = 2091 > > scala> eventDF.filter($"entityType" !== > "ib_user").select("entityId").distinct.count > res152: Long = 2091 > > > But, All of above code works in 1.4.0 > > Thanks. > >