Hi Sathish, The current implementation of countByKey uses reduceByKey: https://github.com/apache/spark/blob/v1.2.1/core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala#L332
It seems that countByKey is mostly deprecated: https://issues.apache.org/jira/browse/SPARK-3994 -Jey On Tue, Feb 24, 2015 at 3:53 PM, Sathish Kumaran Vairavelu <vsathishkuma...@gmail.com> wrote: > Hello, > > Quick question. I am trying to understand difference between reduceByKey vs > countByKey? Which one gives better performance reduceByKey or countByKey? > While we can perform same count operation using reduceByKey why we need > countByKey/countByValue? > > Thanks > > Sathish --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org