HI Robin, Yes, below mentioned piece or code works fine in Spark Shell but the same when place in Script File and executed with -i <file name> it creating an empty RDD
scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40))) pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at makeRDD at <console>:28 scala> pairs.reduceByKey((x,y) => x + y).collect res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40)) Command: dse spark --master local --jars postgresql-9.4-1201.jar -i <ScriptFile> I understand, I am missing something here due to which my final RDD does not have as required output Regards, Satish Chandra On Thu, Aug 20, 2015 at 8:23 PM, Robin East <robin.e...@xense.co.uk> wrote: > This works for me: > > scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40))) > pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] at > makeRDD at <console>:28 > > > scala> pairs.reduceByKey((x,y) => x + y).collect > res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40)) > > On 20 Aug 2015, at 11:05, satish chandra j <jsatishchan...@gmail.com> > wrote: > > HI All, > I have data in RDD as mentioned below: > > RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40)) > > > I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function on > Values for each key > > Code: > RDD.reduceByKey((x,y) => x+y) > RDD.take(3) > > Result in console: > RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey at > <console>:73 > res:Array[(Int,Int)] = Array() > > Command as mentioned > > dse spark --master local --jars postgresql-9.4-1201.jar -i <ScriptFile> > > > Please let me know what is missing in my code, as my resultant Array is > empty > > > > Regards, > Satish > > >