Hi Satish, I don't see where spark support "-i", so suspect it is provided by DSE. In that case, it might be bug of DSE.
On Fri, Aug 21, 2015 at 6:02 PM, satish chandra j <jsatishchan...@gmail.com> wrote: > HI Robin, > Yes, it is DSE but issue is related to Spark only > > Regards, > Satish Chandra > > On Fri, Aug 21, 2015 at 3:06 PM, Robin East <robin.e...@xense.co.uk> > wrote: > >> Not sure, never used dse - it’s part of DataStax Enterprise right? >> >> On 21 Aug 2015, at 10:07, satish chandra j <jsatishchan...@gmail.com> >> wrote: >> >> HI Robin, >> Yes, below mentioned piece or code works fine in Spark Shell but the same >> when place in Script File and executed with -i <file name> it creating an >> empty RDD >> >> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40))) >> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] >> at makeRDD at <console>:28 >> >> >> scala> pairs.reduceByKey((x,y) => x + y).collect >> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40)) >> >> Command: >> >> dse spark --master local --jars postgresql-9.4-1201.jar -i >> <ScriptFile> >> >> I understand, I am missing something here due to which my final RDD does >> not have as required output >> >> Regards, >> Satish Chandra >> >> On Thu, Aug 20, 2015 at 8:23 PM, Robin East <robin.e...@xense.co.uk> >> wrote: >> >>> This works for me: >>> >>> scala> val pairs = sc.makeRDD(Seq((0,1),(0,2),(1,20),(1,30),(2,40))) >>> pairs: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[77] >>> at makeRDD at <console>:28 >>> >>> >>> scala> pairs.reduceByKey((x,y) => x + y).collect >>> res43: Array[(Int, Int)] = Array((0,3), (1,50), (2,40)) >>> >>> On 20 Aug 2015, at 11:05, satish chandra j <jsatishchan...@gmail.com> >>> wrote: >>> >>> HI All, >>> I have data in RDD as mentioned below: >>> >>> RDD : Array[(Int),(Int)] = Array((0,1), (0,2),(1,20),(1,30),(2,40)) >>> >>> >>> I am expecting output as Array((0,3),(1,50),(2,40)) just a sum function >>> on Values for each key >>> >>> Code: >>> RDD.reduceByKey((x,y) => x+y) >>> RDD.take(3) >>> >>> Result in console: >>> RDD: org.apache.spark.rdd.RDD[(Int,Int)]= ShuffledRDD[1] at reduceByKey >>> at <console>:73 >>> res:Array[(Int,Int)] = Array() >>> >>> Command as mentioned >>> >>> dse spark --master local --jars postgresql-9.4-1201.jar -i <ScriptFile> >>> >>> >>> Please let me know what is missing in my code, as my resultant Array is >>> empty >>> >>> >>> >>> Regards, >>> Satish >>> >>> >>> >> >> > -- Best Regards Jeff Zhang