Hi Robin, You are a star! Thank you for the explanation and example. I converted your code into Java without any hassle. It is working as I expected. I carried out the final calculation (5th/6th) using mapValues and it is working nicely. But I was wondering is there a better way to do it other than using mapValues?
Cheers, Raj On 16 September 2015 at 20:13, Robin East <robin.e...@xense.co.uk> wrote: > One way is to use foldByKey which is similar to reduceByKey but you supply > a ‘zero’ value for the start of the computation. The idea is to add an > extra element to the returned string to represent the count of the 5th > element. You can then use the 5th and 6th elements to calculate the mean. > The ‘zero’ value you supply to foldByKey is the all-zeros string > “0,0,0,0,0,0”. > > Below is some example scala code that implements this idea - I’m sure > Spark Java experts on the forum could turn this into the equivalent Java. > > initial.foldByKey("0,0,0,0,0,0")( (a,b) => { > val iFirst = a.split(",")(0).toInt > val iFirstB = b.split(",")(0).toInt > val iFirth = a.split(",")(4).toInt > val iFirthB = b.split(",")(4).toInt > val countA = if(a.split(",").size > 5) a.split(",")(5).toInt else 1 > val countB = if(b.split(",").size > 5) b.split(",")(5).toInt else 1 > s"${iFirst + iFirstB},0,0,0,${iFirth + iFirthB},${countA + countB}" > }).collect > > > This returns a collection of keys and 6 element strings where the 5th > element is the sum of all the fifth entries and the 6th element is the > running count of entries. > > ------------------------------------------------------------------------------- > Robin East > *Spark GraphX in Action* Michael Malak and Robin East > Manning Publications Co. > http://www.manning.com/books/spark-graphx-in-action > > > > > > On 16 Sep 2015, at 15:46, diplomatic Guru <diplomaticg...@gmail.com> > wrote: > > have a mapper that emit key/value pairs(composite keys and composite > values separated by comma). > > e.g > > *key:* a,b,c,d *Value:* 1,2,3,4,5 > > *key:* a1,b1,c1,d1 *Value:* 5,4,3,2,1 > > ... > > ... > > *key:* a,b,c,d *Value:* 5,4,3,2,1 > > > I could easily SUM these values using reduceByKey. > > e.g. > > reduceByKey(new Function2<String, String, String>() { > > @Override > public String call(String value1, String value2) { > String oldValue[] = value1.toString().split(","); > String newValue[] = value2.toString().split(","); > > int iFirst = Integer.parseInt(oldValue[0]) + > Integer.parseInt(newValue[0]); > int iSecond = Integer.parseInt(oldValue[1]) + > Integer.parseInt(newValue[1]); > int iThird = Integer.parseInt(oldValue[2]) + > Integer.parseInt(newValue[2]); > int iFourth = Integer.parseInt(oldValue[3]) + > Integer.parseInt(newValue[3]); > int iFifth = Integer.parseInt(oldValue[4]) + > Integer.parseInt(newValue[4]); > > return iFirst + "," + iSecond + "," > + iThird+ "," + iFourth+ "," + iFifth; > > } > }); > > But the problem is how do I find average of just one of these values. Lets > assume I want to SUM iFirst, iSecond, iThird and iFourth but I want to find > Average of iFifth. How do i do it? With a simple key/value pairs I could > use mapValues function but not sure how I could do it with my example. > Please advice. > > >