Hi Robin,

You are a star! Thank you for the explanation and example. I converted your
code into Java without any hassle. It is working as I expected. I carried
out the final calculation (5th/6th) using mapValues and it is working
nicely. But I was wondering is there a better way to do it other than using
mapValues?

Cheers,

Raj


On 16 September 2015 at 20:13, Robin East <robin.e...@xense.co.uk> wrote:

> One way is to use foldByKey which is similar to reduceByKey but you supply
> a ‘zero’ value for the start of the computation. The idea is to add an
> extra element to the returned string to represent the count of the 5th
> element. You can then use the 5th and 6th elements to calculate the mean.
> The ‘zero’ value you supply to foldByKey is the all-zeros string
> “0,0,0,0,0,0”.
>
> Below is some example scala code that implements this idea - I’m sure
> Spark Java experts on the forum could turn this into the equivalent Java.
>
> initial.foldByKey("0,0,0,0,0,0")( (a,b) => {
>         val iFirst = a.split(",")(0).toInt
>         val iFirstB = b.split(",")(0).toInt
>         val iFirth = a.split(",")(4).toInt
>         val iFirthB = b.split(",")(4).toInt
>         val countA  = if(a.split(",").size > 5) a.split(",")(5).toInt else 1
>         val countB  = if(b.split(",").size > 5) b.split(",")(5).toInt else 1
>         s"${iFirst + iFirstB},0,0,0,${iFirth + iFirthB},${countA + countB}"
>       }).collect
>
>
> This returns a collection of keys and 6 element strings where the 5th
> element is the sum of all the fifth entries and the 6th element is the
> running count of entries.
>
> -------------------------------------------------------------------------------
> Robin East
> *Spark GraphX in Action* Michael Malak and Robin East
> Manning Publications Co.
> http://www.manning.com/books/spark-graphx-in-action
>
>
>
>
>
> On 16 Sep 2015, at 15:46, diplomatic Guru <diplomaticg...@gmail.com>
> wrote:
>
>  have a mapper that emit key/value pairs(composite keys and composite
> values separated by comma).
>
> e.g
>
> *key:* a,b,c,d *Value:* 1,2,3,4,5
>
> *key:* a1,b1,c1,d1 *Value:* 5,4,3,2,1
>
> ...
>
> ...
>
> *key:* a,b,c,d *Value:* 5,4,3,2,1
>
>
> I could easily SUM these values using reduceByKey.
>
> e.g.
>
> reduceByKey(new Function2<String, String, String>() {
>
>         @Override
>         public String call(String value1, String value2) {
>             String oldValue[] = value1.toString().split(",");
>             String newValue[] = value2.toString().split(",");
>
>             int iFirst = Integer.parseInt(oldValue[0]) + 
> Integer.parseInt(newValue[0]);
>             int iSecond = Integer.parseInt(oldValue[1]) + 
> Integer.parseInt(newValue[1]);
>             int iThird = Integer.parseInt(oldValue[2]) + 
> Integer.parseInt(newValue[2]);
>             int iFourth = Integer.parseInt(oldValue[3]) + 
> Integer.parseInt(newValue[3]);
>             int iFifth = Integer.parseInt(oldValue[4]) + 
> Integer.parseInt(newValue[4]);
>
>             return iFirst  + "," + iSecond + ","
>                     + iThird+ "," + iFourth+ "," + iFifth;
>
>         }
>     });
>
> But the problem is how do I find average of just one of these values. Lets
> assume I want to SUM iFirst, iSecond, iThird and iFourth but I want to find
> Average of iFifth. How do i do it? With a simple key/value pairs I could
> use mapValues function but not sure how I could do it with my example.
> Please advice.
>
>
>

Reply via email to