Hi, You can use "groupByKey + mapValues", e.g.,
JavaPairRDD<String, Tuple2<Integer, List<List<String>>>> callCount = byCaller .groupByKey() .mapValues( new Function<List<Tuple2<Integer, List<String>>>, Tuple2<Integer, List<List<String>>>>() { @Override public Tuple2<Integer, List<List<String>>> call( List<Tuple2<Integer, List<String>>> values) throws Exception { int count = 0; List<List<String>> l = new ArrayList<List<String>>(); for (Tuple2<Integer, List<String>> value : values) { count += value._1; l.add(value._2); } return new Tuple2<Integer, List<List<String>>>( count, l); } }); Or "combineByKey" which often has better performance. Best Regards, Shixiong Zhu 2014-03-14 0:56 GMT+08:00 goi cto <goi....@gmail.com>: > Hi, > > I have an RDD with <S,Tuple2<I,List>> which I want to reduceByKey and get > I+I and List of List > (add the integers and build a list of the lists. > > BUT reduce by key requires that the return value is of the same type of > the input > so I can combine the lists. > > JavaPairRDD<String,Tuple2<Integer*,List<*List<String>>>> callCount = > byCaller.*reduceByKey*( > new > Function2<Tuple2<Integer,List<String>>,Tuple2<Integer,List<String>>,Tuple2<Integer,List<List<String>>>>(){ > public Tuple2<Integer,List<List<String>>> > call(Tuple2<Integer,List<String>> i1,Tuple2<Integer,List<String>> i2){ > Integer count = i1._1+i2._1; > List<List<String>> combinedList = new ArrayList<List<String>>(2); > combinedList.add(i1._2); > combinedList.add(i2._2); > return new Tuple2(count,combinedList); > } > > > any solution for that? > > -- > Eran | CTO >