Hi All I have below code whether distinct is running for more time.

blockingRdd is the combination of <Long,String> and it will have 400K
records
JavaPairRDD<Long,Integer> completeDataToprocess=blockingRdd.flatMapValues(
new Function<String, Iterable<Integer>>(){

@Override
public Iterable<Integer> call(String v1) throws Exception {
return ckdao.getSingelkeyresult(v1);
}
 }).distinct(32);

I am running distinct on 800K records and its taking 2 hours on 16 cores
and 20 GB RAM.

Reply via email to