Hi All I have below code whether distinct is running for more time. blockingRdd is the combination of <Long,String> and it will have 400K records JavaPairRDD<Long,Integer> completeDataToprocess=blockingRdd.flatMapValues( new Function<String, Iterable<Integer>>(){
@Override public Iterable<Integer> call(String v1) throws Exception { return ckdao.getSingelkeyresult(v1); } }).distinct(32); I am running distinct on 800K records and its taking 2 hours on 16 cores and 20 GB RAM.