Hello everyone, I am transplanting a clustering algorithm to spark platform, and I meet a problem confusing me for a long time, can someone help me?
I have a PairRDD<Integer, Integer> named patternRDD, which the key represents a number and the value stores an information of the key. And I want to use two of the VALUEs to calculate a kendall number, and if the number is greater than 0.6, then output the two KEYs. I have tried to transform the PairRDD to a RDD<Tuple2<Integer, Integer>>, and add a common key zero to them, and join two together then get a PairRDD<0, Iterable<Tuple2<Tuple2<key1, value1>, Tuple2<key2, value2>>>>, and tried to use values() method and map the keys out, but it gives me an "out of memory" error. I think the "out of memory" error is caused by the few entries of my RDD, but I have no idea how to solve it. Can you help me? Regards, Gefei Li