Maybe you can refer sliding method of RDD, but it's right now mllib private
method.
Look at org.apache.spark.mllib.rdd.RDDFunctions.


2014-08-26 12:59 GMT+08:00 Vida Ha <v...@databricks.com>:

> Can you paste the code?  It's unclear to me how/when the out of memory is
> occurring without seeing the code.
>
>
>
>
> On Sun, Aug 24, 2014 at 11:37 PM, Gefei Li <gefeili.2...@gmail.com> wrote:
>
>> Hello everyone,
>>     I am transplanting a clustering algorithm to spark platform, and I
>> meet a problem confusing me for a long time, can someone help me?
>>
>>     I have a PairRDD<Integer, Integer> named patternRDD, which the key
>> represents a number and the value stores an information of the key. And I
>> want to use two of the VALUEs to calculate a kendall number, and if the
>> number is greater than 0.6, then output the two KEYs.
>>
>>     I have tried to transform the PairRDD to a RDD<Tuple2<Integer,
>> Integer>>, and add a common key zero to them, and join two together then
>> get a PairRDD<0, Iterable<Tuple2<Tuple2<key1, value1>, Tuple2<key2,
>> value2>>>>, and tried to use values() method and map the keys out, but it
>> gives me an "out of memory" error. I think the "out of memory" error is
>> caused by the few entries of my RDD, but I have no idea how to solve it.
>>
>>      Can you help me?
>>
>> Regards,
>> Gefei Li
>>
>
>

Reply via email to