subject:"\"OutOfMemory and GC limits \\\(TODO\\\) Error in map after self\\\-join\""

Re: OutOfMemory and GC limits (TODO) Error in map after self-join

2015-02-18 Thread Tom Walwyn

Thanks Imran, I'll try your suggestions. I eventually got this to run by 'checkpointing' the joined RDD (according to Akhil's suggestion) before performing the reduceBy, and then checkpointing it again afterward. i.e. > val rdd2 = rdd.join(rdd, numPartitions=1000) > .map(fp=>((fp._2._1, fp._2._2

Re: OutOfMemory and GC limits (TODO) Error in map after self-join

2015-02-18 Thread Imran Rashid

Hi Tom, there are a couple of things you can do here to make this more efficient. first, I think you can replace your self-join with a groupByKey. on your example data set, this would give you (1, Iterable(2,3)) (4, Iterable(3)) this reduces the amount of data that needs to be shuffled, and tha

Re: OutOfMemory and GC limits (TODO) Error in map after self-join

2015-02-17 Thread Tom Walwyn

Thanks for the reply, I'll try your suggestions. Apologies, in my previous post I was mistaken. rdd is actually an PairRDD of (Int, Int). I'm doing the self-join so I can count two things. First, I can count the number of times a value appears in the data set. Second I can count number of times va

Re: OutOfMemory and GC limits (TODO) Error in map after self-join

2015-02-17 Thread Akhil Das

Why are you joining the rdd with itself? You can try these things: - Change the StorageLevel of both rdds to MEMORY_AND_DISK_2 or MEMORY_AND_DISK_SER, so that it doesnt need to keep everything up in memory. - Set your default Serializer to Kryo (.set("spark.serializer", "org.apache.spark.seriali

OutOfMemory and GC limits (TODO) Error in map after self-join

2015-02-17 Thread Tom Walwyn

Hi All, I'm a new Spark (and Hadoop) user and I want to find out if the cluster resources I am using are feasible for my use-case. The following is a snippet of code that is causing a OOM exception in the executor after about 125/1000 tasks during the map stage. > val rdd2 = rdd.join(rdd, numPart

Re: OutOfMemory and GC limits (TODO) Error in map after self-join

Re: OutOfMemory and GC limits (TODO) Error in map after self-join

Re: OutOfMemory and GC limits (TODO) Error in map after self-join

Re: OutOfMemory and GC limits (TODO) Error in map after self-join

OutOfMemory and GC limits (TODO) Error in map after self-join

5 matches

Site Navigation

Mail list logo

Footer information