I am still waiting for answer to my questions on Bagel. And I have another set of questions on iterative processing.
Two RDDs are serialized (and compressed) and cached in memory before the start of a loop. Together, they occupy just 20% of the fraction of RDD memory. (These RDDs are read from object files just before caching.) After each iteration, I store results to an object file and the same file is read at the start of next iteration. In such a scenario, I expect the memory to keep holding my previously cached RDDs. But in reality, some of the partitions get evicted during each iteration even though there is nothing to replace them. Performance takes a major hit the moment it starts recomputing the RDDs. It's really hard to understand why cached block are getting evicted. I am sure people have faced such issues and I would really appreciate any advise on dealing with these. Thanks and regards, ~Mayuresh On Sat, Nov 30, 2013 at 7:14 PM, Mayuresh Kunjir <[email protected]>wrote: > One more question: Referring to the Bagel code, is the following step > possible culprit? Just making a guess from the code comment. > > // Force evaluation of processed RDD for accurate performance measurements > > processed.foreach(x => {}) > > > > On Sat, Nov 30, 2013 at 6:58 PM, Mayuresh Kunjir < > [email protected]> wrote: > >> I tried passing DISK_ONLY storage level to Bagel's run method. It's >> running without any error (so far) but is too slow. I am attaching details >> for a stage corresponding to second iteration of my algorithm. (foreach >> at >> Bagel.scala:237<http://ec2-54-234-176-171.compute-1.amazonaws.com:4040/stages/stage?id=23>) >> It's been running for more than 35 minutes. I am noticing very high GC time >> for some tasks. Listing below the setup parameters. >> >> #nodes = 16 >> SPARK_WORKER_MEMORY = 13G >> SPARK_MEM = 13G >> RDD storage fraction = 0.5 >> degree of parallelism = 192 (16 nodes * 4 cores each * 3) >> Serializer = Kryo >> Vertex data size after serialization = ~12G (probably too high, but it's >> the bare minimum required for the algorithm.) >> >> I would be grateful if you could suggest some further optimizations or >> point out reasons why/if Bagel is not suitable for this data size. I need >> to further scale my cluster and not feeling confident at all looking at >> this. >> >> Thanks and regards, >> ~Mayuresh >> >> >> On Sat, Nov 30, 2013 at 3:07 PM, Mayuresh Kunjir < >> [email protected]> wrote: >> >>> Hi Spark users, >>> >>> I am running a pagerank-style algorithm on Bagel and bumping into "out >>> of memory" issues with that. >>> >>> Referring to the following table, rdd_120 is the rdd of vertices, >>> serialized and compressed in memory. On each iteration, Bagel deserializes >>> the compressed rdd. e.g. rdd_126 shows the uncompressed version of rdd_120 >>> persisted in memory and disk. As iterations keep piling on, the cached >>> partitions start getting evicted. The moment a rdd_120 partition gets >>> evicted, it necessitates a recomputations and the performance goes for a >>> toss. Although we don't need uncompressed rdds from previous iterations, >>> they are the last ones to get evicted thanks to LRU policy. >>> >>> Should I make Bagel use DISK_ONLY persistence? How much of a performance >>> hit would that be? Or maybe there is a better solution here. >>> >>> Storage >>> RDD NameStorage Level Cached PartitionsFraction Cached Size in MemorySize >>> on Disk >>> rdd_83<http://ec2-54-234-176-171.compute-1.amazonaws.com:4040/storage/rdd?id=83>Memory >>> Serialized1x Replicated2312%83.7 MB0.0 B >>> rdd_95<http://ec2-54-234-176-171.compute-1.amazonaws.com:4040/storage/rdd?id=95>Memory >>> Serialized1x Replicated23 >>> 12% 2.5 MB 0.0 B >>> rdd_120<http://ec2-54-234-176-171.compute-1.amazonaws.com:4040/storage/rdd?id=120>Memory >>> Serialized1x Replicated2513%761.1 MB0.0 B >>> rdd_126<http://ec2-54-234-176-171.compute-1.amazonaws.com:4040/storage/rdd?id=126>Disk >>> Memory Deserialized 1x Replicated192 >>> 100% 77.9 GB 1016.5 MB >>> rdd_134<http://ec2-54-234-176-171.compute-1.amazonaws.com:4040/storage/rdd?id=134>Disk >>> Memory Deserialized 1x Replicated18596%60.8 GB475.4 MB >>> Thanks and regards, >>> ~Mayuresh >>> >> >> >
