Hi, Maybe you need to check those nodes. It's very slow.
3487 SUCCESS PROCESS_LOCAL ip-10-60-150-111.ec2.internal 2013/12/01 02:11:38 17.7 m 16.3 m 23.3 MB 3447 SUCCESS PROCESS_LOCAL ip-10-12-54-63.ec2.internal 2013/12/01 02:11:26 20.1 m 13.9 m 50.9 MB > 在 2013年12月1日,上午10:59,"Mayuresh Kunjir" <mayuresh.kun...@gmail.com> 写道: > > I tried passing DISK_ONLY storage level to Bagel's run method. It's running > without any error (so far) but is too slow. I am attaching details for a > stage corresponding to second iteration of my algorithm. (foreach at > Bagel.scala:237) It's been running for more than 35 minutes. I am noticing > very high GC time for some tasks. Listing below the setup parameters. > > #nodes = 16 > SPARK_WORKER_MEMORY = 13G > SPARK_MEM = 13G > RDD storage fraction = 0.5 > degree of parallelism = 192 (16 nodes * 4 cores each * 3) > Serializer = Kryo > Vertex data size after serialization = ~12G (probably too high, but it's the > bare minimum required for the algorithm.) > > I would be grateful if you could suggest some further optimizations or point > out reasons why/if Bagel is not suitable for this data size. I need to > further scale my cluster and not feeling confident at all looking at this. > > Thanks and regards, > ~Mayuresh > > >> On Sat, Nov 30, 2013 at 3:07 PM, Mayuresh Kunjir <mayuresh.kun...@gmail.com> >> wrote: >> Hi Spark users, >> >> I am running a pagerank-style algorithm on Bagel and bumping into "out of >> memory" issues with that. >> >> Referring to the following table, rdd_120 is the rdd of vertices, serialized >> and compressed in memory. On each iteration, Bagel deserializes the >> compressed rdd. e.g. rdd_126 shows the uncompressed version of rdd_120 >> persisted in memory and disk. As iterations keep piling on, the cached >> partitions start getting evicted. The moment a rdd_120 partition gets >> evicted, it necessitates a recomputations and the performance goes for a >> toss. Although we don't need uncompressed rdds from previous iterations, >> they are the last ones to get evicted thanks to LRU policy. >> >> Should I make Bagel use DISK_ONLY persistence? How much of a performance hit >> would that be? Or maybe there is a better solution here. >> >> Storage >> RDD Name Storage Level Cached Partitions Fraction Cached Size >> in Memory Size on Disk >> rdd_83 Memory Serialized1x Replicated 23 12% 83.7 MB >> 0.0 B >> rdd_95 Memory Serialized1x Replicated 23 12% 2.5 MB 0.0 B >> rdd_120 Memory Serialized1x Replicated 25 13% 761.1 MB >> 0.0 B >> rdd_126 Disk Memory Deserialized 1x Replicated 192 100% 77.9 GB >> 1016.5 MB >> rdd_134 Disk Memory Deserialized 1x Replicated 185 96% 60.8 >> GB 475.4 MB >> Thanks and regards, >> ~Mayuresh > > <BigFrame - Details for Stage 23.htm>