One more question: Referring to the Bagel code, is the following step
possible culprit? Just making a guess from the code comment.
// Force evaluation of processed RDD for accurate performance measurements
processed.foreach(x => {})
On Sat, Nov 30, 2013 at 6:58 PM, Mayuresh Kunjir
<[email protected]>wrote:
> I tried passing DISK_ONLY storage level to Bagel's run method. It's
> running without any error (so far) but is too slow. I am attaching details
> for a stage corresponding to second iteration of my algorithm. (foreach
> at
> Bagel.scala:237<http://ec2-54-234-176-171.compute-1.amazonaws.com:4040/stages/stage?id=23>)
> It's been running for more than 35 minutes. I am noticing very high GC time
> for some tasks. Listing below the setup parameters.
>
> #nodes = 16
> SPARK_WORKER_MEMORY = 13G
> SPARK_MEM = 13G
> RDD storage fraction = 0.5
> degree of parallelism = 192 (16 nodes * 4 cores each * 3)
> Serializer = Kryo
> Vertex data size after serialization = ~12G (probably too high, but it's
> the bare minimum required for the algorithm.)
>
> I would be grateful if you could suggest some further optimizations or
> point out reasons why/if Bagel is not suitable for this data size. I need
> to further scale my cluster and not feeling confident at all looking at
> this.
>
> Thanks and regards,
> ~Mayuresh
>
>
> On Sat, Nov 30, 2013 at 3:07 PM, Mayuresh Kunjir <
> [email protected]> wrote:
>
>> Hi Spark users,
>>
>> I am running a pagerank-style algorithm on Bagel and bumping into "out of
>> memory" issues with that.
>>
>> Referring to the following table, rdd_120 is the rdd of vertices,
>> serialized and compressed in memory. On each iteration, Bagel deserializes
>> the compressed rdd. e.g. rdd_126 shows the uncompressed version of rdd_120
>> persisted in memory and disk. As iterations keep piling on, the cached
>> partitions start getting evicted. The moment a rdd_120 partition gets
>> evicted, it necessitates a recomputations and the performance goes for a
>> toss. Although we don't need uncompressed rdds from previous iterations,
>> they are the last ones to get evicted thanks to LRU policy.
>>
>> Should I make Bagel use DISK_ONLY persistence? How much of a performance
>> hit would that be? Or maybe there is a better solution here.
>>
>> Storage
>> RDD NameStorage Level Cached PartitionsFraction Cached Size in MemorySize
>> on Disk
>> rdd_83<http://ec2-54-234-176-171.compute-1.amazonaws.com:4040/storage/rdd?id=83>Memory
>> Serialized1x Replicated2312%83.7 MB0.0 B
>> rdd_95<http://ec2-54-234-176-171.compute-1.amazonaws.com:4040/storage/rdd?id=95>Memory
>> Serialized1x Replicated23
>> 12% 2.5 MB 0.0 B
>> rdd_120<http://ec2-54-234-176-171.compute-1.amazonaws.com:4040/storage/rdd?id=120>Memory
>> Serialized1x Replicated2513%761.1 MB0.0 B
>> rdd_126<http://ec2-54-234-176-171.compute-1.amazonaws.com:4040/storage/rdd?id=126>Disk
>> Memory Deserialized 1x Replicated192
>> 100% 77.9 GB 1016.5 MB
>> rdd_134<http://ec2-54-234-176-171.compute-1.amazonaws.com:4040/storage/rdd?id=134>Disk
>> Memory Deserialized 1x Replicated18596%60.8 GB475.4 MB
>> Thanks and regards,
>> ~Mayuresh
>>
>
>