[ 
https://issues.apache.org/jira/browse/SPARK-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-5715.
------------------------------
    Resolution: Duplicate

Can you move your comment to the existing JIRA, SPARK-5081?

> Shuffle size increase, performance loss from Spark 1.1.0 to Spark 1.2.0 (and 
> 1.2.1)
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-5715
>                 URL: https://issues.apache.org/jira/browse/SPARK-5715
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.2.0, 1.2.1
>         Environment: Running with local[*] as master (experienced it during 
> pre-integration test, not in production cluster), 100GByte memory assigned, 
> 16 core machine.
>            Reporter: Dr. Christian Betz
>              Labels: performance
>
> I see a *factor four performance loss* in my Spark jobs when migrating from 
> Spark 1.1.0 to Spark 1.2.0 or 1.2.1.
> Also, I see an *increase in the size of shuffle writes* (which is also 
> reported by Kevin Jung on the mailing list: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-write-increases-in-spark-1-2-tt20894.html
>  
> Together with this I experience a *huge number of disk spills*.
> I'm experiencing these with my job under the following circumstances: 
> * Spark 1.2.0 with Sort-based Shuffle 
> * Spark 1.2.0 with Hash-based Shuffle 
> * Spark 1.2.1 with Sort-based Shuffle 
> All three combinations show the same behavior, which contrasts from Spark 
> 1.1.0. 
> In Spark 1.1.0, my job runs for about an hour, in Spark 1.2.x it runs for 
> almost four hours. Configuration is identical otherwise - I only added 
> org.apache.spark.scheduler.CompressedMapStatus to the Kryo registrator for 
> Spark 1.2.0 to cope with https://issues.apache.org/jira/browse/SPARK-5102. 
> As a consequence (I think, but causality might be different) I see lots and 
> lots of disk spills. 
> I cannot provide a small test case, but maybe the log entries for a single 
> worker thread can help someone investigate on this. (See below.) 
> I will also open up an issue, if nobody stops me by providing an answer ;) 
> Any help will be greatly appreciated, because otherwise I'm stuck with Spark 
> 1.1.0, as quadrupling runtime is not an option. 
> Sincerely, 
> Chris 
> 2015-02-09T14:06:06.328+01:00 INFO org.apache.spark.executor.Executor Running 
> task 9.0 in stage 18.0 (TID 300) Executor task launch worker-18 
> 2015-02-09T14:06:06.351+01:00 INFO org.apache.spark.CacheManager Partition 
> rdd_35_9 not found, computing it Executor task launch worker-18 
> 2015-02-09T14:06:06.351+01:00 INFO 
> org.apache.spark.storage.ShuffleBlockFetcherIterator Getting 10 non-empty 
> blocks out of 10 blocks Executor task launch worker-18 
> 2015-02-09T14:06:06.351+01:00 INFO 
> org.apache.spark.storage.ShuffleBlockFetcherIterator Started 0 remote fetches 
> in 0 ms Executor task launch worker-18 
> 2015-02-09T14:06:07.396+01:00 INFO org.apache.spark.storage.MemoryStore 
> ensureFreeSpace(2582904) called with curMem=300174944, maxMe... Executor task 
> launch worker-18 
> 2015-02-09T14:06:07.397+01:00 INFO org.apache.spark.storage.MemoryStore Block 
> rdd_35_9 stored as bytes in memory (estimated size 2.5... Executor task 
> launch worker-18 
> 2015-02-09T14:06:07.398+01:00 INFO 
> org.apache.spark.storage.BlockManagerMaster Updated info of block rdd_35_9 
> Executor task launch worker-18 
> 2015-02-09T14:06:07.399+01:00 INFO org.apache.spark.CacheManager Partition 
> rdd_38_9 not found, computing it Executor task launch worker-18 
> 2015-02-09T14:06:07.399+01:00 INFO 
> org.apache.spark.storage.ShuffleBlockFetcherIterator Getting 10 non-empty 
> blocks out of 10 blocks Executor task launch worker-18 
> 2015-02-09T14:06:07.400+01:00 INFO 
> org.apache.spark.storage.ShuffleBlockFetcherIterator Started 0 remote fetches 
> in 0 ms Executor task launch worker-18 
> 2015-02-09T14:06:07.567+01:00 INFO org.apache.spark.storage.MemoryStore 
> ensureFreeSpace(944848) called with curMem=302757848, maxMem... Executor task 
> launch worker-18 
> 2015-02-09T14:06:07.568+01:00 INFO org.apache.spark.storage.MemoryStore Block 
> rdd_38_9 stored as values in memory (estimated size 92... Executor task 
> launch worker-18 
> 2015-02-09T14:06:07.569+01:00 INFO 
> org.apache.spark.storage.BlockManagerMaster Updated info of block rdd_38_9 
> Executor task launch worker-18 
> 2015-02-09T14:06:07.573+01:00 INFO 
> org.apache.spark.storage.ShuffleBlockFetcherIterator Getting 34 non-empty 
> blocks out of 50 blocks Executor task launch worker-18 
> 2015-02-09T14:06:07.573+01:00 INFO 
> org.apache.spark.storage.ShuffleBlockFetcherIterator Started 0 remote fetches 
> in 1 ms Executor task launch worker-18 
> 2015-02-09T14:06:38.931+01:00 INFO org.apache.spark.CacheManager Partition 
> rdd_41_9 not found, computing it Executor task launch worker-18 
> 2015-02-09T14:06:38.931+01:00 INFO 
> org.apache.spark.storage.ShuffleBlockFetcherIterator Getting 3 non-empty 
> blocks out of 10 blocks Executor task launch worker-18 
> 2015-02-09T14:06:38.931+01:00 INFO 
> org.apache.spark.storage.ShuffleBlockFetcherIterator Started 0 remote fetches 
> in 0 ms Executor task launch worker-18 
> 2015-02-09T14:06:38.945+01:00 INFO org.apache.spark.storage.MemoryStore 
> ensureFreeSpace(0) called with curMem=307529127, maxMem=9261... Executor task 
> launch worker-18 
> 2015-02-09T14:06:38.945+01:00 INFO org.apache.spark.storage.MemoryStore Block 
> rdd_41_9 stored as bytes in memory (estimated size 0.0... Executor task 
> launch worker-18 
> 2015-02-09T14:06:38.946+01:00 INFO 
> org.apache.spark.storage.BlockManagerMaster Updated info of block rdd_41_9 
> Executor task launch worker-18 
> 2015-02-09T14:06:38.946+01:00 WARN org.apache.spark.storage.BlockManager 
> Block rdd_41_9 replicated to only 0 peer(s) instead of 1 pee... Executor task 
> launch worker-18 
> 2015-02-09T14:06:39.088+01:00 INFO org.apache.spark.storage.BlockManager 
> Found block rdd_3_9 locally Executor task launch worker-18 
> 2015-02-09T14:06:41.389+01:00 INFO org.apache.spark.CacheManager Partition 
> rdd_7_9 not found, computing it Executor task launch worker-18 
> 2015-02-09T14:06:41.389+01:00 INFO 
> org.apache.spark.storage.ShuffleBlockFetcherIterator Getting 1 non-empty 
> blocks out of 1 blocks Executor task launch worker-18 
> 2015-02-09T14:06:41.389+01:00 INFO 
> org.apache.spark.storage.ShuffleBlockFetcherIterator Started 0 remote fetches 
> in 0 ms Executor task launch worker-18 
> 2015-02-09T14:06:41.402+01:00 INFO org.apache.spark.storage.MemoryStore 
> ensureFreeSpace(38144) called with curMem=307529151, maxMem=... Executor task 
> launch worker-18 
> 2015-02-09T14:06:41.402+01:00 INFO org.apache.spark.storage.MemoryStore Block 
> rdd_7_9 stored as values in memory (estimated size 37.... Executor task 
> launch worker-18 
> 2015-02-09T14:06:41.404+01:00 INFO 
> org.apache.spark.storage.BlockManagerMaster Updated info of block rdd_7_9 
> Executor task launch worker-18 
> 2015-02-09T14:07:00.019+01:00 INFO org.apache.spark.CacheManager Partition 
> rdd_73_9 not found, computing it Executor task launch worker-18 
> 2015-02-09T14:07:00.019+01:00 INFO 
> org.apache.spark.storage.ShuffleBlockFetcherIterator Getting 1 non-empty 
> blocks out of 1 blocks Executor task launch worker-18 
> 2015-02-09T14:07:00.019+01:00 INFO 
> org.apache.spark.storage.ShuffleBlockFetcherIterator Started 0 remote fetches 
> in 0 ms Executor task launch worker-18 
> 2015-02-09T14:07:00.185+01:00 INFO org.apache.spark.storage.MemoryStore 
> ensureFreeSpace(826250) called with curMem=307567295, maxMem... Executor task 
> launch worker-18 
> 2015-02-09T14:07:00.185+01:00 INFO org.apache.spark.storage.MemoryStore Block 
> rdd_73_9 stored as values in memory (estimated size 80... Executor task 
> launch worker-18 
> 2015-02-09T14:07:00.186+01:00 INFO 
> org.apache.spark.storage.BlockManagerMaster Updated info of block rdd_73_9 
> Executor task launch worker-18 
> 2015-02-09T14:07:00.190+01:00 INFO org.apache.spark.storage.BlockManager 
> Found block rdd_35_9 locally Executor task launch worker-18 
> 2015-02-09T14:07:00.190+01:00 INFO org.apache.spark.storage.BlockManager 
> Found block rdd_38_9 locally Executor task launch worker-18 
> 2015-02-09T14:07:00.194+01:00 INFO 
> org.apache.spark.storage.ShuffleBlockFetcherIterator Getting 34 non-empty 
> blocks out of 50 blocks Executor task launch worker-18 
> 2015-02-09T14:07:00.194+01:00 INFO 
> org.apache.spark.storage.ShuffleBlockFetcherIterator Started 0 remote fetches 
> in 1 ms Executor task launch worker-18 
> 2015-02-09T14:07:17.967+01:00 INFO 
> org.apache.spark.util.collection.ExternalAppendOnlyMap Thread 117 spilling 
> in-memory map of 670.2 MB to disk (1 tim... Executor task launch worker-18 
> 2015-02-09T14:07:46.716+01:00 INFO org.apache.spark.storage.BlockManager 
> Found block rdd_41_9 locally Executor task launch worker-18 
> 2015-02-09T14:07:47.603+01:00 INFO org.apache.spark.storage.BlockManager 
> Found block rdd_3_9 locally Executor task launch worker-18 
> 2015-02-09T14:07:47.897+01:00 INFO 
> org.apache.spark.util.collection.ExternalAppendOnlyMap Thread 117 spilling 
> in-memory map of 5.0 MB to disk (1 time ... Executor task launch worker-18 
> 2015-02-09T14:07:48.270+01:00 INFO 
> org.apache.spark.util.collection.ExternalAppendOnlyMap Thread 117 spilling 
> in-memory map of 5.0 MB to disk (2 times... Executor task launch worker-18 
> 2015-02-09T14:07:48.727+01:00 INFO 
> org.apache.spark.util.collection.ExternalAppendOnlyMap Thread 117 spilling 
> in-memory map of 5.0 MB to disk (3 times... Executor task launch worker-18 
> 2015-02-09T14:07:49.021+01:00 INFO 
> org.apache.spark.util.collection.ExternalAppendOnlyMap Thread 117 spilling 
> in-memory map of 5.0 MB to disk (4 times... Executor task launch worker-18 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to