[ https://issues.apache.org/jira/browse/SPARK-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-5715. ------------------------------ Resolution: Duplicate Can you move your comment to the existing JIRA, SPARK-5081? > Shuffle size increase, performance loss from Spark 1.1.0 to Spark 1.2.0 (and > 1.2.1) > ----------------------------------------------------------------------------------- > > Key: SPARK-5715 > URL: https://issues.apache.org/jira/browse/SPARK-5715 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.2.0, 1.2.1 > Environment: Running with local[*] as master (experienced it during > pre-integration test, not in production cluster), 100GByte memory assigned, > 16 core machine. > Reporter: Dr. Christian Betz > Labels: performance > > I see a *factor four performance loss* in my Spark jobs when migrating from > Spark 1.1.0 to Spark 1.2.0 or 1.2.1. > Also, I see an *increase in the size of shuffle writes* (which is also > reported by Kevin Jung on the mailing list: > http://apache-spark-user-list.1001560.n3.nabble.com/Shuffle-write-increases-in-spark-1-2-tt20894.html > > Together with this I experience a *huge number of disk spills*. > I'm experiencing these with my job under the following circumstances: > * Spark 1.2.0 with Sort-based Shuffle > * Spark 1.2.0 with Hash-based Shuffle > * Spark 1.2.1 with Sort-based Shuffle > All three combinations show the same behavior, which contrasts from Spark > 1.1.0. > In Spark 1.1.0, my job runs for about an hour, in Spark 1.2.x it runs for > almost four hours. Configuration is identical otherwise - I only added > org.apache.spark.scheduler.CompressedMapStatus to the Kryo registrator for > Spark 1.2.0 to cope with https://issues.apache.org/jira/browse/SPARK-5102. > As a consequence (I think, but causality might be different) I see lots and > lots of disk spills. > I cannot provide a small test case, but maybe the log entries for a single > worker thread can help someone investigate on this. (See below.) > I will also open up an issue, if nobody stops me by providing an answer ;) > Any help will be greatly appreciated, because otherwise I'm stuck with Spark > 1.1.0, as quadrupling runtime is not an option. > Sincerely, > Chris > 2015-02-09T14:06:06.328+01:00 INFO org.apache.spark.executor.Executor Running > task 9.0 in stage 18.0 (TID 300) Executor task launch worker-18 > 2015-02-09T14:06:06.351+01:00 INFO org.apache.spark.CacheManager Partition > rdd_35_9 not found, computing it Executor task launch worker-18 > 2015-02-09T14:06:06.351+01:00 INFO > org.apache.spark.storage.ShuffleBlockFetcherIterator Getting 10 non-empty > blocks out of 10 blocks Executor task launch worker-18 > 2015-02-09T14:06:06.351+01:00 INFO > org.apache.spark.storage.ShuffleBlockFetcherIterator Started 0 remote fetches > in 0 ms Executor task launch worker-18 > 2015-02-09T14:06:07.396+01:00 INFO org.apache.spark.storage.MemoryStore > ensureFreeSpace(2582904) called with curMem=300174944, maxMe... Executor task > launch worker-18 > 2015-02-09T14:06:07.397+01:00 INFO org.apache.spark.storage.MemoryStore Block > rdd_35_9 stored as bytes in memory (estimated size 2.5... Executor task > launch worker-18 > 2015-02-09T14:06:07.398+01:00 INFO > org.apache.spark.storage.BlockManagerMaster Updated info of block rdd_35_9 > Executor task launch worker-18 > 2015-02-09T14:06:07.399+01:00 INFO org.apache.spark.CacheManager Partition > rdd_38_9 not found, computing it Executor task launch worker-18 > 2015-02-09T14:06:07.399+01:00 INFO > org.apache.spark.storage.ShuffleBlockFetcherIterator Getting 10 non-empty > blocks out of 10 blocks Executor task launch worker-18 > 2015-02-09T14:06:07.400+01:00 INFO > org.apache.spark.storage.ShuffleBlockFetcherIterator Started 0 remote fetches > in 0 ms Executor task launch worker-18 > 2015-02-09T14:06:07.567+01:00 INFO org.apache.spark.storage.MemoryStore > ensureFreeSpace(944848) called with curMem=302757848, maxMem... Executor task > launch worker-18 > 2015-02-09T14:06:07.568+01:00 INFO org.apache.spark.storage.MemoryStore Block > rdd_38_9 stored as values in memory (estimated size 92... Executor task > launch worker-18 > 2015-02-09T14:06:07.569+01:00 INFO > org.apache.spark.storage.BlockManagerMaster Updated info of block rdd_38_9 > Executor task launch worker-18 > 2015-02-09T14:06:07.573+01:00 INFO > org.apache.spark.storage.ShuffleBlockFetcherIterator Getting 34 non-empty > blocks out of 50 blocks Executor task launch worker-18 > 2015-02-09T14:06:07.573+01:00 INFO > org.apache.spark.storage.ShuffleBlockFetcherIterator Started 0 remote fetches > in 1 ms Executor task launch worker-18 > 2015-02-09T14:06:38.931+01:00 INFO org.apache.spark.CacheManager Partition > rdd_41_9 not found, computing it Executor task launch worker-18 > 2015-02-09T14:06:38.931+01:00 INFO > org.apache.spark.storage.ShuffleBlockFetcherIterator Getting 3 non-empty > blocks out of 10 blocks Executor task launch worker-18 > 2015-02-09T14:06:38.931+01:00 INFO > org.apache.spark.storage.ShuffleBlockFetcherIterator Started 0 remote fetches > in 0 ms Executor task launch worker-18 > 2015-02-09T14:06:38.945+01:00 INFO org.apache.spark.storage.MemoryStore > ensureFreeSpace(0) called with curMem=307529127, maxMem=9261... Executor task > launch worker-18 > 2015-02-09T14:06:38.945+01:00 INFO org.apache.spark.storage.MemoryStore Block > rdd_41_9 stored as bytes in memory (estimated size 0.0... Executor task > launch worker-18 > 2015-02-09T14:06:38.946+01:00 INFO > org.apache.spark.storage.BlockManagerMaster Updated info of block rdd_41_9 > Executor task launch worker-18 > 2015-02-09T14:06:38.946+01:00 WARN org.apache.spark.storage.BlockManager > Block rdd_41_9 replicated to only 0 peer(s) instead of 1 pee... Executor task > launch worker-18 > 2015-02-09T14:06:39.088+01:00 INFO org.apache.spark.storage.BlockManager > Found block rdd_3_9 locally Executor task launch worker-18 > 2015-02-09T14:06:41.389+01:00 INFO org.apache.spark.CacheManager Partition > rdd_7_9 not found, computing it Executor task launch worker-18 > 2015-02-09T14:06:41.389+01:00 INFO > org.apache.spark.storage.ShuffleBlockFetcherIterator Getting 1 non-empty > blocks out of 1 blocks Executor task launch worker-18 > 2015-02-09T14:06:41.389+01:00 INFO > org.apache.spark.storage.ShuffleBlockFetcherIterator Started 0 remote fetches > in 0 ms Executor task launch worker-18 > 2015-02-09T14:06:41.402+01:00 INFO org.apache.spark.storage.MemoryStore > ensureFreeSpace(38144) called with curMem=307529151, maxMem=... Executor task > launch worker-18 > 2015-02-09T14:06:41.402+01:00 INFO org.apache.spark.storage.MemoryStore Block > rdd_7_9 stored as values in memory (estimated size 37.... Executor task > launch worker-18 > 2015-02-09T14:06:41.404+01:00 INFO > org.apache.spark.storage.BlockManagerMaster Updated info of block rdd_7_9 > Executor task launch worker-18 > 2015-02-09T14:07:00.019+01:00 INFO org.apache.spark.CacheManager Partition > rdd_73_9 not found, computing it Executor task launch worker-18 > 2015-02-09T14:07:00.019+01:00 INFO > org.apache.spark.storage.ShuffleBlockFetcherIterator Getting 1 non-empty > blocks out of 1 blocks Executor task launch worker-18 > 2015-02-09T14:07:00.019+01:00 INFO > org.apache.spark.storage.ShuffleBlockFetcherIterator Started 0 remote fetches > in 0 ms Executor task launch worker-18 > 2015-02-09T14:07:00.185+01:00 INFO org.apache.spark.storage.MemoryStore > ensureFreeSpace(826250) called with curMem=307567295, maxMem... Executor task > launch worker-18 > 2015-02-09T14:07:00.185+01:00 INFO org.apache.spark.storage.MemoryStore Block > rdd_73_9 stored as values in memory (estimated size 80... Executor task > launch worker-18 > 2015-02-09T14:07:00.186+01:00 INFO > org.apache.spark.storage.BlockManagerMaster Updated info of block rdd_73_9 > Executor task launch worker-18 > 2015-02-09T14:07:00.190+01:00 INFO org.apache.spark.storage.BlockManager > Found block rdd_35_9 locally Executor task launch worker-18 > 2015-02-09T14:07:00.190+01:00 INFO org.apache.spark.storage.BlockManager > Found block rdd_38_9 locally Executor task launch worker-18 > 2015-02-09T14:07:00.194+01:00 INFO > org.apache.spark.storage.ShuffleBlockFetcherIterator Getting 34 non-empty > blocks out of 50 blocks Executor task launch worker-18 > 2015-02-09T14:07:00.194+01:00 INFO > org.apache.spark.storage.ShuffleBlockFetcherIterator Started 0 remote fetches > in 1 ms Executor task launch worker-18 > 2015-02-09T14:07:17.967+01:00 INFO > org.apache.spark.util.collection.ExternalAppendOnlyMap Thread 117 spilling > in-memory map of 670.2 MB to disk (1 tim... Executor task launch worker-18 > 2015-02-09T14:07:46.716+01:00 INFO org.apache.spark.storage.BlockManager > Found block rdd_41_9 locally Executor task launch worker-18 > 2015-02-09T14:07:47.603+01:00 INFO org.apache.spark.storage.BlockManager > Found block rdd_3_9 locally Executor task launch worker-18 > 2015-02-09T14:07:47.897+01:00 INFO > org.apache.spark.util.collection.ExternalAppendOnlyMap Thread 117 spilling > in-memory map of 5.0 MB to disk (1 time ... Executor task launch worker-18 > 2015-02-09T14:07:48.270+01:00 INFO > org.apache.spark.util.collection.ExternalAppendOnlyMap Thread 117 spilling > in-memory map of 5.0 MB to disk (2 times... Executor task launch worker-18 > 2015-02-09T14:07:48.727+01:00 INFO > org.apache.spark.util.collection.ExternalAppendOnlyMap Thread 117 spilling > in-memory map of 5.0 MB to disk (3 times... Executor task launch worker-18 > 2015-02-09T14:07:49.021+01:00 INFO > org.apache.spark.util.collection.ExternalAppendOnlyMap Thread 117 spilling > in-memory map of 5.0 MB to disk (4 times... Executor task launch worker-18 -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org