Hi Andrew, We were running the master after SPARK-3613. Will give another shot against the current master while Josh fixed couple issues in shuffle. Thanks.
Sincerely, DB Tsai ------------------------------------------------------- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Thu, Oct 23, 2014 at 10:17 AM, Andrew Or <and...@databricks.com> wrote: > To add to Aaron's response, `spark.shuffle.consolidateFiles` only applies to > hash-based shuffle, so you shouldn't have to set it for sort-based shuffle. > And yes, since you changed neither `spark.shuffle.compress` nor > `spark.shuffle.spill.compress` you can't possibly have run into what #2890 > fixes. > > I'm assuming you're running master? Was it before or after this commit: > https://github.com/apache/spark/commit/6b79bfb42580b6bd4c4cd99fb521534a94150693? > > -Andrew > > 2014-10-22 16:37 GMT-07:00 Aaron Davidson <ilike...@gmail.com>: > >> You may be running into this issue: >> https://issues.apache.org/jira/browse/SPARK-4019 >> >> You could check by having 2000 or fewer reduce partitions. >> >> On Wed, Oct 22, 2014 at 1:48 PM, DB Tsai <dbt...@dbtsai.com> wrote: >>> >>> PS, sorry for spamming the mailing list. Based my knowledge, both >>> spark.shuffle.spill.compress and spark.shuffle.compress are default to >>> true, so in theory, we should not run into this issue if we don't >>> change any setting. Is there any other big we run into? >>> >>> Thanks. >>> >>> Sincerely, >>> >>> DB Tsai >>> ------------------------------------------------------- >>> My Blog: https://www.dbtsai.com >>> LinkedIn: https://www.linkedin.com/in/dbtsai >>> >>> >>> On Wed, Oct 22, 2014 at 1:37 PM, DB Tsai <dbt...@dbtsai.com> wrote: >>> > Or can it be solved by setting both of the following setting into true >>> > for now? >>> > >>> > spark.shuffle.spill.compress true >>> > spark.shuffle.compress ture >>> > >>> > Sincerely, >>> > >>> > DB Tsai >>> > ------------------------------------------------------- >>> > My Blog: https://www.dbtsai.com >>> > LinkedIn: https://www.linkedin.com/in/dbtsai >>> > >>> > >>> > On Wed, Oct 22, 2014 at 1:34 PM, DB Tsai <dbt...@dbtsai.com> wrote: >>> >> It seems that this issue should be addressed by >>> >> https://github.com/apache/spark/pull/2890 ? Am I right? >>> >> >>> >> Sincerely, >>> >> >>> >> DB Tsai >>> >> ------------------------------------------------------- >>> >> My Blog: https://www.dbtsai.com >>> >> LinkedIn: https://www.linkedin.com/in/dbtsai >>> >> >>> >> >>> >> On Wed, Oct 22, 2014 at 11:54 AM, DB Tsai <dbt...@dbtsai.com> wrote: >>> >>> Hi all, >>> >>> >>> >>> With SPARK-3948, the exception in Snappy PARSING_ERROR is gone, but >>> >>> I've another exception now. I've no clue about what's going on; does >>> >>> anyone run into similar issue? Thanks. >>> >>> >>> >>> This is the configuration I use. >>> >>> spark.rdd.compress true >>> >>> spark.shuffle.consolidateFiles true >>> >>> spark.shuffle.manager SORT >>> >>> spark.akka.frameSize 128 >>> >>> spark.akka.timeout 600 >>> >>> spark.core.connection.ack.wait.timeout 600 >>> >>> spark.core.connection.auth.wait.timeout 300 >>> >>> >>> >>> >>> >>> java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325) >>> >>> >>> >>> java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794) >>> >>> >>> >>> java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801) >>> >>> java.io.ObjectInputStream.<init>(ObjectInputStream.java:299) >>> >>> >>> >>> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.<init>(JavaSerializer.scala:57) >>> >>> >>> >>> org.apache.spark.serializer.JavaDeserializationStream.<init>(JavaSerializer.scala:57) >>> >>> >>> >>> org.apache.spark.serializer.JavaSerializerInstance.deserializeStream(JavaSerializer.scala:95) >>> >>> >>> >>> org.apache.spark.storage.BlockManager.getLocalShuffleFromDisk(BlockManager.scala:351) >>> >>> >>> >>> org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$fetchLocalBlocks$1$$anonfun$apply$4.apply(ShuffleBlockFetcherIterator.scala:196) >>> >>> >>> >>> org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$fetchLocalBlocks$1$$anonfun$apply$4.apply(ShuffleBlockFetcherIterator.scala:196) >>> >>> >>> >>> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:243) >>> >>> >>> >>> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:52) >>> >>> >>> >>> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) >>> >>> >>> >>> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30) >>> >>> >>> >>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) >>> >>> >>> >>> org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:89) >>> >>> >>> >>> org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:44) >>> >>> >>> >>> org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92) >>> >>> >>> >>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >>> >>> org.apache.spark.rdd.RDD.iterator(RDD.scala:229) >>> >>> org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) >>> >>> >>> >>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >>> >>> org.apache.spark.rdd.RDD.iterator(RDD.scala:229) >>> >>> >>> >>> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) >>> >>> >>> >>> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >>> >>> org.apache.spark.rdd.RDD.iterator(RDD.scala:229) >>> >>> >>> >>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) >>> >>> >>> >>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) >>> >>> org.apache.spark.scheduler.Task.run(Task.scala:56) >>> >>> >>> >>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181) >>> >>> >>> >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> >>> >>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> >>> java.lang.Thread.run(Thread.java:744) >>> >>> >>> >>> >>> >>> Sincerely, >>> >>> >>> >>> DB Tsai >>> >>> ------------------------------------------------------- >>> >>> My Blog: https://www.dbtsai.com >>> >>> LinkedIn: https://www.linkedin.com/in/dbtsai >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>> For additional commands, e-mail: user-h...@spark.apache.org >>> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org