Re: [SHUFFLE]FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle
We haven't seen many of these, but we have seen it a couple of times -- there is ongoing work under SPARK-26089 to address the issue we know about, namely that we don't detect corruption in large shuffle blocks. Do you believe the cases you have match that -- does it appear to be corruption in large shuffle blocks? Or do you not have compression or encryption enabled? Both the prior solution and the work under SPARK-26089 only work if either one of those is enabled. On Tue, Mar 12, 2019 at 9:36 AM Vadim Semenov wrote: > I/We have seen this error before on 1.6 but ever since we upgraded to 2.1 > two years ago we haven't seen it > > On Tue, Mar 12, 2019 at 2:19 AM wangfei wrote: > >> Hi all, >> Non-deterministic FAILED_TO_UNCOMPRESS(5) or ’Stream is corrupted’ >> errors >> may occur during shuffle read, described as this JIRA( >> https://issues.apache.org/jira/browse/SPARK-4105). >> There is not new comment for a long time in this JIRA. So, Is >> there anyone seen these errors in latest version, such as spark-2.3? >> Can anyone provide a reproducible case or analyze the cause of >> these errors? >> Thanks. >> > > > -- > Sent from my iPhone >
Re: [SHUFFLE]FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle
I/We have seen this error before on 1.6 but ever since we upgraded to 2.1 two years ago we haven't seen it On Tue, Mar 12, 2019 at 2:19 AM wangfei wrote: > Hi all, > Non-deterministic FAILED_TO_UNCOMPRESS(5) or ’Stream is corrupted’ > errors > may occur during shuffle read, described as this JIRA( > https://issues.apache.org/jira/browse/SPARK-4105). > There is not new comment for a long time in this JIRA. So, Is there > anyone seen these errors in latest version, such as spark-2.3? > Can anyone provide a reproducible case or analyze the cause of these > errors? > Thanks. > -- Sent from my iPhone
[SHUFFLE]FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle
Hi all, Non-deterministic FAILED_TO_UNCOMPRESS(5) or ’Stream is corrupted’ errors may occur during shuffle read, described as this JIRA(https://issues.apache.org/jira/browse/SPARK-4105). There is not new comment for a long time in this JIRA. So, Is there anyone seen these errors in latest version, such as spark-2.3? Can anyone provide a reproducible case or analyze the cause of these errors? Thanks.
Re: FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle
I am seeing the same issue with Spark 1.3.1. I see this issue when reading sequence file stored in Sequence File format (SEQorg.apache.hadoop.io.Textorg.apache.hadoop.io.Text'org.apache.hadoop.io.compress.GzipCodec?v? ) All i do is sc.sequenceFile(dwTable, classOf[Text], classOf[Text]).partitionBy(new org.apache.spark.HashPartitioner(2053)) .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") .set("spark.kryoserializer.buffer.mb", arguments.get("buffersize").get) .set("spark.kryoserializer.buffer.max.mb", arguments.get("maxbuffersize").get) .set("spark.driver.maxResultSize", arguments.get("maxResultSize").get) .set("spark.yarn.maxAppAttempts", "0") //.set("spark.akka.askTimeout", arguments.get("askTimeout").get) //.set("spark.akka.timeout", arguments.get("akkaTimeout").get) //.set("spark.worker.timeout", arguments.get("workerTimeout").get) .registerKryoClasses(Array(classOf[com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum])) and values are buffersize=128 maxbuffersize=1068 maxResultSize=200G On Thu, May 7, 2015 at 8:04 AM, Jianshi Huang wrote: > I'm using the default settings. > > Jianshi > > On Wed, May 6, 2015 at 7:05 PM, twinkle sachdeva < > twinkle.sachd...@gmail.com> wrote: > >> Hi, >> >> Can you please share your compression etc settings, which you are using. >> >> Thanks, >> Twinkle >> >> On Wed, May 6, 2015 at 4:15 PM, Jianshi Huang >> wrote: >> >>> I'm facing this error in Spark 1.3.1 >>> >>> https://issues.apache.org/jira/browse/SPARK-4105 >>> >>> Anyone knows what's the workaround? Change the compression codec for >>> shuffle output? >>> >>> -- >>> Jianshi Huang >>> >>> LinkedIn: jianshi >>> Twitter: @jshuang >>> Github & Blog: http://huangjs.github.com/ >>> >> >> > > > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ > -- Deepak
Re: FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle
I'm using the default settings. Jianshi On Wed, May 6, 2015 at 7:05 PM, twinkle sachdeva wrote: > Hi, > > Can you please share your compression etc settings, which you are using. > > Thanks, > Twinkle > > On Wed, May 6, 2015 at 4:15 PM, Jianshi Huang > wrote: > >> I'm facing this error in Spark 1.3.1 >> >> https://issues.apache.org/jira/browse/SPARK-4105 >> >> Anyone knows what's the workaround? Change the compression codec for >> shuffle output? >> >> -- >> Jianshi Huang >> >> LinkedIn: jianshi >> Twitter: @jshuang >> Github & Blog: http://huangjs.github.com/ >> > > -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/
Re: FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle
Hi, Can you please share your compression etc settings, which you are using. Thanks, Twinkle On Wed, May 6, 2015 at 4:15 PM, Jianshi Huang wrote: > I'm facing this error in Spark 1.3.1 > > https://issues.apache.org/jira/browse/SPARK-4105 > > Anyone knows what's the workaround? Change the compression codec for > shuffle output? > > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ >
FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle
I'm facing this error in Spark 1.3.1 https://issues.apache.org/jira/browse/SPARK-4105 Anyone knows what's the workaround? Change the compression codec for shuffle output? -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/