Re: [SHUFFLE]FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle

2019-03-12 Thread Imran Rashid
We haven't seen many of these, but we have seen it a couple of times --
there is ongoing work under SPARK-26089 to address the issue we know about,
namely that we don't detect corruption in large shuffle blocks.

Do you believe the cases you have match that -- does it appear to be
corruption in large shuffle blocks?
Or do you not have compression or encryption enabled?  Both the prior
solution and the work under SPARK-26089 only work if either one of those is
enabled.

On Tue, Mar 12, 2019 at 9:36 AM Vadim Semenov  wrote:

> I/We have seen this error before on 1.6 but ever since we upgraded to 2.1
> two years ago we haven't seen it
>
> On Tue, Mar 12, 2019 at 2:19 AM wangfei  wrote:
>
>> Hi all,
>>  Non-deterministic FAILED_TO_UNCOMPRESS(5) or ’Stream is corrupted’  
>> errors
>> may occur during shuffle read, described as this JIRA(
>> https://issues.apache.org/jira/browse/SPARK-4105).
>>  There is not new comment for a long time in this JIRA.  So,  Is
>> there anyone seen these errors in latest version, such as spark-2.3?
>>  Can anyone provide a reproducible case or  analyze the cause of
>> these errors?
>>  Thanks.
>>
>
>
> --
> Sent from my iPhone
>


Re: [SHUFFLE]FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle

2019-03-12 Thread Vadim Semenov
I/We have seen this error before on 1.6 but ever since we upgraded to 2.1
two years ago we haven't seen it

On Tue, Mar 12, 2019 at 2:19 AM wangfei  wrote:

> Hi all,
>  Non-deterministic FAILED_TO_UNCOMPRESS(5) or ’Stream is corrupted’  
> errors
> may occur during shuffle read, described as this JIRA(
> https://issues.apache.org/jira/browse/SPARK-4105).
>  There is not new comment for a long time in this JIRA.  So,  Is there
> anyone seen these errors in latest version, such as spark-2.3?
>  Can anyone provide a reproducible case or  analyze the cause of these
> errors?
>  Thanks.
>


-- 
Sent from my iPhone


[SHUFFLE]FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle

2019-03-11 Thread wangfei
Hi all,
 Non-deterministic FAILED_TO_UNCOMPRESS(5) or ’Stream is corrupted’  errors 
may occur during shuffle read, described as this 
JIRA(https://issues.apache.org/jira/browse/SPARK-4105).
 There is not new comment for a long time in this JIRA.  So,  Is there 
anyone seen these errors in latest version, such as spark-2.3?
 Can anyone provide a reproducible case or  analyze the cause of these 
errors?
 Thanks.

Re: FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle

2015-06-01 Thread ๏̯͡๏
I am seeing the same issue with Spark 1.3.1.

I see this issue when reading sequence file stored in Sequence File format
(SEQorg.apache.hadoop.io.Textorg.apache.hadoop.io.Text'org.apache.hadoop.io.compress.GzipCodec?v?
)

All i do is
sc.sequenceFile(dwTable, classOf[Text], classOf[Text]).partitionBy(new
org.apache.spark.HashPartitioner(2053))
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
  .set("spark.kryoserializer.buffer.mb",
arguments.get("buffersize").get)
  .set("spark.kryoserializer.buffer.max.mb",
arguments.get("maxbuffersize").get)
  .set("spark.driver.maxResultSize", arguments.get("maxResultSize").get)
  .set("spark.yarn.maxAppAttempts", "0")
  //.set("spark.akka.askTimeout", arguments.get("askTimeout").get)
  //.set("spark.akka.timeout", arguments.get("akkaTimeout").get)
  //.set("spark.worker.timeout", arguments.get("workerTimeout").get)

.registerKryoClasses(Array(classOf[com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum]))


and values are
buffersize=128 maxbuffersize=1068 maxResultSize=200G

On Thu, May 7, 2015 at 8:04 AM, Jianshi Huang 
wrote:

> I'm using the default settings.
>
> Jianshi
>
> On Wed, May 6, 2015 at 7:05 PM, twinkle sachdeva <
> twinkle.sachd...@gmail.com> wrote:
>
>> Hi,
>>
>> Can you please share your compression etc settings, which you are using.
>>
>> Thanks,
>> Twinkle
>>
>> On Wed, May 6, 2015 at 4:15 PM, Jianshi Huang 
>> wrote:
>>
>>> I'm facing this error in Spark 1.3.1
>>>
>>>   https://issues.apache.org/jira/browse/SPARK-4105
>>>
>>> Anyone knows what's the workaround? Change the compression codec for
>>> shuffle output?
>>>
>>> --
>>> Jianshi Huang
>>>
>>> LinkedIn: jianshi
>>> Twitter: @jshuang
>>> Github & Blog: http://huangjs.github.com/
>>>
>>
>>
>
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>



-- 
Deepak


Re: FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle

2015-05-06 Thread Jianshi Huang
I'm using the default settings.

Jianshi

On Wed, May 6, 2015 at 7:05 PM, twinkle sachdeva  wrote:

> Hi,
>
> Can you please share your compression etc settings, which you are using.
>
> Thanks,
> Twinkle
>
> On Wed, May 6, 2015 at 4:15 PM, Jianshi Huang 
> wrote:
>
>> I'm facing this error in Spark 1.3.1
>>
>>   https://issues.apache.org/jira/browse/SPARK-4105
>>
>> Anyone knows what's the workaround? Change the compression codec for
>> shuffle output?
>>
>> --
>> Jianshi Huang
>>
>> LinkedIn: jianshi
>> Twitter: @jshuang
>> Github & Blog: http://huangjs.github.com/
>>
>
>


-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/


Re: FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle

2015-05-06 Thread twinkle sachdeva
Hi,

Can you please share your compression etc settings, which you are using.

Thanks,
Twinkle

On Wed, May 6, 2015 at 4:15 PM, Jianshi Huang 
wrote:

> I'm facing this error in Spark 1.3.1
>
>   https://issues.apache.org/jira/browse/SPARK-4105
>
> Anyone knows what's the workaround? Change the compression codec for
> shuffle output?
>
> --
> Jianshi Huang
>
> LinkedIn: jianshi
> Twitter: @jshuang
> Github & Blog: http://huangjs.github.com/
>


FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle

2015-05-06 Thread Jianshi Huang
I'm facing this error in Spark 1.3.1

  https://issues.apache.org/jira/browse/SPARK-4105

Anyone knows what's the workaround? Change the compression codec for
shuffle output?

-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/