We haven't seen many of these, but we have seen it a couple of times -- there is ongoing work under SPARK-26089 to address the issue we know about, namely that we don't detect corruption in large shuffle blocks.
Do you believe the cases you have match that -- does it appear to be corruption in large shuffle blocks? Or do you not have compression or encryption enabled? Both the prior solution and the work under SPARK-26089 only work if either one of those is enabled. On Tue, Mar 12, 2019 at 9:36 AM Vadim Semenov <va...@datadoghq.com> wrote: > I/We have seen this error before on 1.6 but ever since we upgraded to 2.1 > two years ago we haven't seen it > > On Tue, Mar 12, 2019 at 2:19 AM wangfei <hzfeiw...@163.com> wrote: > >> Hi all, >> Non-deterministic FAILED_TO_UNCOMPRESS(5) or ’Stream is corrupted’ >> errors >> may occur during shuffle read, described as this JIRA( >> https://issues.apache.org/jira/browse/SPARK-4105). >> There is not new comment for a long time in this JIRA. So, Is >> there anyone seen these errors in latest version, such as spark-2.3? >> Can anyone provide a reproducible case or analyze the cause of >> these errors? >> Thanks. >> > > > -- > Sent from my iPhone >