Re: Data correctness issue with Repartition + FetchFailure

2022-03-14 Thread Jason Xu
:47 PM Reynold Xin wrote: > This is why RoundRobinPartitioning shouldn't be used ... > > > On Sat, Mar 12, 2022 at 12:08 PM, Jason Xu > wrote: > >> Hi Spark community, >> >> I reported a data correctness issue in >> https://issues.apache.org/jira/browse/S

Re: Data correctness issue with Repartition + FetchFailure

2022-03-15 Thread Jason Xu
r 15, 2022 at 1:50 AM Jason Xu wrote: > >> Hi Reynold, do you suggest removing RoundRobinPartitioning in >> repartition(numPartitions: Int) API implementation? If that's the direction >> we're considering, before we have a new implementation, should we suggest >> users avoi

Data correctness issue with Repartition + FetchFailure

2022-03-12 Thread Jason Xu
in the ticket. I report here to bring more attention, could you help confirm it's a bug and worth effort to further investigate and fix, thank you in advance for help! Thanks, Jason Xu

Re: Spark on Yarn with Java 17

2023-12-10 Thread Jason Xu
m/LucaCanali/Miscellaneous/blob/master/Spark_Notes/Spark_Set_Java_Home_Howto.md > > > > Best, > > Luca > > > > *From:* Dongjoon Hyun > *Sent:* Saturday, December 9, 2023 09:39 > *To:* Jason Xu > *Cc:* dev@spark.apache.org > *Subject:* Re: Spark on Yarn with Jav

Spark on Yarn with Java 17

2023-12-08 Thread Jason Xu
ark 4 depend on the availability of Java 17 support in Hadoop? Additionally, do we have a rough estimate for the release of Spark 4? Thanks! Cheers, Jason Xu

Re: Spark on Yarn with Java 17

2023-12-08 Thread Jason Xu
(4.0.0, NEW) > > Thanks, > Dongjoon. > > On 2023/12/08 23:50:15 Jason Xu wrote: > > Hi Spark devs, > > > > According to the Spark 3.5 release notes, Spark 4 will no longer support > > Java 8 and 11 (link > > < > https://spark.apache.org/releases/spark

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-04 Thread Jason Xu
Hi Prem, >From the symptom of shuffle fetch failure and few duplicate data and few missing data, I think you might run into this correctness bug: https://issues.apache.org/jira/browse/SPARK-38388. Node/shuffle failure is hard to avoid, I wonder if you have non-deterministic logic and calling