>>> speculatively execute tasks on different executors to improve
>>>>>> performance. If a task fails due to the *FetchFailedException*, a
>>>>>> speculative task might be launched on another executor. This is where fun
>>>>>> and games start. If the unavailable node recovers before t
e recovers before the speculative
>>>>>> task finishes, both the original and speculative tasks might complete
>>>>>> successfully,* resulting in duplicates*. With regard to missing
>>>>>> data, if the data node reboot leads to data corru
>>> I think when a task failed in between and retry task started
>>>>> and completed it may create duplicate as failed task has some data + retry
>>>>> task has full data. but my question is why spark keeps delta data or
>>>>> according to you i
a task fails due to the *FetchFailedException*, a
>>>>> speculative task might be launched on another executor. This is where fun
>>>>> and games start. If the unavailable node recovers before the speculative
>>>>> task finishes, both the original and specul
me features to mitigate these
>>>> issues, but it might not guarantee complete elimination of duplicates or
>>>> data loss:. You can adjust parameters like *spark.shuffle.retry.wa*it
>>>> and *spark.speculation* to control retry attempts and speculative
>>>> exe
t;>>> On Thu, Feb 29, 2024 at 9:50 PM Dongjoon Hyun
>>>> wrote:
>>>>
>>>>> Please use the url as thr full string including '()' part.
>>>>>
>>>>> Or you can seach directly at ASF Jira with 'Spark' project and three
ception due to data node reboot then spark
> should handle it gracefully isn't it ?
> or how to handle it ?
>
>
>
>
>
> On Fri, Mar 1, 2024 at 5:35 PM Mich Talebzadeh
> wrote:
>
>> Hi,
>>
>> Your point -> "When Spark job shows Fe
at 5:35 PM Mich Talebzadeh
wrote:
> Hi,
>
> Your point -> "When Spark job shows FetchFailedException it creates few
> duplicate data and we see few data also missing , please explain why. We
> have scenario when spark job complains *FetchFailedException as one of
> the
Hi,
Your point -> "When Spark job shows FetchFailedException it creates few
duplicate data and we see few data also missing , please explain why. We
have scenario when spark job complains *FetchFailedException as one of the
data node got ** rebooted middle of job running ."*
As
Hello All,
in the list of JIRAs i didn't find anything related to fetchFailedException.
as mentioned above
"When Spark job shows FetchFailedException it creates few duplicate data
and we see few data also missing , please explain why. We have a scenario
when spark job comp
ld be help if you can report any correctness issues with Apache
>> Spark 3.5.1.
>>
>> Thanks,
>> Dongjoon.
>>
>> On 2024/02/29 15:04:41 Prem Sahoo wrote:
>> > When Spark job shows FetchFailedException it creates few duplicate data
>> and
>> &
s with Apache
> Spark 3.5.1.
>
> Thanks,
> Dongjoon.
>
> On 2024/02/29 15:04:41 Prem Sahoo wrote:
> > When Spark job shows FetchFailedException it creates few duplicate data
> and
> > we see few data also missing , please explain why. We have scenario when
> >
if you can report any correctness issues with Apache Spark
3.5.1.
Thanks,
Dongjoon.
On 2024/02/29 15:04:41 Prem Sahoo wrote:
> When Spark job shows FetchFailedException it creates few duplicate data and
> we see few data also missing , please explain why. We have scenario when
>
When Spark job shows FetchFailedException it creates few duplicate data and
we see few data also missing , please explain why. We have scenario when
spark job complains FetchFailedException as one of the data node got
rebooted middle of job running .
Now due to this we have few duplicate data
14 matches
Mail list logo