Thank you so much for this information, Sean. One more question, that when
it wants to re-run the failed partition, where does it run? On the same
node or some other node?


On Fri, 21 Jan 2022, 23:41 Sean Owen, <sro...@gmail.com> wrote:

> The Spark program already knows the partitions of the data and where they
> exist; that's just defined by the data layout. It doesn't care what data is
> inside. It knows partition 1 needs to be processed and if the task
> processing it fails, needs to be run again. I'm not sure where you're
> seeing data loss here? the data is already stored to begin with, not
> somehow consumed and deleted.
>
> On Fri, Jan 21, 2022 at 12:07 PM Siddhesh Kalgaonkar <
> kalgaonkarsiddh...@gmail.com> wrote:
>
>> Okay, so suppose I have 10 records distributed across 5 nodes and the
>> partition of the first node holding 2 records failed. I understand that it
>> will re-process this partition but how will it come to know that XYZ
>> partition was holding XYZ data so that it will pick again only those
>> records and reprocess it? In case of failure of a partition, is there a
>> data loss? or is it stored somewhere?
>>
>> Maybe my question is very naive but I am trying to understand it in a
>> better way.
>>
>> On Fri, Jan 21, 2022 at 11:32 PM Sean Owen <sro...@gmail.com> wrote:
>>
>>> In that case, the file exists in parts across machines. No, tasks won't
>>> re-read the whole file; no task does or can do that. Failed partitions are
>>> reprocessed, but as in the first pass, the same partition is processed.
>>>
>>> On Fri, Jan 21, 2022 at 12:00 PM Siddhesh Kalgaonkar <
>>> kalgaonkarsiddh...@gmail.com> wrote:
>>>
>>>> Hello team,
>>>>
>>>> I am aware that in case of memory issues when a task fails, it will try
>>>> to restart 4 times since it is a default number and if it still fails then
>>>> it will cause the entire job to fail.
>>>>
>>>> But suppose if I am reading a file that is distributed across nodes in
>>>> partitions. So, what will happen if a partition fails that holds some data?
>>>> Will it re-read the entire file and get that specific subset of data since
>>>> the driver has the complete information? or will it copy the data to the
>>>> other working nodes or tasks and try to run it?
>>>>
>>>

Reply via email to