In that case, the file exists in parts across machines. No, tasks won't re-read the whole file; no task does or can do that. Failed partitions are reprocessed, but as in the first pass, the same partition is processed.
On Fri, Jan 21, 2022 at 12:00 PM Siddhesh Kalgaonkar < kalgaonkarsiddh...@gmail.com> wrote: > Hello team, > > I am aware that in case of memory issues when a task fails, it will try to > restart 4 times since it is a default number and if it still fails then it > will cause the entire job to fail. > > But suppose if I am reading a file that is distributed across nodes in > partitions. So, what will happen if a partition fails that holds some data? > Will it re-read the entire file and get that specific subset of data since > the driver has the complete information? or will it copy the data to the > other working nodes or tasks and try to run it? >