Hi, In some of the data science use cases like Predictions etc, we are using Spark. Most of the times, we faced data skew ness issues and we have distributed them using Murmur hashing or round robin assignment and fixed skew ness issue across the partitions/tasks.
But still, some of the tasks are taking huge time due to it's logical flow based on the nature of the data for a particular key. For our use cases, we are OK, if we omit few tasks, if they cannot complete in certain amount of time. That's why we have implemented task level time outs and our job is still successful, even some of the tasks, are not completed in defined time, with this we are able to define the SLA's for our Spark applications. Is there any mechanism from Spark framework to define the task level time outs and making job successful even with x% of the tasks are successful. (Where x can be configured)? And anyone faced such issues? Thanks & Regards, B Anil Kumar.