[SparkListener] Calculating the total amount of re-computations / waste

Faiz Halde Fri, 14 Oct 2022 06:56:15 -0700

Hello,

We run our spark workloads on spot and we would like to quantify the impact
of spot interruptions on our workloads. We are proposing the following
metric but would like your opinions on it


We are leveraging Spark's Event Listener and performing the following

T = task

T1 = sum(T.execution-time) for all T where T.status=failed and
T.stage-attempt-number = 0

T2 = sum(T.execution-time) for all T where T.stage-attempt-number > 0

Tall = sum(T.execution-time)

Retry% = (T1 + T2) / Tall

The assumption is that

T1 – IF a stage is executing for the first time then only tasks that failed
was waste
T2 – every task executed for a stage with stage-attempt-number > 0 is a
retry since the stage was succeeded previously

[SparkListener] Calculating the total amount of re-computations / waste

Reply via email to