Re: [PR] [SPARK-49038][SQL] Fix regression in Spark UI SQL operator metrics calculation to filter out invalid accumulator values correctly [spark]


virrrat commented on PR #47516:
URL: https://github.com/apache/spark/pull/47516#issuecomment-2260312096

> Do we have a test case to demonstrate the issue? AFAIK Spark filters out
0-value accumulators at the executor side.

`SQLMetrics.stringValue` has different logic for different metric types to
filter out invalid accumulator values.

1. For `AVERAGE_METRIC` zero and negative values are filtered out
[here](https://github.com/apache/spark/blob/9f22fa4d2acfcbc42d6d76a28778885cbdad733d/sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala#L225).
2. For `SIZE`, `TIMING` and `NS_TIMING`, negative values are filtered out
[here](https://github.com/apache/spark/blob/9f22fa4d2acfcbc42d6d76a28778885cbdad733d/sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala#L251).
Given zero is a valid metric value for them, those are not filtered out.

So the change made in #39311 basically converts invalid `-1` accumulator
value to a valid `0` value which isn't getting filtered out now, resulting in
wrong `min` and `median` values for size and time metrics. While the `max` and
`cumulative` value still matches.

I think there is no existing test for this, that's why it was never caught.
Let me see if I can add one to demonstrate the issue.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-49038][SQL] Fix regression in Spark UI SQL operator metrics calculation to filter out invalid accumulator values correctly [spark]

Reply via email to