Re: [PR] [SPARK-49038][SQL] Fix regression in Spark UI SQL operator metrics calculation to filter out invalid accumulator values correctly [spark]

via GitHub Thu, 01 Aug 2024 00:58:51 -0700


virrrat commented on PR #47516:
URL: https://github.com/apache/spark/pull/47516#issuecomment-2262298606

> Are you sure it may contain -1?

You can run the tpc-ds benchmark q1 and check the SQL tab DAG views in
driver hosted Spark UI. The minimum and median values are not correct for
multiple operator metrics and they don't match with history server values
either (as pointed out by @abmodi in the JIRA
[ticket](https://issues.apache.org/jira/browse/SPARK-49038)). For incorrect
metric cases, the minimum value is always zero while the median value is always
less than the actual median value. The reason for this is that there are extra
zeros in the `values` parameter of `SQLMetrics.stringValue` method which were
actually `-1` and should have filtered out.

I'm not a spark expert but given `SQLMetrics.stringValue` method has
[this](https://github.com/apache/spark/blob/9f22fa4d2acfcbc42d6d76a28778885cbdad733d/sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala#L251)
logic to fetch `validValues` itself indicates that the `values` parameter may
contain `-1`.

> People can access SQLMetric instances in physical plan programmatically.

Agree. If users are consuming values directly from listener, they may use
`SQLMetric.isZero` before consuming it. We can't compromise the integrity of
the `SQLMetric.value` just for this given it is causing regression in the data
shown in Spark UI. Either we need to expose two different variables, one to be
used for Spark UI metric data calculation and other to be used by users
programmatically but not sure if it's a good idea.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-49038][SQL] Fix regression in Spark UI SQL operator metrics calculation to filter out invalid accumulator values correctly [spark]

Reply via email to