virrrat commented on PR #47516:
URL: https://github.com/apache/spark/pull/47516#issuecomment-2262298606

   > Are you sure it may contain -1?
   
   You can run the tpc-ds benchmark q1 and check the SQL tab DAG views in 
driver hosted Spark UI. The minimum and median values are not correct for 
multiple operator metrics and they don't match with history server values 
either (as pointed out by @abmodi in the JIRA 
[ticket](https://issues.apache.org/jira/browse/SPARK-49038)). For incorrect 
metric cases, the minimum value is always zero while the median value is always 
less than the actual median value. The reason for this is that there are extra 
zeros in the `values` parameter of `SQLMetrics.stringValue` method which were 
actually `-1` and should have filtered out.
   
   I'm not a spark expert but given `SQLMetrics.stringValue` method has 
[this](https://github.com/apache/spark/blob/9f22fa4d2acfcbc42d6d76a28778885cbdad733d/sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala#L251)
 logic to fetch `validValues` itself indicates that the `values` parameter may 
contain `-1`.
   
   > People can access SQLMetric instances in physical plan programmatically.
   
   Agree. If users are consuming values directly from listener,  they may use 
`SQLMetric.isZero` before consuming it. We can't compromise the integrity of 
the `SQLMetric.value` just for this given it is causing regression in the data 
shown in Spark UI. Either we need to expose two different variables, one to be 
used for Spark UI metric data calculation and other to be used by users 
programmatically but not sure if it's a good idea.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to