WangGuangxin commented on issue #26899: [SPARK-28332][SQL] Reserve init value -1 only when do min max statistics in SQLMetrics URL: https://github.com/apache/spark/pull/26899#issuecomment-566549640 > Can we make the solution clear first? TBH I don't quite understand the changes here. > > For size metrics, it has the following properties: > > 1. initial size is -1 > 2. usually tasks accumulate values at executor side, and these values get merged at driver side > 3. sometimes, the SQLMetrics may be serialized and sent to executors unexpectedly. Then the accumulator gets no update at executor side, and -1 is sent back to driver side and gets merged. > > This has several problems: > > 1. the actual value is 1 byte smaller, because the initial value is -1 > 2. when merging values at driver side, -1 can mess things up. > > There are 2 places merging the values: > > 1. SQL web UI, which filters out -1 values, so it's fine. > 2. The accumulator framework, which calls `SQLMetrics.merge`. > > I think what we need to fix are: > > 1. `SQLMetrics.add` should set `_value` to 0 if it's -1. This can avoid making the actual value 1 byte smaller. > 2. `SQLMetrics.merge` should ignore -1 values. This can fix the negative size metrics bug. ok, I'll update soon
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
