[v8-dev] Re: Pinpoint should also "auto-blame" improvements down a single CL

Gabriel Charette Fri, 23 Feb 2018 03:48:42 -0800

Thanks Dave, great to hear, keep us posted on improvements as they land :)

On Fri, Feb 23, 2018 at 9:06 AM Dave Tu <d...@chromium.org> wrote:


> We definitely want to get to a place where we can give you a breakdown of
> all improvements and regressions caused by every commit. Right now it's
> looking like sometime in 2018, because there's still a lot of work to do,
> but we have a bunch of projects in flight to get there.
>
> One of the barriers is perf sheriff workload. We're already asking a lot
> of perf sheriffs, because there are >100 perf regression alerts per day,
> which can balloon to several hundred if there's a regression affecting many
> platforms or benchmarks. And many of them are false positives. So we're not
> ready to ask them to take on double the workload until we can automate more
> of it. To that end, we're making a bunch of improvements to capacity and
> reliability:
>
>    - HistogramSet's Related Histograms is going to contain structured
>    information about how metrics are related to each other, so we can be
>    smarter about deduplicating alerts.
>
>
>    - We're standardizing the process for picking perf device
>    configurations so that we can obtain bulk hardware for each of them, and we
>    can update the configs as manufacturers start and stop producing different
>    devices, while still ensuring coverage of each attribute (all supported OS
>    versions, low memory, high DPI, etc.). This will decrease our pipeline
>    latency, increase the number of data points we have available for analysis,
>    and allow us to bisect every regression and improvement.
>
>
>    - The maintenance mutex will help reduce noise in perf test runs by
>    stopping perf tests during OS software updates.
>
>
>    - Chrome benchmarking is pruning benchmarks and metrics that produce
>    unbisectable or unreproducible results, and we're adding better monitoring
>    to able to provide some more hard data as feedback to benchmark owners on
>    the quality of their numbers.
>
>
> On Tue, Feb 20, 2018 at 9:08 AM, Gabriel Charette <g...@chromium.org>
> wrote:
>
>> Actually, we probably don't need to bisect all the green alerts. But when
>> a CL is expected to improve something, it'd be nice to explicitly request a
>> breakdown of which improvements/regressions it's pinpointed to.
>>
>> i.e. have a "Bisect Filter" button on
>> https://chromeperf.appspot.com/group_report?rev=533152 to request an
>> explicit bisect on all bots where something was noticed in its range and
>> filter out any graph that isn't attributed to that particular CL.
>>
>> On Tue, Feb 20, 2018 at 5:58 PM Simon Hatch <simonha...@chromium.org>
>> wrote:
>>
>>>
>>>
>>> On Tuesday, February 20, 2018 at 11:46:18 AM UTC-5, Gabriel Charette
>>> wrote:
>>>>
>>>> (sigh @chromium.org.. again!)
>>>>
>>>> On Tue, Feb 20, 2018 at 5:44 PM Gabriel Charette <g...@google.com>
>>>> wrote:
>>>>
>>>>> Oops, we crossed in writing :)
>>>>>
>>>>> On Tue, Feb 20, 2018 at 5:37 PM <simonha...@chromium.org> wrote:
>>>>>
>>>>>> Completely agree in that we want to get to a place where you can
>>>>>> clearly what impact a CL had. The data and tooling are getting there,
>>>>>> there's been considerable effort in the last year to improve things on 
>>>>>> that
>>>>>> front. We can't enable automatic Pinpoint for improvements, although we 
>>>>>> do
>>>>>> have plans for a much more automated sherriffing flow in the
>>>>>> not-too-distant future (we're requesting a lot more hardware and I 
>>>>>> believe
>>>>>> benchmarking team is narrowing the # of configurations).
>>>>>>
>>>>>> For this specific case, only regressions get triaged, thus you only
>>>>>> see regressions get filed to you.
>>>>>>
>>>>>
>>>>> What would it take to triage both? Seems like a single bit thing for a
>>>>> big gain?
>>>>>
>>>>>
>>> I'm actually not sure, Annie could probably speak more to this. A quick
>>> look at the alerts page suggests it would more than double the perf sheriff
>>> workload.
>>>
>>> Would some sort of "super" perf-tryjob allowing you to run against many
>>> benchmarks/configurations, with a nice report of the overall impact, be a
>>> reasonable alternative to sherriffing all improvements?
>>>
>>>
>>>
>>>> Also, re. "you can file a bug on them". That's at least blocked on
>>>>> go/catabug/4225 <https://goto.google.com/catabug/4225>.
>>>>>
>>>>>
>>>>
>>> In the short-term I might be able to take this on if you're willing to
>>> do the tedious bisect on all the green alerts.
>>>
>>>
>>>>
>>>>>>
>>>>>> On Tuesday, February 20, 2018 at 11:06:53 AM UTC-5, Annie Sullivan
>>>>>> wrote:
>>>>>>
>>>>>>> bcc: benchmarking-dev
>>>>>>> cc: speed-services-dev, dtu, simonhatch
>>>>>>>
>>>>>>> Pinpoint does identify both regressions and improvements; you can
>>>>>>> see more details here
>>>>>>> <https://docs.google.com/document/d/19gig8pv8ei2y45mVDp5awZGj39bZoFjNDn65uOfyUUY/edit>.
>>>>>>> Dave, Simon, can one of you look into the specifics of this case?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Annie
>>>>>>>
>>>>>>> On Mon, Feb 19, 2018 at 7:33 AM, Gabriel Charette <g...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hello Benchmarking Dev!
>>>>>>>>
>>>>>>>> First, thanks for your hard work on bringing and keeping reliable
>>>>>>>> benchmarks with automatic single CL blaming tools.
>>>>>>>>
>>>>>>>> I recently have been digging more into low-level scheduling
>>>>>>>> primitives on the v8 side and it tends to move benchmarks in 
>>>>>>>> interesting
>>>>>>>> ways.
>>>>>>>>
>>>>>>>> *The problem:* pinpoint only identifies and then narrows down to a
>>>>>>>> single CL for *regressions*.
>>>>>>>> *The request:* pinpoint should identify and narrow down to a
>>>>>>>> single CL for *improvements as well*.
>>>>>>>>
>>>>>>>> CLs in the scheduling space tend to move *many* benchmarks and
>>>>>>>> it's hard to tell what it actually improved/regressed.
>>>>>>>>
>>>>>>>> For example did r534414 overall improve or regress things
>>>>>>>> <https://chromeperf.appspot.com/group_report?rev=534414>? It looks
>>>>>>>> like it went both ways, but only regressions were pinpointed down to 
>>>>>>>> my CL,
>>>>>>>> what about the other dozens of green graphs, are they coincidental or 
>>>>>>>> also
>>>>>>>> caused by my CL? I could launch bisects on all the green dots but not 
>>>>>>>> only
>>>>>>>> is that tedious it will also result in pinging a bunch of people (
>>>>>>>> go/catabug/4225).
>>>>>>>>
>>>>>>>> After fixing <http://crbug.com/809961> the pinpointed regressions
>>>>>>>> I'm left to wonder whether this was a no-op or an overall improvement.
>>>>>>>>
>>>>>>>> Not knowing what is an overall improvement is an engineering
>>>>>>>> problem as it denies data that could otherwise serve as a hint to 
>>>>>>>> paradigms
>>>>>>>> that should be encouraged and reproduced for further gains.
>>>>>>>>
>>>>>>>> If a CL regresses one thing and improves ten. The only way to
>>>>>>>> specifically know that it improved those 10 things is to have it 
>>>>>>>> reverted
>>>>>>>> for that 1 regression and then have the revert cause its own set of
>>>>>>>> "regressions" (i.e. unimprovements)...
>>>>>>>>
>>>>>>>> Feels that all the data and tooling is there, can we just enable
>>>>>>>> automatic pinpoint for improvements? I understanding that filing bugs 
>>>>>>>> for
>>>>>>>> improvements is weird but as a first pass I'm sure the vast majority of
>>>>>>>> engineers would be glad to be told that they're making things better,
>>>>>>>> regardless of the medium through which the news is delivered!
>>>>>>>>
>>>>>>>> Thanks!
>>>>>>>> Gab
>>>>>>>>
>>>>>>>
>>>>>>>
>

-- 
-- 
v8-dev mailing list
v8-dev@googlegroups.com
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to v8-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[v8-dev] Re: Pinpoint should also "auto-blame" improvements down a single CL

Reply via email to