We definitely want to get to a place where we can give you a breakdown of
all improvements and regressions caused by every commit. Right now it's
looking like sometime in 2018, because there's still a lot of work to do,
but we have a bunch of projects in flight to get there.

One of the barriers is perf sheriff workload. We're already asking a lot of
perf sheriffs, because there are >100 perf regression alerts per day, which
can balloon to several hundred if there's a regression affecting many
platforms or benchmarks. And many of them are false positives. So we're not
ready to ask them to take on double the workload until we can automate more
of it. To that end, we're making a bunch of improvements to capacity and
reliability:

   - HistogramSet's Related Histograms is going to contain structured
   information about how metrics are related to each other, so we can be
   smarter about deduplicating alerts.


   - We're standardizing the process for picking perf device configurations
   so that we can obtain bulk hardware for each of them, and we can update the
   configs as manufacturers start and stop producing different devices, while
   still ensuring coverage of each attribute (all supported OS versions, low
   memory, high DPI, etc.). This will decrease our pipeline latency, increase
   the number of data points we have available for analysis, and allow us to
   bisect every regression and improvement.


   - The maintenance mutex will help reduce noise in perf test runs by
   stopping perf tests during OS software updates.


   - Chrome benchmarking is pruning benchmarks and metrics that produce
   unbisectable or unreproducible results, and we're adding better monitoring
   to able to provide some more hard data as feedback to benchmark owners on
   the quality of their numbers.


On Tue, Feb 20, 2018 at 9:08 AM, Gabriel Charette <g...@chromium.org> wrote:

> Actually, we probably don't need to bisect all the green alerts. But when
> a CL is expected to improve something, it'd be nice to explicitly request a
> breakdown of which improvements/regressions it's pinpointed to.
>
> i.e. have a "Bisect Filter" button on https://chromeperf.appspot.
> com/group_report?rev=533152 to request an explicit bisect on all bots
> where something was noticed in its range and filter out any graph that
> isn't attributed to that particular CL.
>
> On Tue, Feb 20, 2018 at 5:58 PM Simon Hatch <simonha...@chromium.org>
> wrote:
>
>>
>>
>> On Tuesday, February 20, 2018 at 11:46:18 AM UTC-5, Gabriel Charette
>> wrote:
>>>
>>> (sigh @chromium.org.. again!)
>>>
>>> On Tue, Feb 20, 2018 at 5:44 PM Gabriel Charette <g...@google.com> wrote:
>>>
>>>> Oops, we crossed in writing :)
>>>>
>>>> On Tue, Feb 20, 2018 at 5:37 PM <simonha...@chromium.org> wrote:
>>>>
>>>>> Completely agree in that we want to get to a place where you can
>>>>> clearly what impact a CL had. The data and tooling are getting there,
>>>>> there's been considerable effort in the last year to improve things on 
>>>>> that
>>>>> front. We can't enable automatic Pinpoint for improvements, although we do
>>>>> have plans for a much more automated sherriffing flow in the
>>>>> not-too-distant future (we're requesting a lot more hardware and I believe
>>>>> benchmarking team is narrowing the # of configurations).
>>>>>
>>>>> For this specific case, only regressions get triaged, thus you only
>>>>> see regressions get filed to you.
>>>>>
>>>>
>>>> What would it take to triage both? Seems like a single bit thing for a
>>>> big gain?
>>>>
>>>>
>> I'm actually not sure, Annie could probably speak more to this. A quick
>> look at the alerts page suggests it would more than double the perf sheriff
>> workload.
>>
>> Would some sort of "super" perf-tryjob allowing you to run against many
>> benchmarks/configurations, with a nice report of the overall impact, be a
>> reasonable alternative to sherriffing all improvements?
>>
>>
>>
>>> Also, re. "you can file a bug on them". That's at least blocked on
>>>> go/catabug/4225 <https://goto.google.com/catabug/4225>.
>>>>
>>>>
>>>
>> In the short-term I might be able to take this on if you're willing to do
>> the tedious bisect on all the green alerts.
>>
>>
>>>
>>>>>
>>>>> On Tuesday, February 20, 2018 at 11:06:53 AM UTC-5, Annie Sullivan
>>>>> wrote:
>>>>>
>>>>>> bcc: benchmarking-dev
>>>>>> cc: speed-services-dev, dtu, simonhatch
>>>>>>
>>>>>> Pinpoint does identify both regressions and improvements; you can see 
>>>>>> more
>>>>>> details here
>>>>>> <https://docs.google.com/document/d/19gig8pv8ei2y45mVDp5awZGj39bZoFjNDn65uOfyUUY/edit>.
>>>>>> Dave, Simon, can one of you look into the specifics of this case?
>>>>>>
>>>>>> Thanks,
>>>>>> Annie
>>>>>>
>>>>>> On Mon, Feb 19, 2018 at 7:33 AM, Gabriel Charette <g...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hello Benchmarking Dev!
>>>>>>>
>>>>>>> First, thanks for your hard work on bringing and keeping reliable
>>>>>>> benchmarks with automatic single CL blaming tools.
>>>>>>>
>>>>>>> I recently have been digging more into low-level scheduling
>>>>>>> primitives on the v8 side and it tends to move benchmarks in interesting
>>>>>>> ways.
>>>>>>>
>>>>>>> *The problem:* pinpoint only identifies and then narrows down to a
>>>>>>> single CL for *regressions*.
>>>>>>> *The request:* pinpoint should identify and narrow down to a single
>>>>>>> CL for *improvements as well*.
>>>>>>>
>>>>>>> CLs in the scheduling space tend to move *many* benchmarks and it's
>>>>>>> hard to tell what it actually improved/regressed.
>>>>>>>
>>>>>>> For example did r534414 overall improve or regress things
>>>>>>> <https://chromeperf.appspot.com/group_report?rev=534414>? It looks
>>>>>>> like it went both ways, but only regressions were pinpointed down to my 
>>>>>>> CL,
>>>>>>> what about the other dozens of green graphs, are they coincidental or 
>>>>>>> also
>>>>>>> caused by my CL? I could launch bisects on all the green dots but not 
>>>>>>> only
>>>>>>> is that tedious it will also result in pinging a bunch of people (
>>>>>>> go/catabug/4225).
>>>>>>>
>>>>>>> After fixing <http://crbug.com/809961> the pinpointed regressions
>>>>>>> I'm left to wonder whether this was a no-op or an overall improvement.
>>>>>>>
>>>>>>> Not knowing what is an overall improvement is an engineering problem
>>>>>>> as it denies data that could otherwise serve as a hint to paradigms that
>>>>>>> should be encouraged and reproduced for further gains.
>>>>>>>
>>>>>>> If a CL regresses one thing and improves ten. The only way to
>>>>>>> specifically know that it improved those 10 things is to have it 
>>>>>>> reverted
>>>>>>> for that 1 regression and then have the revert cause its own set of
>>>>>>> "regressions" (i.e. unimprovements)...
>>>>>>>
>>>>>>> Feels that all the data and tooling is there, can we just enable
>>>>>>> automatic pinpoint for improvements? I understanding that filing bugs 
>>>>>>> for
>>>>>>> improvements is weird but as a first pass I'm sure the vast majority of
>>>>>>> engineers would be glad to be told that they're making things better,
>>>>>>> regardless of the medium through which the news is delivered!
>>>>>>>
>>>>>>> Thanks!
>>>>>>> Gab
>>>>>>>
>>>>>>
>>>>>>

-- 
-- 
v8-dev mailing list
v8-dev@googlegroups.com
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to v8-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to