Thanks Dave, great to hear, keep us posted on improvements as they land :) On Fri, Feb 23, 2018 at 9:06 AM Dave Tu <d...@chromium.org> wrote:
> We definitely want to get to a place where we can give you a breakdown of > all improvements and regressions caused by every commit. Right now it's > looking like sometime in 2018, because there's still a lot of work to do, > but we have a bunch of projects in flight to get there. > > One of the barriers is perf sheriff workload. We're already asking a lot > of perf sheriffs, because there are >100 perf regression alerts per day, > which can balloon to several hundred if there's a regression affecting many > platforms or benchmarks. And many of them are false positives. So we're not > ready to ask them to take on double the workload until we can automate more > of it. To that end, we're making a bunch of improvements to capacity and > reliability: > > - HistogramSet's Related Histograms is going to contain structured > information about how metrics are related to each other, so we can be > smarter about deduplicating alerts. > > > - We're standardizing the process for picking perf device > configurations so that we can obtain bulk hardware for each of them, and we > can update the configs as manufacturers start and stop producing different > devices, while still ensuring coverage of each attribute (all supported OS > versions, low memory, high DPI, etc.). This will decrease our pipeline > latency, increase the number of data points we have available for analysis, > and allow us to bisect every regression and improvement. > > > - The maintenance mutex will help reduce noise in perf test runs by > stopping perf tests during OS software updates. > > > - Chrome benchmarking is pruning benchmarks and metrics that produce > unbisectable or unreproducible results, and we're adding better monitoring > to able to provide some more hard data as feedback to benchmark owners on > the quality of their numbers. > > > On Tue, Feb 20, 2018 at 9:08 AM, Gabriel Charette <g...@chromium.org> > wrote: > >> Actually, we probably don't need to bisect all the green alerts. But when >> a CL is expected to improve something, it'd be nice to explicitly request a >> breakdown of which improvements/regressions it's pinpointed to. >> >> i.e. have a "Bisect Filter" button on >> https://chromeperf.appspot.com/group_report?rev=533152 to request an >> explicit bisect on all bots where something was noticed in its range and >> filter out any graph that isn't attributed to that particular CL. >> >> On Tue, Feb 20, 2018 at 5:58 PM Simon Hatch <simonha...@chromium.org> >> wrote: >> >>> >>> >>> On Tuesday, February 20, 2018 at 11:46:18 AM UTC-5, Gabriel Charette >>> wrote: >>>> >>>> (sigh @chromium.org.. again!) >>>> >>>> On Tue, Feb 20, 2018 at 5:44 PM Gabriel Charette <g...@google.com> >>>> wrote: >>>> >>>>> Oops, we crossed in writing :) >>>>> >>>>> On Tue, Feb 20, 2018 at 5:37 PM <simonha...@chromium.org> wrote: >>>>> >>>>>> Completely agree in that we want to get to a place where you can >>>>>> clearly what impact a CL had. The data and tooling are getting there, >>>>>> there's been considerable effort in the last year to improve things on >>>>>> that >>>>>> front. We can't enable automatic Pinpoint for improvements, although we >>>>>> do >>>>>> have plans for a much more automated sherriffing flow in the >>>>>> not-too-distant future (we're requesting a lot more hardware and I >>>>>> believe >>>>>> benchmarking team is narrowing the # of configurations). >>>>>> >>>>>> For this specific case, only regressions get triaged, thus you only >>>>>> see regressions get filed to you. >>>>>> >>>>> >>>>> What would it take to triage both? Seems like a single bit thing for a >>>>> big gain? >>>>> >>>>> >>> I'm actually not sure, Annie could probably speak more to this. A quick >>> look at the alerts page suggests it would more than double the perf sheriff >>> workload. >>> >>> Would some sort of "super" perf-tryjob allowing you to run against many >>> benchmarks/configurations, with a nice report of the overall impact, be a >>> reasonable alternative to sherriffing all improvements? >>> >>> >>> >>>> Also, re. "you can file a bug on them". That's at least blocked on >>>>> go/catabug/4225 <https://goto.google.com/catabug/4225>. >>>>> >>>>> >>>> >>> In the short-term I might be able to take this on if you're willing to >>> do the tedious bisect on all the green alerts. >>> >>> >>>> >>>>>> >>>>>> On Tuesday, February 20, 2018 at 11:06:53 AM UTC-5, Annie Sullivan >>>>>> wrote: >>>>>> >>>>>>> bcc: benchmarking-dev >>>>>>> cc: speed-services-dev, dtu, simonhatch >>>>>>> >>>>>>> Pinpoint does identify both regressions and improvements; you can >>>>>>> see more details here >>>>>>> <https://docs.google.com/document/d/19gig8pv8ei2y45mVDp5awZGj39bZoFjNDn65uOfyUUY/edit>. >>>>>>> Dave, Simon, can one of you look into the specifics of this case? >>>>>>> >>>>>>> Thanks, >>>>>>> Annie >>>>>>> >>>>>>> On Mon, Feb 19, 2018 at 7:33 AM, Gabriel Charette <g...@google.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hello Benchmarking Dev! >>>>>>>> >>>>>>>> First, thanks for your hard work on bringing and keeping reliable >>>>>>>> benchmarks with automatic single CL blaming tools. >>>>>>>> >>>>>>>> I recently have been digging more into low-level scheduling >>>>>>>> primitives on the v8 side and it tends to move benchmarks in >>>>>>>> interesting >>>>>>>> ways. >>>>>>>> >>>>>>>> *The problem:* pinpoint only identifies and then narrows down to a >>>>>>>> single CL for *regressions*. >>>>>>>> *The request:* pinpoint should identify and narrow down to a >>>>>>>> single CL for *improvements as well*. >>>>>>>> >>>>>>>> CLs in the scheduling space tend to move *many* benchmarks and >>>>>>>> it's hard to tell what it actually improved/regressed. >>>>>>>> >>>>>>>> For example did r534414 overall improve or regress things >>>>>>>> <https://chromeperf.appspot.com/group_report?rev=534414>? It looks >>>>>>>> like it went both ways, but only regressions were pinpointed down to >>>>>>>> my CL, >>>>>>>> what about the other dozens of green graphs, are they coincidental or >>>>>>>> also >>>>>>>> caused by my CL? I could launch bisects on all the green dots but not >>>>>>>> only >>>>>>>> is that tedious it will also result in pinging a bunch of people ( >>>>>>>> go/catabug/4225). >>>>>>>> >>>>>>>> After fixing <http://crbug.com/809961> the pinpointed regressions >>>>>>>> I'm left to wonder whether this was a no-op or an overall improvement. >>>>>>>> >>>>>>>> Not knowing what is an overall improvement is an engineering >>>>>>>> problem as it denies data that could otherwise serve as a hint to >>>>>>>> paradigms >>>>>>>> that should be encouraged and reproduced for further gains. >>>>>>>> >>>>>>>> If a CL regresses one thing and improves ten. The only way to >>>>>>>> specifically know that it improved those 10 things is to have it >>>>>>>> reverted >>>>>>>> for that 1 regression and then have the revert cause its own set of >>>>>>>> "regressions" (i.e. unimprovements)... >>>>>>>> >>>>>>>> Feels that all the data and tooling is there, can we just enable >>>>>>>> automatic pinpoint for improvements? I understanding that filing bugs >>>>>>>> for >>>>>>>> improvements is weird but as a first pass I'm sure the vast majority of >>>>>>>> engineers would be glad to be told that they're making things better, >>>>>>>> regardless of the medium through which the news is delivered! >>>>>>>> >>>>>>>> Thanks! >>>>>>>> Gab >>>>>>>> >>>>>>> >>>>>>> > -- -- v8-dev mailing list v8-dev@googlegroups.com http://groups.google.com/group/v8-dev --- You received this message because you are subscribed to the Google Groups "v8-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.