Re: any concerns with dropping the talos test v8 and using AWFY Octane instead?
> >> It seems like another alternative might be to run Octane in Talos, > >> instead of v8_7. > >> > >> It seems like Talos has two advantages over AWFY (correct me if I'm wrong): > >> > >> 1. Easy for developers to schedule jobs via try (maybe less of a concern > >> with a benchmark like this, where I suspect results are more > >> reproducible locally?) > > > > I believe there was talk of adding try support for AWFY (there already is > > for AWSY). Of course that's not actually done yet, I just want to point out > > it's not particularly hard and AWSY's version could be adapted rather > > easily. > > Running Octane in Talos would be useful, but we would be duplicating efforts. While that is not a bad thing, will we get value from it. We do get value in self serving support with try and tools like mozci for backfilling and retriggering. The issue in the bug also points out that the developers who care about Octane use AWFY and already detect these regressions before talos does (or at least a sheriff finds it and files a bug). This specific topic of upgrading V8 to Octane for Talos should be discussed in bug 1174671. > Talos already runs on non-virtualized hardware. I don't see any inherent > reason we couldn't rework AWSY as a Talos test. In general it feels to > me like we should be running performance tests on relops-supported > infrastructure where possible, as opposed to adhoc systems. I would agree that the more we can run on managed systems the better. While all Talos jobs are run on non-virtualized hardware today, we do run on a shared pool of hardware with the unittests. One difference between AWFY and Talos is that the numbers are so much more stable, even in the browser version (AWFY runs a js shell as well as a browser). I believe this is attributed to the type of hardware, the environment, or the fact that a specific test is run on a specific machine. > > In general it would be great if we could consolidate the various perf tests > > (AWFY, AWSY, Talos, Raptor, etc) under one umbrella (at least from an end > > user perspective). So you could go to trychooser and choose a "Perf" option > > that would have various subsets like: "JS Engine", "Memory Usage", "Layout > > Latency", "Mobile Launch Time", etc. > > This is a worthwhile goal- Simplifying the interface to the tools over the next few quarters to allow for common sheriffing, and self serving will make big strides. > > 1. It assumes that all test machines of a particular class will be > uniform, at least per test. For example, Autophone tracks the > performance of something like 9 different Android devices seperately > (see: http://phonedash.mozilla.org/) -- that's not something Perfherder > was designed to do. As mentioned earlier in this comment AWFY runs the same test on the same machine- the numbers are more reliable, but there is no further evidence that is the cause of the noise in Talos, I suspect it is a factor. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: any concerns with dropping the talos test v8 and using AWFY Octane instead?
On 2016-01-11 4:12 PM, Eric Rahm wrote: On Monday, January 11, 2016 at 8:42:11 AM UTC-8, William Lachance wrote: It seems like another alternative might be to run Octane in Talos, instead of v8_7. It seems like Talos has two advantages over AWFY (correct me if I'm wrong): 1. Easy for developers to schedule jobs via try (maybe less of a concern with a benchmark like this, where I suspect results are more reproducible locally?) I believe there was talk of adding try support for AWFY (there already is for AWSY). Of course that's not actually done yet, I just want to point out it's not particularly hard and AWSY's version could be adapted rather easily. 2. More hardware available, so can get results faster. I would guess we want to run on dedicated non-virtualized hardware for these tests. Is that an option w/ Talos? FWIW if that's an option I'd be more than happy to move AWSY over to the platform as well :) Talos already runs on non-virtualized hardware. I don't see any inherent reason we couldn't rework AWSY as a Talos test. In general it feels to me like we should be running performance tests on relops-supported infrastructure where possible, as opposed to adhoc systems. Thoughts? Incidentally one of my deliverables for this quarter is to try to figure out how Perfherder, Talos, and AWFY should co-exist, so I'm very interested in knowing if my assumptions above are correct. Regardless of whether we use Talos to run the tests or not, it would be definitely be nice have the data reported in perfherder. A digression, maybe worth followup in a separate thread: In general it would be great if we could consolidate the various perf tests (AWFY, AWSY, Talos, Raptor, etc) under one umbrella (at least from an end user perspective). So you could go to trychooser and choose a "Perf" option that would have various subsets like: "JS Engine", "Memory Usage", "Layout Latency", "Mobile Launch Time", etc. If all of these systems reported their data to perfherder (and optionally elsewhere) we'd now have one centralized location where you can track perf regressions. As an end-user this is pretty great: The graphs look the same across systems, I only have to learn how to use one tool, I only have to learn how to interpret regressions in one system. Yes, this seems like a good long-term goal to me. There are a few constraints that Perfherder has that make it unsuitable for some use cases: 1. It assumes that all test machines of a particular class will be uniform, at least per test. For example, Autophone tracks the performance of something like 9 different Android devices seperately (see: http://phonedash.mozilla.org/) -- that's not something Perfherder was designed to do. 2. It is designed to track performance changes in one product per repository, not compare one against another. It's not designed to facilitate comparisons between Firefox and Chrome. That shouldn't stop us from using Perfherder as at least an optional submission target for many systems though -- we've had good luck with autophone so far as a potential replacement for Android Talos on a single device class, and it looks like AWSY should work as well. AWFY is a bit of a different animal as it has its own regression detection/reporting system, in principle it makes sense to unify that with Perfherder but I'm not yet 100% sure what would be involved in making that happen -- AWFY supports some things that aren't on Perfherder's near-term roadmap (e.g. reporting a regression manually on an arbitrary revision), so we need to figure that out. Will ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: any concerns with dropping the talos test v8 and using AWFY Octane instead?
On Mon, Jan 11, 2016 at 3:08 PM,wrote: > Currently we run a very outdated version of V8 (version 7) in Talos. This > has since been replaced with Octane in the world of benchmarks. > > AWFY (arewefastyet.com), has been running Octane and catching regressions > faster than Talos. There is missing coverage in AWFY, specifically e10s, > pgo, aurora/beta. There are plans to add coverage for this in Q1. > > A main reason for pushing to turn off V8, is that the benchmark is > outdated and regressions could not be the most useful use of developers > time if there is a regression only seen on V8 instead of Octane. While > this does point out that we are leaning towards building performance for a > specific benchmark and ignoring other tests, we could argue that is what we > should be doing. > As one of the people pushing for this change, let me clarify that this is not about focusing on Octane and ignoring less important benchmarks so much as it is about ignoring a specific, buggy, benchmark. v8_7 is gameable because it calls builtin functions and then doesn't use the results in any way. In some cases, it's valid to optimize for content code not using results, but in others it's just benchmark gaming. Which is fine[1] as long as it doesn't cost too much time or prevents other optimizations or even correctness fixes. In bug 1174671 various developers spent non-trivial efforts on analyzing just such a case[2]. That is actually the better of two possible bad outcomes. The worse would've been to back out a patch that fixes correctness issues, improves performance in other tests, and paves the way for further improvements. > The reason I am posting here is to find out if there are reasons we should > keep v8 running in Talos. We still plan to turn it off once AWFY coverage > matches the coverage of Talos V8. > > You can reference bug 1174671 for some history. > [1] Even required sometimes if the benchmark is high-profile and not optimizing it causes us to lose in benchmark comparisons. This used to be such a case, but nowadays nobody cares about v8_7. [2] To be fair, there was another regression that would've required analysis anyway, but the point stands: we, IMO, wasted time on analyzing a regression nobody should ever have looked at. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
any concerns with dropping the talos test v8 and using AWFY Octane instead?
Currently we run a very outdated version of V8 (version 7) in Talos. This has since been replaced with Octane in the world of benchmarks. AWFY (arewefastyet.com), has been running Octane and catching regressions faster than Talos. There is missing coverage in AWFY, specifically e10s, pgo, aurora/beta. There are plans to add coverage for this in Q1. A main reason for pushing to turn off V8, is that the benchmark is outdated and regressions could not be the most useful use of developers time if there is a regression only seen on V8 instead of Octane. While this does point out that we are leaning towards building performance for a specific benchmark and ignoring other tests, we could argue that is what we should be doing. The reason I am posting here is to find out if there are reasons we should keep v8 running in Talos. We still plan to turn it off once AWFY coverage matches the coverage of Talos V8. You can reference bug 1174671 for some history. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: any concerns with dropping the talos test v8 and using AWFY Octane instead?
On 2016-01-11 9:08 AM, jma...@mozilla.com wrote: Currently we run a very outdated version of V8 (version 7) in Talos. This has since been replaced with Octane in the world of benchmarks. AWFY (arewefastyet.com), has been running Octane and catching regressions faster than Talos. There is missing coverage in AWFY, specifically e10s, pgo, aurora/beta. There are plans to add coverage for this in Q1. A main reason for pushing to turn off V8, is that the benchmark is outdated and regressions could not be the most useful use of developers time if there is a regression only seen on V8 instead of Octane. While this does point out that we are leaning towards building performance for a specific benchmark and ignoring other tests, we could argue that is what we should be doing. The reason I am posting here is to find out if there are reasons we should keep v8 running in Talos. We still plan to turn it off once AWFY coverage matches the coverage of Talos V8. You can reference bug 1174671 for some history. It seems like another alternative might be to run Octane in Talos, instead of v8_7. It seems like Talos has two advantages over AWFY (correct me if I'm wrong): 1. Easy for developers to schedule jobs via try (maybe less of a concern with a benchmark like this, where I suspect results are more reproducible locally?) 2. More hardware available, so can get results faster. Thoughts? Incidentally one of my deliverables for this quarter is to try to figure out how Perfherder, Talos, and AWFY should co-exist, so I'm very interested in knowing if my assumptions above are correct. Will ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform