Re: any concerns with dropping the talos test v8 and using AWFY Octane instead?

2016-01-19 Thread jmaher
> >> It seems like another alternative might be to run Octane in Talos,
> >> instead of v8_7.
> >>
> >> It seems like Talos has two advantages over AWFY (correct me if I'm wrong):
> >>
> >> 1. Easy for developers to schedule jobs via try (maybe less of a concern
> >> with a benchmark like this, where I suspect results are more
> >> reproducible locally?)
> >
> > I believe there was talk of adding try support for AWFY (there already is 
> > for AWSY). Of course that's not actually done yet, I just want to point out 
> > it's not particularly hard and AWSY's version could be adapted rather 
> > easily.
> >

Running Octane in Talos would be useful, but we would be duplicating efforts.  
While that is not a bad thing, will we get value from it.  We do get value in 
self serving support with try and tools like mozci for backfilling and 
retriggering.  The issue in the bug also points out that the developers who 
care about Octane use AWFY and already detect these regressions before talos 
does (or at least a sheriff finds it and files a bug).  This specific topic of 
upgrading V8 to Octane for Talos should be discussed in bug 1174671.


> Talos already runs on non-virtualized hardware. I don't see any inherent 
> reason we couldn't rework AWSY as a Talos test. In general it feels to 
> me like we should be running performance tests on relops-supported 
> infrastructure where possible, as opposed to adhoc systems.

I would agree that the more we can run on managed systems the better.  While 
all Talos jobs are run on non-virtualized hardware today, we do run on a shared 
pool of hardware with the unittests.  One difference between AWFY and Talos is 
that the numbers are so much more stable, even in the browser version (AWFY 
runs a js shell as well as a browser).  I believe this is attributed to the 
type of hardware, the environment, or the fact that a specific test is run on a 
specific machine.

> > In general it would be great if we could consolidate the various perf tests 
> > (AWFY, AWSY, Talos, Raptor, etc) under one umbrella (at least from an end 
> > user perspective). So you could go to trychooser and choose a "Perf" option 
> > that would have various subsets like: "JS Engine", "Memory Usage", "Layout 
> > Latency", "Mobile Launch Time", etc.
> >

This is a worthwhile goal- Simplifying the interface to the tools over the next 
few quarters to allow for common sheriffing, and self serving will make big 
strides.

> 
> 1. It assumes that all test machines of a particular class will be 
> uniform, at least per test. For example, Autophone tracks the 
> performance of something like 9 different Android devices seperately 
> (see: http://phonedash.mozilla.org/) -- that's not something Perfherder 
> was designed to do.

As mentioned earlier in this comment AWFY runs the same test on the same 
machine- the numbers are more reliable, but there is no further evidence that 
is the cause of the noise in Talos, I suspect it is a factor.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: any concerns with dropping the talos test v8 and using AWFY Octane instead?

2016-01-12 Thread William Lachance

On 2016-01-11 4:12 PM, Eric Rahm wrote:

On Monday, January 11, 2016 at 8:42:11 AM UTC-8, William Lachance wrote:

It seems like another alternative might be to run Octane in Talos,
instead of v8_7.

It seems like Talos has two advantages over AWFY (correct me if I'm wrong):

1. Easy for developers to schedule jobs via try (maybe less of a concern
with a benchmark like this, where I suspect results are more
reproducible locally?)


I believe there was talk of adding try support for AWFY (there already is for 
AWSY). Of course that's not actually done yet, I just want to point out it's 
not particularly hard and AWSY's version could be adapted rather easily.


2. More hardware available, so can get results faster.


I would guess we want to run on dedicated non-virtualized hardware for these 
tests. Is that an option w/ Talos? FWIW if that's an option I'd be more than 
happy to move AWSY over to the platform as well :)


Talos already runs on non-virtualized hardware. I don't see any inherent 
reason we couldn't rework AWSY as a Talos test. In general it feels to 
me like we should be running performance tests on relops-supported 
infrastructure where possible, as opposed to adhoc systems.



Thoughts? Incidentally one of my deliverables for this quarter is to try
to figure out how Perfherder, Talos, and AWFY should co-exist, so I'm
very interested in knowing if my assumptions above are correct.


Regardless of whether we use Talos to run the tests or not, it would be 
definitely be nice have the data reported in perfherder.

A digression, maybe worth followup in a separate thread:

In general it would be great if we could consolidate the various perf tests (AWFY, AWSY, Talos, Raptor, etc) under one umbrella 
(at least from an end user perspective). So you could go to trychooser and choose a "Perf" option that would have 
various subsets like: "JS Engine", "Memory Usage", "Layout Latency", "Mobile Launch 
Time", etc.

If all of these systems reported their data to perfherder (and optionally 
elsewhere) we'd now have one centralized location where you can track perf 
regressions. As an end-user this is pretty great: The graphs look the same 
across systems, I only have to learn how to use one tool, I only have to learn 
how to interpret regressions in one system.


Yes, this seems like a good long-term goal to me. There are a few 
constraints that Perfherder has that make it unsuitable for some use cases:


1. It assumes that all test machines of a particular class will be 
uniform, at least per test. For example, Autophone tracks the 
performance of something like 9 different Android devices seperately 
(see: http://phonedash.mozilla.org/) -- that's not something Perfherder 
was designed to do.
2. It is designed to track performance changes in one product per 
repository, not compare one against another. It's not designed to 
facilitate comparisons between Firefox and Chrome.


That shouldn't stop us from using Perfherder as at least an optional 
submission target for many systems though -- we've had good luck with 
autophone so far as a potential replacement for Android Talos on a 
single device class, and it looks like AWSY should work as well.


AWFY is a bit of a different animal as it has its own regression 
detection/reporting system, in principle it makes sense to unify that 
with Perfherder but I'm not yet 100% sure what would be involved in 
making that happen -- AWFY supports some things that aren't on 
Perfherder's near-term roadmap (e.g. reporting a regression manually on 
an arbitrary revision), so we need to figure that out.


Will
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: any concerns with dropping the talos test v8 and using AWFY Octane instead?

2016-01-11 Thread Till Schneidereit
On Mon, Jan 11, 2016 at 3:08 PM,  wrote:

> Currently we run a very outdated version of V8 (version 7) in Talos.  This
> has since been replaced with Octane in the world of benchmarks.
>
> AWFY (arewefastyet.com), has been running Octane and catching regressions
> faster than Talos.  There is missing coverage in AWFY, specifically e10s,
> pgo, aurora/beta.  There are plans to add coverage for this in Q1.
>
> A main reason for pushing to turn off V8, is that the benchmark is
> outdated and regressions could not be the most useful use of developers
> time if there is a regression only seen on V8 instead of Octane.  While
> this does point out that we are leaning towards building performance for a
> specific benchmark and ignoring other tests, we could argue that is what we
> should be doing.
>

As one of the people pushing for this change, let me clarify that this is
not about focusing on Octane and ignoring less important benchmarks so much
as it is about ignoring a specific, buggy, benchmark.

v8_7 is gameable because it calls builtin functions and then doesn't use
the results in any way. In some cases, it's valid to optimize for content
code not using results, but in others it's just benchmark gaming. Which is
fine[1] as long as it doesn't cost too much time or prevents other
optimizations or even correctness fixes.

In bug 1174671 various developers spent non-trivial efforts on analyzing
just such a case[2]. That is actually the better of two possible bad
outcomes. The worse would've been to back out a patch that fixes
correctness issues, improves performance in other tests, and paves the way
for further improvements.


> The reason I am posting here is to find out if there are reasons we should
> keep v8 running in Talos.  We still plan to turn it off once AWFY coverage
> matches the coverage of Talos V8.
>
> You can reference bug 1174671 for some history.
>

[1] Even required sometimes if the benchmark is high-profile and not
optimizing it causes us to lose in benchmark comparisons. This used to be
such a case, but nowadays nobody cares about v8_7.
[2] To be fair, there was another regression that would've required
analysis anyway, but the point stands: we, IMO, wasted time on analyzing a
regression nobody should ever have looked at.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


any concerns with dropping the talos test v8 and using AWFY Octane instead?

2016-01-11 Thread jmaher
Currently we run a very outdated version of V8 (version 7) in Talos.  This has 
since been replaced with Octane in the world of benchmarks.

AWFY (arewefastyet.com), has been running Octane and catching regressions 
faster than Talos.  There is missing coverage in AWFY, specifically e10s, pgo, 
aurora/beta.  There are plans to add coverage for this in Q1.

A main reason for pushing to turn off V8, is that the benchmark is outdated and 
regressions could not be the most useful use of developers time if there is a 
regression only seen on V8 instead of Octane.  While this does point out that 
we are leaning towards building performance for a specific benchmark and 
ignoring other tests, we could argue that is what we should be doing.  

The reason I am posting here is to find out if there are reasons we should keep 
v8 running in Talos.  We still plan to turn it off once AWFY coverage matches 
the coverage of Talos V8.

You can reference bug 1174671 for some history.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: any concerns with dropping the talos test v8 and using AWFY Octane instead?

2016-01-11 Thread William Lachance

On 2016-01-11 9:08 AM, jma...@mozilla.com wrote:

Currently we run a very outdated version of V8 (version 7) in Talos.  This has 
since been replaced with Octane in the world of benchmarks.

AWFY (arewefastyet.com), has been running Octane and catching regressions 
faster than Talos.  There is missing coverage in AWFY, specifically e10s, pgo, 
aurora/beta.  There are plans to add coverage for this in Q1.

A main reason for pushing to turn off V8, is that the benchmark is outdated and 
regressions could not be the most useful use of developers time if there is a 
regression only seen on V8 instead of Octane.  While this does point out that 
we are leaning towards building performance for a specific benchmark and 
ignoring other tests, we could argue that is what we should be doing.

The reason I am posting here is to find out if there are reasons we should keep 
v8 running in Talos.  We still plan to turn it off once AWFY coverage matches 
the coverage of Talos V8.

You can reference bug 1174671 for some history.


It seems like another alternative might be to run Octane in Talos, 
instead of v8_7.


It seems like Talos has two advantages over AWFY (correct me if I'm wrong):

1. Easy for developers to schedule jobs via try (maybe less of a concern 
with a benchmark like this, where I suspect results are more 
reproducible locally?)

2. More hardware available, so can get results faster.

Thoughts? Incidentally one of my deliverables for this quarter is to try 
to figure out how Perfherder, Talos, and AWFY should co-exist, so I'm 
very interested in knowing if my assumptions above are correct.


Will
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform