proposal: replace talos with inline tests

2013-03-04 Thread Jim Mathies
For metrofx we’ve been working on getting omtc and apzc running in the browser. 
One of the things we need to be able to do is run performance tests that tell 
us whether or not the work we’re doing is having a positive effect on perf. We 
currently don’t have automated tests up and running for metrofx and talos is 
even farther off.

So to work around this I’ve been putting together some basic perf tests I can 
use to measure performance using the mochitest framework. I’m wondering if this 
might be a useful answer to our perf tests problems long term. 

Putting together talos tests is a real pain. You have to write a new test using 
the talos framework (which is a separate repo from mc), test the test to be 
sure it’s working, file rel eng bugs on getting it integrated into talos test 
runs, populated in graph server, and tested via staging to be sure everything 
is working right. Overall the overhead here seems way too high.

Maybe we should consider changing this system so devs can write performance 
tests that suit their needs that are integrated into our main repo? Basically:

1) rework graphs server to be open ended so that it can accept data from test 
runs within our normal test frameworks.
2) develop of test module that can be included in tests that allows test 
writers to post performance data to graph server.
3) come up with a good way to manage the life cycle of active perf tests so 
graph server doesn’t become polluted.
4) port existing talos tests over to the mochitest framework.
5) drop talos.

Curious what people think of this idea. 

Jim

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: proposal: replace talos with inline tests

2013-03-04 Thread Ed Morley

(CCing auto-to...@mozilla.com)

jmaher and jhammel will be able to comment more on the talos specifics, 
but few thoughts off the top of my head:


It seems like we're conflating multiple issues here:
 1) [talos] is a separate repo from mc
 2) [it's a hassle to] test the test to be sure it’s working
 3) [it's a hassle to get results] populated in graph server
 4) [we need to] come up with a good way to manage the life cycle of 
active perf tests so graph server doesn’t become polluted


Switching from the talos harness to mochitest doesn't fix #2 (we still 
have to test, and I don't see how it magically becomes any easier 
without extra work - that could have been applied to talos instead) or 
#3/#4 (orthogonal problem). It also seems like a brute force way of 
fixing #1 (we could just check talos into mozilla-central).


Instead, I think we should be asking:
1) Is the best test framework for performance testing: [a] talos (with 
improvements), [b] mochitest (with a significant amount of work to make 
it compatible), or [c] a brand new framework?
2) Regardless of framework used, would checking it into mozilla-central 
improve dev workflow enough to outweigh the downsides (see bug 787200 
for history on that discussion)?
3) Regardless of framework used, how can we make the 
development/testing/staging cycle less painful?
4) Regardless of framework used, who should be responsible for ensuring 
we actively prune performance tests that are no longer relevant?


Note also that graphs.mozilla.org will be depreciated soon, in favour 
of datazilla - which afaik is less painful for adding new test suites 
(eg doesn't need manual database changes); jeads can say more on that 
front.


Best wishes,

Ed

On 04 March 2013 13:15:56, Jim Mathies wrote:

For metrofx we’ve been working on getting omtc and apzc running in the browser. 
One of the things we need to be able to do is run performance tests that tell 
us whether or not the work we’re doing is having a positive effect on perf. We 
currently don’t have automated tests up and running for metrofx and talos is 
even farther off.

So to work around this I’ve been putting together some basic perf tests I can 
use to measure performance using the mochitest framework. I’m wondering if this 
might be a useful answer to our perf tests problems long term.

Putting together talos tests is a real pain. You have to write a new test using 
the talos framework (which is a separate repo from mc), test the test to be 
sure it’s working, file rel eng bugs on getting it integrated into talos test 
runs, populated in graph server, and tested via staging to be sure everything 
is working right. Overall the overhead here seems way too high.

Maybe we should consider changing this system so devs can write performance 
tests that suit their needs that are integrated into our main repo? Basically:

1) rework graphs server to be open ended so that it can accept data from test 
runs within our normal test frameworks.
2) develop of test module that can be included in tests that allows test 
writers to post performance data to graph server.
3) come up with a good way to manage the life cycle of active perf tests so 
graph server doesn’t become polluted.
4) port existing talos tests over to the mochitest framework.
5) drop talos.

Curious what people think of this idea.

Jim

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: proposal: replace talos with inline tests

2013-03-04 Thread Joel Maher
Some thoughts on the subject- 

I would argue against running performance tests inside of mochitest.  The main 
reason is that mochitest has a lot of profile stuff for testing as well as many 
other tests bundled inside of the same browser session.  For a standalone 
metric unrelated to a user scenario, we could consider performance style tests 
into mochitest.

In the process of creating Datazilla, we have found endless little quirks in 
the end to end system how performance works.  As time goes on we have continued 
to push forward with the goal of making a performance system that can detect 
regressions automatically when the test finishes.  

For the last few months we have had data going both to Datazilla and graph 
server and have been refining our assumptions and tools along the way.  When 
graph server is deprecated in the near future, it will be REALLY EASY to add 
new tests to the collection and reporting system.  That doesn't solve the 
problem of making it easy to add or adjust a test in the test runners (buildbot 
scripts), but it solves half the problem.

Many of the talos tests are old and outdated and while we have tried to find 
owners for the tests, it has been a failing effort.  To that tune, we have 
disabled some Talos tests which nobody had interest in anymore.  If there are 
tests which people feel are not useful, we should disable those ASAP to reduce 
our load on our infrastructure and work on creating a test which people care 
about.

-Joel

- Original Message -
From: Ed Morley emor...@mozilla.com
To: Jim Mathies jmath...@mozilla.com, auto-to...@mozilla.com
Cc: dev-platform@lists.mozilla.org
Sent: Monday, March 4, 2013 8:42:39 AM
Subject: Re: proposal: replace talos with inline tests

(CCing auto-to...@mozilla.com)

jmaher and jhammel will be able to comment more on the talos specifics, 
but few thoughts off the top of my head:

It seems like we're conflating multiple issues here:
  1) [talos] is a separate repo from mc
  2) [it's a hassle to] test the test to be sure it’s working
  3) [it's a hassle to get results] populated in graph server
  4) [we need to] come up with a good way to manage the life cycle of 
active perf tests so graph server doesn’t become polluted

Switching from the talos harness to mochitest doesn't fix #2 (we still 
have to test, and I don't see how it magically becomes any easier 
without extra work - that could have been applied to talos instead) or 
#3/#4 (orthogonal problem). It also seems like a brute force way of 
fixing #1 (we could just check talos into mozilla-central).

Instead, I think we should be asking:
1) Is the best test framework for performance testing: [a] talos (with 
improvements), [b] mochitest (with a significant amount of work to make 
it compatible), or [c] a brand new framework?
2) Regardless of framework used, would checking it into mozilla-central 
improve dev workflow enough to outweigh the downsides (see bug 787200 
for history on that discussion)?
3) Regardless of framework used, how can we make the 
development/testing/staging cycle less painful?
4) Regardless of framework used, who should be responsible for ensuring 
we actively prune performance tests that are no longer relevant?

Note also that graphs.mozilla.org will be depreciated soon, in favour 
of datazilla - which afaik is less painful for adding new test suites 
(eg doesn't need manual database changes); jeads can say more on that 
front.

Best wishes,

Ed

On 04 March 2013 13:15:56, Jim Mathies wrote:
 For metrofx we’ve been working on getting omtc and apzc running in the 
 browser. One of the things we need to be able to do is run performance tests 
 that tell us whether or not the work we’re doing is having a positive effect 
 on perf. We currently don’t have automated tests up and running for metrofx 
 and talos is even farther off.

 So to work around this I’ve been putting together some basic perf tests I can 
 use to measure performance using the mochitest framework. I’m wondering if 
 this might be a useful answer to our perf tests problems long term.

 Putting together talos tests is a real pain. You have to write a new test 
 using the talos framework (which is a separate repo from mc), test the test 
 to be sure it’s working, file rel eng bugs on getting it integrated into 
 talos test runs, populated in graph server, and tested via staging to be sure 
 everything is working right. Overall the overhead here seems way too high.

 Maybe we should consider changing this system so devs can write performance 
 tests that suit their needs that are integrated into our main repo? Basically:

 1) rework graphs server to be open ended so that it can accept data from test 
 runs within our normal test frameworks.
 2) develop of test module that can be included in tests that allows test 
 writers to post performance data to graph server.
 3) come up with a good way to manage the life cycle of active perf tests so 
 graph server doesn’t become polluted.
 4

Re: proposal: replace talos with inline tests

2013-03-04 Thread Jim Mathies

Good points, comments below.

Ed Morley emor...@mozilla.com wrote in message 
news:mailman.1992.1362404580.24452.dev-platf...@lists.mozilla.org...

(CCing auto-to...@mozilla.com)

jmaher and jhammel will be able to comment more on the talos specifics, 
but few thoughts off the top of my head:


It seems like we're conflating multiple issues here:
  1) [talos] is a separate repo from mc
  2) [it's a hassle to] test the test to be sure it’s working
  3) [it's a hassle to get results] populated in graph server
  4) [we need to] come up with a good way to manage the life cycle of 
active perf tests so graph server doesn’t become polluted


Switching from the talos harness to mochitest doesn't fix #2 (we still 
have to test, and I don't see how it magically becomes any easier without 
extra work - that could have been applied to talos instead)


I disagree here, very few devs are familiar with the talos framework and 
what it takes to get a new test written. Everyone is very familiar with 
mochitest and other related test frameworks on mc. I can write a mochitest 
to test perf in something simple like scrolling in about an hour. Putting 
together a talos scroll test would take much longer. If Talos were on mc it 
would help, but integrating into existing test frameworks we have and use on 
a regular basis seems like the simplest approach with the least amount of 
overhead.



Instead, I think we should be asking:
1) Is the best test framework for performance testing: [a] talos (with 
improvements), [b] mochitest (with a significant amount of work to make it 
compatible), or [c] a brand new framework?


On [b] there might be a significant amount of work in getting infra pieces 
to work maybe (like graph server or whatever we plan to replace it with) but 
not in writing a import module that devs would use to post data.


2) Regardless of framework used, would checking it into mozilla-central 
improve dev workflow enough to outweigh the downsides (see bug 787200 for 
history on that discussion)?


Maybe we might want to keep talos around for big, important tests. But I 
think devs need a way to run perf tests on a smaller scale that doesn't 
involve infra changes. I think having this ability would be a big win for 
us.


Jim


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: proposal: replace talos with inline tests

2013-03-04 Thread Boris Zbarsky

On 3/4/13 8:15 AM, Jim Mathies wrote:

So to work around this I’ve been putting together some basic perf tests I can 
use to measure performance using the mochitest framework.


How are you dealing with the fact that mochitest runs on heterogeneous 
hardware (including VMs and the like last I checked, which could have 
arbitrarily bad (or good!) performance characteristics depending on what 
else is happening with the host system)?



Maybe we should consider changing this system so devs can write performance 
tests that suit their needs that are integrated into our main repo? Basically:

1) rework graphs server to be open ended so that it can accept data from test 
runs within our normal test frameworks.
2) develop of test module that can be included in tests that allows test 
writers to post performance data to graph server.
3) come up with a good way to manage the life cycle of active perf tests so 
graph server doesn’t become polluted.
4) port existing talos tests over to the mochitest framework.
5) drop talos.


This sounds plausible, modulo the inability to port Tp in its current 
state to a setup that involves the tests living in m-c, as long as the 
problem above is kept in mind.  Basically, reusing something 
mochitest-like for developer familiarity may make sense, but it would 
need to be a separate test suite run on completely separate test slaves 
that are actually set up with performance testing in mind.  A separate 
test suite which is like mochitest is not a problem per se (we have the 
ipcplugins, chrome, browserchrome, a11y tests already).


So the main win would be making it easier to add new tests in terms of 
number of actions to be taken (something it seems like we could improve 
with the current Talos setup too) and easier for developers to add tests 
because the framework is already similar, right?


-Boris
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: proposal: replace talos with inline tests

2013-03-04 Thread Gregory Szorc

On 3/4/13 5:15 AM, Jim Mathies wrote:

For metrofx we’ve been working on getting omtc and apzc running in the browser. 
One of the things we need to be able to do is run performance tests that tell 
us whether or not the work we’re doing is having a positive effect on perf. We 
currently don’t have automated tests up and running for metrofx and talos is 
even farther off.

So to work around this I’ve been putting together some basic perf tests I can 
use to measure performance using the mochitest framework. I’m wondering if this 
might be a useful answer to our perf tests problems long term.

Putting together talos tests is a real pain. You have to write a new test using 
the talos framework (which is a separate repo from mc), test the test to be 
sure it’s working, file rel eng bugs on getting it integrated into talos test 
runs, populated in graph server, and tested via staging to be sure everything 
is working right. Overall the overhead here seems way too high.

Maybe we should consider changing this system so devs can write performance 
tests that suit their needs that are integrated into our main repo? Basically:

1) rework graphs server to be open ended so that it can accept data from test 
runs within our normal test frameworks.
2) develop of test module that can be included in tests that allows test 
writers to post performance data to graph server.
3) come up with a good way to manage the life cycle of active perf tests so 
graph server doesn’t become polluted.
4) port existing talos tests over to the mochitest framework.
5) drop talos.

Curious what people think of this idea.


Generally speaking, I think we should have a generic framework for 
declaring tests. i.e. test files for xpcshell, mochitest, Talos, etc 
would all look very similar from a JS perspective. I've been wanting to 
unify the in-test code for a while and over the weekend I put together a 
very rough draft of what I think this should look like [1]. Please 
criticize it.


If all your tests are declared the same way, then presumably the test 
running code would be similar and capturing performance data would 
require a single implementation affecting all test suites instead of N 
1-off solutions.


I'm of the opinion that would should generally collect tons of data from 
all of our testing frameworks and then sort out the meaning of that data 
later (e.g. ignore data from tests running on non-homogenous or 
unreliable hardware). Maybe we don't care about things like rev X-Y 
comparison of CPU cycles on an individual mochitest. But, we'd certainly 
be interested if we saw an individual mochitest's CPU cycle count or 
wall time double over the span of a month! You can't even raise eyebrows 
unless you have data. We don't have this data today. Even if we did, it 
would require separate implementations for each testing flavor 
(xpcshell, mochitest, etc).


We should unify our test running code as much as possible. Then, we 
should make decisions on whether it makes sense to collect and/or assess 
performance data in each execution context/test flavor.


[1] https://gist.github.com/indygreg/5073810
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: proposal: replace talos with inline tests

2013-03-04 Thread Jim Mathies
Boris Zbarsky bzbar...@mit.edu wrote in message 
news:o7ydnyp6n66okqnmnz2dnuvz_uwdn...@mozilla.org...
 On 3/4/13 8:15 AM, Jim Mathies wrote:
  So to work around this I’ve been putting together some basic perf tests I 
  can use to measure performance using the mochitest framework.
 
 How are you dealing with the fact that mochitest runs on heterogeneous 
 hardware (including VMs and the like last I checked, which could have 
 arbitrarily bad (or good!) performance characteristics depending on what 
 else is happening with the host system)?

That sounds like a rel eng problem that could be solved. I don’t know our 
enough about our test slaves to say for sure. 

 This sounds plausible, modulo the inability to port Tp in its current 
 state to a setup that involves the tests living in m-c, as long as the 
 problem above is kept in mind.  Basically, reusing something 
 mochitest-like for developer familiarity may make sense, but it would 
 need to be a separate test suite run on completely separate test slaves 
 that are actually set up with performance testing in mind.  A separate 
 test suite which is like mochitest is not a problem per se (we have the 
 ipcplugins, chrome, browserchrome, a11y tests already).

That's fine, I'm not married to mochitest, but something similar using the 
similar run characteristics would be best.

 So the main win would be making it easier to add new tests in terms of 
 number of actions to be taken (something it seems like we could improve 
 with the current Talos setup too) and easier for developers to add tests 
 because the framework is already similar, right?
 
 -Boris

Yes, basically - 

1) something checked into mc anyone can easily author or run (for tracking down 
regressions) without having to checkout a separate repo, or setup and run a 
custom perf test framework. 
2) performance tests that generate data that spits out to the console on local 
runs or could be posted to a graphs server in automation.
3) no releng overhead for setup of new perf tests. something that is built into 
the test framework / infrastructure we set up.

Jim
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: proposal: replace talos with inline tests

2013-03-04 Thread Justin Lebar
 1) something checked into mc anyone can easily author or run (for tracking 
 down regressions) without having to checkout a separate repo, or setup and 
 run a custom perf test framework.

I don't oppose the gist of what you're suggesting here, but please
keep in mind that small perf changes are often very difficult to track
down locally.  Small changes in system and toolchain configuration can
have large effects on average build speed and its variance.  For
example, I've found observable performance differences between Try and
m-c/m-i builds in the past (bug 653961), despite their build configs
being nearly identical.

In my experience, we spend the majority of our time trying to track
down small perf changes, so a change which makes it easier to track
down the source of large perf changes might not have an outsize
effect.

 3) no releng overhead for setup of new perf tests. something that is built 
 into the test framework / infrastructure we set up.

If we did this, we'd need to figure out how and when to promote
benchmarks to we care about them status.

We already don't back back out changes for regressing a benchmark like
we back them  out for regressing tests.  I think this is at least
partially because a general sentiment that not all of our benchmarks
correlate strongly to what they're trying to measure.

I suspect if anyone could check in a benchmark, the average quality of
benchmarks would likely stay roughly the same, but the number of
benchmarks would increase.  In that case we'd have even more
benchmarks with spurious regressions to deal with.

-Justin
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: proposal: replace talos with inline tests

2013-03-04 Thread Justin Dolske

On 3/4/13 9:36 AM, Gregory Szorc wrote:


If all your tests are declared the same way, then presumably the test
running code would be similar and capturing performance data would
require a single implementation affecting all test suites instead of N
1-off solutions.


We've talked about this before (perhaps in this very newsgroup), as a 
cheap (?) way to get extra perf measurements beyond our current limited 
set of tests, and to avoid having to add a new test suite/framework 
whenever someone wants a metric... E.G. measure the run time of each 
existing test, use scripts to figure out which ones are fairly stable 
over time, then watch for regressions. A chance to begin again in a 
orange land of opportunity and adventure!


But I'd also take the general ability to add a new test as a microbenchmark.


We should unify our test running code as much as possible.


Oh god yes please.

Justin

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: proposal: replace talos with inline tests

2013-03-04 Thread Dave Mandelin
On Monday, March 4, 2013 5:15:56 AM UTC-8, Jim Mathies wrote:
 For metrofx we’ve been working on getting omtc and apzc running in the 
 browser. One of the things we need to be able to do is run performance tests 
 that tell us whether or not the work we’re doing is having a positive effect 
 on perf. We currently don’t have automated tests up and running for metrofx 
 and talos is even farther off.
 
 So to work around this I’ve been putting together some basic perf tests I can 
 use to measure performance using the mochitest framework. I’m wondering if 
 this might be a useful answer to our perf tests problems long term. 

I think this is an incredibly interesting proposal, and I'd love to see 
something like it go forward. Detailed reactions below.

 Putting together talos tests is a real pain. You have to write a new test 
 using the talos framework (which is a separate repo from mc), test the test 
 to be sure it’s working, file rel eng bugs on getting it integrated into 
 talos test runs, populated in graph server, and tested via staging to be sure 
 everything is working right. Overall the overhead here seems way too high.

Yup. And that's a big problem. Not only does this make your life harder, it 
makes people not do as much performance testing as they otherwise might. The JS 
team has had the experience that adding a new way of creating correctness tests 
incredibly easy (with *zero* overhead in the common case) really helped get 
more tests written and used. So I think it would be great to make it a lot 
easier to write perf tests.

 Maybe we should consider changing this system so devs can write performance 
 tests that suit their needs that are integrated into our main repo? Basically:
 
 1) rework graphs server to be open ended so that it can accept data from test 
 runs within our normal test frameworks.

IIUC, something like this is a key requirement: letting any perf test feed into 
the reporting system. People have pointed out that the Talos tests run on 
selected machines, so the perf tests should probably run on them as well, 
rather than on the correctness test machines. But that's only a small change to 
the basic idea, right?

 2) develop of test module that can be included in tests that allows test 
 writers to post performance data to graph server.

Does that mean a mochitest module? This part seems optional, although certainly 
useful. Some tests will require non-mochitest frameworks.

I believe jmaher did some work to get in-browser standard JS benchmarks running 
automatically and reporting to graph-server. I'm curious how that would fit in 
with this idea--would the test module help at all, or could there be some other 
kind of more general module maybe, so that even things like standard benchmarks 
can be self-serve?

 3) come up with a good way to manage the life cycle of active perf tests so 
 graph server doesn’t become polluted.

:-) How about getting an owner optionally listed for new tests, and then tests 
will be removed if no one is looking at them (according to web server logs) and 
there is no owner of record or the owner doesn't say the tests are still needed?

 4) port existing talos tests over to the mochitest framework.
 
 5) drop talos.

This seems like it's in the line of fix Talos. I'm not sure if this 
particular 4+5 is the right way to go, but the idea has some merit.

To the extent that people don't pay attention to Talos, it seems we really 
don't need to do anything with it. If people are paying attention to and taking 
care of performance in their area, then we're covered. To take the example I 
happen to know best, the JS team uses AWFY to track JS performance on standard 
benchmarks and additional tests they've decided are useful. So Talos is not 
needed to track JS performance. Having all the features of the new graph server 
does sound pretty cool, though.

It appears that there a few areas that are only covered by Talos for now, 
though. I think in that category we have warm startup time via Ts, and basic 
layout performance via Tp. I'm not sure about memory, because we do seem to 
detect increases via Talos, but we also have AWSY, and I don't know whether 
AWSY obviates Talos memory measurements or not.

For that kind of thing, I'm thinking maybe we should go with the same teams 
take care of their own perf tests idea. Performance is a natural owner for Ts. 
I'm not entirely sure about Tp, but it's probably layout or DOM. Then those 
teams could decide if they wanted to switch from Talos to a different 
framework. If everything's working properly, if the difficulty of reproducing 
Talos tests locally caused enough problems to warrant it, the owning teams 
would notice and switch.

Dave
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: proposal: replace talos with inline tests

2013-03-04 Thread Robert O'Callahan
Writing a lot of performance tests creates the problem that those tests
will take a long time to run. The nature of performance tests is that each
test must run for a relatively long time to get meaningful results.
Therefore I doubt writing lots of different performance tests can scale.
(Maybe we can find ways to eliminate noise in very short tests, but that
might be research.)

One other thing to keep in mind if we're going to start doing performance
tests differently is https://bugzilla.mozilla.org/show_bug.cgi?id=846166.
Basically Chris suggests using eideticker for performance tests a lot more.

Rob
-- 
Wrfhf pnyyrq gurz gbtrgure naq fnvq, “Lbh xabj gung gur ehyref bs gur
Tragvyrf ybeq vg bire gurz, naq gurve uvtu bssvpvnyf rkrepvfr nhgubevgl
bire gurz. Abg fb jvgu lbh. Vafgrnq, jubrire jnagf gb orpbzr terng nzbat
lbh zhfg or lbhe freinag, naq jubrire jnagf gb or svefg zhfg or lbhe fynir
— whfg nf gur Fba bs Zna qvq abg pbzr gb or freirq, ohg gb freir, naq gb
tvir uvf yvsr nf n enafbz sbe znal.” [Znggurj 20:25-28]
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: proposal: replace talos with inline tests

2013-03-04 Thread Jeff Hammel
I'll point out and really this is about all I have to say on this thread 
that while perf testing (that is, recording results) may bewell, not 
easy, but not too awful that rigorous analysis of what the data means 
and if there is a regression is often hard since it is often the case, 
as evidenced by Talos, that distributions are non-normal and may be 
multi-modal. While I have no love of Talos, despite/because of sinking a 
year's worth of effort into it, I fear that any replacement will be done 
with a loss of all wisdom harvested from legacy, and then relearned.  If 
each team is responsible for perf testing, without a common basis and 
understanding of the stats analysis problem, I fear this will just 
multiply the problem. Frankly, one of the problems I've seen time and 
time again is the duplication of effort around a problem (which isn't a 
bad thing except...) and a lack of consolidation towards a 
(moz-)universal solution.


On 03/04/2013 04:47 PM, Dave Mandelin wrote:

On Monday, March 4, 2013 5:15:56 AM UTC-8, Jim Mathies wrote:

For metrofx we’ve been working on getting omtc and apzc running in the browser. 
One of the things we need to be able to do is run performance tests that tell 
us whether or not the work we’re doing is having a positive effect on perf. We 
currently don’t have automated tests up and running for metrofx and talos is 
even farther off.

So to work around this I’ve been putting together some basic perf tests I can 
use to measure performance using the mochitest framework. I’m wondering if this 
might be a useful answer to our perf tests problems long term.

I think this is an incredibly interesting proposal, and I'd love to see 
something like it go forward. Detailed reactions below.


Putting together talos tests is a real pain. You have to write a new test using 
the talos framework (which is a separate repo from mc), test the test to be 
sure it’s working, file rel eng bugs on getting it integrated into talos test 
runs, populated in graph server, and tested via staging to be sure everything 
is working right. Overall the overhead here seems way too high.

Yup. And that's a big problem. Not only does this make your life harder, it 
makes people not do as much performance testing as they otherwise might. The JS 
team has had the experience that adding a new way of creating correctness tests 
incredibly easy (with *zero* overhead in the common case) really helped get 
more tests written and used. So I think it would be great to make it a lot 
easier to write perf tests.


Maybe we should consider changing this system so devs can write performance 
tests that suit their needs that are integrated into our main repo? Basically:

1) rework graphs server to be open ended so that it can accept data from test 
runs within our normal test frameworks.

IIUC, something like this is a key requirement: letting any perf test feed into 
the reporting system. People have pointed out that the Talos tests run on 
selected machines, so the perf tests should probably run on them as well, 
rather than on the correctness test machines. But that's only a small change to 
the basic idea, right?


2) develop of test module that can be included in tests that allows test 
writers to post performance data to graph server.

Does that mean a mochitest module? This part seems optional, although certainly 
useful. Some tests will require non-mochitest frameworks.

I believe jmaher did some work to get in-browser standard JS benchmarks running 
automatically and reporting to graph-server. I'm curious how that would fit in 
with this idea--would the test module help at all, or could there be some other 
kind of more general module maybe, so that even things like standard benchmarks 
can be self-serve?


3) come up with a good way to manage the life cycle of active perf tests so 
graph server doesn’t become polluted.

:-) How about getting an owner optionally listed for new tests, and then tests 
will be removed if no one is looking at them (according to web server logs) and 
there is no owner of record or the owner doesn't say the tests are still needed?


4) port existing talos tests over to the mochitest framework.

5) drop talos.

This seems like it's in the line of fix Talos. I'm not sure if this 
particular 4+5 is the right way to go, but the idea has some merit.

To the extent that people don't pay attention to Talos, it seems we really 
don't need to do anything with it. If people are paying attention to and taking 
care of performance in their area, then we're covered. To take the example I 
happen to know best, the JS team uses AWFY to track JS performance on standard 
benchmarks and additional tests they've decided are useful. So Talos is not 
needed to track JS performance. Having all the features of the new graph server 
does sound pretty cool, though.

It appears that there a few areas that are only covered by Talos for now, 
though. I think in that category we have warm startup time 

Re: proposal: replace talos with inline tests

2013-03-04 Thread Dave Mandelin
On Monday, March 4, 2013 5:42:39 AM UTC-8, Ed Morley wrote:
 (CCing auto-to...@mozilla.com)
 
 jmaher and jhammel will be able to comment more on the talos specifics, 
 but few thoughts off the top of my head:
 
 It seems like we're conflating multiple issues here:
   1) [talos] is a separate repo from mc

And also

1a) Talos itself is a big pain for developers to use and debug regressions 
in, not to mention add tests to, which they basically don't.

It seems that some of this may have changed recently, especially around using 
the new framework--I haven't used it in a while. I think Talos still does fail 
on creating tests, though, because lots of things just don't fit its 
assumptions.

   2) [it's a hassle to] test the test to be sure it’s working
   3) [it's a hassle to get results] populated in graph server
   4) [we need to] come up with a good way to manage the life cycle of 
 active perf tests so graph server doesn’t become polluted

 Switching from the talos harness to mochitest doesn't fix #2 (we still 
 have to test, and I don't see how it magically becomes any easier 
 without extra work - that could have been applied to talos instead) or 
 #3/#4 (orthogonal problem). It also seems like a brute force way of 
 fixing #1 (we could just check talos into mozilla-central).

I think that part was mostly supposed to address (1a).

 Instead, I think we should be asking:
 
 1) Is the best test framework for performance testing: [a] talos (with 
 improvements), [b] mochitest (with a significant amount of work to make 
 it compatible), or [c] a brand new framework?

I think that question doesn't have one answer. For JS, it's clearly something 
else, but it's not even really a framework--it's just running standard 
benchmarks. 

For other areas, there are likely different answers. That's why I was so 
excited about the self-serve idea. (Interestingly, I got schooled on this 
subject in a similar vein recently on bug tracking. :-) )

 2) Regardless of framework used, would checking it into mozilla-central 
 improve dev workflow enough to outweigh the downsides (see bug 787200 
 for history on that discussion)?

Thanks for the bug link. It seems like putting Talos itself into m-c has 
significant disadvantages. I'm not sure what to do with other/new perf tests.

 3) Regardless of framework used, how can we make the 
 development/testing/staging cycle less painful?

I liked the original proposal a lot for this.

 4) Regardless of framework used, who should be responsible for ensuring 
 we actively prune performance tests that are no longer relevant?

I gave an idea for how to do this in my reply to the original proposal. I 
didn't say who would do it, but I was assuming the maintainers/operators of 
graph-server, with the notion that they would be highly empowered to remove 
anything that no one asked them to keep or that didn't otherwise have a 
well-documented, easily understood rationale.

Dave
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: proposal: replace talos with inline tests

2013-03-04 Thread Dave Mandelin
On Monday, March 4, 2013 5:17:29 PM UTC-8, Gregory Szorc wrote:
 On 3/4/13 5:09 PM, Dave Mandelin wrote:
 
  We already don't back back out changes for regressing a benchmark like
  we back them  out for regressing tests.  I think this is at least
  partially because a general sentiment that not all of our benchmarks
  correlate strongly to what they're trying to measure.
 
  I know this has been a hot topic lately. I think getting more clarity on 
  this would be great, *if* of course we could have an answer that was both 
  operationally beneficial and clear, which seems to be incredibly difficult.
 
  But this thread gives me a new idea. If each test run in automation had an 
  owner (as I suggested elsewhere), how about also making the owners 
  responsible for informing the sheriffs about what to do in case of 
  regression? If the owners know the test is reliable and measures something 
  important, they can ask for monitoring and presumptive backout. If not, 
  they can ask sheriffs to ignore the test, inform and coordinate with the 
  owning team, inform the landing person only, or some other action.
 
 This should be annotated in the tests themselves, IMO. We could even 
 have said annotation influence the color on TBPL. 

I like it. We would need to make sure the annotations reflect active 
consideration by the test owners, but I suppose failures are likely to 
self-correct.

 IMO we should be focusing on lessening the burden on the 
 sheriffs and leaving them to focus on real problems.

Absolutely.

Dave
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform