Re: Changing reftest required resolution

2012-08-29 Thread jmaher
On Wednesday, August 29, 2012 11:35:52 AM UTC-4, Andrew Halberstadt wrote:
 On 08/29/2012 09:56 AM, Andrew Halberstadt wrote:
 
  On 08/28/2012 02:17 PM, L. David Baron wrote:
 
  On Tuesday 2012-08-28 12:52 -0400, Andrew Halberstadt wrote:
 
  I also don't think we should go quite as small as 400x400 -- and we
 
  want to come up with a common value with other browser vendors that
 
  are also using reftest.  We don't want to be running our reftests at
 
  a size smaller than the accepted max size for reftests at W3C.
 
 
 
  Joel mentioned he was thinking of 600x400. I don't know that there's a
 
  magic size where things all of a sudden improve significantly. I'll ask
 
  around about panda/apc.io default resolutions.
 
 
 
 
 
 
 
 I'm told they do 1280x720, so we'd at least need to bump the height down.

That is 720 height and reftest uses a subset of that (I believe 672) to account 
for the android task bar and the urlbar.  600 height as a maximum will work, 
likewise 480x640 or something like that.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Planning to turn off Talos tdhtml

2012-10-16 Thread jmaher
In our continuing effort on the Auotmation and Tools team to make Talos tests 
useful and make sure we know and care about what we are measuring, we have 
realized that the tdhtml tests are not providing us any value.  We run these as 
chrome/nochrome.

Here are the tests:
http://hg.mozilla.org/build/talos/file/8960d6a0b2c2/talos/page_load_test/dhtml

We would be happy to leave some or all of these tests on and running.  Our 
current plan is to disable these on November 1st.

Please speak up if you find value in these tests.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: ManifestDestiny

2013-03-19 Thread jmaher
On Tuesday, March 19, 2013 5:37:22 AM UTC-4, Robert O'Callahan wrote:
 Cool name. However, are you actually proposing replacing, say, the reftest 
 manifest format with this format? It looks like it would be a lot more
 verbose.

My understanding is there is no need or plan to replace the reftest manifest 
format with ManifestDestiny.  Right now this is used for xpcshell, and we are 
working towards making it work for mochitests.  

Trying to make a one size fits all solution doesn't really make sense for all 
of the different harnesses.  The reftest .list format is too clunky for just 
listing out tests to run, likewise the ManifestDestiny .ini format is too 
awkward to list A vs B type of comparisons along with complex conditions.

-Joel
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Proposal for using a multi-headed tree instead of inbound

2013-04-04 Thread jmaher
the 325 jobs per push come from manually counting jobs on tbpl (ignoring pgo).  
remember to use showall=1.  The total stats from gps include try which has much 
fewer test jobs per push and inbound coalescing.   

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Proposal for using a multi-headed tree instead of inbound (updated)

2013-04-05 Thread jmaher
My thoughts on why the average build time is shorter on try vs inbound is 
inbound includes pgo builds and debug builds which have other steps.  The try 
server builds are not usually doing pgo.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: reorganizing some test directories

2013-04-11 Thread jmaher
On Thursday, April 11, 2013 10:26:25 AM UTC-4, Scott Johnson wrote:
 Thus Spoke jmaher:
 
  There are a couple common directory structures used for storing tests in 
  the tree:
 
  1) component/tests
 
  2) component/tests/harness
 
 
 
  I have a series of patches which will move most of the directory structures 
  from #1 to a format of #2.  This means we would see:
 
  component/tests/mochitest
 
  component/tests/browser
 
  component/tests/chrome
 
 Will this also affect reftests? Specifically, if we had the following
 
 structure:
 
 
 
 component/tests/mochitest
 
 component/reftests/
 
 
 
 Will the latter be affected and placed in component/tests/reftests?
 
 
 
 ~Scott

Great question- the main goal is to have each test type in a directory for that 
specific harness.  Currently reftest/crashtest files are in subfolders to 
indicate that.  The existing patches on the referenced bug have a couple 
instances of moving crashtests to the tests/ subfolder, but in general they are 
staying put. 
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load

2013-04-26 Thread jmaher
On Thursday, April 25, 2013 4:12:16 PM UTC-4, Ed Morley wrote:
 On 25 April 2013 20:14:10, Justin Lebar wrote:
 
  Is this what you're saying?
 
  * 10.6 opt tests - per-checkin (no change)
 
  * 10.6 debug tests- reduced
 
  * 10.7 opt tests - reduced
 
  * 10.7 debug tests - reduced
 
 
 
  * reduced -- m-c, m-a, m-b, m-r, esr17
 
 
 
  Yes.
 
 
 
  Now that I think about this more, maybe we should go big or go home:
 
  change 10.6 opt tests to reduced as well, and see how it goes.  We can
 
  always change it back.
 
 
 
  If it goes well, we can try to do the same thing with the Windows tests.
 
 
 
  We should get the sheriffs to sign off.
 
 
 
 Worth a shot, we can always revert :-) Only thing I might add, is that 
 
 we'll need a way to opt into 10.6 test jobs on Try, in case someone has 
 
 to debug issues found on mozilla-central (eg using sfink's undocumented 
 
 OS version specific syntax).
 
 
 
 Ed

I had to revert a talos change on inbound due to 10.6 failures only just on 
Wednesday.  This was due to a different version of python on 10.6 :(  

-Joel
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Improving Mac OS X 10.6 test wait times by reducing 10.7 load

2013-04-26 Thread jmaher
On Friday, April 26, 2013 9:49:18 AM UTC-4, Armen Zambrano G. wrote:
 
 Maybe we can keep one of the talos jobs around? (until releng fixes the 
 
 various python versions' story)
 
 IIUC this was more of an infra issue rather than a Firefox testing issue.

It was infra related, but it was specific to the 10.6 platform.  Even knowing 
that, I fully support the proposed plan.  We could have easily determined the 
root cause of the 10.6 specific failure a day later on a different branch.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: android.json

2013-05-03 Thread jmaher
On Friday, May 3, 2013 12:07:27 PM UTC-4, Ehsan Akhgari wrote:
 Can somebody explain to me what android.json is, why it exists, and 
 
 what's different between disabling a mochitest on Android there versus 
 
 excluding it from MOCHITEST_FILES in the Makefile.in?
 
 
 
 Thanks!
 
 Ehsan

Android.json is a way to enable/disable tests quickly.  It allows us to specify 
which tests we want to run (the original use and the b2g case) as well as which 
test cases we want to ignore.  Logic in Makefile.in is not optimal and for 
large scale changes, requires a lot of additional work.  We can now look in one 
place and find all the test cases which are not able to run instead of looking 
through 50 or so Makefile.in files.

When we do officially switch to manifests for mochitest, there will be less of 
a need for this.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


new prefs for talos

2013-05-22 Thread jmaher
Last week while investigating a crash on android which only happened in the 
reftest and talos harness (not in the mochitest harness), we compared the 
preferences.  We found in mochitest we set a bunch of preferences to disable 
background network access (intentionally designed to 404 for mochitest):
// Point the url-classifier to the local testing server for fast failures
user_pref(browser.safebrowsing.gethashURL, 
http://127.0.0.1:/safebrowsing-dummy/gethash;);
user_pref(browser.safebrowsing.keyURL, 
http://127.0.0.1:/safebrowsing-dummy/newkey;);
user_pref(browser.safebrowsing.updateURL, 
http://127.0.0.1:/safebrowsing-dummy/update;);
// Point update checks to the local testing server for fast failures
user_pref(extensions.update.url, 
http://127.0.0.1:/extensions-dummy/updateURL;);
user_pref(extensions.update.background.url, 
http://127.0.0.1:/extensions-dummy/updateBackgroundURL;);
user_pref(extensions.blocklist.url, 
http://127.0.0.1:/extensions-dummy/blocklistURL;);
user_pref(extensions.hotfix.url, 
http://127.0.0.1:/extensions-dummy/hotfixURL;);
// Turn off extension updates so they don't bother tests
user_pref(extensions.update.enabled, false);
// Make sure opening about:addons won't hit the network
user_pref(extensions.webservice.discoverURL, 
http://127.0.0.1:/extensions-dummy/discoveryURL;);
// Make sure AddonRepository won't hit the network
user_pref(extensions.getAddons.maxResults, 0);
user_pref(extensions.getAddons.get.url, 
http://127.0.0.1:/extensions-dummy/repositoryGetURL;);
user_pref(extensions.getAddons.getWithPerformance.url, 
http://127.0.0.1:/extensions-dummy/repositoryGetWithPerformanceURL;);
user_pref(extensions.getAddons.search.browseURL, 
http://127.0.0.1:/extensions-dummy/repositoryBrowseURL;);
user_pref(extensions.getAddons.search.url, 
http://127.0.0.1:/extensions-dummy/repositorySearchURL;);
// Make sure that opening the plugins check page won't hit the network
user_pref(plugins.update.url, 
http://127.0.0.1:/plugins-dummy/updateCheckURL;);

We landed these prefs for android based reftests and the crash has gone away, 
we will be doing the same for talos.  In thinking about it more, it makes sense 
to add these to desktop Talos as well.

Is there any reason why we shouldn't be setting these preferences for desktop 
Talos?  

Should any of those be adjusted?

Thanks!
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Windows XP virtual memory peak appears to be causing talos timeout errors for dromaeo_css

2013-07-10 Thread jmaher
We have a top orange factor failure which is a talos timeout that only happens 
on windows XP and predominately on the dromaeo_css test.  What happens is we 
appear to complete the test just fine, but the poller we have on the process 
used to manage firefox never indicates we have finished.  After doing some 
screenshots and looking at the process list, I haven't found much except that 
on the failing cases the _Total value for Virtual Bytes Peak is 2GB, and for 
all the passing instances it is ~1.25GB.

Are there other things I should look for, or things I could change to fix this 
problem?
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


running tests in HiDPI mode on the build machines

2013-07-10 Thread jmaher
Can you explain what would need to be done for Android to get into this mode?  
It might be difficult to make this work with our current solution for automated 
tests.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: How is Talos on Android deployed?

2013-09-10 Thread jmaher
Please see the talos wiki for detailed instructions:
https://wiki.mozilla.org/Buildbot/Talos/Misc#Steps_to_add_a_test_to_production

Currently desktop tests run on mozharness and android tests run the old way via 
raw buildbot.  There is a difference and there will be a difference for the 
next year until we have all mobile platform running on mozharness.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Tegra build backlog is too big!

2013-09-11 Thread jmaher
quite possibly we don't need all those jobs running on tegras.  I don't know of 
a bug in the product that has broken on either the tegra or panda platform but 
not the other.

Joel
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Migrating to Win64 rev2 machines

2013-10-10 Thread jmaher
it takes a lot of work to get green tests on a new platform.  I spent the 
better part of December to March getting tests green on Ubuntu.

If these are in vms, we wouldn't have the graphics cards mentioned above. In 
fact we might not have the ability to run the webgl tests. 
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Talos ts_paint numbers will take a small adjustment

2013-10-25 Thread jmaher
In bug 923770 we switched talos to use mozprocess [1] internally.  This is a 
great win for managing the internal process of Firefox and reducing our 
timeouts on OSX and Windows test runs.  The side effect of running in 
mozprocess is about a 500ms delay in launching the process.  I have looked at 
all the other numbers and right now the only test affected is ts_paint and this 
is the case for all platforms.

I will land this on mozilla-inbound Monday morning, please speak up this 
weekend if you have other concerns or questions.

[1] - mozprocess is found here: 
https://github.com/mozilla/mozbase/tree/master/mozprocess.  It is also used to 
launch the browser for mochitests.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Pushes to Backouts on Mozilla Inbound

2013-11-05 Thread jmaher
using https://treestatus.mozilla.org/mozilla-inbound, I looked at the reasons 
for tree closures (usually associated with backouts), going back to 50 status 
messages, I found:
38 test issues
14 build issues
9  infrastructure issues
2  other issues

Note, some of these closures had 1 issue documented.

Questions to ask would be:
* How many of these build issues would have been resolved by a developer 
building locally?
* How many of these test issues would have been resolved by a developer running 
a simple smoketest (i.e. 5 minutes)?

I agree that running on try server over and over again is just pushing the 
resource problem from a managed tree to an unmanaged tree.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Measuring power usage

2013-11-05 Thread jmaher
I am working on using intel power gadget to measure the power usage.  Currently 
this is on windows with an idle test.  Our test slaves have older CPUs which do 
not support the intel power gadget.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Spring cleaning: Reducing Number Footprint of HG Repos

2014-03-27 Thread jmaher
For talos development we allow pointing at a user specific repo instead of the 
master one.  This has greatly reduced the time to bring up new tests.  This 
could easily be hosted elsewhere, but we chose to restrict it to user repos for 
a security measure.  You have to have cleared some form of basic authentication 
with user repos and now if someone wants to see how their talos modifications 
run on talos they can do that without checking them in.

A change like this will require us to either remove this functionality, make it 
less secure, or create busy work whenever someone new wants to point to a 
custom talos repository.

-Joel
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Policy for disabling tests which run on TBPL

2014-04-04 Thread jmaher
As the sheriff's know it is frustrating to deal with hundreds of tests that 
fail on a daily basis, but are intermittent.

When a single test case is identified to be leaking or failing at least 10% of 
the time, it is time to escalate.

Escalation path:
1) Ensure we have a bug on file, with the test author, reviewer, module owner, 
and any other interested parties, links to logs, etc.
2) We need to needinfo? and expect a response within 2 business days, this 
should be clear in a comment.
3) In the case we don't get a response, request a needinfo? from the module 
owner
with the expectation of 2 days for a response and getting someone to take 
action.
4) In the case we go another 2 days with no response from a module owner, we 
will disable the test.

Ideally we will work with the test author to either get the test fixed or 
disabled depending on available time or difficulty in fixing the test.

This is intended to respect the time of the original test authors by not 
throwing emergencies in their lap, but also strike a balance with keeping the 
trees manageable. 

Two exceptions:
1) If a test is failing at least 50% of the time, we will file a bug and 
disable the test first
2) When we are bringing a new platform online (Android 2.3, b2g, etc.) many 
tests will need to be disabled prior to getting the tests on tbpl.

Please comment if this doesn't make sense.

-Joel
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Policy for disabling tests which run on TBPL

2014-04-04 Thread jmaher
 
  4) In the case we go another 2 days with no response from a module owner, 
  we will disable the test.
 
 
 
 Are you talking about newly-added tests, or tests that have been
 
 passing for a long time and recently started failing?
 
 
 
 In the latter case, the burden should fall on the regressing patch,
 
 and the regressing patch should get backed out instead of disabling
 
 the test.
 

I had overlooked a new test- I agree that backing it out is the right thing.


 
 If this plan is applied to existing tests, then it will lead to
 style system mochitests being turned off due to other regressions
 because I'm the person who wrote them and the module owner, and I
 don't always have time to deal with regressions in other parts of
 code (e.g., the JS engine) leading to these tests failing
 intermittently.
 
 If that happens, we won't have the test coverage we need to add new
 CSS properties or values.

Interesting point.  Are these tests failing often?  Can we invest some minimal 
time into these to make them more reliable from a test case or test harness 
perspective?

As long as there is a dialog in the bugs filed, I would find it hard to believe 
we would just disable a test and it come as a surprise.

 
 More generally, it places a much heavier burden on contributors who
 have been part of the project longer, who are also likely to be
 overburdened in other ways (e.g., reviews).  That's why the burden
 needs to be placed on the regressing change rather than the original
 author of the test.

I am open to ideas to help figure out the offending changes.  My understanding 
is many of the test failures are due to small adjustments to the system or even 
the order the tests are run such that it causes the test to fail intermittently.

I know there are a lot of new faces to the Mozilla community every month, could 
we offload some (not all) of this work to mozillians with less on their plate?
 
 
 These 10% and 50% numbers don't feel right to me; I think the
 thresholds should probably be substantially lower.  But I think it's
 easier to think about these numbers in failures/day, at least for
 me.

Great feedback on this.  Maybe we pick the top 10 from orange factor 
(http://brasstacks.mozilla.com/orangefactor/index.html), or we cut the numbers 
in half.  10% and 50% were sort of last resort numbers I came up with, ideally 
there would have already been a conversation/bug about the problem.

  2) When we are bringing a new platform online (Android 2.3, b2g, etc.) many 
  tests will need to be disabled prior to getting the tests on tbpl.
 
 That's reasonable as long as work is done to try to get the tests 
 enabled (at a minimum, actually enabling all the tests that are
 passing reliably, rather than stopping after enabling the passing
 tests in only some directories).

One this I have heard is coming online is a way to track the # of tests 
available/disabled per platform, that would really help with ensuring we are 
not ignoring thousands of tests on a new platform.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Policy for disabling tests which run on TBPL

2014-04-15 Thread jmaher
I want to express my thanks to everyone who contributed to this thread.  We 
have a lot of passionate and smart people who care about this topic- thanks 
again for weighing in so far.

Below is a slightly updated policy from the original, and following that is an 
attempt to summarize the thread and turn what makes sense into actionable items.

= Policy for handling intermittent oranges = 

This policy will define an escalation path for when a single test case is 
identified to be leaking or failing and is causing enough disruption on the 
trees. Disruption is defined as:
1) Test case is on the list of top 20 intermittent failures on Orange Factor 
(http://brasstacks.mozilla.com/orangefactor/index.html)
2) It is causing oranges =8% of the time
3) We have 100 instances of this failure in the bug in the last 30 days

Escalation is a responsibility of all developers, although the majority will 
fall on the sheriffs.

Escalation path:
1) Ensure we have a bug on file, with the test author, reviewer, module owner, 
and any other interested parties, links to logs, etc.
2) We need to needinfo? and expect a response within 2 business days, this 
should be clear in a comment.
3) In the case we don't get a response, request a needinfo? from the module 
owner
with the expectation of 2 days for a response and getting someone to take 
action.
4) In the case we go another 2 days with no response from a module owner, we 
will disable the test.

Ideally we will work with the test author to either get the test fixed or 
disabled depending on available time or difficulty in fixing the test.  If a 
bug has activity and work is being done to address the issue, it is reasonable 
to expect the test will not be disabled.  Inactivity in the bug is the main 
cause for escalation.

This is intended to respect the time of the original test authors by not 
throwing emergencies in their lap, but also strike a balance with keeping the 
trees manageable.

Exceptions:
1) If this test has landed (or been modified) in the last 48 hours, we will 
most likely back out the patch with the test
2) If a test is failing at least 30% of the time, we will file a bug and 
disable the test first
3) When we are bringing a new platform online (Android 2.3, b2g, etc.) many 
tests will need to be disabled prior to getting the tests on tbpl.
4) In the rare case we are disabling the majority of the tests (either at once 
or slowly over time) for a given feature, we need to get the module owner to 
sign off on the current state of the tests.


= Documentation =
We have thousands of tests disabled, many are disabled for different build 
configurations or platforms. This can be dangerous as we slowly reduce our 
coverage. By running a daily report (bug 996183) to outline the total tests 
available vs each configuration (b2g, debug, osx, e10s, etc.) we can bring 
visibility to the state of each platform and if we are disabling more than we 
fix.

We need to have a clear guide on how to run the tests, how to write a test, how 
to debug a test, and use metadata to indicate if we have looked at this test 
and when.

When an intermittent bug is filed, we need to clearly outline what information 
will aid the most in reproducing and fixing this bug.  Without a documented 
process for fixing oranges, this falls on the shoulders of the original test 
authors and a few determined hackers.


= General Policy =
I have adjusted the above policy to mention backing out new tests which are not 
stable, working to identify a regression in the code or tests, and adding 
protection so we do not disable coverage for a specific feature completely. In 
addition, I added a clearer definition of what is a disruptive test and 
clarified the expectations around communicating in the bug vs escalating.

What is more important is the culture we have around commiting patches to 
Mozilla repositories. We need to decide as an organization if we care about 
zero oranges (or insert acceptable percentage). We also need to decide what is 
acceptable coverage levels and what our general policy is for test reviews (at 
checkin time and in the future). These need to be answered outside of this 
policy- but the sooner we answer these questions, the better we can all move 
forward towards the same goal.


= Tools =
Much of the discussion was around tools. As a member of the Automation and 
Tools team, I should be advocating for more tools, in this case I am leaning 
more towards less tools and better process.

One common problem is dealing with the noise around infrastructure and changing 
environments and test harnesses. Is this documented, how can we filter that 
out? Having our tools support ways to detect this and annotate changes 
unrelated to tests or builds will go a long way.  Related is updating our 
harnesses and the way we run tests so they are more repeatable.  I have filed 
bug 996504 to track work on this.

Another problem we can look at with tooling is annotating the expected outcome 

Re: Policy for disabling tests which run on TBPL

2014-04-15 Thread jmaher
On Tuesday, April 15, 2014 9:42:25 AM UTC-4, Kyle Huey wrote:
 On Tue, Apr 15, 2014 at 6:21 AM, jmaher joel.ma...@gmail.com wrote:
 
  This policy will define an escalation path for when a single test case is 
  identified to be leaking or failing and is causing enough disruption on the 
  trees. Disruption is defined as:
 
  1) Test case is on the list of top 20 intermittent failures on Orange 
  Factor (http://brasstacks.mozilla.com/orangefactor/index.html)
 
  2) It is causing oranges =8% of the time
 
  3) We have 100 instances of this failure in the bug in the last 30 days
 
 
 
 Are these conditions joined by 'and' or by 'or'?  If 'or', there will
 
 always be at least 20 tests meeting this set of criteria ...
 
 
 
 - Kyle

Great question Kyle-

The top 20 doesn't always include specific tests- sometimes it is 
infrastructure, hardware/vm, or test harness, mozharness, etc. related. If a 
test meets any of the above criteria and is escalated, then we should expect to 
follow some basic criteria about either working on fixing it or disabling it as 
spelled out in the escalation path.

For the large majority of the cases, bugs filed for specific test cases will 
meet all 3 conditions.  We have had some cases where we have thousands of stars 
over years, but it isn't on the top 20 list all the time.  Likewise when we 
have 10 infra bugs, a frequent orange on the trees won't be in the top 20.

-Joel
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: OMTC on Windows

2014-05-28 Thread jmaher

https://bugzilla.mozilla.org/show_bug.cgi?id=1013262 tracks all the Talos 
performance adjustments 
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Is it time for mochitest-chrome on Android and B2G

2014-06-18 Thread jmaher
Could you give some examples of what tests we could run on mobile in chrome? 
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Upcoming performance tooling work in the next few months

2014-06-27 Thread jmaher
The A*Team has a lot of performance tool related work in Q3 (goals: 
https://wiki.mozilla.org/Auto-tools/Goals/2014Q3#Performance)

For folks interested in performance automation and tools here are some of the 
key areas we plan on working on:
* Tree Herder Talos UI - view all performance data from Talos in the Tree 
Herder UI, replace all the functionality of Datazilla to allow us to turn 
Datazilla off for Talos data collection/viewing in Q4.  We are expecting a 
delivery of this by the end of August.

* High resolution alerts - https://wiki.mozilla.org/Auto-tools/Projects/Alerts, 
deliver useful alerts per test from Talos (not per test suite as we already 
do), will be in beta mode as our raw data source will be changing from 
Datazilla to Tree Herder.
** We are working on defining initial parameters and verifying reliability of 
the data
** We currently depend on Datazilla API for retrieving the data, this will 
switch in August to Tree Herder
** Once the Tree Herder UI is in place and we have proven these are useful we 
will integrate this (eta September)

* Graph server - still the standard for sending automated alerts. Minor 
adjustments to resolve display of retriggers, stop dropping highest value in 
calculating test suite value and switch to geometric mean instead of average.  
** no dependencies
** Will discuss deprecation of this when Tree Herder and high resolution alerts 
are deployed post Q3.

* New Talos tests: tp5o_scroll, media_tests (webRTC), mainthread IO in tp5, and 
webGL, cold startup

* Eideticker - add automated alerts, increase frequency of b2g runs, and get 
android 2.3 and 4.x runs
working again with increased frequency

* Add weekly/nightly test runs on windows for cross browser Javascript 
benchmarking and power profiling.

Priorities change every few weeks as new projects become critical, but look for 
progress on if not completion on the above items.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Disabling a few mochitests this week, and upcoming changes next week to browser-chrome

2014-08-19 Thread jmaher
It has been 3+ weeks since Vaibhav and I found the remaining issues with 
--run-by-dir (https://bugzilla.mozilla.org/show_bug.cgi?id=992911) for 
browser-chrome.  Since then a few issues have been fixed, and many have been 
ignored.

Going with our test disabling policy 
(https://wiki.mozilla.org/Sheriffing/Test_Disabling_Policy), I am going to get 
ready and disable tests on Friday.  This weekend I can sort out any remaining 
issues (there are always new issues that show up as the product changes) and 
turn on --run-by-dir next week.

You can see all the bugs that are blocking bug 992911 
(https://bugzilla.mozilla.org/showdependencytree.cgi?id=992911hide_resolved=1)

Happy hacking,
-jmaher
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Disabling a few mochitests this week, and upcoming changes next week to browser-chrome

2014-08-19 Thread jmaher
On Tuesday, August 19, 2014 11:46:08 AM UTC-4, Gavin Sharp wrote:
  Going with our test disabling policy 
  (https://wiki.mozilla.org/Sheriffing/Test_Disabling_Policy), I am going to 
  get ready and disable tests on Friday.
 
 
 
 A few issues here:
 
 - This particular case (we're making broad changes to how the tests
 
 run that causes many new failures) was not what that policy was meant
 
 to cover. We need some leeway to handle this situation differently.
 
 - This policy should probably clarify that any test disabling patches
 
 still require module owner/peer review.
 
 
 
  You can see all the bugs that are blocking bug 992911 
  (https://bugzilla.mozilla.org/showdependencytree.cgi?id=992911hide_resolved=1)
 
 
 
 We cannot disable all of the tests in that list wholesale. At least
 
 bug 1041569, bug 1041583, bug 1017187, bug 1001820, and bug 963075

Bug 1041569 - has been needinfo'd for 4 weeks, it is only disabling this on win 
debug, I don't know how to reduce that anymore, especially if nobody wants to 
look at it.

bug 1041583 - no activity in 4+ week as well.  We are only disabling it on 
windows debug

bug 1017187 - no reply in 4+ weeks, this is disabling the test on all debug 
branches.

bug 1001820 - idle for 3 weeks until last week and it was recommended to 
disable the test (in fact it appears to be disabled)

bug 963075 - no activity for 4 weeks, we need to disable 2 pdf tests on opt 
builds only.

All in all we are changing 12 tests in the manifests.

It is obvious based on the lack of response in the bugs that fixing these bugs 
are not a priority, changing the way we run these tests will only give us more 
reliable results and less churn when we need to adjust chunking or jobs.  
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


How do you run unit tests against Firefox?

2014-11-12 Thread jmaher
On the A*Team we do many projects to bring new platforms online or make sure 
our test harnesses work in new situations (like e10s).  We hear great ideas 
from folks we interact with (sometimes we make up our own ideas), but one thing 
we don't know is how people run unit tests and what type of workflow is normal.

To answer this we created a survey:
http://goo.gl/p45Iwo

As different teams have different workflows, we would really appreciate if you 
fill out the team you work on (or do the most work for) - that is question 1 of 
13 in the survey!

Now is your chance to ask for automated sandwich making!

-jmaher
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: How do you run unit tests against Firefox?

2014-11-13 Thread jmaher
On Wednesday, November 12, 2014 5:51:40 PM UTC-5, Benjamin Smedberg wrote:
 On 11/12/2014 5:49 PM, Nicholas Nethercote wrote:
  What exactly do you mean by unit tests?
 I presumed that this meant all of our pass/fail test suites (not the 
 performance tests): xpcshell, the various flavors of mochitest, 
 reftests, etc.
 
 --BDS

Thanks for the question Nicholas.  I do in fact me any time of testing, 
specifically testing that is done by our automation.  So as Benjamin pointed 
out xpcshell, mochitests, reftests.  It would also include the .cpp tests.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


It is time to solve making a push to Try server show a performance regression or validate a fix

2014-12-12 Thread jmaher
In the history of running Talos, there has never been an easy way to determine 
if your change has fixed a regression or created a new one.  We have compare.py 
and compare-talos which are actually quite useful, but it requires you to run 
yet another tool - in short you have to break your normal workflow.

Over the last year of looking at Talos regressions, it usually is pretty 
obvious within 3 data points if you have a real sustained regression.  There 
are many reasons for needing 1 data point, no need to go into that here.  When 
you land a change to an integration tree, the automated alerts will wait until 
12 future data points are available and then determine if there is a sustained 
regression or improvement.  Can we do this in a more streamlined way?

I believe so, the easy way is to abuse the 'orange' color for a job on 
treeherder.  The way I see it working is like this:
* push to try, include some talos jobs in your patch
* at the end of your job after data is uploaded, we query graph server for the 
try data and compare it to the expected range of data in mozilla-central for 
the last 7 days.
* If your data point is outside of the range we turn the job orange with a 
message to retrigger this job a couple more times
* When there are at least 3 data points (read, most likely 3 orange talos jobs 
on your try push) the message will indicate you probably have a sustained 
regression.

This pattern makes a few assumptions:
1) That you will run Talos on try server
2) That you are fine with orange Talos jobs and manually retriggering
3) That we turn improvements into oranges as well

I would like to know if this as a hack would be useful to many.  Quite possibly 
there are other ways to solve this problem in a way that isn't so hacky, please 
let us know.  I believe in the longer term (2-6 months) we could have a view on 
treeherder that does a lot of this for us.  

Thanks for reading- lets make things more useful!
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Updating the policy for Talos performance regression in 2015

2014-12-18 Thread jmaher
Mozilla - 2015 Talos performance regression policy

Over the last year and a half the Talos tests have been rewritten to be more 
useful and meaningful.  This means we need to take them seriously and cannot 
just ignore real issues when we don't have time.  This does not mean we need to 
fix or backout every changeset that caused a regression.

Starting in 2015, when a regression is identified to be related to a specific 
changeset, the patch author will be ask for information via the needinfo flag.  
We expect a response and reasonable dialog within 72 hours (3 business days) of 
requesting information.  If no response is given we will backout the patch(es) 
in question and the patch author can investigate when they have time and reland.

Some requirements before requesting needinfo:
* On integration branches (higher volume), a talos sheriff will have verified 
the root cause within 1 week of the patch landing
* a patch or set of patches from a bug must be identified as the root cause.  
This can take place through retriggers on the tree or in the case of many 
patches landing at once this would take place through a push to try backing out 
the suspected patch(es)
* links in the bug to document the regression (and any related 
regressions/improvements)
* if we are confident this is the root cause and it meets a 3% regression 
threshold, then the needinfo request will mention that this policy will be 
enforced

Acceptable outcomes:
* A promise to attempt a fix at the bug is agreed upon, the bug is assigned to 
someone and put in a queue.
* The bug will contain enough details and evidence to support accepting this 
regression, we will mark it as wontfix
* It is agreed that this should be backed out

Other scenarios:
* A bug related to the alert is not filed within 1 week of the patch landing.  
This removes the urgency and required action.
* We only caught a regression at uplift time.  There is a chance this isn't 
easily determined, this will be documented and identified patch authors will 
use their judgement to fix the bug
* Regression is unrelated to code (say pgo issue) - this should be documented 
in the bug and closed as wontfix.
* When we uplift to Aurora or Beta, all regressions filed before the uplift 
that show up on the upstream branch will have a needinfo flag set and require 
action to be taken.


Please take a moment to look over this and outline any concerns you might have.

Thanks,
Joel
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Updating the policy for Talos performance regression in 2015

2014-12-19 Thread jmaher
Great questions folks.

:bsmedberg has answered the questions quite well, let me elaborate:
Before a bug can be marked as resolved:fixed we need to verify the regression 
is actually fixed.  In many cases we will fix a large portion of the regression 
and accept the small remainder.

We do keep track of all the bugs filed per version (firefox 36 example: 
https://bugzilla.mozilla.org/show_bug.cgi?id=1084461)

these get looked at more specifically during each uplift.

I will update the verbage next week to call out how these will be followed up 
and posted to:
https://www.mozilla.org/hacking/regression-policy.html

Do speak up if this should be posted elsewhere or linked from a specific 
location.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Talos on e10s is broken. Do we care?

2015-02-19 Thread jmaher
Right now we are running jobs on linux/windows/osx and they are all timing out. 
 That means we are eating up machine time for roughly an hour a job when most 
jobs take 20 minutes.  

By the end of the month we should either take a larger hit on our 
infrastructure and get these running on integration branches (which requires 
fixing them), or turn them off until we can get the resources to fix these.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


reducing the jobs we run by default on fx-team and mozilla-inbound

2015-04-06 Thread jmaher
A couple months ago I had posted about a project I worked on called SETA:
https://elvis314.wordpress.com/2015/02/06/seta-search-for-extraneous-test-automation/

This has been rolling now for a few months and it helps show us the minimum set 
of jobs to run to detect all the failures we find.

This week we will go live with a snapshot of this data to adjust our default 
builders on mozilla-inbound and fx-team.  This means we won't see the full set 
of jobs, but a reduced set.  The other jobs will be scheduled, but we will 
force them to be coalesced (as we already do for many debug jobs and other 
platforms).  In fact this replaces the other bits that were forced coalescing 
and just uses data driven information to do the forced coalescing.

In the near future we will have this updated more automatically in our configs 
and consider doing this for android and b2g jobs as well.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: what is new in talos, what is coming up

2015-06-03 Thread jmaher
Things are still changing with Talos- many things are becoming easier, while 
others still have kinks to work out- here are a few things which have changed 
recently:

1) Android talos has been reduced to what is useful- it should run faster and 
sets us up for migrating to autophone next quarter
2) compare-talos is in perfherder 
(https://treeherder.mozilla.org/perf.html#/comparechooser), other instances of 
compare-talos have a warning message at the top indicating you should use 
perfherder.  We will deprecate those instances of compare-talos next quarter 
completely.
3) datazilla no longer collects talos data.  This has been stopped on all 
branches we care about and we will be turning datazilla off completely next 
month!
4) talos counters are now streamlined a bit and showing up in perfherder 
(https://treeherder.mozilla.org/perf.html#/graphs)
5) compare view in perfherder has a more realistic comparison algorithm to 
point out regressions/improvements.

upcoming work:
1) finish evaluating talos counters - collect only what is useful
2) continue polishing perfherder graphs, compare-view
3) start generating alerts from perfherder (in parallel to graph server)
4) document and work on a smoother method for getting enough data and comparing 
data points on try pushes, regressions on the tree, and running tests locally.

A lot of good ideas come in from various folks (usually folks who are 
investigating a regression or worried about a change they are making).  While 
in Whistler, we would love to show folks how these tools currently work, walk 
through the end game and ideal use cases, and brainstorm on things we are 
overlooking.

If you are interested, do look for a performance discussion on the schedule.  
It has yet to be scheduled.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: what is new in talos, what is coming up

2015-06-05 Thread jmaher

 
  can we keep the
  snarkfest version running please until this is resolved?
 
 My main concern was because I inferred from the previous post that
 deprecation of snarkfest was scheduled on a timeline basis.
 
 Can we instead schedule on a when-its-ready basis, please?
 

yes- we are not doing this on a time basis- datazilla deprecation is on a time 
schedule- the rest is not.  Most likely this is an August/September thing- 
ideally within the next 4-6 weeks you will be using Perfherder happily for 
everything!

Thanks for bringing up suggestions and using the new tools.  
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: New policy: 48-hour backouts for major Talos regressions

2015-08-15 Thread jmaher
I did see the ts, paint regression.  This happened on 4 different platforms and 
was backed out for telemetry issues about 5 pushes later:
http://hg.mozilla.org/integration/mozilla-inbound/rev/1190bc7b862d

and the backout:
http://hg.mozilla.org/integration/mozilla-inbound/rev/59ad2812d3c7

By the time we get the alert (about 2 hours later), we would have seen the 
backout and looking at the raw data it would have been clear it was related to 
the patch which was backed out.  In this case, there would be no need to file a 
bug and pester folks.

I guess if you have questions about a specific email or alert ask in 
#developers or #perf.

We get over 1000 alerts/month these days- It is unrealistic to comment on every 
alert, but reasonable to sanity check them and ensure we are filing bugs for 
the real regressions.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: what is new in talos, what is coming up

2015-07-27 Thread jmaher
It has been a while since we posted an update on Talos

Here are some new things:
* bug 1166132 - new tps test - tab switching
* e10s on all platforms, only runs on mozilla-central for pgo builds, broken 
tests, big regressions are tracked in bug 1144120
* perfherder is easier to use, some polish on test selection and the compare 
view, and most importantly we have found a few odd bugs that has caused 
duplicate data to show up, check it out: 
https://treeherder.mozilla.org/perf.html#/graphs

Here is what is upcoming:
* moving talos source code in-tree (bug 787200)
* starting to move android talos to autophone (bug 1170685)
* perfherder: easier to find it when pushing to try and more polish on 
selecting which revisions to compare against.
* automatic 5 retriggers for talos jobs on try server

As always if you have issues you can file bugs:
* talos: 
https://bugzilla.mozilla.org/enter_bug.cgi?product=Testingcomponent=Talos
* perfherder: 
https://bugzilla.mozilla.org/enter_bug.cgi?product=Tree%20Managementcomponent=Perfherder

Thanks for responding to regressions when pinged!  Expect another update 
sometime in late August or early September.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


what is new with Talos - September 2015 edition

2015-09-10 Thread jmaher
The last update was in late July:
https://groups.google.com/forum/#!topic/mozilla.dev.platform/PaJFBtvc3Vg

While we have no new tests, I would like to highlight a few changes:
* talos now lives in mozilla-central: testing/talos/.  Thanks to :parkouss our 
fearless contributor who tackled this large project.
* e10s talos is now run on all pushes
* you can now run e10s talos from try and select it logically from 
http://trychooser.pub.build.mozilla.org/
* A lot of updates are on the talos wiki: 
https://wiki.mozilla.org/Buildbot/Talos/Tests (and more upcoming)
* Perfherder compare view doesn't show osx 10.10.  We are starting to look into 
why that platform is an expensive random number generator in bug 1191019
* dromaeo_dom is turned off everywhere (but not for long)

Upcoming work:
* continue to edit the wiki: https://wiki.mozilla.org/Buildbot/Talos/Tests
* investigate noisy tests in bug 1201230 and osx specific in bug 1191019
* turn dromaeo_dom on for linux in bug 1191952
* use a python webserver instead of apache for production talos: bug 1195288
* look into making |mach talos| friendlier now that we have talos in tree
* alerts generated inside of perfherder (currently planned to be in parallel to 
graph server)
* take advantage of other shared code now that we live in tree
* start scheduling Android talos tests on Autophone (different system and 
reporting as Tier 2 or 3 in treeherder- i.e. not default view)


A lot of big hurdles are behind us but there are many more big steps to take.  
A previous topic was discussing the 48 hour backout policy- A lot of work has 
been done to validate the data, we are still double checking things and will 
post before making it 100% live.

As always, feedback is welcome!

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Considering dropping Talos support for Linux32

2015-09-24 Thread jmaher
Our infrastructure at Mozilla does a great job of supporting all the jobs we 
need to run jobs for the thousands of pushes we do each month.  Currently Talos 
runs on real hardware (and we have no plans to change that), but it does mean 
that we have a limited pool of available machines.  Right now this isn't a 
problem for Linux 32 or 64 since we don't run any other jobs on those platforms.

The problem we do have is on OSX and Windows, and in the last 2 weeks we have 
had a big problem with backlog on Windows.  The main reason we have a problem 
on OSX and Windows is because we run all the unit tests on there as well.  
Granted we have a larger pool of machines, but we run a considerably larger 
volume of tests on there.

Trying to be smart about what we are doing, bug 1204920 [1] was filed to look 
into what would happen if we stopped running Talos on Linux32.  We would still 
have OSX, Windows, and Linux64 support and from looking at all the data for the 
last 90 days, there are very few minor differences between 32 and 64 when it 
comes to catching regressions.

After looking into this, we realized that we could reimage the linux32 machines 
as windows machines- this would then solve our backlog and give us some 
breathing room on capacity until we find other ways to reduce the load or make 
a more formal decision to increase the machine pool.

Sadly there are really no plans to formally add back Linux32 support.  What 
does Linux32 give us that Linux64 doesn't when it comes to Talos results?  I am 
looking at this from a narrow lens, quite possibly someone else has ideas of 
what might be more useful.  Overall we are serious about doing this, but want 
to do it knowing more so at what cost.

Thanks for reading and fixing Performance regressions when they show up in your 
patches!


[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1204920
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


any concerns with dropping the talos test v8 and using AWFY Octane instead?

2016-01-11 Thread jmaher
Currently we run a very outdated version of V8 (version 7) in Talos.  This has 
since been replaced with Octane in the world of benchmarks.

AWFY (arewefastyet.com), has been running Octane and catching regressions 
faster than Talos.  There is missing coverage in AWFY, specifically e10s, pgo, 
aurora/beta.  There are plans to add coverage for this in Q1.

A main reason for pushing to turn off V8, is that the benchmark is outdated and 
regressions could not be the most useful use of developers time if there is a 
regression only seen on V8 instead of Octane.  While this does point out that 
we are leaning towards building performance for a specific benchmark and 
ignoring other tests, we could argue that is what we should be doing.  

The reason I am posting here is to find out if there are reasons we should keep 
v8 running in Talos.  We still plan to turn it off once AWFY coverage matches 
the coverage of Talos V8.

You can reference bug 1174671 for some history.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: any concerns with dropping the talos test v8 and using AWFY Octane instead?

2016-01-19 Thread jmaher
> >> It seems like another alternative might be to run Octane in Talos,
> >> instead of v8_7.
> >>
> >> It seems like Talos has two advantages over AWFY (correct me if I'm wrong):
> >>
> >> 1. Easy for developers to schedule jobs via try (maybe less of a concern
> >> with a benchmark like this, where I suspect results are more
> >> reproducible locally?)
> >
> > I believe there was talk of adding try support for AWFY (there already is 
> > for AWSY). Of course that's not actually done yet, I just want to point out 
> > it's not particularly hard and AWSY's version could be adapted rather 
> > easily.
> >

Running Octane in Talos would be useful, but we would be duplicating efforts.  
While that is not a bad thing, will we get value from it.  We do get value in 
self serving support with try and tools like mozci for backfilling and 
retriggering.  The issue in the bug also points out that the developers who 
care about Octane use AWFY and already detect these regressions before talos 
does (or at least a sheriff finds it and files a bug).  This specific topic of 
upgrading V8 to Octane for Talos should be discussed in bug 1174671.


> Talos already runs on non-virtualized hardware. I don't see any inherent 
> reason we couldn't rework AWSY as a Talos test. In general it feels to 
> me like we should be running performance tests on relops-supported 
> infrastructure where possible, as opposed to adhoc systems.

I would agree that the more we can run on managed systems the better.  While 
all Talos jobs are run on non-virtualized hardware today, we do run on a shared 
pool of hardware with the unittests.  One difference between AWFY and Talos is 
that the numbers are so much more stable, even in the browser version (AWFY 
runs a js shell as well as a browser).  I believe this is attributed to the 
type of hardware, the environment, or the fact that a specific test is run on a 
specific machine.

> > In general it would be great if we could consolidate the various perf tests 
> > (AWFY, AWSY, Talos, Raptor, etc) under one umbrella (at least from an end 
> > user perspective). So you could go to trychooser and choose a "Perf" option 
> > that would have various subsets like: "JS Engine", "Memory Usage", "Layout 
> > Latency", "Mobile Launch Time", etc.
> >

This is a worthwhile goal- Simplifying the interface to the tools over the next 
few quarters to allow for common sheriffing, and self serving will make big 
strides.

> 
> 1. It assumes that all test machines of a particular class will be 
> uniform, at least per test. For example, Autophone tracks the 
> performance of something like 9 different Android devices seperately 
> (see: http://phonedash.mozilla.org/) -- that's not something Perfherder 
> was designed to do.

As mentioned earlier in this comment AWFY runs the same test on the same 
machine- the numbers are more reliable, but there is no further evidence that 
is the cause of the noise in Talos, I suspect it is a factor.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Moving FirefoxOS into Tier 3 support

2016-01-26 Thread jmaher
> Same here. I'm now unable to land the gecko dependencies for bug 1227980
> [1] and that's annoying to say the least considering it took a month to
> write and test that thing (not even mentioning the time spent by the
> many reviewers and QA people involved). What are we supposed to do with
> those kind of patches which are b2g-only? Shall we land them ourselves
> on mozilla-central?

I would assume all gecko patches would get landed on either mozilla-inbound or 
fx-team.  If you use mozreview and take advantage of the autoland feature, then 
these specific patches will land on mozilla-inbound.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Proposed changes to Talos (performance) alerting

2016-01-19 Thread jmaher
> 
> This is just a raw idea, but maybe this would make more sense to provide a 
> diff of profiles, and show what decreased / increased.  At least this would 
> make these benchmarks less obscure.
> 
Pushing a before/after patch to try with profiling (note the numbers are not 
useful) can be done and there is a simple checkbox on trychooser.  I am not 
familiar with any diff tools and would wonder how much noise would show up.  It 
does seem worthwhile.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


b2g builds on trunk- perma failing for weeks?

2016-03-21 Thread jmaher
I have noticed that since March 4th, the b2g builds on m-c are perma failing.  
Doing some basic searching we have about 15 hours of machine time going to 
failed builds on every push (which is ~1350 builds or 20,250 machine hours).

These show up in mozreview as failed jobs when you autoland a push.  I assume 
we plan to get all of these builds green and tests green, otherwise we wouldn't 
keep them running on every push for inbound/fx-team/central.

Do we need to keep these running on inbound and fx-team, or can they only run 
on mozilla-central?  I assume somebody is working on getting the builds to 
green up, could we be made aware of the work done here (maybe a tracking bug) 
so it doesn't seem that we are just letting builds run because we can?
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


planning to turn off talos posting to graphs.mozilla.org

2016-03-01 Thread jmaher
Historically talos has posted to graphs.mozilla.org.  In fact, 
graphs.mozilla.org collects the summarized data from Talos and generates alerts 
which are posted to mozilla.dev.tree-alerts.

Over the last 6 months we have been collecting the all the summarized data and 
subtest data inside of perfherder (https://treeherder.mozilla.org/perf.html#).  
Last quarter we started generating alerts from there and managing them inside a 
dashboard for perfherder: https://treeherder.mozilla.org/perf.html#/alerts.

As we now have confidence in Perfherder catching regressions from Talos, we 
would like to turn off alert generation and data uploading from Talos to 
graphserver.  If there are other tools using graph server for data, then we 
need to find those and either update them or remove them.

The one big advantage in doing this now, is that we don't have to add custom 
database entries to the graph server database anymore when we change a test or 
add a new one.

Please chime in with any concerns, I would like to make this change this week 
so we can ride the trains to Aurora.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Tier-1 for Fennec api-15 Debug builds in TaskCluster at end of week (July 1st)

2016-06-29 Thread jmaher
On Wednesday, June 29, 2016 at 7:37:51 PM UTC+3, jl...@mozilla.com wrote:
> On Tuesday, June 28, 2016 at 2:35:22 PM UTC-7, David Baron wrote:
> 
> > 
> > Why is it vital to opt jobs but less so for debug jobs?
> 
> you're right. I worded this poorly. I mean more that opt builds are part of 
> the critical path: they are the actual builds that we promote to releases and 
> therefore more important.
> 
> it sounds like you personally don't mind triggering new jobs missing on 
> fennec debug for a few weeks. My hope is that others feel the same.

I would prefer to push this out a week.  We just got builds running on 
inbound/fx-team this week, and there are downstream consumers that depend on 
buildbot builds which need to be updated (and couldn't until this week at the 
earliest).

I would also like to make sure that failures from builds and tests are 
understandable to the sheriffs.

July 11th seems like a better candidate to switch so we can find other issues 
which might be related.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Project Stockwell - February 2017 update

2017-02-07 Thread jmaher
This is the second update of project stockwell (first update: 
https://goo.gl/1X31t8).

This month we will be recommending and asking that intermittent failures that 
occur >=30 times/week be resolved within 2 weeks of triaging them.

Yesterday we had these stats:
Orange Factor: 10.75 (https://goo.gl/qvFbeB)
count(high_frequency_bugs): 61

Last month we had these stats:
Orange Factor: 13.76 (https://goo.gl/o5XOof)
count(high_frequency_bugs): 42

For more details of the bugs and what we are working on, you can read more on 
this recent blog post:
https://elvis314.wordpress.com/2017/02/07/project-stockwell-february-2017/

Thanks for helping out with intermittent test failures when we ping you about 
them!
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


upcoming meeting announcement - intermittent orange hacking

2016-10-03 Thread jmaher
Every 2 weeks on Friday at 9 PDT [1] we will be hosting a meeting to discuss 
intermittent oranges.  The format will be a chance for people with previous 
topics to relate status and findings, then the rest of the meeting discuss and 
surface ideas to consider investigating more.

Our topics for the October 7th meeting are:
* [jmaher] defining intermittent oranges (are there logical categories we can 
break failures into?)
* [gbrown] disabling unreliable tests (sheriffs point of view vs developers 
point of view, and what makes sense to ship a quality product)

Please consider joining us or reaching out to us to give input, share ideas and 
your perspective.

I do not plan to post ahead of every meeting, but I will mention in November 
where we are as well as options to learn more at an upcoming work week.

[1] https://wiki.mozilla.org/Auto-tools/Projects/Stockwell/Meetings
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Upcoming intermittent orange hacking meeting Tuesday November 8th

2016-11-04 Thread jmaher
On Tuesday, November 8th we will be holding another intermittent orange hacking 
meeting at 08:30 PDT:
https://wiki.mozilla.org/Auto-tools/Projects/Stockwell/Meetings

This week we will discuss:
* Triaging intermittents via OrangeFactor
* What makes up a good or a bad test case?

The wiki with full notes is here:
https://wiki.mozilla.org/EngineeringProductivity/Projects/Stockwell/Meetings/2016-11-08#Discussion_Topics

Please feel free to attend the meeting or provide feedback/questions in this 
post, the above mentioned wiki, on irc, or email me directly.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Project Stockwell January update

2017-01-10 Thread jmaher
Every month I will be posting here to dev.platform with a summary of how we are 
doing with intermittents.

Yesterday we had:
* Orange Factor of 13.76
* count(high_frequency_bugs): 42

I posted some notes to what some of the common intermittents are as well as 
projects which are active:
https://elvis314.wordpress.com/2017/01/09/project-stockwell-january-2016/

Feel free to join our next fortnightly meeting next Tuesday (January 17th) 
@08:30 PDT in the jmaher vidyo room.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: unowned module: Firefox::New Tab Page, help me find an owner

2017-03-22 Thread jmaher
On Wednesday, March 22, 2017 at 9:35:35 AM UTC-4, Ben Kelly wrote:
> On Wed, Mar 22, 2017 at 9:22 AM,  wrote:
> 
> > I have not been able to find an owner for the Firefox::New Tab Page
> > bugzilla component (bug 1346908).  There are 35 tests in the tree and
> > without anyone to assume responsibility for them when they are intermittent
> > (bug 1338848), I plan to delete them all if I cannot get an owner by the
> > end of the month including someone who will sign up to be the triage owner
> > for the bugzilla component.
> >
> 
> You plan to delete all the tests?  This seems somewhat extreme for a
> shipped feature.  Why not disable just the tests that are intermittent?

I agree that does sound extreme, but if nobody cares about the tests or will 
accept responsibility if we have failures, then they have no real value.  I 
assume they have value and this is why I am asking for help to find someone who 
cares about the code and will take responsibility for the feature and related 
tests.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


unowned module: Firefox::New Tab Page, help me find an owner

2017-03-22 Thread jmaher
I have not been able to find an owner for the Firefox::New Tab Page bugzilla 
component (bug 1346908).  There are 35 tests in the tree and without anyone to 
assume responsibility for them when they are intermittent (bug 1338848), I plan 
to delete them all if I cannot get an owner by the end of the month including 
someone who will sign up to be the triage owner for the bugzilla component.

Thanks for helping me find owners for tests!
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Project Stockwell (reducing intermittents) - April 2017 update

2017-04-11 Thread jmaher
I wanted to give an update on our work for reducing intermittents.  Last month 
when we posted there was a lot of discussion and concerns around disabling 
tests, likewise some of the terminology and process.

I have outlined many pieces of data about the progress and rates of bugs and 
disabling tests in my blog post earlier today:
https://elvis314.wordpress.com/2017/04/11/project-stockwell-reduce-intermittents-april-2017/

Given the push for disabling more tests, we ended up disabling 30 tests, but 
fixing >70- so a much higher percentage of tests are being fixed than previous 
months!

Overall, the Orange Factor is hanging out around 10.0, but we are adding a lot 
of new tests and test jobs- so this is great news.  While we would have like to 
end Q1 with something closer to 8.0, we remain optimistic that we can still 
make a great difference going forward and continue to drive this down.  Our 
focus will be on proving our process, and working towards shifting this to be 
easier to self manage for dev teams and other sheriffs.

Thanks for fixing so many intermittent failures!
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Project Stockwell (reducing intermittents) - March 2017 update

2017-03-09 Thread jmaher
A lot of great discussion here, thanks everyone for taking some time out of 
your day to weigh in on this subject.  There are slight differences between a 
bug being filed and actively working on the bug once it crosses our threshold 
of 30 failures/week- I want to discuss when we have looked at the bug and have 
tried to add context/value, including a ni? request.

Let me comment on a few items here:
1) BUG_COMPONENT, I have been working this quarter to get this completed 
in-tree (https://bugzilla.mozilla.org/show_bug.cgi?id=1328351 ).  Ideally the 
sheriffs and Bug Filer tools will use this, we can work to fix that.  Part of 
this is ensuring there is an active triager responsible for those components, 
that is mostly done: 
https://bugzilla.mozilla.org/page.cgi?id=triage_owners.html.

2) how do we get the right people to see the bugs?  We will always ni? the 
triage owner unless we know a better person to send a ni? request to.  In many 
cases we determine a specific patch causes a regression, we will also include 
the patch author with a ni? and reviewer as a cc on the bug in those cases.  
Please watch your components in bugzilla and keep your bugzilla handle updated 
with PTO.

3) to the point of not clearing the ni? on a bug where we disable the test 
case, that is easy to do, lets assume that is standard protocol if we are 
disabling a test (or hacking up the test case)

4) more granular whiteboard tags, and ones that don't use stockwell.  We will 
figure out the right naming, right now it most likely will be extra tags to 
track when we fixed a disabled test as well as differentiating between test 
fixes and product fixes.

5) When we triage a bug (initial investigation after crossing 30 
failures/week), we will include a brief report of the configuration affected 
the most along with the number of failures, number of runs, and the percentage 
failure.  This will be retrieved by using |mach test-info | (bug 1345572 
for more info) and will look similar to this:
Total: 307 failures in 4313 runs or 0.071 failures/run
Worst rate on linux32/debug-e10s: 73 failures in 119 runs or 0.613 failures/run

6) using a different metric/threshold for investigating a bug.  We looked at 6 
months of data from 2016 to come up with this number.  Assuming we fixed all of 
the bugs that are high frequency, Orange Factor would still be 4.78 (as of 
Monday) which is still unacceptable, we are only interested in investigating 
tests that have the highest chance of getting fixed or cause the most pain, not 
just whatever is top 10 or relatively high.  My goal is to adjust the threshold 
down to 20 in the future- that might not be as realistic as I would hope in the 
short term.

Keep in mind sheriffs are human, they make mistakes (filing bugs wrong, ni? the 
wrong person, etc.) but they are also flexible and will work with you to help 
get more information or help manage a larger volume of failures and allowing 
extra time if you are actively debugging the problem.

Thanks for the many encouraging comments in this thread and suggestions of how 
to work out the quirks with this new process.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Project Stockwell (reducing intermittents) - March 2017 update

2017-03-07 Thread jmaher
On Tuesday, March 7, 2017 at 2:57:21 PM UTC-5, Steve Fink wrote:
> On 03/07/2017 11:34 AM, Joel Maher wrote:
> > Good suggestion here- I have seen so many cases where a simple
> > fix/disabled/unknown/needswork just do not describe it.  Let me work on a
> > few new tags given that we have 248 bugs to date.
> >
> > I am thinking maybe [stockwell turnedoff] - where the job is turned off- we
> > could also ensure one of the last comments indicates this.
> >
> > also [stockwell fix] -> [stockwell testfix], [stockwell bandaid] (for those
> > requestLongerTimeouts(), etc.), [stockwell productfix], and [stockwell
> > reneabled].
> 
> Forgive the bikeshedding, but my kneejerk reaction to these is to wonder 
> whether it's a good idea to use the "stockwell" jargon. It would be a 
> lot easier for people unfamiliar with the stockwell project if these 
> were [intermittent turnedoff], [intermittent fix], etc. Perhaps it's too 
> late, but is that a possibility?

I think that is valid, thanks for bringing that up!
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Sheriff Highlights and Summary in February 2017

2017-03-11 Thread jmaher
On Friday, March 10, 2017 at 3:46:14 PM UTC-5, Kris Maglione wrote:
> On Fri, Mar 10, 2017 at 01:55:40PM +, David Burns wrote:
> >I went back and did some checks with autoland to servo and the results are
> >negligible. So from 01 February 2017 to 10 March 2017 (as of sending this
> >email). I have removed merge commits from the numbers.
> >
> >Autoland:
> >Total Servo Sync Pushes: 152
> >Total Pushes: 1823
> >Total Backouts: 144
> >Percentage of backouts: 7.8990674712
> >Percentage of backouts without Servo: 8.61759425494
> >
> >Mozilla-Inbound:
> >Total Pushes: 1472
> >Total Backouts: 166
> >Percentage of backouts: 11.277173913
> 
> Is there any way you can get these numbers in terms of patches, rather than 
> pushes? Or, ideally, in terms of bugs landed and backed out? Pushes to 
> inbound 
> still often have patches for more than one bug, so if 4 bugs bets pushed to 
> inbound in one push, and 4 land on autoland as separate pushes, and one gets 
> backed out from each branch, the comparison isn't very useful.

I have been asking the same question and from some initial data, it looks like 
we would have 2075 bugs changed with 166 backouts, or a 8.0% backout rate.  I 
think David is working on validating this data, but to me it shows that code 
quality is the same between branches, we are just landing more bugs on inbound.

There is more guess work for the sheriffs when it comes to backing out things 
with multibug pushes, I would be curious how many times (in the last few 
months) we have had a backout on inbound that didn't fix the problem due to 
multi bug pushes.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Project Stockwell (reducing intermittents) - March 2017 update

2017-03-07 Thread jmaher
In recent months we have been triaging high frequency (>=30 times/week) 
failures in automated tests.  We find that we are fixing 35% of the bugs and 
disabling 23% of them.

The great news is we are fixing many of the issues.  The sad news is we are 
disabling tests, but usually only after giving a bug 2+ weeks of time to get 
fixed.

In March, we want to find a way to disable the teststhat are causing the most 
pain or are most likely not to be fixed, without unduly jeopardizing the chance 
that these bugs will be fixed.  We propose:
1) all high frequency (>=30/week) intermittent failure bugs will have 2 weeks 
from initial triage to get fixed, otherwise we will disable the test case.
2) all very high frequency bugs (>=75/week) will have 1 week from initial 
triage to get fixed, otherwise we will disable the test case.

We still plan to only pester once/week.  If a test has fallen out of our 
definition of high frequency, we are happy to make a note in the bug and adjust 
expectations.

Since we are changing this, we expect a few more disabled tests, but do not 
expect us to shift the balance of fixed vs disabled.  We also expect our Orange 
Factor to be <7.0 by the end of the month.

Thanks to everyone for their on-going efforts to fix frequent intermittent test 
failures; together we can make test results more reliable and less confusing 
for everyone.

Here is a blog post with more data and information about the project:
https://elvis314.wordpress.com/2017/03/07/project-stockwell-reduce-intermittents-march-2017/
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Project Stockwell (reducing intermittents) - March 2017 update

2017-03-08 Thread jmaher
On Tuesday, March 7, 2017 at 11:45:38 PM UTC-5, Chris Pearce wrote:
> I recommend that instead of classifying intermittents as tests which fail > 
> 30 times per week, to instead classify tests that fail more than some 
> threshold percent as intermittent. Otherwise on a week with lots of checkins, 
> a test which isn't actually a problem could clear the threshold and cause 
> unnecessary work for orange triage people and developers alike.
> 
> The currently published threshold is 8%:
> 
> https://wiki.mozilla.org/Sheriffing/Test_Disabling_Policy#Identifying_problematic_tests
> 
> 8% seems reasonable to me.
> 
> Also, whenever a test is disabled, not only should a bug be filed, but please 
> _please_ need-info the test owner or at least someone on the affected team.
> 
> If a test for a feature is disabled without the maintainer of that feature 
> knowing, then we are flying blind and we are putting the quality of our 
> product at risk.
> 
> 
> cpearce.
>

Thanks cpearce for the concern here.  Regarding disabling tests, all tests that 
we have disabled as part of the stockwell project have started out with a 
triage where we ni the responsible party and the bug is filed in the component 
where the test is associated with.  I assume if the bug is filed in the right 
component others from the team will be made aware of this.  Right now I assume 
the triage owner of a component is the owner of the tests and can proxy the 
request to the correct person on the team (many times the original author is on 
PTO, busy with a project, left the team, etc.).  Please let me know if this is 
a false assumption and what we could do to better get bugs in front of the 
right people.

I agree 8% is a good number, the sheriff policy has other criteria (top 20 on 
orange factor, 100 times/month).  We picked 30 times/week as that is where bugs 
start becoming frequent enough to easily reproduce (locally or on try) and it 
would be reasonable to expect a fix.  There is ambiguity when using a %, on a 
low volume week (as most of december was) we see <500 pushes/week, also the % 
doesn't indicate the amount of times the test was run- this is affected by SETA 
(reducing tests for 4/5 commits to save on load) and by people doing 
retriggers/backfills.  If last week the test was 8%, and it is 7% this week- do 
we ignore it?

Picking a single number like 30 times/7days removes ambiguity and ensures that 
we can stay focused on things and don't have to worry about recalculations.  It 
is true on lower volume weeks that 30 times/7days doesn't happen as frequently, 
yet we have always had many bugs to work on with that threshold.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Project Stockwell (reducing intermittents) - March 2017 update

2017-03-07 Thread jmaher
On Tuesday, March 7, 2017 at 1:53:48 PM UTC-5, Marco Bonardo wrote:
> On Tue, Mar 7, 2017 at 6:42 PM, Joel Maher  wrote:
> 
> > Thank for pointing that out.  In some cases we have fixed tests that are
> > just timing out, in a few cases we disable because the test typically runs
> > much faster (i.e. <15 seconds) and is hanging/timing out.  In other cases
> > extending the timeout doesn't help (i.e. a hang/timeout).
> >
> 
> Any failure like "This test exceeded the timeout threshold. It should be
> rewritten or split up. If that's not possible, use requestLongerTimeout(N),
> but only as a last resort" is not a failure nor a timeout.
> For these cases extending the timeout will 100% solve the failure, but it
> can't be considered a long term fix since we should not have single tests
> so complex to take minutes to run.

Thanks for checking up on this- there are 6 specific bugs that have this 
signature in the disabled set- in this case they are all linux32-debug devtools 
tests- we disabled devtools on linux32-debug because the runtime was exceeding 
in many cases 90 seconds for a test (that is after adding requestLongerTimeout) 
where on linux64-debug it was <30 seconds.  There were almost no users on 
linux32-debug for devtools, so we found it easier to disable the entire suite 
of tests to save runtime, sheriff time, etc.  This was done in bug 1328915.

I would encourage you to look through the [stockwell fixed] whiteboard tag and 
see many examples of great fixes by many developers.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Project Stockwell (reducing intermittents) - March 2017 update

2017-03-07 Thread jmaher
On Tuesday, March 7, 2017 at 1:59:14 PM UTC-5, Steve Fink wrote:
> Is there a mechanism in place to detect when disabled intermittent tests 
> have been fixed?
> 
> eg, every so often you could rerun disabled tests individually a bunch 
> of times. Or if you can distinguish which tests are failing, run them 
> all a bunch of times and pick apart the wreckage to see which ones are 
> now consistently passing. I'm not suggesting those, just using them as 
> example solutions to illustrate what I mean.
> 
> On 03/07/2017 10:33 AM, Honza Bambas wrote:
> > I presume that when a test is disabled a bug is filed and triaged 
> > within the responsible team as any regular bug.  Only that way we 
> > don't forget and push on fixing it and returning back to the wheel.
> >
> > Are there also some data or stats how often tests having a strong 
> > orange factor catch actual regressions?  I.e. fail a different way 
> > than the filed "intermittent" one and uncover a real bug leading to a 
> > patch back out or filing of a regular functionality regression bug.  
> > If that number is found to be high(ish) for a test, the priority of 
> > fixing it after its disabling should be raised.
> >
> > -hb-
> >
> >

I am happy to see the discussion here.  Overall, we do not have data to 
indicate whether we are fixing a bug in the product patching a test.  I agree 
we should track that and I will try to do that going forward.  I recall 1 case 
of that happening this quarter, I suspect there are others.

Most of the disabled tests are on bugs marked leave-open and have the relevant 
developers on the bug- what value would a new bug bring?  If it would be 
better, i am happy to create a new bug.

I have seen 1 bug get fixed after being disabled, but that is it for this 
quarter.  Possibly there are others, but it is hard to know.  If we follow the 
tree rules for visibility, many of the jobs would be hidden and we would have 
no value from them.  

I think running the tests that are disabled on try once in a while seems 
useful, one could argue that is the role of the people who own the test- 
possibly we could make it easier to do this?  I could see adding a |tag = 
disabled| and running the disabled tests x20 or something in a nightly M(d) job 
or something to indicate it is disabled.  If we were to do that, who would look 
at the results and how would we get that information to all of the teams who 
care about the tests?
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: disabled non-e10s tests on trunk

2017-08-15 Thread jmaher
Thanks everyone for commenting on this thread.  As a note, we run many tests in 
non-e10s mode:
* android mochitest, reftest, crashtest, marionette,
* mochitest-chrome
* xpcshell
* gtest/cpptest/jittest
* mochitest-a11y
* mochitest-jetpack (very few tests remain)

While there are many tests which individually are disabled or lacking coverage, 
these test suites have no non-e10s coverage:
* web-platform-tests
* browser-chrome
* devtools
* jsreftests
* mochitest-webgl, mochitest-gpu, mochitest-media
* reftest un-accel

I would propose running these above tests on windows7-opt (we don't have these 
running yet in windows 10, although we are close), and only running specific 
tests which are not run in e10s mode, turning them off December 29th, 2017.

Keep in mind we have had many years to get all our tests running in e10s mode 
and we have known since last year that Firefox 57 would be the first release 
that we ship e10s by default to our users- my proposal is a 4 month temporary 
measure to let test owners finish ensuring their tests are running in e10s mode.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


disabled non-e10s tests on trunk

2017-08-08 Thread jmaher
As Firefox 57 is on trunk, we are shipping e10s by default.  This means that 
our primary support is for e10s.  As part of this, there is little to no need 
to run duplicated tests in non-e10s and e10s mode.  

In bug 1386689, we have turned them off.  There was some surprise in doing this 
and some valid concerns expressed in comments in the bug.  Given that, I 
thought we should bring this information to a wider audience on dev.platform so 
more developers are aware of the change.

While we get some advantages to not running duplicated tests (faster try 
results, less backlogs, fewer intermittent failures) there might be compelling 
reasons to run some tests in e10s based on specific coverage.  With that said, 
I would like to say that any tests we turn on as non-e10s must have a clearly 
marked end date- as in this is only a temporary measure while we schedule work 
in to gain this coverage fully with e10s tests.

Also keep in mind that most if not all of the CI/scheduling/admin benefits of 
turning off the tests are already accounted for with the new stylo tests we are 
running in parallel on osx and windows.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: disabled non-e10s tests on trunk

2017-08-18 Thread jmaher
Yesterday I landed bug 1391371 which enabled non-e10s unittests on windows 7 
debug.  A few tests had to be disabled in order to do this.  To keep track of 
what we did and each of the test suites to evaluate, I have filed bug 1391350.

As a note we now have the following non-e10s tests running:
windows7-debug: all trunk branches
android opt/debug: all trunk branches (existing media, plain, reftest)
linux64-jsdcov: mozilla-central (mochitest-plain/browser-chrome/devtools)
** this is a linux64 opt build and we use the jsdebugger to collect code 
coverage metrics- but we specifically run this in non-e10s mode.

Please let me know if there are large areas that we have overlooked.

Thanks!
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: disabled non-e10s tests on trunk

2017-08-16 Thread jmaher
On Wednesday, August 16, 2017 at 4:03:20 PM UTC-4, Nils Ohlmeier wrote:
> > On Aug 16, 2017, at 07:23, James Graham  wrote:
> > 
> > On 16/08/17 01:26, Nils Ohlmeier wrote:
> >> I guess not a lot of people are aware of it, but for WebRTC we still have 
> >> two distinct implementations for the networking code.
> >> So if I understand the impact here right we just lost test coverage for 
> >> probably a couple of thousand lines of code.
> > […]
> > 
> >> I’m not sure how others do it, but our low level C++ unit tests don’t have 
> >> an e10s mode at all.
> >> Therefore we can’t simply delete the non-e10s WebRTC networking code 
> >> either (without loosing a ton of test coverage).
> > 
> > If the networking code is only covered by C++ unit tests, there is separate 
> > code for non-e10s vs e10s,  and the unit tests don't work in e10s mode 
> > doesn't that mean we currently don't have any test coverage for our 
> > shipping configuration on desktop? What am I missing?
> 
> So we have mochitest-media which works as kind of integration test on a 
> higher level. They execute high level JS API tests, but also try to ensure 
> that the lower level networking pieces (the once which are exposed through 
> JS) match the expectations.
> The mochitest-media got executed for e10s and non-e10s and therefore covered 
> both implementations.
> 
> And then we have C++ unit tests, which cover a lot more corner cases of 
> different scenarios for networking. And yes these only work with non-e10s 
> right now. It would be a lot of work to create the same amount of tests with 
> a higher level test suite like mochitest to get the e10s coverage. Plus these 
> tests would probably take a lot execution time.
> 
> Technically that leaves us with a somewhat blind spot for the coverage of 
> networking corner cases under e10s. I guess if there is a high demand for 
> turning off all non-e10s tests we need to look at how to get our C++ unit 
> tests working with something like e10s.
> But until we can get to that I think we really should keep running 
> mochitest-media with e10s and without it.
> 
> Best
>   Nils Ohlmeier

As a note, C++ tests will continue to run in non-e10s mode (cppunit, gtest)- 
there are no plans to run these as e10s; this is mostly referring to: 
mochitest*, web-platform-test*, reftest*, marionette*
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Improving visibility of compiler warnings

2017-05-19 Thread jmaher

It is great to see a good use for compiler warnings and alerts.  We have added 
a lot of data to perfherder and the build metrics are not covered by any 
sheriffs by default.  For these if it is clear who introduced them, we will 
comment in the bug as a note, but there are no intentions to file bugs for any 
regressions here.

Joel
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Linux builds now default to -O2 instead of -Os

2017-06-06 Thread jmaher
On Tuesday, June 6, 2017 at 3:17:20 PM UTC-4, Ben Kelly wrote:
> On Tue, Jun 6, 2017 at 3:07 PM, Chris Peterson 
> wrote:
> 
> > On 6/6/17 10:33 AM, Boris Zbarsky wrote:
> >
> >> On 6/1/17 9:04 PM, Mike Hommey wrote:
> >>
> >>> Ah, forgot to mention that. No, it doesn't affect *our* shipped builds
> >>> (because PGO uses a different set of optimization flags).
> >>>
> >>> But it does affect downstream builds that don't PGO.
> >>>
> >>
> >> Based on the jump I see on June 2 at https://treeherder.mozilla.org
> >> /perf.html#/graphs?timerange=2592000=%5Bmozilla-
> >> central,80984697abf1f1ff2b058e2d9f0b351fd9d12ad9,1,1%5D&
> >> series=%5Bmozilla-central,ae68c64ef8bfa104fded89971f1c2c6c90
> >> 926dca,1,1%5D=%5Bmozilla-central,dd55da63ebce86ee3867
> >> aa3b39975c2a90869ce2,1,1%5D it affects some of our talos tests too (the
> >> ones running on non-pgo).
> >>
> >
> > We stopped Talos testing of non-e10s builds on May 14, but it looks like
> > we also stopped testing Linux PGO builds on May 15. Is that expected?
> >
> 
> Why did we stop talos testing non-e10s?  Firefox for android is a tier 1
> platform (right?) and uses non-e10s.  Do we have separate fennec talos
> tests somewhere?

We disabled non-e10s talos as we needed bandwidth on our physical hardware to 
standup new tests (web extensions, quantum pageload with https/mitmproxy).

Android is done on Autophone, that is 100% separate and we log the data to 
perfherder (2 regressions were triaged today!)

As for linux64-pgo not having talos data, that was an accident and I filed: 
https://bugzilla.mozilla.org/show_bug.cgi?id=1370663.  Hopefully we can have 
that resolved.  Our focus has been on Windows, so linux/osx has been neglected 
when it comes to the fine tooth comb.  Thanks for bringing this up.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Intermittent oranges and when to disable the related test case - a simplified policy

2017-09-06 Thread jmaher
Over the last 9 months a few of us have really watched intermittent test 
failures almost daily and done a lot to pester people as well as fix many.  
While there are over 420 bugs that have been fixed since the beginning of the 
year, there are half that many (211+) which have been disabled in some form 
(including turning off the jobs).

We don't like to disable and have been pretty relaxed in recommending disabling 
a test.  Overall we have tried to adhere to a policy of:
* >=30 failures/week- ask for owner to look at failure and fix it, if this 
persists for a few weeks with no real traction we would go ahead [and 
recommend] disabling it.
* >= 75 failures/week- ask for people to fix this in a shorter time frame and 
recommend disabling the test in a week or so
* >= 150 failures/week- often just disable the test

This is confusing and hard to manage.  Since then we have started adjusting 
triage queries and some teams are doing their own triage and we are ignoring 
those bugs (while they are getting prioritized properly). 

What we are looking to start doing this month is adopting a simpler policy:
* any bug that has >=200 instances in the last 30 days will be disabled
** this will be a manual process, so it will happen a couple times/week

We expect the outcome of this to be a similar amount of disabling, just an 
easier method for doing so.  It is very possible we might recommend disabling a 
test before it hits the threshold- keep in mind a disabled test is easy to 
re-enable (so feel free to disable for that one platform until you have time to 
look at fixing it)

To be clear we (and some component owners) will continue triaging bugs and 
trying to get fixes in place as often as possible and prefer a fix, not a 
disabled test!

Please raise any concerns, otherwise we will move forward with this in the 
coming weeks.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


talos + performance tooling update + elective session to meet us in person

2017-11-06 Thread jmaher
I haven't posted in a while, but I wanted to let everyone know what is new with 
Talos and Performance Sheriffing.

I have a few blog posts outlining more details- in total we have 1127 different 
perf metrics we track per commit and for Firefox 55, 56, and 57 >=50 
regressions files/release [1]

Most of this work is done by Ionut [2] who started as a full time Performance 
Sheriff and Tool hacker earlier this year.

also in the last 6 months we have added or significantly edited 10+ Talos tests 
[3] along with many updates for new measurements to collect or methods to run 
tests.

Going forward we have plans for some upcoming work:
1) new hardware by end of Q1 (linux/windows - testing linux hardware this 
week).  This will be new machines in a new datacenter with Intel graphics cards
2) more changes in tests and updated benchmarks 
3) cleaning up and sheriffing AWFY alerts (we have a lot, we just ignore them)
4) a pass at making running and debugging Firefox easier within Talos
5) some proposals and prototypes of making AWFY easier to maintain and run via 
Try (custom hardware test coverage)
6) revisiting webextensions, heavy profiles, mitmproxy tools, tp6 pageset and 
actions, and checking in on collecting the hero element.

And as a bonus, we have an elective session at the upcoming Mozilla All Hands 
in Austin regarding hacking on Talos which is a great time to come and learn 
about what tools exist, how to add a new test, ask those nagging questions 
about why things are named funny or how to interpret results.

Thanks to many people who have contributed to improving our performance tests 
and tooling over the last 6 months.


[1] 
https://elvis314.wordpress.com/2017/11/01/keeping-an-eye-on-performance-alerts/
[2] 
https://elvis314.wordpress.com/2017/10/18/a-formal-introduction-to-ionut-goldan-mozillas-new-performance-sheriff-and-tool-hacker/
[3]
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


stockwell (intermittent failures) policy change - recommend disabling tests when 150 failures are seen in 21 days

2018-01-09 Thread jmaher
Happy new year from the stockwell team!  We have been busy triaging bugs and 
getting the test-verify job to be fine tuned for all platforms and test suites.

If you want to read about what we plan to do in the near future, you can follow 
along in this tracking bug:
https://bugzilla.mozilla.org/show_bug.cgi?id=1428828

One change I wanted to make everyone aware of is the threshold we use to set 
the whiteboard tag: [stockwell disable-recommended].  For the last 4 months 
that has been 200 failures in 30 days.  We look at this whiteboard tag twice a 
week and focus on those bugs- basically if there isn't signs of an upcoming 
patch we disable the test to reduce the pain on the trees.

What I would like to change on January 15th, is the threshold so that it is 150 
failures over 21 days.  In analyzing data over the last 4 months there would 
have been 2 bugs which we would have disabled which ended up getting fixed a 
week later- the rest of the bugs had active development taking place and would 
have been fixed/resolved as they were normally.

The advantage of doing this is we would disable tests 9 days faster, reducing 
failures on the tree and more importantly fewer failures on your try pushes 
with almost the exact same end result.

Please reply if you have concerns or other ideas we should consider.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Proposal to adjust testing to run on PGO builds only and not test on OPT builds

2019-01-17 Thread jmaher
Following up on this, thanks to Chris we have fast artifact builds for PGO, so 
the time to develop and use try server is in parity with current opt solutions 
for many cases (front end development, most bisection cases).

I have also looked in depth at what the impact on the integration branches 
would be.  In the data set from July-December (H2 2018) there were 11 instances 
of tests that we originally only scheduled in the OPT config and we didn't have 
PGO or Debug test jobs to point out the regression (this is due to scheduling 
choices).  Worse case scenario is finding the regression on PGO up to 1 hour 
later 11 times or roughly 2x/month.  Backfilling to find the offending patch as 
we do now 24% of the time would be similar time.  In fact running the OPT jobs 
on Debug instead would result in same time for all 11 instances (due to more 
chunks on debug and similar runtimes).  In short, little to no impact.

Lastly there was a pending question about talos.  There is an edge case where 
we can see a regression on talos that is PGO, but it is unrelated to the code 
and just a side effect of how PGO works.  I looked into that in 
https://bugzilla.mozilla.org/show_bug.cgi?id=1514829.  I found that if we 
didn't get opt alerts that we would not have missed any regressions.  
Furthermore, for the regressions, for the ones that were pgo only regressions 
(very rare) there were many other regressions at the same time (say a build 
change, or test change, etc.) and usually these were accepted changes, backed 
out, or investigated on a different test or platform.  In the past when we have 
determined a regression is a PGO artifact we have resolved it as WONTFIX and 
moved on.

Given this summary, I feel that most concerns around removing testing for OPT 
are addressed.  I would also like to extend the proposal to remove the OPT 
builds since no unit or perf tests would run on there.

As my original timeline is not realistic, I would like to see if there are 
comments until next Wednesday- January 23rd, then I can follow up on remaining 
issues or work towards ensuring we start the process of making this happen and 
what the right timeline is.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Proposal to adjust testing to run on PGO builds only and not test on OPT builds

2019-01-03 Thread jmaher
I would like to propose that we do not run tests on linux64-opt, windows7-opt, 
and windows10-opt.

Why am I proposing this:
1) All test regressions that were found on trunk are mostly on debug, and in 
fewer cases on PGO.  There are no unique regressions found in the last 6 months 
(all the data I looked at) that are exclusive to OPT builds.
2) On mozilla-beta, mozilla-release, and ESR, we only build/test PGO builds, we 
do not run tests on plan OPT builds
3) This will reduce the jobs (about 16%) we run which in turn reduces, cpu 
time, money spent, turnaround time, intermittents, complexity of the taskgraph.
4) PGO builds are very similar to OPT builds, but we add flags to generate 
profile data and small adjustments to build scripts behind MOZ_PGO flag 
in-tree, then we launch the browser, collect data, and repack our binaries for 
faster performance.
5) We ship PGO builds, not OPT builds

What are the risks associated with this?
1) try server build times will increase as we will be testing on PGO instead of 
OPT
2) we could miss a regression that only shows up on OPT, but if we only ship 
PGO and once we leave central we do not build OPT, this is a very low risk.

I would like to hear any concerns you might have on this or other areas which I 
have overlooked.  Assuming there are no risks which block this, I would like to 
have a decision by January 11th, and make the adjustments on January 28th when 
Firefox 67 is on trunk.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Proposal to adjust testing to run on PGO builds only and not test on OPT builds

2019-01-04 Thread jmaher
thanks everyone for your comments on this.  It sounds like from a practical 
standpoint until we can get the runtimes of PGO builds on try and in 
integration to be less than debug build times this is not a desirable change.

A few common responses:
* artifact opt builds on try are fast for quick iterations, a must have
* can we do artifact builds for PGO? (thanks :nalexander for bug 1517533 and 
bug 1517532)
* what about talos?  we need to investigate this more, I have always argued 
against pgo only for talos, but maybe we can revisit that (bug 1514829)
* do we turn off builds as well?  I had proposed just the tests, if we decide 
to turn off talos it would make sense to turn off builds.

Thanks all for the quick feedback, when the bugs in this thread are further 
along, or if I see another simpler solution for reducing the duplication, I 
will follow up.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Intent to deprecate - linux32 tests starting with Firefox 69

2019-04-05 Thread jmaher
Currently linux32 makes up about .77% of our total users.  This is still 1M+ 
users on any given week.

There is not a lot of support in the industry for 32 bit linux, all the major 
vendors are only distributing 64 bit versions now and other browser vendors 
only distribute 64 bit versions officially.

Currently we run a full suite of unit tests on linux32 per commit (integration, 
mozilla-central, try, mozilla-beta) and overall this is about 11% of our total 
CPU usage (which is about $416/day).

Linux32 has a similar number of intermittent failures (2.06% of tasks fail) and 
regressions as other platforms (such as Android, Windows, OSX, Linux64) 
indicating that this functions as an equal in many ways.  The cost associated 
with intermittents and failures will be a savings for everyone looking at 
results of a push by reducing 10% of intermittents allowing focus other 
failures for faster decision making.

I have looked at regressions that only appear on linux32, in fact looking 
deeper into the changes, there are 3 regressions since July which caused a 
backout and fix to the product (bug 571074, bug 1497580, bug 1499426).  For 
reference, in 2019, osx, windows7, and android 4.3 all have ~5 unique 
regressions found and all other configs have 0 or 1.  The larger volume of 
regressions we find are seen on multiple platforms or didn't result in fixing 
the browser.

Linux32 runs many tests in both e10s and non-e10s (see 
https://bugzilla.mozilla.org/show_bug.cgi?id=1433276 ), for a few reasons:
1) fennec ships non-e10s, our code base should have more coverage
2) not all tests run on fennec
3) users can disable non-e10s locally

I looked at Firefox non-e10s users, and it is 0.2% of our users for anyone that 
has upgraded the browser in the last 6 months.  I focused on that because if we 
are supporting modern versions and users never see it, then our efforts to 
build/develop non-e10s are having little effect.  In fact 96.4% of our users 
that use non-e10s are on Windows, so if we determine there is a need for 
testing this, I would encourage us to test on windows instead of Linux.  As for 
running non-e10s tests in place of coverage on Android, we have enabled more 
tests on android emulators (primarily due to web-platform-tests) which has 
reduced the need to rely on Linux32 as a means for getting close enough 
coverage for Android.

Earlier I mentioned 3 regressions in the browser that we fixed, 2 of those were 
for non-e10s specific cases, and one was for our default browser in e10s mode.  
While there have been many other regressions found on linux32, they are either 
fixed by hacking tests or disabling the test, or we see them on other platforms 
and would catch them in the absence of linux32 tests.

As our next ESR is upcoming, I would like to turn off linux32 on Firefox 69 and 
let it ride the trains and stay on 68 ESR.  This will allow builds/tests to be 
supported with security updates into 2021.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


today we updated the windows 10 tests machines to version 1803

2019-02-11 Thread jmaher
I wanted to give everyone a chance to be aware of an upgrade we made today for 
our windows 10 testers.  For the last 1.5 years they have been running on 
version 1703 and as of today we have updated them to be on 1803.  If you are 
curious about what tests had issues, you can see the bugs that depend on:
https://bugzilla.mozilla.org/show_bug.cgi?id=1522900

As a note, this change is for virtual machines only, not hardware workers that 
run performance tests.

If you push to try with an older base revision, there is a chance that many wpt 
and a few reftest/xpcshell/browser-chrome tests will fail.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform