On Wednesday, August 29, 2012 11:35:52 AM UTC-4, Andrew Halberstadt wrote:
On 08/29/2012 09:56 AM, Andrew Halberstadt wrote:
On 08/28/2012 02:17 PM, L. David Baron wrote:
On Tuesday 2012-08-28 12:52 -0400, Andrew Halberstadt wrote:
I also don't think we should go quite as small as
In our continuing effort on the Auotmation and Tools team to make Talos tests
useful and make sure we know and care about what we are measuring, we have
realized that the tdhtml tests are not providing us any value. We run these as
chrome/nochrome.
Here are the tests:
On Tuesday, March 19, 2013 5:37:22 AM UTC-4, Robert O'Callahan wrote:
Cool name. However, are you actually proposing replacing, say, the reftest
manifest format with this format? It looks like it would be a lot more
verbose.
My understanding is there is no need or plan to replace the reftest
the 325 jobs per push come from manually counting jobs on tbpl (ignoring pgo).
remember to use showall=1. The total stats from gps include try which has much
fewer test jobs per push and inbound coalescing.
___
dev-platform mailing list
My thoughts on why the average build time is shorter on try vs inbound is
inbound includes pgo builds and debug builds which have other steps. The try
server builds are not usually doing pgo.
___
dev-platform mailing list
On Thursday, April 11, 2013 10:26:25 AM UTC-4, Scott Johnson wrote:
Thus Spoke jmaher:
There are a couple common directory structures used for storing tests in
the tree:
1) component/tests
2) component/tests/harness
I have a series of patches which will move most
On Thursday, April 25, 2013 4:12:16 PM UTC-4, Ed Morley wrote:
On 25 April 2013 20:14:10, Justin Lebar wrote:
Is this what you're saying?
* 10.6 opt tests - per-checkin (no change)
* 10.6 debug tests- reduced
* 10.7 opt tests - reduced
* 10.7 debug tests
On Friday, April 26, 2013 9:49:18 AM UTC-4, Armen Zambrano G. wrote:
Maybe we can keep one of the talos jobs around? (until releng fixes the
various python versions' story)
IIUC this was more of an infra issue rather than a Firefox testing issue.
It was infra related, but it was
On Friday, May 3, 2013 12:07:27 PM UTC-4, Ehsan Akhgari wrote:
Can somebody explain to me what android.json is, why it exists, and
what's different between disabling a mochitest on Android there versus
excluding it from MOCHITEST_FILES in the Makefile.in?
Thanks!
Ehsan
Last week while investigating a crash on android which only happened in the
reftest and talos harness (not in the mochitest harness), we compared the
preferences. We found in mochitest we set a bunch of preferences to disable
background network access (intentionally designed to 404 for
We have a top orange factor failure which is a talos timeout that only happens
on windows XP and predominately on the dromaeo_css test. What happens is we
appear to complete the test just fine, but the poller we have on the process
used to manage firefox never indicates we have finished.
Can you explain what would need to be done for Android to get into this mode?
It might be difficult to make this work with our current solution for automated
tests.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
Please see the talos wiki for detailed instructions:
https://wiki.mozilla.org/Buildbot/Talos/Misc#Steps_to_add_a_test_to_production
Currently desktop tests run on mozharness and android tests run the old way via
raw buildbot. There is a difference and there will be a difference for the
next
quite possibly we don't need all those jobs running on tegras. I don't know of
a bug in the product that has broken on either the tegra or panda platform but
not the other.
Joel
___
dev-platform mailing list
dev-platform@lists.mozilla.org
it takes a lot of work to get green tests on a new platform. I spent the
better part of December to March getting tests green on Ubuntu.
If these are in vms, we wouldn't have the graphics cards mentioned above. In
fact we might not have the ability to run the webgl tests.
In bug 923770 we switched talos to use mozprocess [1] internally. This is a
great win for managing the internal process of Firefox and reducing our
timeouts on OSX and Windows test runs. The side effect of running in
mozprocess is about a 500ms delay in launching the process. I have looked
using https://treestatus.mozilla.org/mozilla-inbound, I looked at the reasons
for tree closures (usually associated with backouts), going back to 50 status
messages, I found:
38 test issues
14 build issues
9 infrastructure issues
2 other issues
Note, some of these closures had 1 issue
I am working on using intel power gadget to measure the power usage. Currently
this is on windows with an idle test. Our test slaves have older CPUs which do
not support the intel power gadget.
___
dev-platform mailing list
For talos development we allow pointing at a user specific repo instead of the
master one. This has greatly reduced the time to bring up new tests. This
could easily be hosted elsewhere, but we chose to restrict it to user repos for
a security measure. You have to have cleared some form of
As the sheriff's know it is frustrating to deal with hundreds of tests that
fail on a daily basis, but are intermittent.
When a single test case is identified to be leaking or failing at least 10% of
the time, it is time to escalate.
Escalation path:
1) Ensure we have a bug on file, with the
4) In the case we go another 2 days with no response from a module owner,
we will disable the test.
Are you talking about newly-added tests, or tests that have been
passing for a long time and recently started failing?
In the latter case, the burden should fall on the
I want to express my thanks to everyone who contributed to this thread. We
have a lot of passionate and smart people who care about this topic- thanks
again for weighing in so far.
Below is a slightly updated policy from the original, and following that is an
attempt to summarize the thread
On Tuesday, April 15, 2014 9:42:25 AM UTC-4, Kyle Huey wrote:
On Tue, Apr 15, 2014 at 6:21 AM, jmaher joel.ma...@gmail.com wrote:
This policy will define an escalation path for when a single test case is
identified to be leaking or failing and is causing enough disruption on the
trees
https://bugzilla.mozilla.org/show_bug.cgi?id=1013262 tracks all the Talos
performance adjustments
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform
Could you give some examples of what tests we could run on mobile in chrome?
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform
The A*Team has a lot of performance tool related work in Q3 (goals:
https://wiki.mozilla.org/Auto-tools/Goals/2014Q3#Performance)
For folks interested in performance automation and tools here are some of the
key areas we plan on working on:
* Tree Herder Talos UI - view all performance data
(https://bugzilla.mozilla.org/showdependencytree.cgi?id=992911hide_resolved=1)
Happy hacking,
-jmaher
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform
On Tuesday, August 19, 2014 11:46:08 AM UTC-4, Gavin Sharp wrote:
Going with our test disabling policy
(https://wiki.mozilla.org/Sheriffing/Test_Disabling_Policy), I am going to
get ready and disable tests on Friday.
A few issues here:
- This particular case (we're making broad
sandwich making!
-jmaher
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform
On Wednesday, November 12, 2014 5:51:40 PM UTC-5, Benjamin Smedberg wrote:
On 11/12/2014 5:49 PM, Nicholas Nethercote wrote:
What exactly do you mean by unit tests?
I presumed that this meant all of our pass/fail test suites (not the
performance tests): xpcshell, the various flavors of
In the history of running Talos, there has never been an easy way to determine
if your change has fixed a regression or created a new one. We have compare.py
and compare-talos which are actually quite useful, but it requires you to run
yet another tool - in short you have to break your normal
Mozilla - 2015 Talos performance regression policy
Over the last year and a half the Talos tests have been rewritten to be more
useful and meaningful. This means we need to take them seriously and cannot
just ignore real issues when we don't have time. This does not mean we need to
fix or
Great questions folks.
:bsmedberg has answered the questions quite well, let me elaborate:
Before a bug can be marked as resolved:fixed we need to verify the regression
is actually fixed. In many cases we will fix a large portion of the regression
and accept the small remainder.
We do keep
Right now we are running jobs on linux/windows/osx and they are all timing out.
That means we are eating up machine time for roughly an hour a job when most
jobs take 20 minutes.
By the end of the month we should either take a larger hit on our
infrastructure and get these running on
A couple months ago I had posted about a project I worked on called SETA:
https://elvis314.wordpress.com/2015/02/06/seta-search-for-extraneous-test-automation/
This has been rolling now for a few months and it helps show us the minimum set
of jobs to run to detect all the failures we find.
This
Things are still changing with Talos- many things are becoming easier, while
others still have kinks to work out- here are a few things which have changed
recently:
1) Android talos has been reduced to what is useful- it should run faster and
sets us up for migrating to autophone next quarter
can we keep the
snarkfest version running please until this is resolved?
My main concern was because I inferred from the previous post that
deprecation of snarkfest was scheduled on a timeline basis.
Can we instead schedule on a when-its-ready basis, please?
yes- we are not doing
I did see the ts, paint regression. This happened on 4 different platforms and
was backed out for telemetry issues about 5 pushes later:
http://hg.mozilla.org/integration/mozilla-inbound/rev/1190bc7b862d
and the backout:
http://hg.mozilla.org/integration/mozilla-inbound/rev/59ad2812d3c7
By the
It has been a while since we posted an update on Talos
Here are some new things:
* bug 1166132 - new tps test - tab switching
* e10s on all platforms, only runs on mozilla-central for pgo builds, broken
tests, big regressions are tracked in bug 1144120
* perfherder is easier to use, some polish
The last update was in late July:
https://groups.google.com/forum/#!topic/mozilla.dev.platform/PaJFBtvc3Vg
While we have no new tests, I would like to highlight a few changes:
* talos now lives in mozilla-central: testing/talos/. Thanks to :parkouss our
fearless contributor who tackled this
Our infrastructure at Mozilla does a great job of supporting all the jobs we
need to run jobs for the thousands of pushes we do each month. Currently Talos
runs on real hardware (and we have no plans to change that), but it does mean
that we have a limited pool of available machines. Right
Currently we run a very outdated version of V8 (version 7) in Talos. This has
since been replaced with Octane in the world of benchmarks.
AWFY (arewefastyet.com), has been running Octane and catching regressions
faster than Talos. There is missing coverage in AWFY, specifically e10s, pgo,
> >> It seems like another alternative might be to run Octane in Talos,
> >> instead of v8_7.
> >>
> >> It seems like Talos has two advantages over AWFY (correct me if I'm wrong):
> >>
> >> 1. Easy for developers to schedule jobs via try (maybe less of a concern
> >> with a benchmark like this,
> Same here. I'm now unable to land the gecko dependencies for bug 1227980
> [1] and that's annoying to say the least considering it took a month to
> write and test that thing (not even mentioning the time spent by the
> many reviewers and QA people involved). What are we supposed to do with
>
>
> This is just a raw idea, but maybe this would make more sense to provide a
> diff of profiles, and show what decreased / increased. At least this would
> make these benchmarks less obscure.
>
Pushing a before/after patch to try with profiling (note the numbers are not
useful) can be done
I have noticed that since March 4th, the b2g builds on m-c are perma failing.
Doing some basic searching we have about 15 hours of machine time going to
failed builds on every push (which is ~1350 builds or 20,250 machine hours).
These show up in mozreview as failed jobs when you autoland a
Historically talos has posted to graphs.mozilla.org. In fact,
graphs.mozilla.org collects the summarized data from Talos and generates alerts
which are posted to mozilla.dev.tree-alerts.
Over the last 6 months we have been collecting the all the summarized data and
subtest data inside of
On Wednesday, June 29, 2016 at 7:37:51 PM UTC+3, jl...@mozilla.com wrote:
> On Tuesday, June 28, 2016 at 2:35:22 PM UTC-7, David Baron wrote:
>
> >
> > Why is it vital to opt jobs but less so for debug jobs?
>
> you're right. I worded this poorly. I mean more that opt builds are part of
> the
This is the second update of project stockwell (first update:
https://goo.gl/1X31t8).
This month we will be recommending and asking that intermittent failures that
occur >=30 times/week be resolved within 2 weeks of triaging them.
Yesterday we had these stats:
Orange Factor: 10.75
for the October 7th meeting are:
* [jmaher] defining intermittent oranges (are there logical categories we can
break failures into?)
* [gbrown] disabling unreliable tests (sheriffs point of view vs developers
point of view, and what makes sense to ship a quality product)
Please consider joining us
On Tuesday, November 8th we will be holding another intermittent orange hacking
meeting at 08:30 PDT:
https://wiki.mozilla.org/Auto-tools/Projects/Stockwell/Meetings
This week we will discuss:
* Triaging intermittents via OrangeFactor
* What makes up a good or a bad test case?
The wiki with
://elvis314.wordpress.com/2017/01/09/project-stockwell-january-2016/
Feel free to join our next fortnightly meeting next Tuesday (January 17th)
@08:30 PDT in the jmaher vidyo room.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https
On Wednesday, March 22, 2017 at 9:35:35 AM UTC-4, Ben Kelly wrote:
> On Wed, Mar 22, 2017 at 9:22 AM, wrote:
>
> > I have not been able to find an owner for the Firefox::New Tab Page
> > bugzilla component (bug 1346908). There are 35 tests in the tree and
> > without anyone
I have not been able to find an owner for the Firefox::New Tab Page bugzilla
component (bug 1346908). There are 35 tests in the tree and without anyone to
assume responsibility for them when they are intermittent (bug 1338848), I plan
to delete them all if I cannot get an owner by the end of
I wanted to give an update on our work for reducing intermittents. Last month
when we posted there was a lot of discussion and concerns around disabling
tests, likewise some of the terminology and process.
I have outlined many pieces of data about the progress and rates of bugs and
disabling
A lot of great discussion here, thanks everyone for taking some time out of
your day to weigh in on this subject. There are slight differences between a
bug being filed and actively working on the bug once it crosses our threshold
of 30 failures/week- I want to discuss when we have looked at
On Tuesday, March 7, 2017 at 2:57:21 PM UTC-5, Steve Fink wrote:
> On 03/07/2017 11:34 AM, Joel Maher wrote:
> > Good suggestion here- I have seen so many cases where a simple
> > fix/disabled/unknown/needswork just do not describe it. Let me work on a
> > few new tags given that we have 248 bugs
On Friday, March 10, 2017 at 3:46:14 PM UTC-5, Kris Maglione wrote:
> On Fri, Mar 10, 2017 at 01:55:40PM +, David Burns wrote:
> >I went back and did some checks with autoland to servo and the results are
> >negligible. So from 01 February 2017 to 10 March 2017 (as of sending this
> >email). I
In recent months we have been triaging high frequency (>=30 times/week)
failures in automated tests. We find that we are fixing 35% of the bugs and
disabling 23% of them.
The great news is we are fixing many of the issues. The sad news is we are
disabling tests, but usually only after giving
On Tuesday, March 7, 2017 at 11:45:38 PM UTC-5, Chris Pearce wrote:
> I recommend that instead of classifying intermittents as tests which fail >
> 30 times per week, to instead classify tests that fail more than some
> threshold percent as intermittent. Otherwise on a week with lots of
On Tuesday, March 7, 2017 at 1:53:48 PM UTC-5, Marco Bonardo wrote:
> On Tue, Mar 7, 2017 at 6:42 PM, Joel Maher wrote:
>
> > Thank for pointing that out. In some cases we have fixed tests that are
> > just timing out, in a few cases we disable because the test typically
On Tuesday, March 7, 2017 at 1:59:14 PM UTC-5, Steve Fink wrote:
> Is there a mechanism in place to detect when disabled intermittent tests
> have been fixed?
>
> eg, every so often you could rerun disabled tests individually a bunch
> of times. Or if you can distinguish which tests are
Thanks everyone for commenting on this thread. As a note, we run many tests in
non-e10s mode:
* android mochitest, reftest, crashtest, marionette,
* mochitest-chrome
* xpcshell
* gtest/cpptest/jittest
* mochitest-a11y
* mochitest-jetpack (very few tests remain)
While there are many tests which
As Firefox 57 is on trunk, we are shipping e10s by default. This means that
our primary support is for e10s. As part of this, there is little to no need
to run duplicated tests in non-e10s and e10s mode.
In bug 1386689, we have turned them off. There was some surprise in doing this
and
Yesterday I landed bug 1391371 which enabled non-e10s unittests on windows 7
debug. A few tests had to be disabled in order to do this. To keep track of
what we did and each of the test suites to evaluate, I have filed bug 1391350.
As a note we now have the following non-e10s tests running:
On Wednesday, August 16, 2017 at 4:03:20 PM UTC-4, Nils Ohlmeier wrote:
> > On Aug 16, 2017, at 07:23, James Graham wrote:
> >
> > On 16/08/17 01:26, Nils Ohlmeier wrote:
> >> I guess not a lot of people are aware of it, but for WebRTC we still have
> >> two distinct
It is great to see a good use for compiler warnings and alerts. We have added
a lot of data to perfherder and the build metrics are not covered by any
sheriffs by default. For these if it is clear who introduced them, we will
comment in the bug as a note, but there are no intentions to file
On Tuesday, June 6, 2017 at 3:17:20 PM UTC-4, Ben Kelly wrote:
> On Tue, Jun 6, 2017 at 3:07 PM, Chris Peterson
> wrote:
>
> > On 6/6/17 10:33 AM, Boris Zbarsky wrote:
> >
> >> On 6/1/17 9:04 PM, Mike Hommey wrote:
> >>
> >>> Ah, forgot to mention that. No, it doesn't
Over the last 9 months a few of us have really watched intermittent test
failures almost daily and done a lot to pester people as well as fix many.
While there are over 420 bugs that have been fixed since the beginning of the
year, there are half that many (211+) which have been disabled in
I haven't posted in a while, but I wanted to let everyone know what is new with
Talos and Performance Sheriffing.
I have a few blog posts outlining more details- in total we have 1127 different
perf metrics we track per commit and for Firefox 55, 56, and 57 >=50
regressions files/release [1]
Happy new year from the stockwell team! We have been busy triaging bugs and
getting the test-verify job to be fine tuned for all platforms and test suites.
If you want to read about what we plan to do in the near future, you can follow
along in this tracking bug:
Following up on this, thanks to Chris we have fast artifact builds for PGO, so
the time to develop and use try server is in parity with current opt solutions
for many cases (front end development, most bisection cases).
I have also looked in depth at what the impact on the integration branches
I would like to propose that we do not run tests on linux64-opt, windows7-opt,
and windows10-opt.
Why am I proposing this:
1) All test regressions that were found on trunk are mostly on debug, and in
fewer cases on PGO. There are no unique regressions found in the last 6 months
(all the data
thanks everyone for your comments on this. It sounds like from a practical
standpoint until we can get the runtimes of PGO builds on try and in
integration to be less than debug build times this is not a desirable change.
A few common responses:
* artifact opt builds on try are fast for quick
Currently linux32 makes up about .77% of our total users. This is still 1M+
users on any given week.
There is not a lot of support in the industry for 32 bit linux, all the major
vendors are only distributing 64 bit versions now and other browser vendors
only distribute 64 bit versions
I wanted to give everyone a chance to be aware of an upgrade we made today for
our windows 10 testers. For the last 1.5 years they have been running on
version 1703 and as of today we have updated them to be on 1803. If you are
curious about what tests had issues, you can see the bugs that
76 matches
Mail list logo