Android tests - do not forget logcat!
Lately it feels like there is more activity around running, investigating, and fixing Android automated tests -- that's great to see! When looking at logs from automated test logs, be aware that there are differences between desktop and Android. Some messages dumped to standard output will appear in the desktop test log, but not on Android: They likely appear in the logcat instead. Logcat also provides additional diagnostic logging from geckoview, as well as logging from the OS and other apps running on the test device. It is a complicated, comprehensive account of what is happening on the device -- often essential to understanding test failures. All Android test tasks have the full logcat attached as a separate artifact, visible in the treeherder Job Details pane and also in the Log Viewer. Look for "*artifact uploaded:* logcat-.log". ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Visibility of disabled tests
Thanks Johann. I agree it is important that we try to fix tests that have been disabled. I think the sheriffs usually needinfo the triage owner before/when disabling a test; I'm disappointed to hear that isn't happening consistently. However, I'd prefer not to change the review process for the disabling patch. Currently sheriffs normally request review from #intermittent-reviewers and that has been working well: - we strive for very low latency so that frequently failing tests can be addressed right away - we watch for common errors in test manifests - we can help ensure consistency in the test disabling procedure. Keep in mind that sheriffs also needinfo (typically the triage owner) when a test is identified as "needswork", failing frequently but not yet at the disabling threshold. Often those needinfo requests go unanswered or fail to resolve the issue (no shaming here: we are all busy and have priorities). I think that requesting review from test author/triage owner/component peer risks adding another 2 day delay to the overall process -- more time where those tests are failing. Instead of changing the reviewers, how about: - we remind the sheriffs to needinfo - #intermittent-reviewers check that needinfo is in place when reviewing disabling patches. It might be helpful if we explicitly consider some special cases. If the sheriffs have needinfo'd for "needswork" and that needinfo has been cleared, do we want to set needinfo again when disabling? Always? If the triage owner has a huge needinfo queue, still needinfo? ... Regarding regression finding, as I understand it, sheriffs currently look for regression ranges for bugs where: - the test is failing frequently: since these are easier to verify pass/fail on any push - the test was running reliably in the near past. In my experience, a comment on the bug requesting a regression range can be effective. I don't know if the sheriffs have much time for additional regression searches. - Geoff On Tue, Jan 7, 2020 at 6:29 AM Johann Hofmann wrote: > Hi folks, > > in the past I and other triage owners have experienced some frequently > failing tests being disabled without a clear notice to the triage owner, > component owner or test author. I've seen this specific pattern a few times: > > - An intermittent test starts failing very frequently very suddenly. > - The Stockwell team reacts quickly (which is good) and disables the test, > getting review from another sheriff or member of their team. > - No analysis is done on the possible cause or regressing bug > - The intermittent bug is left open without needinfo to anyone who could > fix the test (some even with a P5 priority). > > This is problematic, since a) we're losing test coverage that way and b) > these tests might be failing frequently because there's actually something > wrong with the feature, not just a test issue. > > In most cases these get discovered sooner or later so I don't want to make > this issue bigger than it is, but it's still suboptimal for some of us. It > seems like we could easily remedy this by introducing a policy like: > > *For disabling tests, review from the test author, triage owner or a > component peer is required. If they do not respond within 2? business days > or if the frequency is higher than x, the test may be disabled without > their consent, but the triage owner *must* be needinfo'd on such a bug in > this case.* > > It would also be extremely helpful if Sheriffs could post a possible > regression range for the frequent intermittent when disabling, where > possible (because I assume that's also the best time to do a regression > range). > > Any thoughts? > > Cheers. > > Johann > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
PSA: Android test environments
The Android test environments used for continuous integration have been through many changes over the last year or two; here's a review of what we have today. [1] Most of our Android tests run on emulators. Some run on hardware: real phones. Our Android hardware tests run on physical devices -- Motorola G5 and Pixel 2 phones currently -- and those phones are physically managed by bitbar, a device farm provider. These test platforms appear on treeherder as "Android 7.0 MotoG5" and "Android 8.0 Pixel2". Running tests on hardware is relatively expensive so we make deliberate choices about which tests run on hardware. All of our performance (raptor) tests, architecture-sensitive tests like jittest and jsreftest, and select tests requiring special capabilities run on hardware. All other tests -- web-platform tests, most mochitests, reftests, xpcshell tests, etc -- run on emulators. The emulator test platform appears on treeherder as "Android 7.0 x86-64". These tests run in the Android x86_64 emulator on a Linux host. Unlike previous generations of our emulator test environment, today's emulator tests are fast: Thanks to hardware acceleration, tests run at about the same rate as they do on our desktop platforms. The Android tests running on trunk today are testing geckoview apps (Fennec tests continue to run on the esr68 branch). Most raptor tests run in the geckoview_example app; additional raptor tests run in Fenix and the Reference Browser; most other tests run in the geckoview test app. Both emulator and hardware tests have a fixed pool of instances: Regardless of load, we can only run N emulator tasks, or M hardware tasks at a time. Release Engineering Operations monitors backlog for both pools, but temporary backlogs are expected and tolerated. Since our hardware testing capacity is particularly limited, to run Android hardware tests on try, you must use the --full option with 'mach try fuzzy' [2]. For instance, you can see the available tests with 'mach try fuzzy --no-push --full --query "android-hw"' and you could run android-hw mochitest-media tests with 'mach try fuzzy --full --query "android-hw mochitest-media"'. There are no similar restrictions on try runs for emulator tests -- but please use responsibly! [1] https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&searchStr=android%2Ctest [2] https://firefox-source-docs.mozilla.org/tools/try/selectors/fuzzy.html ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Announcing new test platform "Android 7.0 x86"
This week some familiar tier 1 test suites began running on a new test platform labelled "Android 7.0 x86" on treeherder. Only a few test suites are running so far; more are planned. Like the existing "Android 4.2" and "Android 4.3" test platforms, these tests run in an Android emulator running in a docker container (the same Ubuntu-based image used for linux64 tests). The new platform runs an x86 emulator using kvm acceleration, enabling tests to run much, much faster than on the older platforms. As a bonus, the new platform uses Android 7.0 ("Nougat", API 24) - more modern, more relevant. This test platform was added to support geckoview testing. Tests run in the geckoview-based TestRunnerActivity (not Firefox for Android). To reproduce the main elements of this test environment locally: - build for Android x86 (mozconfig with --target=i686-linux-android) - 'mach android-emulator' or explicitly 'mach android-emulator --version x86-7.0' - install the geckoview androidTest apk - run your test command using --app to specify the geckoview test app, something like 'mach mochitest ... --app=org.mozilla.geckoview.test' Great thanks to the many people who have helped enable this test platform, especially :wcosta for help with taskcluster and :jchen for investigating test failures. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
New Android-only test suite on treeherder: geckoview-junit
With bug 1445716, there is a new Android-only, tier-1 test suite on treeherder: geckoview-junit (gv-junit). These are are on-device Android junit tests written in support of geckoview, running in our standard Android emulator environment on aws instances. You can run these tests on a local emulator or Android device with 'mach geckoview-junit'; be sure to install org.mozilla.geckoview.test from geckoview-androidTest.apk. Technical and support details at https://developer.mozilla.org/en-US/docs/Mozilla/Geckoview-Junit_Tests I am glad to answer questions/concerns/enhancement requests. - Geoff ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
skip-if(verify)
It is now possible to skip tests in test-verify. Simplify annotate the manifest for your test: [test] skip-if = verify or, for reftests: skip-if(verify) ... and the test-verify (TV) test task will not try to verify the annotated test. Please don't abuse this feature! Most TV failures indicate a weakness in the test. As always, you can read more about test-verify at https://developer.mozilla.org/en-US/docs/Mozilla/QA/Test_Verification. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Better triage for intermittent leaks in tests?
Some of our most troublesome intermittent test failures are leak bugs ("Intermittent LeakSanitizer | leak at ..." or "Intermittent leakcheck | default process: bytes leaked ...") Even when they fail frequently, these bugs often seem to remain unresolved for many weeks. Leaks are sometimes not strongly associated with a particular test, making it difficult to assign to a useful bugzilla component, or find a motivated triage owner or assignee. I feel like these bugs are not being connected to the "right" people effectively. Could we do better? For instance, could we assign all leak bugs to a specific bugzilla component, with a "leak guru" as triage owner? Volunteers?? - Geoff ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: --verify option added to mochitest, reftest, xpcshell test harnesses
On Tue, Oct 3, 2017 at 1:05 PM, Andrew Halberstadt wrote: > This is really great Geoff! Hopefully it can cut down the number of new > intermittents we introduce to the tree. Do you know if orangefactor or > ActiveData can track the rate of new incoming intermittents? Would be neat > to see how much of an impact this tool has on that. Thanks Andrew. I think that data is probably available in ActiveData, but querying for "new" might be tricky...I wouldn't know where to begin. Bugzilla might be the way to go: count bugs with keyword intermittent-failure created today, vs yesterday vs the day before ...? I suspect those numbers would be dominated by low frequency intermittent failures, which might not change. In fact I'm actually hoping to catch only mid- to high-frequency failures with TV, so that it doesn't end up wasting people's time fixing intermittents that we would hardly notice otherwise. - Geoff ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
test-verify now running as tier 2
Today the test-verify test task will start running as a tier 2 job. Look for the "TV" symbol on treeherder, on linux-64 test platforms. TV is intended as an "early warning system" for identifying the introduction of intermittent test failures. When a mochitest, reftest, or xpcshell test file is modified on a push, TV runs that particular test over and over until it fails (orange job, standard failure messages), or until max iterations are achieved (green job, all's well), or until TV runs out of time (green job, maybe all's well?). As a consequence, when a new test is added or a test is modified and an intermittent failure is introduced, TV will usually be the first job to fail, and it will fail on the push that modified the test, making it (usually) simple to identify where the intermittent was introduced. In future I hope to run TV on more platforms, apply it to more test suites, and refine the --verify implementation to find intermittent failures more efficiently. As a tier 2 task, TV failures will be starred but will not cause backouts. I hope to move to tier 1 once TV is proven to be effective. More info at [1]. Bug and enhancement requests welcomed: please file bugs blocking bug 1357513. [1] https://developer.mozilla.org/en-US/docs/Test_Verification ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
--verify option added to mochitest, reftest, xpcshell test harnesses
The mochitest, reftest, and xpcshell test harnesses now support a --verify option. For example: mach mochitest docshell/test/test_anchor_scroll_after_document_open.html --verify In verify mode, the requested test is run multiple times, in various "modes", in hopes of quickly finding any intermittent failures. Once tests are complete, a summary of results is printed: ::: ::: Test verification summary for: ::: ::: docshell/test/test_anchor_scroll_after_document_open.html ::: ::: 1. Run each test 10 times in one browser. : Pass ::: 2. Run each test 5 times in a new browser each time. : Pass ::: 3. Run each test 10 times in one browser, in chaos mode. : Pass ::: 4. Run each test 5 times in a new browser each time, in chaos mode. : Pass ::: ::: Test verification PASSED ::: There's no flexibility in the number of times the test is run in each mode and that's by design: I wanted a simple, standard way of checking "Is this test likely to produce a frequent intermittent failure"? Verify mode was developed for the test-verify task (announcement coming soon!) but it may also be a convenient addition to your local testing. More info at [1]. Bug and enhancement requests welcomed: please file bugs blocking bug 1357513. [1] https://developer.mozilla.org/en-US/docs/Test_Verification ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Reminder on Try usage and infrastructure resources
Masayuki, your try push had trouble because you requested "mochitest-2" instead of "mochitest-e10s-2". Non-e10s mochitests only run on Android and Windows now. You probably wanted something like: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d68382f17d63f0674c62acc7242a9e406793895f This is a good example of how a small deviation from "correct" try syntax can have unexpected and frustrating consequences. - Geoff On Thu, Sep 14, 2017 at 7:15 PM, Masayuki Nakano wrote: > I tried to say different point. See the treehearder log, mochitests didn't > run except on Win7 Debug, Android 4.3 API16+ Opt/Debug. So, try syntax > parser or something is really broken. I often meet this kind of bug. > > > On 9/15/2017 10:07 AM, Kris Maglione wrote: >> >> Your best bet is probably to use `mach try` with a specific set of test >> directories. It will generate a set of --try-test-paths flags to restrict >> tests to those paths, and only run the first chunk of any group. Without >> that, groups shift around too much to be reliable. >> >> On Fri, Sep 15, 2017 at 10:03:00AM +0900, Masayuki Nakano wrote: >>> >>> Even when I got the chunk numbers, specifying chunk numbers of mochitests >>> wouldn't work, see this log: >>> >>> https://treeherder.mozilla.org/#/jobs?repo=try&revision=c09c7046ed0664e89f7224e1de5219c39c94c948 >>> After that, I needed to rerun mochitests with |-u mochitests|. IIRC, I >>> tried to kick the specific chunks with "Add new jobs", but didn't work. >>> And also, when I try to investigate random oranges which are not >>> reproducible on my environments, I want an option like |--run-until-failure| >>> and |--repeat REPEAT| in the try syntax. Because of no such options, I need >>> to trigger a lot of jobs manually and that may/might cause too many oranges. >>> >>> On 9/15/2017 1:21 AM, Kyle Lahnakoski wrote: You can try ActiveData, which stores all test results from the past few weeks. Here is an example query that shows the chunk number for each run/build combo in the past day. ActiveData is sometimes more than a day behind https://activedata.allizom.org/tools/query.html#query_id=4HHuBgDu { "from":"unittest", "select":[ {"aggregate":"count"}, {"value":"action.start_time","aggregate":"max"} ], "groupby":[ "run.suite", "run.chunk", "result.test", "build.platform", "build.type", "run.type" ], "where":{"and":[ {"eq":{"build.branch":"mozilla-inbound"}}, {"prefix":{"run.suite":"moch"}}, {"gt":{"action.start_time":{"date":"today-day"}}}, {"regex":{"result.test":".*browser_623779.js.*"}} ]}, "limit":1000 } On 2017-09-14 11:49, Michael de Boer wrote: >> >> On 14 Sep 2017, at 17:48, Marco Bonardo wrote: >> >> When I need to retrigger a mochitest-browser test multiple times (to >> investigate an intermittent), often I end up running all the >> mochitest-browser tests, looking at every log until I find the chunk >> where the test is, and retrigger just that chunk. The chunk number >> changes based on the platform and debug/opt, so it's painful. >> Is there a way to trigger only the chunk that will contain a given >> test, so I can save running all of the other chunks? > > This! This! This! I’d love to be able to do this - would making testing > possible test failure fixes sooo much easier. > > Cheers, > > Mike. > >>> >>> >>> -- >>> Masayuki Nakano >>> Software Engineer, Mozilla >> >> > > > -- > Masayuki Nakano > Software Engineer, Mozilla > > -- > You received this message because you are subscribed to the Google Groups > "firefox-ci" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to firefox-ci+unsubscr...@mozilla.com. > To post to this group, send email to firefox...@mozilla.com. > To view this discussion on the web visit > https://groups.google.com/a/mozilla.com/d/msgid/firefox-ci/866a0e06-fbd9-c99b-451e-e20f80a12759%40mozilla.com. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Have you run 'mach bootstrap' lately?
I'm not sure. I always just answer the prompts and am happy with that. There is a --settings option, which sounds like it might be helpful, but I don't have any experience with that. - Geoff On Fri, May 12, 2017 at 9:00 AM, Ethan Glasser-Camp < eglasserc...@mozilla.com> wrote: > Is there a way to run it without having to reanswer the configuration > questions? > > Ethan > > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Have you run 'mach bootstrap' lately?
Good idea - I filed bug 1364480. On Fri, May 12, 2017 at 8:45 AM, Sylvestre Ledru wrote: > > > Le 12/05/2017 à 05:08, Geoffrey Brown a écrit : > > If you set up your build environment with 'mach bootstrap' but haven't > run > > it recently, consider taking a few minutes now to run it again. Running > > 'mach bootstrap' from time to time will keep your environment up to date > > and (more-or-less) in sync with your colleagues'. > > > > This seems to be especially important for Android test environments: The > > Android SDK and associated tools are always being updated and if you > don't > > stay up to date, there's a good chance something will eventually break. > > > Would it be possible to add a check like: > "You haven't updated your local configuration since XX days, please > consider running > mach bootstrap ?" > > Thanks, > Sylvestre > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Have you run 'mach bootstrap' lately?
If you set up your build environment with 'mach bootstrap' but haven't run it recently, consider taking a few minutes now to run it again. Running 'mach bootstrap' from time to time will keep your environment up to date and (more-or-less) in sync with your colleagues'. This seems to be especially important for Android test environments: The Android SDK and associated tools are always being updated and if you don't stay up to date, there's a good chance something will eventually break. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Changes to OrangeFactor Robot comments in intermittent test failure bugs
"OrangeFactor Robot" comments in bugs for intermittent test failures now have additional information: 1. Daily and weekly comments now include the push count and failure rate. For example: 7 failures in 606 pushes (0.012 failures/push) were associated with this bug yesterday. Previously, only the failure count ("7 failures") was reported, sometimes leading to misconceptions about changes to failure rate (a test failed 3 times yesterday, but 30 times today -- did the test change today...or were the trees closed yesterday!?). The push count is still only a very rough approximation of the number of times the test was run. Sometimes tests are skipped on certain platforms, skipped on a push to reduce load, or retried/repeated on a single push. In particular, since SETA regularly skips some test jobs, the number of test runs per push will vary across test jobs, so the failure rate of one test relative to another test may not be meaningful. Use the failure rate to compare changes in the rate of occurrence of one bug over time; do not use it to compare one bug to another. 2. The weekly comments for the most frequent failures now include the bug's rank -- its position in that week's top 50 most frequent failures tracked by OrangeFactor. For example: This is the #12 most frequent failure this week. This is an indication of the relative frequency of this bug compared to other intermittent test failure bugs: The #12 bug had more tracked failures than the #13 bug. This may be an effective guide for prioritizing work on intermittent test failure bugs. 3. The weekly comments for bugs with more than 50 failures in the week now include: ** This failure happened more than 50 times this week! Resolving this bug is a high priority. ** These high-frequency failures consistently account for a large percentage of test failures. In the last 7 days, just 36 high-frequency bugs (less than 4% of the 987 bugs tracked by OrangeFactor) accounted for 4111 failures (about 41% of the 8902 failures tracked by OrangeFactor). If you can contribute to the resolution of one of these bugs, please make it a priority: Quick resolution of these bugs can reduce overall test failure counts dramatically, helping everyone watching treeherder. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Better mach support for Firefox for Android
In recent months, many improvements have been made to mach commands to support running, testing, and debugging Firefox for Android: - More test commands for Android. These mach test commands now support Firefox for Android explicitly: mach mochitest mach robocop mach reftest mach crashtest mach jstestbrowser mach xpcshell-test mach cppunittest - Emulator support. 'mach android-emulator' launches the Android emulator, using the same Android image used to run tests seen on treeherder; select an image type with the --version option. - All of the test, run, and debug commands offer to start the Android emulator if no Android device is connected (when run from an Android context). $ ./mach mochitest testing/mochitest/tests/Harness_sanity No Android devices connected. Start an emulator? (Y/n) - All test, run, and debug commands offer to install Firefox on the connected device or emulator if Firefox is not already installed. $ ./mach mochitest testing/mochitest/tests/Harness_sanity It looks like Firefox is not installed on this device. Install Firefox? (Y/n) - Test commands requiring host xpcshell offer to install "host utilities" if none have been configured. - Firefox can be run on an Android device or emulator with 'mach run'. - JimDB, a GDB fork explicitly supporting debugging for Firefox for Android, can be installed, configured, and run with 'mach run --debug'. - Emulator images, host utilities, and jimdb are automatically downloaded, cached, and installed as needed. - Firefox for Android wiki pages have been updated: - Build info at https://wiki.mozilla.org/Mobile/Fennec/Android - Testing info at https://wiki.mozilla.org/Mobile/Fennec/Android/Testing - Debugging with GDB at https://wiki.mozilla.org/Mobile/Fennec/Android/GDB. - Screencasts demonstrate common tasks at https://people.mozilla.org/~gbrown/android-demos/. Running, testing, and debugging Firefox will always be more complicated on Android than on desktop, but now these tasks look just as easy on Android, and can be performed with the same mach commands as on desktop. If you have had trouble in the past running, testing, or debugging your own Firefox for Android build, this is a great time to try again. All you need to get started is a Firefox for Android build on a Linux or OSX computer. Something not working for you? Have more ideas for improvements? Let me know. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: B2G emulator issues
On 4/7/2014 3:16 PM, Randell Jesup wrote: > The B2G emulator design is causing all sorts of problems. We just fixed That sounds very similar to some of the failures seen on the Android 2.3 emulator. Many media-related mochitests intermittently time out on the Android 2.3 emulator when run on aws. These are reported in bug 981889, bug 981886, bug 981881, and bug 981898, but have not been investigated. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform