Re: New e10s tests on tinderbox
Not yet, because M-e10s is only running on Linux opt, and these test_IPC tests run everywhere in opt and debug. - Kyle On Apr 8, 2014 6:58 PM, "Shih-Chiang Chien" wrote: > Hi Bill, > > Many thanks for working on the M-e10s. Does it means we can remove all > these βtest_ipc.htmlβ mochitests? AFAIK these test cases are manually > emulating an e10s environment with some hacks. > > Here is the list of test_ipc.html: > > http://dxr.mozilla.org/mozilla-central/source/content/media/webspeech/synth/ipc/test/test_ipc.html > > http://dxr.mozilla.org/mozilla-central/source/dom/devicestorage/ipc/test_ipc.html > > http://dxr.mozilla.org/mozilla-central/source/dom/indexedDB/ipc/test_ipc.html > > http://dxr.mozilla.org/mozilla-central/source/dom/media/tests/ipc/test_ipc.html > > Best Regards, > Shih-Chiang Chien > Mozilla Taiwan > > On Apr 9, 2014, at 5:28 AM, Bill McCloskey wrote: > > > Hi everyone, > > > > Starting today, we have new mochitests that show up as M-e10s (1 2 3 4 > 5). These are mochitests-plain running inside an e10s content process. > Aside from being in a separate process, they work pretty much the same as > normal. Some tests have been disabled for e10s. If you add a new test and > it doesn't work in e10s mode, you can disable it with the following > mochitest.ini gunk: > > > > [your_test.html] > > skip-if = e10s > > > > We have about 85% of mochitests-plain running right now. I'm hoping to > make a big push to get this number up to 100%, but there are still some > prerequisite bugs that I want to fix first. In the meantime, we can at > least identify regressions in the tests that run. > > > > Right now, these tests are running on inbound, central, try, fx-team, > and b2g-inbound. In a few days, they'll be running on all trunk trees. If > you do a try push, e10s tests will run iff mochitests-plain run. We don't > have a specific trychooser syntax for them yet. > > > > The tests are restricted to Linux and Linux64 opt builds right now. > Eventually we'll expand them to debug builds and maybe to other platforms. > We also want to get other test suites running in e10s. As testing ramps up, > we're going to have more and more test suites running e10s side-by-side > with non-e10s. The eventual goal is of course to disable non-e10s tests > once we've shipped an e10s browser. Until then, we'll have to balance > resource usage with test coverage. > > > > If you want to run in e10s mode locally, it's pretty simple: > > > > mach mochitest-plain --e10s > > > > As usual, you can pass in specific tests or directories as well as > chunking options. Debugging in e10s is a little harder. Passing the > --debugger=gdb option will only attach the debugger to the parent process. > If you want to debug the content process, set the environment variable > MOZ_DEBUG_CHILD_PROCESS=1. When the child starts up, it will go to sleep > after printing its PID: > > > > CHILDCHILDCHILDCHILD > > debug me @ > > > > At that point you can run gdb as follows: > > > > gdb $OBJDIR/dist/bin/plugin-container > > > > Then you can set breakpoints in the child and resume it with "continue". > > > > Most of the work for this was done by Ted, Armen, Aki, and Mark Hammond. > Thanks guys! > > > > -Bill > > ___ > > dev-platform mailing list > > dev-platform@lists.mozilla.org > > https://lists.mozilla.org/listinfo/dev-platform > > > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform > > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: New e10s tests on tinderbox
Hi Bill, Many thanks for working on the M-e10s. Does it means we can remove all these βtest_ipc.htmlβ mochitests? AFAIK these test cases are manually emulating an e10s environment with some hacks. Here is the list of test_ipc.html: http://dxr.mozilla.org/mozilla-central/source/content/media/webspeech/synth/ipc/test/test_ipc.html http://dxr.mozilla.org/mozilla-central/source/dom/devicestorage/ipc/test_ipc.html http://dxr.mozilla.org/mozilla-central/source/dom/indexedDB/ipc/test_ipc.html http://dxr.mozilla.org/mozilla-central/source/dom/media/tests/ipc/test_ipc.html Best Regards, Shih-Chiang Chien Mozilla Taiwan On Apr 9, 2014, at 5:28 AM, Bill McCloskey wrote: > Hi everyone, > > Starting today, we have new mochitests that show up as M-e10s (1 2 3 4 5). > These are mochitests-plain running inside an e10s content process. Aside from > being in a separate process, they work pretty much the same as normal. Some > tests have been disabled for e10s. If you add a new test and it doesn't work > in e10s mode, you can disable it with the following mochitest.ini gunk: > > [your_test.html] > skip-if = e10s > > We have about 85% of mochitests-plain running right now. I'm hoping to make a > big push to get this number up to 100%, but there are still some prerequisite > bugs that I want to fix first. In the meantime, we can at least identify > regressions in the tests that run. > > Right now, these tests are running on inbound, central, try, fx-team, and > b2g-inbound. In a few days, they'll be running on all trunk trees. If you do > a try push, e10s tests will run iff mochitests-plain run. We don't have a > specific trychooser syntax for them yet. > > The tests are restricted to Linux and Linux64 opt builds right now. > Eventually we'll expand them to debug builds and maybe to other platforms. We > also want to get other test suites running in e10s. As testing ramps up, > we're going to have more and more test suites running e10s side-by-side with > non-e10s. The eventual goal is of course to disable non-e10s tests once we've > shipped an e10s browser. Until then, we'll have to balance resource usage > with test coverage. > > If you want to run in e10s mode locally, it's pretty simple: > > mach mochitest-plain --e10s > > As usual, you can pass in specific tests or directories as well as chunking > options. Debugging in e10s is a little harder. Passing the --debugger=gdb > option will only attach the debugger to the parent process. If you want to > debug the content process, set the environment variable > MOZ_DEBUG_CHILD_PROCESS=1. When the child starts up, it will go to sleep > after printing its PID: > > CHILDCHILDCHILDCHILD > debug me @ > > At that point you can run gdb as follows: > > gdb $OBJDIR/dist/bin/plugin-container > > Then you can set breakpoints in the child and resume it with "continue". > > Most of the work for this was done by Ted, Armen, Aki, and Mark Hammond. > Thanks guys! > > -Bill > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform signature.asc Description: Message signed with OpenPGP using GPGMail ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Removing 'jit-tests' from make check: 15% speedup
Thanks Dan. This looks to be contributing roughly half to our 30-45% build speedup on Windows this month. Daniel Minor wrote: Hello, Just a heads up that very soon we'll be removing jit-tests from the "make check" target[1]. The tests have been split out into a separate test job on TBPL[2] (labelled Jit), have been running on Cedar for several months, and have been recently turned on for other trees. We've added a mach command-- "mach jittest" that runs the tests with the same arguments that "make check" currently does. Along with the cpp unit tests that were removed back in January, the jit-tests are a substantial portion of "make check" execution time. Their removal will speed up build time and allow them to be re-triggered independently in the event of failures. If you encounter any issues please feel free to file a bug. Regards, Dan [1] https://bugzilla.mozilla.org/show_bug.cgi?id=988532 [2] https://bugzilla.mozilla.org/show_bug.cgi?id=858621 ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: New e10s tests on tinderbox
Bill McCloskey wrote: > Starting today, we have new mochitests that show up as M-e10s (1 2 3 4 5). > These are mochitests-plain running inside an e10s content process. Aside from > being in a separate process, they work pretty much the same as normal. Some > tests have been disabled for e10s. If you add a new test and it doesn't work > in e10s mode, you can disable it with the following mochitest.ini gunk: This is great! Thanks for driving this! -- Blake Kaplan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policy for disabling tests which run on TBPL
Aryeh Gregor writes: > On Tue, Apr 8, 2014 at 2:41 AM, Ehsan Akhgari wrote: >> What you're saying above is true *if* someone investigates the >> intermittent test failure and determines that the bug is not >> important. But in my experience, that's not what happens at >> all. I think many people treat intermittent test failures as a >> category of unimportant problems, and therefore some bugs are >> never investigated. The fact of the matter is that most of >> these bugs are bugs in our tests, which of course will not >> impact our users directly, but I have occasionally come across >> bugs in our code code which are exposed as intermittent >> failures. The real issue is that the work of identifying where >> the root of the problem is often time is the majority of work >> needed to fix the intermittent test failure, so unless someone >> is willing to investigate the bug we cannot say whether or not >> it impacts our users. > > The same is true for many bugs. The reported symptom might > indicate a much more extensive underlying problem. The fact is, > though, thoroughly investigating every bug would take a ton of > resources, and is almost certainly not the best use of our > manpower. There are many bugs that are *known* to affect many > users that don't get fixed in a timely fashion. Things that > probably won't affect a single user ever at all, and which are > likely to be a pain to track down (because they're > intermittent), should be prioritized relatively low. New intermittent failures are different from many user reported bugs because they are known to be a regression and there is some kind of indication of the regression window. Regressions should be high priority. People are getting by without many new features but people have begun to depend on existing features, so regressions break real sites and cause confusion for many people. The time to address regressions is ASAP, so that responsibility can be handed over to the person causing the regression. Waiting too long means that backing out the cause of the regression is likely to cause another regression. I wonder whether the real problem here is that we have too many bad tests that report false negatives, and these bad tests are reducing the value of our testsuite in general. Tests also need to be well documented so that people can understand what a negative report really means. This is probably what is leading to assumptions that disabling a test is the solution to a new failure. Getting bugs on file and seen by the right people is an important part of dealing with this. The tricky part is working out how to prioritize and cope with these bugs. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: New e10s tests on tinderbox
- Original Message - > From: "Bobby Holley" > To: "Bill McCloskey" > Cc: "dev-platform" > Sent: Tuesday, April 8, 2014 2:35:26 PM > Subject: Re: New e10s tests on tinderbox > > Can you elaborate on the kinds of things that make tests fail on e10s? I > have some idea in my head of what they might be, but I don't know how > accurate it is with all the Black Magic we do these days. There isn't really any black magic in mochitests-plain. That's why I started with this suite first :-). The most common causes of failures that I've seen: 1. Sometimes code just wasn't designed for e10s and it will assert if we try to use it. Bug 989139 is an example. There, we assert any time we encounter the tag. 2. Opening a new window will open a new tab in e10s. This is bug 989501. It's usually harmless, but sometimes it causes tests to get unexpected values for the window size or position. 3. Tests that use plugins don't work because e10s has no way to find the test plugin. That will eventually be fixed by bug 874016. Also, one thing I forgot to point out is that testing on e10s is fairly close to testing on b2g, so I think b2g will benefit from the work we do in trying to fix some of these issues. For example, the issue happens on b2g too (although admittedly it's not the most important issue). -Bill ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: New e10s tests on tinderbox
> Most of the work for this was done by Ted, Armen, Aki, and Mark Hammond. > Thanks guys! And RyanVM! I knew I'd forget someone. Sorry. -Bill ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: B2G emulator issues
Randell Jesup writes: > 1) running on TBPL (AWS) the internal timings reported show the specific >test going from 30 seconds to 450 seconds with the patch. > 2) on my local system, the test self-reports ~10 seconds, with or >without the patch. > Note: the timer in question is nsITimer::TYPE_REPEATING_PRECISE with > 10ms timing. And changing it to 100ms makes the tests reliably green. Do you know how many simultaneous hardware threads are emulated? Is it possible that the thread using TYPE_REPEATING_PRECISE has a high priority, and so it would occupy the single hardware thread when there is no spare time available for anything else? The time taken for the test run might depend on the "anything else" running. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: New e10s tests on tinderbox
This is awesome! Great job getting us this far. On Tue, Apr 8, 2014 at 2:28 PM, Bill McCloskey wrote: > We have about 85% of mochitests-plain running right now. Can you elaborate on the kinds of things that make tests fail on e10s? I have some idea in my head of what they might be, but I don't know how accurate it is with all the Black Magic we do these days. bholley ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: New e10s tests on tinderbox
On Tue, Apr 08, 2014 at 02:28:02PM -0700, Bill McCloskey wrote: > Hi everyone, > > Starting today, we have new mochitests that show up as M-e10s (1 2 3 4 5). > These are mochitests-plain running inside an e10s content process. Aside from > being in a separate process, they work pretty much the same as normal. Some > tests have been disabled for e10s. If you add a new test and it doesn't work > in e10s mode, you can disable it with the following mochitest.ini gunk: > > [your_test.html] > skip-if = e10s > > We have about 85% of mochitests-plain running right now. I'm hoping to make a > big push to get this number up to 100%, but there are still some prerequisite > bugs that I want to fix first. In the meantime, we can at least identify > regressions in the tests that run. > > Right now, these tests are running on inbound, central, try, fx-team, and > b2g-inbound. In a few days, they'll be running on all trunk trees. If you do > a try push, e10s tests will run iff mochitests-plain run. We don't have a > specific trychooser syntax for them yet. > > The tests are restricted to Linux and Linux64 opt builds right now. > Eventually we'll expand them to debug builds and maybe to other platforms. We > also want to get other test suites running in e10s. As testing ramps up, > we're going to have more and more test suites running e10s side-by-side with > non-e10s. The eventual goal is of course to disable non-e10s tests once we've > shipped an e10s browser. Until then, we'll have to balance resource usage > with test coverage. > > If you want to run in e10s mode locally, it's pretty simple: > > mach mochitest-plain --e10s > > As usual, you can pass in specific tests or directories as well as chunking > options. Debugging in e10s is a little harder. Passing the --debugger=gdb > option will only attach the debugger to the parent process. If you want to > debug the content process, set the environment variable > MOZ_DEBUG_CHILD_PROCESS=1. When the child starts up, it will go to sleep > after printing its PID: > > CHILDCHILDCHILDCHILD > debug me @ > > At that point you can run gdb as follows: > > gdb $OBJDIR/dist/bin/plugin-container > > Then you can set breakpoints in the child and resume it with "continue". or if you know ahead of time you want to debug the child you can set follow-fork-mode to child, and maybe you can use the attach command to attach once the child is running from the debugger the parent is running in, but if I've tried either of those recently I don't remember it. Trev > > Most of the work for this was done by Ted, Armen, Aki, and Mark Hammond. > Thanks guys! > > -Bill > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform signature.asc Description: Digital signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
New e10s tests on tinderbox
Hi everyone, Starting today, we have new mochitests that show up as M-e10s (1 2 3 4 5). These are mochitests-plain running inside an e10s content process. Aside from being in a separate process, they work pretty much the same as normal. Some tests have been disabled for e10s. If you add a new test and it doesn't work in e10s mode, you can disable it with the following mochitest.ini gunk: [your_test.html] skip-if = e10s We have about 85% of mochitests-plain running right now. I'm hoping to make a big push to get this number up to 100%, but there are still some prerequisite bugs that I want to fix first. In the meantime, we can at least identify regressions in the tests that run. Right now, these tests are running on inbound, central, try, fx-team, and b2g-inbound. In a few days, they'll be running on all trunk trees. If you do a try push, e10s tests will run iff mochitests-plain run. We don't have a specific trychooser syntax for them yet. The tests are restricted to Linux and Linux64 opt builds right now. Eventually we'll expand them to debug builds and maybe to other platforms. We also want to get other test suites running in e10s. As testing ramps up, we're going to have more and more test suites running e10s side-by-side with non-e10s. The eventual goal is of course to disable non-e10s tests once we've shipped an e10s browser. Until then, we'll have to balance resource usage with test coverage. If you want to run in e10s mode locally, it's pretty simple: mach mochitest-plain --e10s As usual, you can pass in specific tests or directories as well as chunking options. Debugging in e10s is a little harder. Passing the --debugger=gdb option will only attach the debugger to the parent process. If you want to debug the content process, set the environment variable MOZ_DEBUG_CHILD_PROCESS=1. When the child starts up, it will go to sleep after printing its PID: CHILDCHILDCHILDCHILD debug me @ At that point you can run gdb as follows: gdb $OBJDIR/dist/bin/plugin-container Then you can set breakpoints in the child and resume it with "continue". Most of the work for this was done by Ted, Armen, Aki, and Mark Hammond. Thanks guys! -Bill ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policy for disabling tests which run on TBPL
On 2014-04-08, 3:15 PM, Chris Peterson wrote: On 4/8/14, 11:41 AM, Gavin Sharp wrote: Separately from all of that, we could definitely invest in better tools for dealing with intermittent failures in general. Anecdotally, I know chromium has some nice ways of dealing with them, for example. But I see that a separate discussion not really related to the goals above. Is fixing the known intermittent failures part of the plan? :) Many of the known failures are test timeouts, which suggests some low-hanging fruit in fixing test or network infrastructure problems: http://brasstacks.mozilla.com/orangefactor/ Fixing intermittent timeouts is neither easier or harder than any other kind of intermittent failure. Cheers, Ehsan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policy for disabling tests which run on TBPL
On 4/8/14, 11:41 AM, Gavin Sharp wrote: Separately from all of that, we could definitely invest in better tools for dealing with intermittent failures in general. Anecdotally, I know chromium has some nice ways of dealing with them, for example. But I see that a separate discussion not really related to the goals above. Is fixing the known intermittent failures part of the plan? :) Many of the known failures are test timeouts, which suggests some low-hanging fruit in fixing test or network infrastructure problems: http://brasstacks.mozilla.com/orangefactor/ chris ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: B2G emulator issues
On 4/8/2014 1:05 AM, Thomas Zimmermann wrote: There are tests that instruct the emulator to trigger certain HW events. We can't run them on actual phones. To me, the idea of switching to a x86-based emulator seems to be the most promising solution. What would be necessary? Best regards Thomas We'd need these things: 1 - a consensus we want to move to x86-based emulators, which presumes that architecture-specific problems aren't likely or important enough to warrant continued use of arm-based emulators 2 - RelEng would need to stand up x86-based KitKat emulator builds 3 - The A*Team would need to get all of the tests running against these builds 4 - The A*Team and developers would have to work on fixing the inevitable test failures that occur when standing up any new platform I'll bring this topic up at the next B2G Engineering Meeting. Jonathan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policy for disabling tests which run on TBPL
On Tuesday 2014-04-08 11:41 -0700, Gavin Sharp wrote: > I see only two real goals for the proposed policy: > - ensure that module owners/peers have the opportunity to object to > any "disable test" decisions before they take effect > - set an expectation that intermittent orange failures are dealt with > promptly ("dealt with" first involves investigation, usually by a > developer familiar with the code, and can then lead to either them > being fixed, disabled, or ignored) I'm fine with the initial policy proposed at the top of the thread; this part of the subthread seemed to be about a proposal to auto-retry failing tests and report them as passing if they intermittently pass; that's the bit I'm not comfortable with. -David -- π L. David Baron http://dbaron.org/ π π’ Mozilla https://www.mozilla.org/ π Before I built a wall I'd ask to know What I was walling in or walling out, And to whom I was like to give offense. - Robert Frost, Mending Wall (1914) signature.asc Description: Digital signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policy for disabling tests which run on TBPL
I see only two real goals for the proposed policy: - ensure that module owners/peers have the opportunity to object to any "disable test" decisions before they take effect - set an expectation that intermittent orange failures are dealt with promptly ("dealt with" first involves investigation, usually by a developer familiar with the code, and can then lead to either them being fixed, disabled, or ignored) Neither of those happen reliably today. Sheriffs are failing to get the help they need to investigate failures, which leads loss of (sometimes quite important) test coverage when they decide to unilaterally disable the relevant tests. Sheriffs should not be disabling tests unilaterally; developers should not be ignoring sheriff requests to investigate failures. The policy is not intended to suggest that any particular outcome (i.e. test disabling) is required. Separately from all of that, we could definitely invest in better tools for dealing with intermittent failures in general. Anecdotally, I know chromium has some nice ways of dealing with them, for example. But I see that a separate discussion not really related to the goals above. Gavin On Tue, Apr 8, 2014 at 10:20 AM, L. David Baron wrote: > On Tuesday 2014-04-08 14:51 +0100, James Graham wrote: >> So, what's the minimum level of infrastructure that you think would >> be needed to go ahead with this plan? To me it seems like the >> current system already isn't working very well, so the bar for >> moving forward with a plan that would increase the amount of data we >> had available to diagnose problems with intermittents, and reduce >> the amount of manual labour needed in marking them, should be quite >> low. > > Not sure what plan you're talking about, but: > > The first step I'd like to see is having better tools for finding > where known intermittent failures regressed. In particular, we > should have: > * the ability to retrigger a partial test run (not the whole >suite) on our testing infrastructure. This doesn't always help, >since some failures will happen only in the context of the whole >suite, but I think it's likely to help most of the time. > * auto-bisection tools for intermittent failures that use the above >ability when they can > > I think we're pretty good about backing out changesets that cause > new intermittent failures that happen at ~20% or more failure rates. > We need to get better about backing out for new intermittent > failures that are intermittent at lower rates, and being able to do > that is best done with better tools. > > > (One piece of context I'm coming from: there have been multiple > times that the tests that I consider necessary to have enabled to > allow people to add new CSS properties or values have failed > intermittently at a reasonably high rate for a few months; I think > both the start and end of these periods of failures has, in the > cases where we found it, correlated with major or minor changes to > the Javascript JIT. I think those JIT bugs, if they shipped, were > likely causing real problems for users, and we should be detecting > those bugs rather than disabling our CSS testing coverage and > putting us in a state where we can't add new CSS features.) > > > I also don't think that moving the failure threshold is a long-term > solution. There will always be tests that hover on the edge of > whatever the failure threshold is and give us trouble as a result; I > think moving the threshold will only give temporary relief due to > the history of writing tests to a stricter standard. For example, > if we retry intermittent failures up to 10 times to see if they > pass, we'll end up with tests that fail 75% of the time and thus > fail all 10 retries intermittently (5.6% of the time). > > -David > > -- > π L. David Baron http://dbaron.org/ π > π’ Mozilla https://www.mozilla.org/ π > Before I built a wall I'd ask to know > What I was walling in or walling out, > And to whom I was like to give offense. >- Robert Frost, Mending Wall (1914) > > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to implement requestAutocomplete
On 2014-04-08, at 11:40, Anne van Kesteren wrote: > Related to this, https://www.w3.org/Bugs/Public/show_bug.cgi?id=25235 > is awaiting our input I'm told. Background: > http://lists.w3.org/Archives/Public/public-whatwg-archive/2014Apr/0010.html In the spirit of ocean boiling (i.e., attempting to create a complete breakdown of address components that is internationally portable), I refer you to RFC 5139. That doesnβt use address lines though, and those are commonplace. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to implement requestAutocomplete
On Tue, Apr 8, 2014 at 11:24 AM, Brian Nicholson wrote: > There is currently no formal standard. A link to Chrome's > implementation: > http://www.chromium.org/developers/using-requestautocomplete. Some > discussion of the feature here: > https://groups.google.com/a/chromium.org/forum/#!forum/requestautocomplete. Related to this, https://www.w3.org/Bugs/Public/show_bug.cgi?id=25235 is awaiting our input I'm told. Background: http://lists.w3.org/Archives/Public/public-whatwg-archive/2014Apr/0010.html -- http://annevankesteren.nl/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Intent to implement requestAutocomplete
For the past few weeks, we've been working on requestAutocomplete, a proposed standard for HTML forms that streamlines the checkout flow for websites. Common payment and address form fields are shown in a popup UI native to the browser, so all sites using the API will share a common checkout experience, and previously submitted data can be reused across sites with no autofill guesswork. The main bug tracking this feature is bug 939351. There is currently no formal standard. A link to Chrome's implementation: http://www.chromium.org/developers/using-requestautocomplete. Some discussion of the feature here: https://groups.google.com/a/chromium.org/forum/#!forum/requestautocomplete. The plan is for bug 939351 to implement the backend components (form form submission, validation, and storage). Each platform will require a separate component to implement its form UI. Right now, Android is the only platform with a WIP UI component (bug 946022). For the platform components, I expect this feature to be land by Fx32 behind the dom.requestAutocomplete.enabled pref. On Android, the ETA (with UI) is Fx33. Thanks, Brian ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: B2G emulator issues
>Hi, > >Thanks for bringing up this issue. > >> >> One option (very, very painful, and even slower) would be a proper >> device simulator which simulates both the CPU and the system hardware >> (of *some* B2G phone). This would produce the most realistic result >> with an emulator. > >That is what the emulator is already doing. If we start emulating HW >down to individual CPU cycles, it'll only get slower. :( I think this is wrong in some way. Otherwise I wouldn't see this: 1) running on TBPL (AWS) the internal timings reported show the specific test going from 30 seconds to 450 seconds with the patch. 2) on my local system, the test self-reports ~10 seconds, with or without the patch. The only way I can see that happening is if the simulator in some way exposes the underlying platform performance (in specific timers). Note: the timer in question is nsITimer::TYPE_REPEATING_PRECISE with 10ms timing. And changing it to 100ms makes the tests reliably green. >> Another option (likely not simple) would be to find a way to "slow down >> time" for the emulator, such as intercepting system calls and increasing >> any time constants (multiplying timer values, timeout values to socket >> calls, etc, etc). This may not be simple. For devices (audio, etc), >> frequencies may need modifying or other adjustments made. > >If we do that, writing and debugging tests will take even longer. It shouldn't, if the the system self-adapted (per below). That should give a much more predictable (and closer-to-similar to a real device) result. BTW, I presume we're simulating a single-core ARM, so again not entirely representative anymore. >> We could require that the emulator needs X Bogomips to run, or to run a >> specific test suite. >> >> We could segment out tests that require higher performance and run them >> on faster VMs/etc. > >Do we already know which tests are slow and why? Maybe there are ways to >optimize the emulator. For example, if we execute lots of driver code >within the guest, maybe we can move some of that into the emulator's >binary, where it runs on the native machine. Dunno. But it's REALLY slow. Native linux on tbpl for a specific test: 1s. Local emulator (fast 2year-old desktop) 10s. tbpl before patch 30-40s. after 350-450 and we're lucky it finishes at all. So compared to AWS linux native it's ~30-40x slower without the patch, 300+ x slower with. (Again speaks to realtime stuff leaving no CPU for test running on tbpl.) Others can speak to overall speed. >> We could turn off certain tests on tbpl and run them on separate >> dedicated test machines (a bit similar to PGO). There are downsides to >> this of course. >> >> Lastly, we could put in a bank of HW running B2G to run the tests like >> the Android test boards/phones. > >There are tests that instruct the emulator to trigger certain HW events. >We can't run them on actual phones. Sure. Most don't do that I presume (very few) >To me, the idea of switching to a x86-based emulator seems to be the >most promising solution. What would be necessary? Dunno. -- Randell Jesup, Mozilla Corp remove "news" for personal email ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policy for disabling tests which run on TBPL
On Tuesday 2014-04-08 14:51 +0100, James Graham wrote: > So, what's the minimum level of infrastructure that you think would > be needed to go ahead with this plan? To me it seems like the > current system already isn't working very well, so the bar for > moving forward with a plan that would increase the amount of data we > had available to diagnose problems with intermittents, and reduce > the amount of manual labour needed in marking them, should be quite > low. Not sure what plan you're talking about, but: The first step I'd like to see is having better tools for finding where known intermittent failures regressed. In particular, we should have: * the ability to retrigger a partial test run (not the whole suite) on our testing infrastructure. This doesn't always help, since some failures will happen only in the context of the whole suite, but I think it's likely to help most of the time. * auto-bisection tools for intermittent failures that use the above ability when they can I think we're pretty good about backing out changesets that cause new intermittent failures that happen at ~20% or more failure rates. We need to get better about backing out for new intermittent failures that are intermittent at lower rates, and being able to do that is best done with better tools. (One piece of context I'm coming from: there have been multiple times that the tests that I consider necessary to have enabled to allow people to add new CSS properties or values have failed intermittently at a reasonably high rate for a few months; I think both the start and end of these periods of failures has, in the cases where we found it, correlated with major or minor changes to the Javascript JIT. I think those JIT bugs, if they shipped, were likely causing real problems for users, and we should be detecting those bugs rather than disabling our CSS testing coverage and putting us in a state where we can't add new CSS features.) I also don't think that moving the failure threshold is a long-term solution. There will always be tests that hover on the edge of whatever the failure threshold is and give us trouble as a result; I think moving the threshold will only give temporary relief due to the history of writing tests to a stricter standard. For example, if we retry intermittent failures up to 10 times to see if they pass, we'll end up with tests that fail 75% of the time and thus fail all 10 retries intermittently (5.6% of the time). -David -- π L. David Baron http://dbaron.org/ π π’ Mozilla https://www.mozilla.org/ π Before I built a wall I'd ask to know What I was walling in or walling out, And to whom I was like to give offense. - Robert Frost, Mending Wall (1914) signature.asc Description: Digital signature ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: B2G emulator issues
On 14-04-07 08:49 PM, Ehsan Akhgari wrote: On 2014-04-07, 8:03 PM, Robert O'Callahan wrote: When you say "debug", you mean the emulator is running a FirefoxOS debug build, not that the emulator itself is built debug --- right? Given that, is it a correct summary to say that the problem is that the emulator is just too slow? Applying time dilation might make tests green but we'd be left with the problem of the tests still taking a long time to run. Maybe we should identify a subset of the tests that are more likely to suffer B2G-specific breaking and only run those? Do we disable all compiler optimizations for those debug builds? Can we turn them on, let's say, build with --enable-optimize and --enable-debug which gives us a -O2 optimized debug build? In my experience running tests locally, a single mochitest run on the ARM emulator (hardware: Thinkpad X220, 16GB RAM, SSD) where everything was built with 'B2G_DEBUG=0 B2G_NOOPT=0' will run in 2 to 3 minutes. The same test, run with 'B2G_DEBUG=1 B2G_NOOPT=0' will take 7 to 10 minutes. --m. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Linux testing on single-core VMs nowadays
We do talos testing on in-house machinery (iX machines with 4-core). Not sure if that would trigger some of the issues you are hoping to be caught. In the future, we should be able to have some jobs run on different EC2 instance types. See https://bugzilla.mozilla.org/show_bug.cgi?id=985650 It will require lots of work but it is possible. cheers, Armen On 14-04-08 03:45 AM, ishikawa wrote: > On (2014εΉ΄04ζ08ζ₯ 15:20), Gabriele Svelto wrote: >> On 07/04/2014 23:13, Dave Hylands wrote: >>> Personally, I think that the more ways we can test for threading issues the >>> better. >>> It seems to me that we should do some amount of testing on single core and >>> multi-core. >>> >>> Then I suppose the question becomes how many cores? 2? 4? 8? >>> >>> Maybe we can cycle through some different number of cores so that we get >>> coverage without duplicating everything? >> >> One configuration that is particularly good at catching threading errors >> (especially narrow races) is constraining the software to run on two >> hardware threads on the same SMT-enabled core. This effectively forces >> the threads to share the L1 D$ which in turn can reveal some otherwise >> very-hard-to-find data synchronization issues. >> >> I don't know if we have that level of control on our testing hardware >> but if we do then that's a scenario we might want to include. >> >> Gabriele > > I run thunderbird under valgrind from time to time. > > Valgrind slows down the CPU execution by a very large factor and > it seems to open many windows for thread races. > (Sometimes a very short window is prolonged enough so that events caused by, > say, > I/O can fall inside this prolonged usually short window.) > > During valgrind execution,I have seen errors that were not reported > anywhere, and many have > happened only once :-( > > If VM (such as VirtualBox, VMplayer or something) can artificially > change the execution time of CPU or even different cores slightly (maybe > 1/2, 1/3, 1/4) > I am sure many thread-race issues will be caught. > > I agree that this is a brute-force approach, but please recall that the > first space shuttle launch needed to be > aborted due to software glitch. It was a timing issue and according to the > analysis of the time, > it could happen once in 72 (or was it 74) cases. > Even NASA with a large pocket of money and its subcontractor could not catch > it before launch. > > I am afraid that the situation has not changed much (unless we use a > computer language well suited to > avoid these thread-race issues.) > We need all the help to track down visible and dormant thread-races. > If artificial CPU execution tweaking (by changing the # of cores or even > more advanced tweaking methods if available) can help, it is worth a try. > Maybe not always if such a work cost extra money, but > a prolonged (say a week) testing from time to time (each quarter or half a > year, or > maybe just prior to testing of beta of major release?). > > > TIA > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policy for disabling tests which run on TBPL
On 08/04/14 15:06, Ehsan Akhgari wrote: On 2014-04-08, 9:51 AM, James Graham wrote: On 08/04/14 14:43, Andrew Halberstadt wrote: On 07/04/14 11:49 AM, Aryeh Gregor wrote: On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek wrote: If a bug is causing a test to fail intermittently, then that test loses value. It still has some value in that it can catch regressions that cause it to fail permanently, but we would not be able to catch a regression that causes it to fail intermittently. To some degree, yes, marking a test as expected intermittent causes it to lose value. If the developers who work on the relevant component think the lost value is important enough to track down the cause of the intermittent failure, they can do so. That should be their decision, not something forced on them by infrastructure issues ("everyone else will suffer if you don't find the cause for this failure in your test"). Making known intermittent failures not turn the tree orange doesn't stop anyone from fixing intermittent failures, it just removes pressure from them if they decide they don't want to. If most developers think they have more important bugs to fix, then I don't see a problem with that. I think this proposal would make more sense if the state of our infrastructure and tooling was able to handle it properly. Right now, automatically marking known intermittents would cause the test to lose *all* value. It's sad, but the only data we have about intermittents comes from the sheriffs manually starring them. There is also currently no way to mark a test KNOWN-RANDOM and automatically detect if it starts failing permanently. This means the failures can't be starred and become nearly impossible to discover, let alone diagnose. So, what's the minimum level of infrastructure that you think would be needed to go ahead with this plan? To me it seems like the current system already isn't working very well, so the bar for moving forward with a plan that would increase the amount of data we had available to diagnose problems with intermittents, and reduce the amount of manual labour needed in marking them, should be quite low. dbaron raised the point that there are tests which are supposed to fail intermittently if they detect a bug. With that in mind, the tests cannot be marked as intermittently failing by the sheriffs, less so in an automated way (see the discussion in bug 918921). Such tests are problematic indeed, but it seems like they're problematic in the current infrastructure too. For example if a test goes from always passing to failing 1 time in 10 when it regresses, the first time we see the regression is likely to be around 10 testruns after the problem is introduced. That presumably makes it rather hard to track down what when things went wrong. Or are we running such tests N times where N is some high enough number that we are confident that the test has a 95% (or whatever) chance of failing if there is actually a regression? If not maybe we should be. Or perhaps the idea of independent testruns isn't useful in the face of all the state we have. In any case this kind of test could be explicitly excluded from the reruns, which would make the situation the same as it is today. But to answer your question, I think this is something which can be done in the test harness itself so we don't need any special infra support for it. Note that I don't think that automatically marking such tests is a good idea either way. The infra support I had in mind was something like "automatically (doing something like) starring tests that only passed after being rerun" or "listing all tests that needed a rerun" or "having a tool to find the first build in which the test became intermittent". The goal of this extra infrastructure would be to get the new information about reruns out of the testharness and address the concern that doing automated reruns would mean people paying even less attention to intermittents than they do today. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policy for disabling tests which run on TBPL
On 2014-04-08, 8:15 AM, Aryeh Gregor wrote: On Tue, Apr 8, 2014 at 2:41 AM, Ehsan Akhgari wrote: What you're saying above is true *if* someone investigates the intermittent test failure and determines that the bug is not important. But in my experience, that's not what happens at all. I think many people treat intermittent test failures as a category of unimportant problems, and therefore some bugs are never investigated. The fact of the matter is that most of these bugs are bugs in our tests, which of course will not impact our users directly, but I have occasionally come across bugs in our code code which are exposed as intermittent failures. The real issue is that the work of identifying where the root of the problem is often time is the majority of work needed to fix the intermittent test failure, so unless someone is willing to investigate the bug we cannot say whether or not it impacts our users. The same is true for many bugs. The reported symptom might indicate a much more extensive underlying problem. The fact is, though, thoroughly investigating every bug would take a ton of resources, and is almost certainly not the best use of our manpower. There are many bugs that are *known* to affect many users that don't get fixed in a timely fashion. Things that probably won't affect a single user ever at all, and which are likely to be a pain to track down (because they're intermittent), should be prioritized relatively low. I don't think that an analogy with normal bugs is accurate here. These intermittent failure bugs are categorically treated differently than all other incoming bugs in my experience. The thing that really makes me care about these intermittent failures a lot is that ultimately they make us have to trade either disabling a whole bunch of tests with being unable to manage our tree. As more and more tests get disabled, we lose more and more test coverage, and that can have a much more severe impact on the health of our products than every individual intermittent test failure. I think you hit the nail on the head, but I think there's a third solution: automatically ignore known intermittent failures, in as fine-grained a way as possible. This means the test is still almost as useful -- I think the vast majority of our tests will fail consistently if the thing they're testing breaks, not fail intermittently. But it doesn't get in the way of managing the tree. Yes, it reduces some tests' value slightly relative to fixing them, but it's not a good use of our resources to try tracking down most intermittent failures. The status quo reduces those tests' value just as much as automatic ignoring (because people will star known failure patterns consistently), but imposes a large manual labor cost. I agree that automatically ignoring known intermittent failures that have been marked as such by a human is a good idea. But let's also not forget that it won't be a one size fits all solution. There are test failure scenarios such as timeouts and crashes which we can't easily retry (for timeouts because the test may leave its environment in a non-clean state). There will also be cases where reloading the same test will actually test different things (because, for example, things have been cached, etc.). Cheers, Ehsan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policy for disabling tests which run on TBPL
On 2014-04-08, 9:51 AM, James Graham wrote: On 08/04/14 14:43, Andrew Halberstadt wrote: On 07/04/14 11:49 AM, Aryeh Gregor wrote: On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek wrote: If a bug is causing a test to fail intermittently, then that test loses value. It still has some value in that it can catch regressions that cause it to fail permanently, but we would not be able to catch a regression that causes it to fail intermittently. To some degree, yes, marking a test as expected intermittent causes it to lose value. If the developers who work on the relevant component think the lost value is important enough to track down the cause of the intermittent failure, they can do so. That should be their decision, not something forced on them by infrastructure issues ("everyone else will suffer if you don't find the cause for this failure in your test"). Making known intermittent failures not turn the tree orange doesn't stop anyone from fixing intermittent failures, it just removes pressure from them if they decide they don't want to. If most developers think they have more important bugs to fix, then I don't see a problem with that. I think this proposal would make more sense if the state of our infrastructure and tooling was able to handle it properly. Right now, automatically marking known intermittents would cause the test to lose *all* value. It's sad, but the only data we have about intermittents comes from the sheriffs manually starring them. There is also currently no way to mark a test KNOWN-RANDOM and automatically detect if it starts failing permanently. This means the failures can't be starred and become nearly impossible to discover, let alone diagnose. So, what's the minimum level of infrastructure that you think would be needed to go ahead with this plan? To me it seems like the current system already isn't working very well, so the bar for moving forward with a plan that would increase the amount of data we had available to diagnose problems with intermittents, and reduce the amount of manual labour needed in marking them, should be quite low. dbaron raised the point that there are tests which are supposed to fail intermittently if they detect a bug. With that in mind, the tests cannot be marked as intermittently failing by the sheriffs, less so in an automated way (see the discussion in bug 918921). I'm still not convinced that this idea will be worse than the status quo, but the fact that dbaron doesn't agree makes me hesitate. But to answer your question, I think this is something which can be done in the test harness itself so we don't need any special infra support for it. Note that I don't think that automatically marking such tests is a good idea either way. Cheers, Ehsan ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policy for disabling tests which run on TBPL
On 08/04/14 14:43, Andrew Halberstadt wrote: On 07/04/14 11:49 AM, Aryeh Gregor wrote: On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek wrote: If a bug is causing a test to fail intermittently, then that test loses value. It still has some value in that it can catch regressions that cause it to fail permanently, but we would not be able to catch a regression that causes it to fail intermittently. To some degree, yes, marking a test as expected intermittent causes it to lose value. If the developers who work on the relevant component think the lost value is important enough to track down the cause of the intermittent failure, they can do so. That should be their decision, not something forced on them by infrastructure issues ("everyone else will suffer if you don't find the cause for this failure in your test"). Making known intermittent failures not turn the tree orange doesn't stop anyone from fixing intermittent failures, it just removes pressure from them if they decide they don't want to. If most developers think they have more important bugs to fix, then I don't see a problem with that. I think this proposal would make more sense if the state of our infrastructure and tooling was able to handle it properly. Right now, automatically marking known intermittents would cause the test to lose *all* value. It's sad, but the only data we have about intermittents comes from the sheriffs manually starring them. There is also currently no way to mark a test KNOWN-RANDOM and automatically detect if it starts failing permanently. This means the failures can't be starred and become nearly impossible to discover, let alone diagnose. So, what's the minimum level of infrastructure that you think would be needed to go ahead with this plan? To me it seems like the current system already isn't working very well, so the bar for moving forward with a plan that would increase the amount of data we had available to diagnose problems with intermittents, and reduce the amount of manual labour needed in marking them, should be quite low. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policy for disabling tests which run on TBPL
On 07/04/14 11:49 AM, Aryeh Gregor wrote: On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek wrote: If a bug is causing a test to fail intermittently, then that test loses value. It still has some value in that it can catch regressions that cause it to fail permanently, but we would not be able to catch a regression that causes it to fail intermittently. To some degree, yes, marking a test as expected intermittent causes it to lose value. If the developers who work on the relevant component think the lost value is important enough to track down the cause of the intermittent failure, they can do so. That should be their decision, not something forced on them by infrastructure issues ("everyone else will suffer if you don't find the cause for this failure in your test"). Making known intermittent failures not turn the tree orange doesn't stop anyone from fixing intermittent failures, it just removes pressure from them if they decide they don't want to. If most developers think they have more important bugs to fix, then I don't see a problem with that. I think this proposal would make more sense if the state of our infrastructure and tooling was able to handle it properly. Right now, automatically marking known intermittents would cause the test to lose *all* value. It's sad, but the only data we have about intermittents comes from the sheriffs manually starring them. There is also currently no way to mark a test KNOWN-RANDOM and automatically detect if it starts failing permanently. This means the failures can't be starred and become nearly impossible to discover, let alone diagnose. As I mentioned in another post in this thread, we need better data and easier ways to drill through it. All I'm saying here is that I think things are probably worse than you picture them, and I think there is a lot of groundwork needed before it even makes sense to consider this. -Andrew ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Policy for disabling tests which run on TBPL
On Tue, Apr 8, 2014 at 2:41 AM, Ehsan Akhgari wrote: > What you're saying above is true *if* someone investigates the intermittent > test failure and determines that the bug is not important. But in my > experience, that's not what happens at all. I think many people treat > intermittent test failures as a category of unimportant problems, and > therefore some bugs are never investigated. The fact of the matter is that > most of these bugs are bugs in our tests, which of course will not impact > our users directly, but I have occasionally come across bugs in our code > code which are exposed as intermittent failures. The real issue is that the > work of identifying where the root of the problem is often time is the > majority of work needed to fix the intermittent test failure, so unless > someone is willing to investigate the bug we cannot say whether or not it > impacts our users. The same is true for many bugs. The reported symptom might indicate a much more extensive underlying problem. The fact is, though, thoroughly investigating every bug would take a ton of resources, and is almost certainly not the best use of our manpower. There are many bugs that are *known* to affect many users that don't get fixed in a timely fashion. Things that probably won't affect a single user ever at all, and which are likely to be a pain to track down (because they're intermittent), should be prioritized relatively low. > The thing that really makes me care about these intermittent failures a lot > is that ultimately they make us have to trade either disabling a whole bunch > of tests with being unable to manage our tree. As more and more tests get > disabled, we lose more and more test coverage, and that can have a much more > severe impact on the health of our products than every individual > intermittent test failure. I think you hit the nail on the head, but I think there's a third solution: automatically ignore known intermittent failures, in as fine-grained a way as possible. This means the test is still almost as useful -- I think the vast majority of our tests will fail consistently if the thing they're testing breaks, not fail intermittently. But it doesn't get in the way of managing the tree. Yes, it reduces some tests' value slightly relative to fixing them, but it's not a good use of our resources to try tracking down most intermittent failures. The status quo reduces those tests' value just as much as automatic ignoring (because people will star known failure patterns consistently), but imposes a large manual labor cost. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: B2G emulator issues
Hi, Thanks for bringing up this issue. > > One option (very, very painful, and even slower) would be a proper > device simulator which simulates both the CPU and the system hardware > (of *some* B2G phone). This would produce the most realistic result > with an emulator. That is what the emulator is already doing. If we start emulating HW down to individual CPU cycles, it'll only get slower. :( > Another option (likely not simple) would be to find a way to "slow down > time" for the emulator, such as intercepting system calls and increasing > any time constants (multiplying timer values, timeout values to socket > calls, etc, etc). This may not be simple. For devices (audio, etc), > frequencies may need modifying or other adjustments made. If we do that, writing and debugging tests will take even longer. > We could require that the emulator needs X Bogomips to run, or to run a > specific test suite. > > We could segment out tests that require higher performance and run them > on faster VMs/etc. Do we already know which tests are slow and why? Maybe there are ways to optimize the emulator. For example, if we execute lots of driver code within the guest, maybe we can move some of that into the emulator's binary, where it runs on the native machine. > > We could turn off certain tests on tbpl and run them on separate > dedicated test machines (a bit similar to PGO). There are downsides to > this of course. > > Lastly, we could put in a bank of HW running B2G to run the tests like > the Android test boards/phones. There are tests that instruct the emulator to trigger certain HW events. We can't run them on actual phones. To me, the idea of switching to a x86-based emulator seems to be the most promising solution. What would be necessary? Best regards Thomas > > > So, what do we do? Because if we do nothing, it will only get worse. > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Linux testing on single-core VMs nowadays
On (2014εΉ΄04ζ08ζ₯ 15:20), Gabriele Svelto wrote: > On 07/04/2014 23:13, Dave Hylands wrote: >> Personally, I think that the more ways we can test for threading issues the >> better. >> It seems to me that we should do some amount of testing on single core and >> multi-core. >> >> Then I suppose the question becomes how many cores? 2? 4? 8? >> >> Maybe we can cycle through some different number of cores so that we get >> coverage without duplicating everything? > > One configuration that is particularly good at catching threading errors > (especially narrow races) is constraining the software to run on two > hardware threads on the same SMT-enabled core. This effectively forces > the threads to share the L1 D$ which in turn can reveal some otherwise > very-hard-to-find data synchronization issues. > > I don't know if we have that level of control on our testing hardware > but if we do then that's a scenario we might want to include. > > Gabriele I run thunderbird under valgrind from time to time. Valgrind slows down the CPU execution by a very large factor and it seems to open many windows for thread races. (Sometimes a very short window is prolonged enough so that events caused by, say, I/O can fall inside this prolonged usually short window.) During valgrind execution,I have seen errors that were not reported anywhere, and many have happened only once :-( If VM (such as VirtualBox, VMplayer or something) can artificially change the execution time of CPU or even different cores slightly (maybe 1/2, 1/3, 1/4) I am sure many thread-race issues will be caught. I agree that this is a brute-force approach, but please recall that the first space shuttle launch needed to be aborted due to software glitch. It was a timing issue and according to the analysis of the time, it could happen once in 72 (or was it 74) cases. Even NASA with a large pocket of money and its subcontractor could not catch it before launch. I am afraid that the situation has not changed much (unless we use a computer language well suited to avoid these thread-race issues.) We need all the help to track down visible and dormant thread-races. If artificial CPU execution tweaking (by changing the # of cores or even more advanced tweaking methods if available) can help, it is worth a try. Maybe not always if such a work cost extra money, but a prolonged (say a week) testing from time to time (each quarter or half a year, or maybe just prior to testing of beta of major release?). TIA ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform