date:20140408

Re: New e10s tests on tinderbox

2014-04-08 Thread Kyle Huey

Not yet, because M-e10s is only running on Linux opt, and these test_IPC
tests run everywhere in opt and debug.

- Kyle
On Apr 8, 2014 6:58 PM, "Shih-Chiang Chien"  wrote:

> Hi Bill,
>
> Many thanks for working on the M-e10s. Does it means we can remove all
> these “test_ipc.html” mochitests? AFAIK these test cases are manually
> emulating an e10s environment with some hacks.
>
> Here is the list of test_ipc.html:
>
> http://dxr.mozilla.org/mozilla-central/source/content/media/webspeech/synth/ipc/test/test_ipc.html
>
> http://dxr.mozilla.org/mozilla-central/source/dom/devicestorage/ipc/test_ipc.html
>
> http://dxr.mozilla.org/mozilla-central/source/dom/indexedDB/ipc/test_ipc.html
>
> http://dxr.mozilla.org/mozilla-central/source/dom/media/tests/ipc/test_ipc.html
>
> Best Regards,
> Shih-Chiang Chien
> Mozilla Taiwan
>
> On Apr 9, 2014, at 5:28 AM, Bill McCloskey  wrote:
>
> > Hi everyone,
> >
> > Starting today, we have new mochitests that show up as M-e10s (1 2 3 4
> 5). These are mochitests-plain running inside an e10s content process.
> Aside from being in a separate process, they work pretty much the same as
> normal. Some tests have been disabled for e10s. If you add a new test and
> it doesn't work in e10s mode, you can disable it with the following
> mochitest.ini gunk:
> >
> > [your_test.html]
> > skip-if = e10s
> >
> > We have about 85% of mochitests-plain running right now. I'm hoping to
> make a big push to get this number up to 100%, but there are still some
> prerequisite bugs that I want to fix first. In the meantime, we can at
> least identify regressions in the tests that run.
> >
> > Right now, these tests are running on inbound, central, try, fx-team,
> and b2g-inbound. In a few days, they'll be running on all trunk trees. If
> you do a try push, e10s tests will run iff mochitests-plain run. We don't
> have a specific trychooser syntax for them yet.
> >
> > The tests are restricted to Linux and Linux64 opt builds right now.
> Eventually we'll expand them to debug builds and maybe to other platforms.
> We also want to get other test suites running in e10s. As testing ramps up,
> we're going to have more and more test suites running e10s side-by-side
> with non-e10s. The eventual goal is of course to disable non-e10s tests
> once we've shipped an e10s browser. Until then, we'll have to balance
> resource usage with test coverage.
> >
> > If you want to run in e10s mode locally, it's pretty simple:
> >
> > mach mochitest-plain --e10s
> >
> > As usual, you can pass in specific tests or directories as well as
> chunking options. Debugging in e10s is a little harder. Passing the
> --debugger=gdb option will only attach the debugger to the parent process.
> If you want to debug the content process, set the environment variable
> MOZ_DEBUG_CHILD_PROCESS=1. When the child starts up, it will go to sleep
> after printing its PID:
> >
> > CHILDCHILDCHILDCHILD
> >  debug me @ 
> >
> > At that point you can run gdb as follows:
> >
> > gdb $OBJDIR/dist/bin/plugin-container 
> >
> > Then you can set breakpoints in the child and resume it with "continue".
> >
> > Most of the work for this was done by Ted, Armen, Aki, and Mark Hammond.
> Thanks guys!
> >
> > -Bill
> > ___
> > dev-platform mailing list
> > dev-platform@lists.mozilla.org
> > https://lists.mozilla.org/listinfo/dev-platform
>
>
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>
>
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: New e10s tests on tinderbox

2014-04-08 Thread Shih-Chiang Chien

Hi Bill,

Many thanks for working on the M-e10s. Does it means we can remove all these 
“test_ipc.html” mochitests? AFAIK these test cases are manually emulating an 
e10s environment with some hacks.

Here is the list of test_ipc.html:
http://dxr.mozilla.org/mozilla-central/source/content/media/webspeech/synth/ipc/test/test_ipc.html
http://dxr.mozilla.org/mozilla-central/source/dom/devicestorage/ipc/test_ipc.html
http://dxr.mozilla.org/mozilla-central/source/dom/indexedDB/ipc/test_ipc.html
http://dxr.mozilla.org/mozilla-central/source/dom/media/tests/ipc/test_ipc.html

Best Regards,
Shih-Chiang Chien
Mozilla Taiwan

On Apr 9, 2014, at 5:28 AM, Bill McCloskey  wrote:

> Hi everyone,
> 
> Starting today, we have new mochitests that show up as M-e10s (1 2 3 4 5). 
> These are mochitests-plain running inside an e10s content process. Aside from 
> being in a separate process, they work pretty much the same as normal. Some 
> tests have been disabled for e10s. If you add a new test and it doesn't work 
> in e10s mode, you can disable it with the following mochitest.ini gunk:
> 
> [your_test.html]
> skip-if = e10s
> 
> We have about 85% of mochitests-plain running right now. I'm hoping to make a 
> big push to get this number up to 100%, but there are still some prerequisite 
> bugs that I want to fix first. In the meantime, we can at least identify 
> regressions in the tests that run.
> 
> Right now, these tests are running on inbound, central, try, fx-team, and 
> b2g-inbound. In a few days, they'll be running on all trunk trees. If you do 
> a try push, e10s tests will run iff mochitests-plain run. We don't have a 
> specific trychooser syntax for them yet.
> 
> The tests are restricted to Linux and Linux64 opt builds right now. 
> Eventually we'll expand them to debug builds and maybe to other platforms. We 
> also want to get other test suites running in e10s. As testing ramps up, 
> we're going to have more and more test suites running e10s side-by-side with 
> non-e10s. The eventual goal is of course to disable non-e10s tests once we've 
> shipped an e10s browser. Until then, we'll have to balance resource usage 
> with test coverage.
> 
> If you want to run in e10s mode locally, it's pretty simple:
> 
> mach mochitest-plain --e10s
> 
> As usual, you can pass in specific tests or directories as well as chunking 
> options. Debugging in e10s is a little harder. Passing the --debugger=gdb 
> option will only attach the debugger to the parent process. If you want to 
> debug the content process, set the environment variable 
> MOZ_DEBUG_CHILD_PROCESS=1. When the child starts up, it will go to sleep 
> after printing its PID:
> 
> CHILDCHILDCHILDCHILD
>  debug me @ 
> 
> At that point you can run gdb as follows:
> 
> gdb $OBJDIR/dist/bin/plugin-container 
> 
> Then you can set breakpoints in the child and resume it with "continue".
> 
> Most of the work for this was done by Ted, Armen, Aki, and Mark Hammond. 
> Thanks guys!
> 
> -Bill
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Removing 'jit-tests' from make check: 15% speedup

2014-04-08 Thread Taras Glek

Thanks Dan. This looks to be contributing roughly half to our 30-45% 
build speedup on Windows this month.


Daniel Minor wrote:

Hello,

Just a heads up that very soon we'll be removing jit-tests from the "make check" target[1]. The 
tests have been split out into a separate test job on TBPL[2] (labelled Jit), have been running on Cedar for 
several months, and have been recently turned on for other trees. We've added a mach command-- "mach 
jittest" that runs the tests with the same arguments that "make check" currently does.

Along with the cpp unit tests that were removed back in January, the jit-tests are a 
substantial portion of "make check" execution time. Their removal will speed up 
build time and allow them to be re-triggered independently in the event of failures.

If you encounter any issues please feel free to file a bug.

Regards,

Dan

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=988532
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=858621

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: New e10s tests on tinderbox

2014-04-08 Thread Blake Kaplan

Bill McCloskey  wrote:
> Starting today, we have new mochitests that show up as M-e10s (1 2 3 4 5). 
> These are mochitests-plain running inside an e10s content process. Aside from 
> being in a separate process, they work pretty much the same as normal. Some 
> tests have been disabled for e10s. If you add a new test and it doesn't work 
> in e10s mode, you can disable it with the following mochitest.ini gunk:

This is great! Thanks for driving this!
-- 
Blake Kaplan
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread Karl Tomlinson

Aryeh Gregor writes:

> On Tue, Apr 8, 2014 at 2:41 AM, Ehsan Akhgari  wrote:
>> What you're saying above is true *if* someone investigates the
>> intermittent test failure and determines that the bug is not
>> important.  But in my experience, that's not what happens at
>> all.  I think many people treat intermittent test failures as a
>> category of unimportant problems, and therefore some bugs are
>> never investigated.  The fact of the matter is that most of
>> these bugs are bugs in our tests, which of course will not
>> impact our users directly, but I have occasionally come across
>> bugs in our code code which are exposed as intermittent
>> failures.  The real issue is that the work of identifying where
>> the root of the problem is often time is the majority of work
>> needed to fix the intermittent test failure, so unless someone
>> is willing to investigate the bug we cannot say whether or not
>> it impacts our users.
>
> The same is true for many bugs.  The reported symptom might
> indicate a much more extensive underlying problem.  The fact is,
> though, thoroughly investigating every bug would take a ton of
> resources, and is almost certainly not the best use of our
> manpower.  There are many bugs that are *known* to affect many
> users that don't get fixed in a timely fashion.  Things that
> probably won't affect a single user ever at all, and which are
> likely to be a pain to track down (because they're
> intermittent), should be prioritized relatively low.

New intermittent failures are different from many user reported
bugs because they are known to be a regression and there is some
kind of indication of the regression window.

Regressions should be high priority.  People are getting by
without many new features but people have begun to depend on
existing features, so regressions break real sites and cause
confusion for many people.

The time to address regressions is ASAP, so that responsibility
can be handed over to the person causing the regression.  Waiting
too long means that backing out the cause of the regression is
likely to cause another regression.

I wonder whether the real problem here is that we have too many
bad tests that report false negatives, and these bad tests are
reducing the value of our testsuite in general.  Tests also need
to be well documented so that people can understand what a
negative report really means.  This is probably what is leading to
assumptions that disabling a test is the solution to a new
failure.

Getting bugs on file and seen by the right people is an important
part of dealing with this.  The tricky part is working out how to
prioritize and cope with these bugs.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: New e10s tests on tinderbox

2014-04-08 Thread Bill McCloskey

- Original Message -
> From: "Bobby Holley" 
> To: "Bill McCloskey" 
> Cc: "dev-platform" 
> Sent: Tuesday, April 8, 2014 2:35:26 PM
> Subject: Re: New e10s tests on tinderbox
> 
> Can you elaborate on the kinds of things that make tests fail on e10s? I
> have some idea in my head of what they might be, but I don't know how
> accurate it is with all the Black Magic we do these days.

There isn't really any black magic in mochitests-plain. That's why I started 
with this suite first :-). The most common causes of failures that I've seen:

1. Sometimes code just wasn't designed for e10s and it will assert if we try to 
use it. Bug 989139 is an example. There, we assert any time we encounter the 
 tag.

2. Opening a new window will open a new tab in e10s. This is bug 989501. It's 
usually harmless, but sometimes it causes tests to get unexpected values for 
the window size or position.

3. Tests that use plugins don't work because e10s has no way to find the test 
plugin. That will eventually be fixed by bug 874016.

Also, one thing I forgot to point out is that testing on e10s is fairly close 
to testing on b2g, so I think b2g will benefit from the work we do in trying to 
fix some of these issues. For example, the  issue happens on b2g too 
(although admittedly it's not the most important issue).

-Bill
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: New e10s tests on tinderbox

2014-04-08 Thread Bill McCloskey

> Most of the work for this was done by Ted, Armen, Aki, and Mark Hammond.
> Thanks guys!

And RyanVM! I knew I'd forget someone. Sorry.

-Bill
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: B2G emulator issues

2014-04-08 Thread Karl Tomlinson

Randell Jesup writes:

> 1) running on TBPL (AWS) the internal timings reported show the specific
>test going from 30 seconds to 450 seconds with the patch.
> 2) on my local system, the test self-reports ~10 seconds, with or
>without the patch.

> Note: the timer in question is nsITimer::TYPE_REPEATING_PRECISE with
> 10ms timing.  And changing it to 100ms makes the tests reliably green.

Do you know how many simultaneous hardware threads are emulated?

Is it possible that the thread using TYPE_REPEATING_PRECISE has a
high priority, and so it would occupy the single hardware thread
when there is no spare time available for anything else?

The time taken for the test run might depend on the "anything
else" running.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: New e10s tests on tinderbox

2014-04-08 Thread Bobby Holley

This is awesome! Great job getting us this far.

On Tue, Apr 8, 2014 at 2:28 PM, Bill McCloskey wrote:

> We have about 85% of mochitests-plain running right now.


Can you elaborate on the kinds of things that make tests fail on e10s? I
have some idea in my head of what they might be, but I don't know how
accurate it is with all the Black Magic we do these days.

bholley
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: New e10s tests on tinderbox

2014-04-08 Thread Trevor Saunders

On Tue, Apr 08, 2014 at 02:28:02PM -0700, Bill McCloskey wrote:
> Hi everyone,
> 
> Starting today, we have new mochitests that show up as M-e10s (1 2 3 4 5). 
> These are mochitests-plain running inside an e10s content process. Aside from 
> being in a separate process, they work pretty much the same as normal. Some 
> tests have been disabled for e10s. If you add a new test and it doesn't work 
> in e10s mode, you can disable it with the following mochitest.ini gunk:
> 
> [your_test.html]
> skip-if = e10s
> 
> We have about 85% of mochitests-plain running right now. I'm hoping to make a 
> big push to get this number up to 100%, but there are still some prerequisite 
> bugs that I want to fix first. In the meantime, we can at least identify 
> regressions in the tests that run.
> 
> Right now, these tests are running on inbound, central, try, fx-team, and 
> b2g-inbound. In a few days, they'll be running on all trunk trees. If you do 
> a try push, e10s tests will run iff mochitests-plain run. We don't have a 
> specific trychooser syntax for them yet.
> 
> The tests are restricted to Linux and Linux64 opt builds right now. 
> Eventually we'll expand them to debug builds and maybe to other platforms. We 
> also want to get other test suites running in e10s. As testing ramps up, 
> we're going to have more and more test suites running e10s side-by-side with 
> non-e10s. The eventual goal is of course to disable non-e10s tests once we've 
> shipped an e10s browser. Until then, we'll have to balance resource usage 
> with test coverage.
> 
> If you want to run in e10s mode locally, it's pretty simple:
> 
> mach mochitest-plain --e10s
> 
> As usual, you can pass in specific tests or directories as well as chunking 
> options. Debugging in e10s is a little harder. Passing the --debugger=gdb 
> option will only attach the debugger to the parent process. If you want to 
> debug the content process, set the environment variable 
> MOZ_DEBUG_CHILD_PROCESS=1. When the child starts up, it will go to sleep 
> after printing its PID:
> 
> CHILDCHILDCHILDCHILD
>   debug me @ 
> 
> At that point you can run gdb as follows:
> 
> gdb $OBJDIR/dist/bin/plugin-container 
> 
> Then you can set breakpoints in the child and resume it with "continue".

or if you know ahead of time you want to debug the child you can set
follow-fork-mode to child, and maybe you can use the attach command to
attach once the child is running from the debugger the parent is running
in, but if I've tried either of those recently I don't remember it.

Trev

> 
> Most of the work for this was done by Ted, Armen, Aki, and Mark Hammond. 
> Thanks guys!
> 
> -Bill
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform


signature.asc
Description: Digital signature
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

New e10s tests on tinderbox

2014-04-08 Thread Bill McCloskey

Hi everyone,

Starting today, we have new mochitests that show up as M-e10s (1 2 3 4 5). 
These are mochitests-plain running inside an e10s content process. Aside from 
being in a separate process, they work pretty much the same as normal. Some 
tests have been disabled for e10s. If you add a new test and it doesn't work in 
e10s mode, you can disable it with the following mochitest.ini gunk:

[your_test.html]
skip-if = e10s

We have about 85% of mochitests-plain running right now. I'm hoping to make a 
big push to get this number up to 100%, but there are still some prerequisite 
bugs that I want to fix first. In the meantime, we can at least identify 
regressions in the tests that run.

Right now, these tests are running on inbound, central, try, fx-team, and 
b2g-inbound. In a few days, they'll be running on all trunk trees. If you do a 
try push, e10s tests will run iff mochitests-plain run. We don't have a 
specific trychooser syntax for them yet.

The tests are restricted to Linux and Linux64 opt builds right now. Eventually 
we'll expand them to debug builds and maybe to other platforms. We also want to 
get other test suites running in e10s. As testing ramps up, we're going to have 
more and more test suites running e10s side-by-side with non-e10s. The eventual 
goal is of course to disable non-e10s tests once we've shipped an e10s browser. 
Until then, we'll have to balance resource usage with test coverage.

If you want to run in e10s mode locally, it's pretty simple:

mach mochitest-plain --e10s

As usual, you can pass in specific tests or directories as well as chunking 
options. Debugging in e10s is a little harder. Passing the --debugger=gdb 
option will only attach the debugger to the parent process. If you want to 
debug the content process, set the environment variable 
MOZ_DEBUG_CHILD_PROCESS=1. When the child starts up, it will go to sleep after 
printing its PID:

CHILDCHILDCHILDCHILD
  debug me @ 

At that point you can run gdb as follows:

gdb $OBJDIR/dist/bin/plugin-container 

Then you can set breakpoints in the child and resume it with "continue".

Most of the work for this was done by Ted, Armen, Aki, and Mark Hammond. Thanks 
guys!

-Bill
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread Ehsan Akhgari


On 2014-04-08, 3:15 PM, Chris Peterson wrote:

On 4/8/14, 11:41 AM, Gavin Sharp wrote:

Separately from all of that, we could definitely invest in better
tools for dealing with intermittent failures in general. Anecdotally,
I know chromium has some nice ways of dealing with them, for example.
But I see that a separate discussion not really related to the goals
above.


Is fixing the known intermittent failures part of the plan? :)

Many of the known failures are test timeouts, which suggests some
low-hanging fruit in fixing test or network infrastructure problems:

http://brasstacks.mozilla.com/orangefactor/


Fixing intermittent timeouts is neither easier or harder than any other 
kind of intermittent failure.


Cheers,
Ehsan

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread Chris Peterson


On 4/8/14, 11:41 AM, Gavin Sharp wrote:

Separately from all of that, we could definitely invest in better
tools for dealing with intermittent failures in general. Anecdotally,
I know chromium has some nice ways of dealing with them, for example.
But I see that a separate discussion not really related to the goals
above.


Is fixing the known intermittent failures part of the plan? :)

Many of the known failures are test timeouts, which suggests some 
low-hanging fruit in fixing test or network infrastructure problems:


http://brasstacks.mozilla.com/orangefactor/

chris
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: B2G emulator issues

2014-04-08 Thread Jonathan Griffin



On 4/8/2014 1:05 AM, Thomas Zimmermann wrote:

There are tests that instruct the emulator to trigger certain HW events.
We can't run them on actual phones.

To me, the idea of switching to a x86-based emulator seems to be the
most promising solution. What would be necessary?

Best regards
Thomas


We'd need these things:

1 - a consensus we want to move to x86-based emulators, which presumes 
that architecture-specific problems aren't likely or important enough to 
warrant continued use of arm-based emulators


2 - RelEng would need to stand up x86-based KitKat emulator builds

3 - The A*Team would need to get all of the tests running against these 
builds


4 - The A*Team and developers would have to work on fixing the 
inevitable test failures that occur when standing up any new platform


I'll bring this topic up at the next B2G Engineering Meeting.

Jonathan

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread L. David Baron

On Tuesday 2014-04-08 11:41 -0700, Gavin Sharp wrote:
> I see only two real goals for the proposed policy:
> - ensure that module owners/peers have the opportunity to object to
> any "disable test" decisions before they take effect
> - set an expectation that intermittent orange failures are dealt with
> promptly ("dealt with" first involves investigation, usually by a
> developer familiar with the code, and can then lead to either them
> being fixed, disabled, or ignored)

I'm fine with the initial policy proposed at the top of the thread;
this part of the subthread seemed to be about a proposal to
auto-retry failing tests and report them as passing if they
intermittently pass; that's the bit I'm not comfortable with.

-David

-- 
𝄞   L. David Baron http://dbaron.org/   𝄂
𝄢   Mozilla  https://www.mozilla.org/   𝄂
 Before I built a wall I'd ask to know
 What I was walling in or walling out,
 And to whom I was like to give offense.
   - Robert Frost, Mending Wall (1914)


signature.asc
Description: Digital signature
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread Gavin Sharp

I see only two real goals for the proposed policy:
- ensure that module owners/peers have the opportunity to object to
any "disable test" decisions before they take effect
- set an expectation that intermittent orange failures are dealt with
promptly ("dealt with" first involves investigation, usually by a
developer familiar with the code, and can then lead to either them
being fixed, disabled, or ignored)

Neither of those happen reliably today. Sheriffs are failing to get
the help they need to investigate failures, which leads loss of
(sometimes quite important) test coverage when they decide to
unilaterally disable the relevant tests. Sheriffs should not be
disabling tests unilaterally; developers should not be ignoring
sheriff requests to investigate failures.

The policy is not intended to suggest that any particular outcome
(i.e. test disabling) is required.

Separately from all of that, we could definitely invest in better
tools for dealing with intermittent failures in general. Anecdotally,
I know chromium has some nice ways of dealing with them, for example.
But I see that a separate discussion not really related to the goals
above.

Gavin

On Tue, Apr 8, 2014 at 10:20 AM, L. David Baron  wrote:
> On Tuesday 2014-04-08 14:51 +0100, James Graham wrote:
>> So, what's the minimum level of infrastructure that you think would
>> be needed to go ahead with this plan? To me it seems like the
>> current system already isn't working very well, so the bar for
>> moving forward with a plan that would increase the amount of data we
>> had available to diagnose problems with intermittents, and reduce
>> the amount of manual labour needed in marking them, should be quite
>> low.
>
> Not sure what plan you're talking about, but:
>
> The first step I'd like to see is having better tools for finding
> where known intermittent failures regressed.  In particular, we
> should have:
>  * the ability to retrigger a partial test run (not the whole
>suite) on our testing infrastructure.  This doesn't always help,
>since some failures will happen only in the context of the whole
>suite, but I think it's likely to help most of the time.
>  * auto-bisection tools for intermittent failures that use the above
>ability when they can
>
> I think we're pretty good about backing out changesets that cause
> new intermittent failures that happen at ~20% or more failure rates.
> We need to get better about backing out for new intermittent
> failures that are intermittent at lower rates, and being able to do
> that is best done with better tools.
>
>
> (One piece of context I'm coming from:  there have been multiple
> times that the tests that I consider necessary to have enabled to
> allow people to add new CSS properties or values have failed
> intermittently at a reasonably high rate for a few months; I think
> both the start and end of these periods of failures has, in the
> cases where we found it, correlated with major or minor changes to
> the Javascript JIT.  I think those JIT bugs, if they shipped, were
> likely causing real problems for users, and we should be detecting
> those bugs rather than disabling our CSS testing coverage and
> putting us in a state where we can't add new CSS features.)
>
>
> I also don't think that moving the failure threshold is a long-term
> solution.  There will always be tests that hover on the edge of
> whatever the failure threshold is and give us trouble as a result; I
> think moving the threshold will only give temporary relief due to
> the history of writing tests to a stricter standard.  For example,
> if we retry intermittent failures up to 10 times to see if they
> pass, we'll end up with tests that fail 75% of the time and thus
> fail all 10 retries intermittently (5.6% of the time).
>
> -David
>
> --
> 𝄞   L. David Baron http://dbaron.org/   𝄂
> 𝄢   Mozilla  https://www.mozilla.org/   𝄂
>  Before I built a wall I'd ask to know
>  What I was walling in or walling out,
>  And to whom I was like to give offense.
>- Robert Frost, Mending Wall (1914)
>
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Intent to implement requestAutocomplete

2014-04-08 Thread Martin Thomson

On 2014-04-08, at 11:40, Anne van Kesteren  wrote:

> Related to this, https://www.w3.org/Bugs/Public/show_bug.cgi?id=25235
> is awaiting our input I'm told. Background:
> http://lists.w3.org/Archives/Public/public-whatwg-archive/2014Apr/0010.html

In the spirit of ocean boiling (i.e., attempting to create a complete breakdown 
of address components that is internationally portable), I refer you to RFC 
5139.  That doesn’t use address lines though, and those are commonplace.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Intent to implement requestAutocomplete

2014-04-08 Thread Anne van Kesteren

On Tue, Apr 8, 2014 at 11:24 AM, Brian Nicholson  wrote:
> There is currently no formal standard. A link to Chrome's
> implementation:
> http://www.chromium.org/developers/using-requestautocomplete. Some
> discussion of the feature here:
> https://groups.google.com/a/chromium.org/forum/#!forum/requestautocomplete.

Related to this, https://www.w3.org/Bugs/Public/show_bug.cgi?id=25235
is awaiting our input I'm told. Background:
http://lists.w3.org/Archives/Public/public-whatwg-archive/2014Apr/0010.html


-- 
http://annevankesteren.nl/
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Intent to implement requestAutocomplete

2014-04-08 Thread Brian Nicholson

For the past few weeks, we've been working on requestAutocomplete, a
proposed standard for HTML forms that streamlines the checkout flow
for websites. Common payment and address form fields are shown in a
popup UI native to the browser, so all sites using the API will share
a common checkout experience, and previously submitted data can be
reused across sites with no autofill guesswork. The main bug tracking
this feature is bug 939351.

There is currently no formal standard. A link to Chrome's
implementation:
http://www.chromium.org/developers/using-requestautocomplete. Some
discussion of the feature here:
https://groups.google.com/a/chromium.org/forum/#!forum/requestautocomplete.

The plan is for bug 939351 to implement the backend components (form
form submission, validation, and storage). Each platform will require
a separate component to implement its form UI. Right now, Android is
the only platform with a WIP UI component (bug 946022).

For the platform components, I expect this feature to be land by Fx32
behind the dom.requestAutocomplete.enabled pref. On Android, the ETA
(with UI) is Fx33.

Thanks,
Brian
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: B2G emulator issues

2014-04-08 Thread Randell Jesup

>Hi,
>
>Thanks for bringing up this issue.
>
>> 
>> One option (very, very painful, and even slower) would be a proper
>> device simulator which simulates both the CPU and the system hardware
>> (of *some* B2G phone).  This would produce the most realistic result
>> with an emulator.
>
>That is what the emulator is already doing. If we start emulating HW
>down to individual CPU cycles, it'll only get slower. :(

I think this is wrong in some way.  Otherwise I wouldn't see this:
1) running on TBPL (AWS) the internal timings reported show the specific
   test going from 30 seconds to 450 seconds with the patch.
2) on my local system, the test self-reports ~10 seconds, with or
   without the patch.

The only way I can see that happening is if the simulator in some way
exposes the underlying platform performance (in specific timers).

Note: the timer in question is nsITimer::TYPE_REPEATING_PRECISE with
10ms timing.  And changing it to 100ms makes the tests reliably green.

>> Another option (likely not simple) would be to find a way to "slow down
>> time" for the emulator, such as intercepting system calls and increasing
>> any time constants (multiplying timer values, timeout values to socket
>> calls, etc, etc).  This may not be simple.  For devices (audio, etc),
>> frequencies may need modifying or other adjustments made.
>
>If we do that, writing and debugging tests will take even longer.

It shouldn't, if the the system self-adapted (per below).  That should
give a much more predictable (and closer-to-similar to a real device)
result.  BTW, I presume we're simulating a single-core ARM, so again not
entirely representative anymore.

>> We could require that the emulator needs X Bogomips to run, or to run a
>> specific test suite.
>> 
>> We could segment out tests that require higher performance and run them
>> on faster VMs/etc.
>
>Do we already know which tests are slow and why? Maybe there are ways to
>optimize the emulator. For example, if we execute lots of driver code
>within the guest, maybe we can move some of that into the emulator's
>binary, where it runs on the native machine.

Dunno.  But it's REALLY slow.  Native linux on tbpl for a specific test: 1s.
Local emulator (fast 2year-old desktop) 10s.  tbpl before patch 30-40s.
after 350-450 and we're lucky it finishes at all.

So compared to AWS linux native it's ~30-40x slower without the patch,
300+ x slower with.  (Again speaks to realtime stuff leaving no CPU for
test running on tbpl.)  Others can speak to overall speed.

>> We could turn off certain tests on tbpl and run them on separate
>> dedicated test machines (a bit similar to PGO).  There are downsides to
>> this of course.
>> 
>> Lastly, we could put in a bank of HW running B2G to run the tests like
>> the Android test boards/phones.
>
>There are tests that instruct the emulator to trigger certain HW events.
>We can't run them on actual phones.

Sure.  Most don't do that I presume (very few)

>To me, the idea of switching to a x86-based emulator seems to be the
>most promising solution. What would be necessary?

Dunno.

-- 
Randell Jesup, Mozilla Corp
remove "news" for personal email
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread L. David Baron

On Tuesday 2014-04-08 14:51 +0100, James Graham wrote:
> So, what's the minimum level of infrastructure that you think would
> be needed to go ahead with this plan? To me it seems like the
> current system already isn't working very well, so the bar for
> moving forward with a plan that would increase the amount of data we
> had available to diagnose problems with intermittents, and reduce
> the amount of manual labour needed in marking them, should be quite
> low.

Not sure what plan you're talking about, but:

The first step I'd like to see is having better tools for finding
where known intermittent failures regressed.  In particular, we
should have:
 * the ability to retrigger a partial test run (not the whole
   suite) on our testing infrastructure.  This doesn't always help,
   since some failures will happen only in the context of the whole
   suite, but I think it's likely to help most of the time.
 * auto-bisection tools for intermittent failures that use the above
   ability when they can

I think we're pretty good about backing out changesets that cause
new intermittent failures that happen at ~20% or more failure rates.
We need to get better about backing out for new intermittent
failures that are intermittent at lower rates, and being able to do
that is best done with better tools.

(One piece of context I'm coming from:  there have been multiple
times that the tests that I consider necessary to have enabled to
allow people to add new CSS properties or values have failed
intermittently at a reasonably high rate for a few months; I think
both the start and end of these periods of failures has, in the
cases where we found it, correlated with major or minor changes to
the Javascript JIT.  I think those JIT bugs, if they shipped, were
likely causing real problems for users, and we should be detecting
those bugs rather than disabling our CSS testing coverage and
putting us in a state where we can't add new CSS features.)

I also don't think that moving the failure threshold is a long-term
solution.  There will always be tests that hover on the edge of
whatever the failure threshold is and give us trouble as a result; I
think moving the threshold will only give temporary relief due to
the history of writing tests to a stricter standard.  For example,
if we retry intermittent failures up to 10 times to see if they
pass, we'll end up with tests that fail 75% of the time and thus
fail all 10 retries intermittently (5.6% of the time).

-David

-- 
𝄞   L. David Baron http://dbaron.org/   𝄂
𝄢   Mozilla  https://www.mozilla.org/   𝄂
 Before I built a wall I'd ask to know
 What I was walling in or walling out,
 And to whom I was like to give offense.
   - Robert Frost, Mending Wall (1914)

signature.asc
Description: Digital signature
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: B2G emulator issues

2014-04-08 Thread Mike Habicher


On 14-04-07 08:49 PM, Ehsan Akhgari wrote:

On 2014-04-07, 8:03 PM, Robert O'Callahan wrote:

When you say "debug", you mean the emulator is running a FirefoxOS debug
build, not that the emulator itself is built debug --- right?

Given that, is it a correct summary to say that the problem is that the
emulator is just too slow?

Applying time dilation might make tests green but we'd be left with the
problem of the tests still taking a long time to run.

Maybe we should identify a subset of the tests that are more likely to
suffer B2G-specific breaking and only run those?


Do we disable all compiler optimizations for those debug builds? Can 
we turn them on, let's say, build with --enable-optimize and 
--enable-debug which gives us a -O2 optimized debug build?


In my experience running tests locally, a single mochitest run on the 
ARM emulator (hardware: Thinkpad X220, 16GB RAM, SSD) where everything 
was built with 'B2G_DEBUG=0 B2G_NOOPT=0' will run in 2 to 3 minutes. The 
same test, run with 'B2G_DEBUG=1 B2G_NOOPT=0' will take 7 to 10 minutes.


--m.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Linux testing on single-core VMs nowadays

2014-04-08 Thread Armen Zambrano G.

We do talos testing on in-house machinery (iX machines with 4-core).
Not sure if that would trigger some of the issues you are hoping to be
caught.

In the future, we should be able to have some jobs run on different EC2
instance types. See https://bugzilla.mozilla.org/show_bug.cgi?id=985650
It will require lots of work but it is possible.

cheers,
Armen

On 14-04-08 03:45 AM, ishikawa wrote:
> On (2014年04月08日 15:20), Gabriele Svelto wrote:
>> On 07/04/2014 23:13, Dave Hylands wrote:
>>> Personally, I think that the more ways we can test for threading issues the 
>>> better.
>>> It seems to me that we should do some amount of testing on single core and 
>>> multi-core.
>>>
>>> Then I suppose the question becomes how many cores? 2? 4? 8?
>>>
>>> Maybe we can cycle through some different number of cores so that we get 
>>> coverage without duplicating everything?
>>
>> One configuration that is particularly good at catching threading errors
>> (especially narrow races) is constraining the software to run on two
>> hardware threads on the same SMT-enabled core. This effectively forces
>> the threads to share the L1 D$ which in turn can reveal some otherwise
>> very-hard-to-find data synchronization issues.
>>
>> I don't know if we have that level of control on our testing hardware
>> but if we do then that's a scenario we might want to include.
>>
>>  Gabriele
> 
> I run thunderbird under valgrind from time to time.
> 
> Valgrind slows down the CPU execution by a very large factor and
> it seems to open many windows for thread races.
> (Sometimes a very short window is prolonged enough so that events caused by,
> say,
> I/O can fall inside this prolonged usually short window.)
> 
> During valgrind execution,I have seen errors that were not reported
> anywhere, and many have
> happened only once :-(
> 
> If VM (such as VirtualBox, VMplayer or something) can artificially
> change the execution time of CPU or even different cores slightly (maybe
> 1/2, 1/3, 1/4)
> I am sure many thread-race issues will be caught.
> 
> I agree that this is a brute-force approach, but please recall that the
> first space shuttle launch needed to be
> aborted due to software glitch. It was a timing issue and according to the
> analysis of the time,
> it could happen once in 72 (or was it 74) cases.
> Even NASA with a large pocket of money and its subcontractor could not catch
> it before launch.
> 
> I am afraid that the situation has not changed much (unless we use a
> computer language well suited to
> avoid these thread-race issues.)
> We need all the help to track down visible and dormant thread-races.
> If artificial CPU execution tweaking (by changing the # of cores or even
> more advanced tweaking methods if available) can help, it is worth a try.
> Maybe not always if such a work cost extra money, but
> a prolonged (say a week) testing from time to time (each quarter or half a
> year, or
> maybe just prior to testing of beta of major release?).
> 
> 
> TIA
> 

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread James Graham


On 08/04/14 15:06, Ehsan Akhgari wrote:

On 2014-04-08, 9:51 AM, James Graham wrote:

On 08/04/14 14:43, Andrew Halberstadt wrote:

On 07/04/14 11:49 AM, Aryeh Gregor wrote:

On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek 
wrote:

If a bug is causing a test to fail intermittently, then that test
loses
value. It still has some value in that it can catch regressions that
cause it to fail permanently, but we would not be able to catch a
regression that causes it to fail intermittently.


To some degree, yes, marking a test as expected intermittent causes it
to lose value.  If the developers who work on the relevant component
think the lost value is important enough to track down the cause of
the intermittent failure, they can do so.  That should be their
decision, not something forced on them by infrastructure issues
("everyone else will suffer if you don't find the cause for this
failure in your test").  Making known intermittent failures not turn
the tree orange doesn't stop anyone from fixing intermittent failures,
it just removes pressure from them if they decide they don't want to.
If most developers think they have more important bugs to fix, then I
don't see a problem with that.


I think this proposal would make more sense if the state of our
infrastructure and tooling was able to handle it properly. Right now,
automatically marking known intermittents would cause the test to lose
*all* value. It's sad, but the only data we have about intermittents
comes from the sheriffs manually starring them. There is also currently
no way to mark a test KNOWN-RANDOM and automatically detect if it starts
failing permanently. This means the failures can't be starred and become
nearly impossible to discover, let alone diagnose.


So, what's the minimum level of infrastructure that you think would be
needed to go ahead with this plan? To me it seems like the current
system already isn't working very well, so the bar for moving forward
with a plan that would increase the amount of data we had available to
diagnose problems with intermittents, and reduce the amount of manual
labour needed in marking them, should be quite low.


dbaron raised the point that there are tests which are supposed to fail
intermittently if they detect a bug.  With that in mind, the tests
cannot be marked as intermittently failing by the sheriffs, less so in
an automated way (see the discussion in bug 918921).


Such tests are problematic indeed, but it seems like they're problematic 
in the current infrastructure too. For example if a test goes from 
always passing to failing 1 time in 10 when it regresses, the first time 
we see the regression is likely to be around 10 testruns after the 
problem is introduced. That presumably makes it rather hard to track 
down what when things went wrong. Or are we running such tests N times 
where N is some high enough number that we are confident that the test 
has a 95% (or whatever) chance of failing if there is actually a 
regression? If not maybe we should be. Or perhaps the idea of 
independent testruns isn't useful in the face of all the state we have.


In any case this kind of test could be explicitly excluded from the 
reruns, which would make the situation the same as it is today.



But to answer your question, I think this is something which can be done
in the test harness itself so we don't need any special infra support
for it.  Note that I don't think that automatically marking such tests
is a good idea either way.


The infra support I had in mind was something like "automatically (doing 
something like) starring tests that only passed after being rerun" or 
"listing all tests that needed a rerun" or "having a tool to find the 
first build in which the test became intermittent". The goal of this 
extra infrastructure would be to get the new information about reruns 
out of the testharness and address the concern that doing automated 
reruns would mean people paying even less attention to intermittents 
than they do today.


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread Ehsan Akhgari


On 2014-04-08, 8:15 AM, Aryeh Gregor wrote:

On Tue, Apr 8, 2014 at 2:41 AM, Ehsan Akhgari  wrote:

What you're saying above is true *if* someone investigates the intermittent
test failure and determines that the bug is not important.  But in my
experience, that's not what happens at all.  I think many people treat
intermittent test failures as a category of unimportant problems, and
therefore some bugs are never investigated.  The fact of the matter is that
most of these bugs are bugs in our tests, which of course will not impact
our users directly, but I have occasionally come across bugs in our code
code which are exposed as intermittent failures.  The real issue is that the
work of identifying where the root of the problem is often time is the
majority of work needed to fix the intermittent test failure, so unless
someone is willing to investigate the bug we cannot say whether or not it
impacts our users.


The same is true for many bugs.  The reported symptom might indicate a
much more extensive underlying problem.  The fact is, though,
thoroughly investigating every bug would take a ton of resources, and
is almost certainly not the best use of our manpower.  There are many
bugs that are *known* to affect many users that don't get fixed in a
timely fashion.  Things that probably won't affect a single user ever
at all, and which are likely to be a pain to track down (because
they're intermittent), should be prioritized relatively low.


I don't think that an analogy with normal bugs is accurate here.  These 
intermittent failure bugs are categorically treated differently than all 
other incoming bugs in my experience.



The thing that really makes me care about these intermittent failures a lot
is that ultimately they make us have to trade either disabling a whole bunch
of tests with being unable to manage our tree.  As more and more tests get
disabled, we lose more and more test coverage, and that can have a much more
severe impact on the health of our products than every individual
intermittent test failure.


I think you hit the nail on the head, but I think there's a third
solution: automatically ignore known intermittent failures, in as
fine-grained a way as possible.  This means the test is still almost
as useful -- I think the vast majority of our tests will fail
consistently if the thing they're testing breaks, not fail
intermittently.  But it doesn't get in the way of managing the tree.
Yes, it reduces some tests' value slightly relative to fixing them,
but it's not a good use of our resources to try tracking down most
intermittent failures.  The status quo reduces those tests' value just
as much as automatic ignoring (because people will star known failure
patterns consistently), but imposes a large manual labor cost.


I agree that automatically ignoring known intermittent failures that 
have been marked as such by a human is a good idea.


But let's also not forget that it won't be a one size fits all solution. 
 There are test failure scenarios such as timeouts and crashes which we 
can't easily retry (for timeouts because the test may leave its 
environment in a non-clean state).  There will also be cases where 
reloading the same test will actually test different things (because, 
for example, things have been cached, etc.).


Cheers,
Ehsan
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread Ehsan Akhgari


On 2014-04-08, 9:51 AM, James Graham wrote:

On 08/04/14 14:43, Andrew Halberstadt wrote:

On 07/04/14 11:49 AM, Aryeh Gregor wrote:

On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek 
wrote:

If a bug is causing a test to fail intermittently, then that test loses
value. It still has some value in that it can catch regressions that
cause it to fail permanently, but we would not be able to catch a
regression that causes it to fail intermittently.


To some degree, yes, marking a test as expected intermittent causes it
to lose value.  If the developers who work on the relevant component
think the lost value is important enough to track down the cause of
the intermittent failure, they can do so.  That should be their
decision, not something forced on them by infrastructure issues
("everyone else will suffer if you don't find the cause for this
failure in your test").  Making known intermittent failures not turn
the tree orange doesn't stop anyone from fixing intermittent failures,
it just removes pressure from them if they decide they don't want to.
If most developers think they have more important bugs to fix, then I
don't see a problem with that.


I think this proposal would make more sense if the state of our
infrastructure and tooling was able to handle it properly. Right now,
automatically marking known intermittents would cause the test to lose
*all* value. It's sad, but the only data we have about intermittents
comes from the sheriffs manually starring them. There is also currently
no way to mark a test KNOWN-RANDOM and automatically detect if it starts
failing permanently. This means the failures can't be starred and become
nearly impossible to discover, let alone diagnose.


So, what's the minimum level of infrastructure that you think would be
needed to go ahead with this plan? To me it seems like the current
system already isn't working very well, so the bar for moving forward
with a plan that would increase the amount of data we had available to
diagnose problems with intermittents, and reduce the amount of manual
labour needed in marking them, should be quite low.


dbaron raised the point that there are tests which are supposed to fail 
intermittently if they detect a bug.  With that in mind, the tests 
cannot be marked as intermittently failing by the sheriffs, less so in 
an automated way (see the discussion in bug 918921).


I'm still not convinced that this idea will be worse than the status 
quo, but the fact that dbaron doesn't agree makes me hesitate.


But to answer your question, I think this is something which can be done 
in the test harness itself so we don't need any special infra support 
for it.  Note that I don't think that automatically marking such tests 
is a good idea either way.


Cheers,
Ehsan

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread James Graham


On 08/04/14 14:43, Andrew Halberstadt wrote:

On 07/04/14 11:49 AM, Aryeh Gregor wrote:

On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek 
wrote:

If a bug is causing a test to fail intermittently, then that test loses
value. It still has some value in that it can catch regressions that
cause it to fail permanently, but we would not be able to catch a
regression that causes it to fail intermittently.


To some degree, yes, marking a test as expected intermittent causes it
to lose value.  If the developers who work on the relevant component
think the lost value is important enough to track down the cause of
the intermittent failure, they can do so.  That should be their
decision, not something forced on them by infrastructure issues
("everyone else will suffer if you don't find the cause for this
failure in your test").  Making known intermittent failures not turn
the tree orange doesn't stop anyone from fixing intermittent failures,
it just removes pressure from them if they decide they don't want to.
If most developers think they have more important bugs to fix, then I
don't see a problem with that.


I think this proposal would make more sense if the state of our
infrastructure and tooling was able to handle it properly. Right now,
automatically marking known intermittents would cause the test to lose
*all* value. It's sad, but the only data we have about intermittents
comes from the sheriffs manually starring them. There is also currently
no way to mark a test KNOWN-RANDOM and automatically detect if it starts
failing permanently. This means the failures can't be starred and become
nearly impossible to discover, let alone diagnose.


So, what's the minimum level of infrastructure that you think would be 
needed to go ahead with this plan? To me it seems like the current 
system already isn't working very well, so the bar for moving forward 
with a plan that would increase the amount of data we had available to 
diagnose problems with intermittents, and reduce the amount of manual 
labour needed in marking them, should be quite low.


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread Andrew Halberstadt


On 07/04/14 11:49 AM, Aryeh Gregor wrote:

On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek  wrote:

If a bug is causing a test to fail intermittently, then that test loses
value. It still has some value in that it can catch regressions that
cause it to fail permanently, but we would not be able to catch a
regression that causes it to fail intermittently.


To some degree, yes, marking a test as expected intermittent causes it
to lose value.  If the developers who work on the relevant component
think the lost value is important enough to track down the cause of
the intermittent failure, they can do so.  That should be their
decision, not something forced on them by infrastructure issues
("everyone else will suffer if you don't find the cause for this
failure in your test").  Making known intermittent failures not turn
the tree orange doesn't stop anyone from fixing intermittent failures,
it just removes pressure from them if they decide they don't want to.
If most developers think they have more important bugs to fix, then I
don't see a problem with that.


I think this proposal would make more sense if the state of our 
infrastructure and tooling was able to handle it properly. Right now, 
automatically marking known intermittents would cause the test to lose 
*all* value. It's sad, but the only data we have about intermittents 
comes from the sheriffs manually starring them. There is also currently 
no way to mark a test KNOWN-RANDOM and automatically detect if it starts 
failing permanently. This means the failures can't be starred and become 
nearly impossible to discover, let alone diagnose.


As I mentioned in another post in this thread, we need better data and 
easier ways to drill through it. All I'm saying here is that I think 
things are probably worse than you picture them, and I think there is a 
lot of groundwork needed before it even makes sense to consider this.


-Andrew
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread Aryeh Gregor

On Tue, Apr 8, 2014 at 2:41 AM, Ehsan Akhgari  wrote:
> What you're saying above is true *if* someone investigates the intermittent
> test failure and determines that the bug is not important.  But in my
> experience, that's not what happens at all.  I think many people treat
> intermittent test failures as a category of unimportant problems, and
> therefore some bugs are never investigated.  The fact of the matter is that
> most of these bugs are bugs in our tests, which of course will not impact
> our users directly, but I have occasionally come across bugs in our code
> code which are exposed as intermittent failures.  The real issue is that the
> work of identifying where the root of the problem is often time is the
> majority of work needed to fix the intermittent test failure, so unless
> someone is willing to investigate the bug we cannot say whether or not it
> impacts our users.

The same is true for many bugs.  The reported symptom might indicate a
much more extensive underlying problem.  The fact is, though,
thoroughly investigating every bug would take a ton of resources, and
is almost certainly not the best use of our manpower.  There are many
bugs that are *known* to affect many users that don't get fixed in a
timely fashion.  Things that probably won't affect a single user ever
at all, and which are likely to be a pain to track down (because
they're intermittent), should be prioritized relatively low.

> The thing that really makes me care about these intermittent failures a lot
> is that ultimately they make us have to trade either disabling a whole bunch
> of tests with being unable to manage our tree.  As more and more tests get
> disabled, we lose more and more test coverage, and that can have a much more
> severe impact on the health of our products than every individual
> intermittent test failure.

I think you hit the nail on the head, but I think there's a third
solution: automatically ignore known intermittent failures, in as
fine-grained a way as possible.  This means the test is still almost
as useful -- I think the vast majority of our tests will fail
consistently if the thing they're testing breaks, not fail
intermittently.  But it doesn't get in the way of managing the tree.
Yes, it reduces some tests' value slightly relative to fixing them,
but it's not a good use of our resources to try tracking down most
intermittent failures.  The status quo reduces those tests' value just
as much as automatic ignoring (because people will star known failure
patterns consistently), but imposes a large manual labor cost.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: B2G emulator issues

2014-04-08 Thread Thomas Zimmermann

Hi,

Thanks for bringing up this issue.

> 
> One option (very, very painful, and even slower) would be a proper
> device simulator which simulates both the CPU and the system hardware
> (of *some* B2G phone).  This would produce the most realistic result
> with an emulator.

That is what the emulator is already doing. If we start emulating HW
down to individual CPU cycles, it'll only get slower. :(

> Another option (likely not simple) would be to find a way to "slow down
> time" for the emulator, such as intercepting system calls and increasing
> any time constants (multiplying timer values, timeout values to socket
> calls, etc, etc).  This may not be simple.  For devices (audio, etc),
> frequencies may need modifying or other adjustments made.

If we do that, writing and debugging tests will take even longer.

> We could require that the emulator needs X Bogomips to run, or to run a
> specific test suite.
> 
> We could segment out tests that require higher performance and run them
> on faster VMs/etc.

Do we already know which tests are slow and why? Maybe there are ways to
optimize the emulator. For example, if we execute lots of driver code
within the guest, maybe we can move some of that into the emulator's
binary, where it runs on the native machine.

> 
> We could turn off certain tests on tbpl and run them on separate
> dedicated test machines (a bit similar to PGO).  There are downsides to
> this of course.
> 
> Lastly, we could put in a bank of HW running B2G to run the tests like
> the Android test boards/phones.

There are tests that instruct the emulator to trigger certain HW events.
We can't run them on actual phones.

To me, the idea of switching to a x86-based emulator seems to be the
most promising solution. What would be necessary?

Best regards
Thomas


> 
> 
> So, what do we do?  Because if we do nothing, it will only get worse.
> 

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: Linux testing on single-core VMs nowadays

2014-04-08 Thread ishikawa

On (2014年04月08日 15:20), Gabriele Svelto wrote:
> On 07/04/2014 23:13, Dave Hylands wrote:
>> Personally, I think that the more ways we can test for threading issues the 
>> better.
>> It seems to me that we should do some amount of testing on single core and 
>> multi-core.
>>
>> Then I suppose the question becomes how many cores? 2? 4? 8?
>>
>> Maybe we can cycle through some different number of cores so that we get 
>> coverage without duplicating everything?
> 
> One configuration that is particularly good at catching threading errors
> (especially narrow races) is constraining the software to run on two
> hardware threads on the same SMT-enabled core. This effectively forces
> the threads to share the L1 D$ which in turn can reveal some otherwise
> very-hard-to-find data synchronization issues.
> 
> I don't know if we have that level of control on our testing hardware
> but if we do then that's a scenario we might want to include.
> 
>  Gabriele

I run thunderbird under valgrind from time to time.

Valgrind slows down the CPU execution by a very large factor and
it seems to open many windows for thread races.
(Sometimes a very short window is prolonged enough so that events caused by,
say,
I/O can fall inside this prolonged usually short window.)

During valgrind execution,I have seen errors that were not reported
anywhere, and many have
happened only once :-(

If VM (such as VirtualBox, VMplayer or something) can artificially
change the execution time of CPU or even different cores slightly (maybe
1/2, 1/3, 1/4)
I am sure many thread-race issues will be caught.

I agree that this is a brute-force approach, but please recall that the
first space shuttle launch needed to be
aborted due to software glitch. It was a timing issue and according to the
analysis of the time,
it could happen once in 72 (or was it 74) cases.
Even NASA with a large pocket of money and its subcontractor could not catch
it before launch.

I am afraid that the situation has not changed much (unless we use a
computer language well suited to
avoid these thread-race issues.)
We need all the help to track down visible and dormant thread-races.
If artificial CPU execution tweaking (by changing the # of cores or even
more advanced tweaking methods if available) can help, it is worth a try.
Maybe not always if such a work cost extra money, but
a prolonged (say a week) testing from time to time (each quarter or half a
year, or
maybe just prior to testing of beta of major release?).

TIA
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Re: New e10s tests on tinderbox

Re: New e10s tests on tinderbox

Re: Removing 'jit-tests' from make check: 15% speedup

Re: New e10s tests on tinderbox

Re: Policy for disabling tests which run on TBPL

Re: New e10s tests on tinderbox

Re: New e10s tests on tinderbox

Re: B2G emulator issues

Re: New e10s tests on tinderbox

Re: New e10s tests on tinderbox

New e10s tests on tinderbox

Re: Policy for disabling tests which run on TBPL

Re: Policy for disabling tests which run on TBPL

Re: B2G emulator issues

Re: Policy for disabling tests which run on TBPL

Re: Policy for disabling tests which run on TBPL

Re: Intent to implement requestAutocomplete

Re: Intent to implement requestAutocomplete

Intent to implement requestAutocomplete

Re: B2G emulator issues

Re: Policy for disabling tests which run on TBPL

Re: B2G emulator issues

Re: Linux testing on single-core VMs nowadays

Re: Policy for disabling tests which run on TBPL

Re: Policy for disabling tests which run on TBPL

Re: Policy for disabling tests which run on TBPL

Re: Policy for disabling tests which run on TBPL

Re: Policy for disabling tests which run on TBPL

Re: Policy for disabling tests which run on TBPL

Re: B2G emulator issues

Re: Linux testing on single-core VMs nowadays

31 matches

Site Navigation

Mail list logo

Footer information