Re: rr chaos mode update

2016-02-14 Thread Robert O'Callahan
On Mon, Feb 15, 2016 at 6:26 PM, Kyle Huey  wrote:

>
> FWIW, every failure that I've debugged to completion so far has been a bug
> in the test (although I have two fatal assertion bugs I'm working through
> that will obviously be flaws in Gecko).  I think one of the things we
> really want to get a feeling for is how often we find actual bugs in the
> product.
>

Yes. So far I've found three Gecko bugs, but we'll find many bugs in tests.

Rob
-- 
lbir ye,ea yer.tnietoehr  rdn rdsme,anea lurpr  edna e hnysnenh hhe uresyf
toD
selthor  stor  edna  siewaoeodm  or v sstvr  esBa  kbvted,t
rdsme,aoreseoouoto
o l euetiuruewFa  kbn e hnystoivateweh uresyf tulsa rehr  rdm  or rnea
lurpr
.a war hsrer holsa rodvted,t  nenh hneireseoouot.tniesiewaoeivatewt sstvr
esn
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: rr chaos mode update

2016-02-14 Thread Benoit Girard
I've got RR working under digital oceans and it works great there.

We've built a harness for generating replays. Once a replay is generated I
match the replay with the bug and comment in the bug looking for developers
to investigate. When they respond they can investigate by ssh'ing. Example:
https://bugzilla.mozilla.org/show_bug.cgi?id=1223249#c12

If we can we should prefer to have an ssh end point running rather than
ship a large VM image. It's also my understanding that while RR works
inside a VM, the trace will not work if the VM has changed host.

However right now we've decided that it's really overkill for the time
being. Producing interesting RR replays is really trivial at the moment.
Finding enough engineers to analyze them is not.

On Mon, Feb 15, 2016 at 1:21 AM, Mike Hommey  wrote:

> On Sun, Feb 14, 2016 at 09:25:58PM -0800, Bobby Holley wrote:
> > This is so. Damn. Exciting. Thank you roc for having the vision and
> > persistence to bring this dream to reality.
> >
> > How far are we from being able to use cloud (rather than local) machine
> > time to produce a trace of an intermittently-failing bug? Some one-click
> > procedure to produce a trace from a failure on treeherder seems like it
> > would lower the activation energy significantly.
>
> One limiting factor is the CPU features required, that are not
> virtualized on AWS (they are on digital ocean, and that's about the only
> cloud provider where they are ttbomk).
>
> Relatedly, roc, is it possible to replay, on a different host, with
> possibly a different CPU, a record that would have been taken on the
> cloud? Does using a VM make it possible? If yes, having "the cloud" (or
> a set of developers) try to reproduce intermittents, and then have
> developers download the records and corresponding VM would be very
> useful. If not, we'd need a system like we have for build/test slave
> loaners.
>
> Mike
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: rr chaos mode update

2016-02-14 Thread Mike Hommey
On Sun, Feb 14, 2016 at 09:25:58PM -0800, Bobby Holley wrote:
> This is so. Damn. Exciting. Thank you roc for having the vision and
> persistence to bring this dream to reality.
> 
> How far are we from being able to use cloud (rather than local) machine
> time to produce a trace of an intermittently-failing bug? Some one-click
> procedure to produce a trace from a failure on treeherder seems like it
> would lower the activation energy significantly.

One limiting factor is the CPU features required, that are not
virtualized on AWS (they are on digital ocean, and that's about the only
cloud provider where they are ttbomk).

Relatedly, roc, is it possible to replay, on a different host, with
possibly a different CPU, a record that would have been taken on the
cloud? Does using a VM make it possible? If yes, having "the cloud" (or
a set of developers) try to reproduce intermittents, and then have
developers download the records and corresponding VM would be very
useful. If not, we'd need a system like we have for build/test slave
loaners.

Mike
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: rr chaos mode update

2016-02-14 Thread Kyle Huey
On Sun, Feb 14, 2016 at 9:37 PM, L. David Baron  wrote:

> On Sunday 2016-02-14 21:26 -0800, Kyle Huey wrote:
> > On Sun, Feb 14, 2016 at 9:16 PM, Robert O'Callahan 
> wrote:
> > > Over the last few days we have had a lot of positive experiences
> > > reproducing bugs with rr chaos mode. Kyle tells me that, in fact, he's
> been
> > > able to reproduce every single bug he tried with enough machine time
> thrown
> > > at it.
> >
> > Of five or so, but yes.
>
> How many of those were intermittents that were never actually
> reported on Linux on our test infrastructure (i.e., reported only on
> other platforms), but that you were able to reproduce in rr's chaos
> mode on Linux?
>

At least one, bug 1150737, had only appeared in any great quantity on 10.6,
and may never have appeared on non-Mac tests in automation.  Chaos mode
reproduced it in a minute or two on Linux.

- Kyle
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: rr chaos mode update

2016-02-14 Thread L. David Baron
On Sunday 2016-02-14 21:26 -0800, Kyle Huey wrote:
> On Sun, Feb 14, 2016 at 9:16 PM, Robert O'Callahan  
> wrote:
> > Over the last few days we have had a lot of positive experiences
> > reproducing bugs with rr chaos mode. Kyle tells me that, in fact, he's been
> > able to reproduce every single bug he tried with enough machine time thrown
> > at it.
> 
> Of five or so, but yes.

How many of those were intermittents that were never actually
reported on Linux on our test infrastructure (i.e., reported only on
other platforms), but that you were able to reproduce in rr's chaos
mode on Linux?

-David

-- 
턞   L. David Baron http://dbaron.org/   턂
턢   Mozilla  https://www.mozilla.org/   턂
 Before I built a wall I'd ask to know
 What I was walling in or walling out,
 And to whom I was like to give offense.
   - Robert Frost, Mending Wall (1914)


signature.asc
Description: Digital signature
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: rr chaos mode update

2016-02-14 Thread Bobby Holley
This is so. Damn. Exciting. Thank you roc for having the vision and
persistence to bring this dream to reality.

How far are we from being able to use cloud (rather than local) machine
time to produce a trace of an intermittently-failing bug? Some one-click
procedure to produce a trace from a failure on treeherder seems like it
would lower the activation energy significantly.

On Sun, Feb 14, 2016 at 9:16 PM, Robert O'Callahan 
wrote:

> Over the last few days we have had a lot of positive experiences
> reproducing bugs with rr chaos mode. Kyle tells me that, in fact, he's been
> able to reproduce every single bug he tried with enough machine time thrown
> at it.
>
> At this point the limiting factor is getting developers to actually debug
> and fix recorded test failures. Anyone should be able to set up a VM on
> their local machine, build Firefox, record some failures and fix them. For
> best results, run just one test that's known intermittent, or possibly the
> whole directory of tests if there might be inter-test dependencies. Use
> --shuffle and --run-until-failure. The most convenient way to run rr with
> chaos mode is probably to create a script rr-chaos that prepends the
> --chaos option, and use --debugger rr-chaos.
>
> Lots of tests have been disabled for intermittency over the years. Now we
> have the ability to fix (at least some of) them without much pain, it may
> be worth revisiting them, though i don't know how to prioritize that.
>
> We might want to revisit our workflow. If we had the ability to mark tests
> as disabled-for-intermittency explicitly, maybe we could automatically
> disable intermittent tests as they show up and dedicate a pool of machines
> to reproducing them with rr.
>
> Rob
> --
> lbir ye,ea yer.tnietoehr  rdn rdsme,anea lurpr  edna e hnysnenh hhe uresyf
> toD
> selthor  stor  edna  siewaoeodm  or v sstvr  esBa  kbvted,t
> rdsme,aoreseoouoto
> o l euetiuruewFa  kbn e hnystoivateweh uresyf tulsa rehr  rdm  or rnea
> lurpr
> .a war hsrer holsa rodvted,t  nenh hneireseoouot.tniesiewaoeivatewt sstvr
> esn
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: rr chaos mode update

2016-02-14 Thread Kyle Huey
On Sun, Feb 14, 2016 at 9:16 PM, Robert O'Callahan 
wrote:

> Over the last few days we have had a lot of positive experiences
> reproducing bugs with rr chaos mode. Kyle tells me that, in fact, he's been
> able to reproduce every single bug he tried with enough machine time thrown
> at it.
>

Of five or so, but yes.

At this point the limiting factor is getting developers to actually debug
> and fix recorded test failures. Anyone should be able to set up a VM on
> their local machine, build Firefox, record some failures and fix them. For
> best results, run just one test that's known intermittent, or possibly the
> whole directory of tests if there might be inter-test dependencies. Use
> --shuffle and --run-until-failure. The most convenient way to run rr with
> chaos mode is probably to create a script rr-chaos that prepends the
> --chaos option, and use --debugger rr-chaos.
>

You can generally pass --debugger=/path/to/rr --debugger-args="record -h"
to mach to get things working.

Lots of tests have been disabled for intermittency over the years. Now we
> have the ability to fix (at least some of) them without much pain, it may
> be worth revisiting them, though i don't know how to prioritize that.
>

FWIW, every failure that I've debugged to completion so far has been a bug
in the test (although I have two fatal assertion bugs I'm working through
that will obviously be flaws in Gecko).  I think one of the things we
really want to get a feeling for is how often we find actual bugs in the
product.

- Kyle
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


rr chaos mode update

2016-02-14 Thread Robert O'Callahan
Over the last few days we have had a lot of positive experiences
reproducing bugs with rr chaos mode. Kyle tells me that, in fact, he's been
able to reproduce every single bug he tried with enough machine time thrown
at it.

At this point the limiting factor is getting developers to actually debug
and fix recorded test failures. Anyone should be able to set up a VM on
their local machine, build Firefox, record some failures and fix them. For
best results, run just one test that's known intermittent, or possibly the
whole directory of tests if there might be inter-test dependencies. Use
--shuffle and --run-until-failure. The most convenient way to run rr with
chaos mode is probably to create a script rr-chaos that prepends the
--chaos option, and use --debugger rr-chaos.

Lots of tests have been disabled for intermittency over the years. Now we
have the ability to fix (at least some of) them without much pain, it may
be worth revisiting them, though i don't know how to prioritize that.

We might want to revisit our workflow. If we had the ability to mark tests
as disabled-for-intermittency explicitly, maybe we could automatically
disable intermittent tests as they show up and dedicate a pool of machines
to reproducing them with rr.

Rob
-- 
lbir ye,ea yer.tnietoehr  rdn rdsme,anea lurpr  edna e hnysnenh hhe uresyf
toD
selthor  stor  edna  siewaoeodm  or v sstvr  esBa  kbvted,t
rdsme,aoreseoouoto
o l euetiuruewFa  kbn e hnystoivateweh uresyf tulsa rehr  rdm  or rnea
lurpr
.a war hsrer holsa rodvted,t  nenh hneireseoouot.tniesiewaoeivatewt sstvr
esn
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Presto: Comparing Firefox performance with other browsers (and e10s with non-e10s)

2016-02-14 Thread Martin Thomson
On Mon, Feb 15, 2016 at 1:12 PM, Martin Thomson  wrote:
> On Mon, Feb 15, 2016 at 12:30 PM, Valentin Gosu  
> wrote:
>>> Thumbnails, or columns on the right for each selected browser with
>>> median (or mean), with the best (for that site) in green, the worst in
>>> red would allow eyeballing the results and finding interesting
>>> differences without clicking on 100 links...  (please!)  Or to avoid
>>> overloading the page, one page with graphs like today, another with the
>>> columns I indicated (where clicking on the row takes you to the graph
>>> page for that side).
>>>
>>
>> What I noticed is that pages with lots of elements, and elements that come
>> from different sources seem to have a higher variability. So pages such as
>> flickr, with lots of images with various sizes, or pages that load various
>> ads.
>
> You currently graph every test result, sorted.  This can be reduced to
> a single measurement.  Here I think that you can take the 5th, 50th
> and 95th percentiles (mean isn't particularly interesting, and you
> want to avoid extreme outliers).  The x axis can then be used for
> something else.  The obvious choice is that you turn this into a bar
> graph with browsers on that x-axis.  You could probably remove the
> browser selector then.

Oh, to be a little less obtuse, I think that means that you get a
column graph with error bars on each column.  Your x-axis is by
browser (and version) with two columns for each (first view and
refresh).
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Presto: Comparing Firefox performance with other browsers (and e10s with non-e10s)

2016-02-14 Thread Martin Thomson
On Mon, Feb 15, 2016 at 12:30 PM, Valentin Gosu  wrote:
>> Thumbnails, or columns on the right for each selected browser with
>> median (or mean), with the best (for that site) in green, the worst in
>> red would allow eyeballing the results and finding interesting
>> differences without clicking on 100 links...  (please!)  Or to avoid
>> overloading the page, one page with graphs like today, another with the
>> columns I indicated (where clicking on the row takes you to the graph
>> page for that side).
>>
>
> What I noticed is that pages with lots of elements, and elements that come
> from different sources seem to have a higher variability. So pages such as
> flickr, with lots of images with various sizes, or pages that load various
> ads.

You currently graph every test result, sorted.  This can be reduced to
a single measurement.  Here I think that you can take the 5th, 50th
and 95th percentiles (mean isn't particularly interesting, and you
want to avoid extreme outliers).  The x axis can then be used for
something else.  The obvious choice is that you turn this into a bar
graph with browsers on that x-axis.  You could probably remove the
browser selector then.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Presto: Comparing Firefox performance with other browsers (and e10s with non-e10s)

2016-02-14 Thread Valentin Gosu
On 12 February 2016 at 17:11, Randell Jesup  wrote:

> >- You can click on each individual point to go to the WPT run and view
> >the results in greater detail
>
> What does "run index" mean in the graphs?  The values appear to be
> sorted from best to worst; so it's comparing best to best, next-best to
> next-best, etc?  Ah, and I see "sorted" undoes the sorting.
>

In the sorted version, it is the index of that run in the sorted array and
mostly meaningless.
The run index in the unsorted version is the index of the run in
chronological order, but the firstRun and repeatRun will share the same
index.


> I'd think displaying mean/median and std-deviation (or a bell-curve-ish)
> might be easier to understand.  But I'm no statistician. :-)  It also
> likely is easier to read when the numbers of samples don't match (or you
> need to stretch them all to the same "width"; using a bell-curve plot of
> median/mean/std-dev avoids that problem.
>

I also think it would be quite useful, but my knowledge of statistics is
pretty basic.
I tried to get the number of samples to match, but for a few domains that
didn't work. I assume there's a bug in my scripts.


>
> Thumbnails, or columns on the right for each selected browser with
> median (or mean), with the best (for that site) in green, the worst in
> red would allow eyeballing the results and finding interesting
> differences without clicking on 100 links...  (please!)  Or to avoid
> overloading the page, one page with graphs like today, another with the
> columns I indicated (where clicking on the row takes you to the graph
> page for that side).
>

What I noticed is that pages with lots of elements, and elements that come
from different sources seem to have a higher variability. So pages such as
flickr, with lots of images with various sizes, or pages that load various
ads.


>
> >--
> >Error sources:
> >
> >Websites may return different content depending on the UA string. While
> >this optimization makes sense for a lot of websites, in this situation it
> >is difficult to determine if the browser's performance or the website's
> >optimizations have more impact on the page load.
>
> Might be interesting to force our UA for a series of tests to match
> theirs, or vice-versa, just to check which sites appear to care (and
> then mark them).
>

I've tried it for a few domains and it didn't make much of a difference.
I'll try it for all the domains to see if there is a pattern we could make
out.


>
> Great!  It'll be interesting to track how these change over time as well
> (or as versions get added to the list).  Again, medians/means/etc may
> help with evaluating and tracking this (or automating notices, ala Talos)
>

I thought about doing this, but Talos is always using static content on a
local connection, whereas this goes over a real network which may vary
performance, and load real websites which may change content or optimize
for different situations. I expect it's useful for confirming certain
properties, such as if page loads are faster on Fx, Chrome or Nightly, and
by how much, but probably can't get results that make sense over a longer
period of time.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Returning dictionaries in WebIDL

2016-02-14 Thread Cameron McCormack
Martin Thomson:
> I know that this is not good practice, but is there something written
> down somewhere explaining why?

I don’t know if it’s written down somewhere.  Using dictionary types for
IDL attributes is forbidden by the spec, because it would mean that a
new copy of the object would need to be returned each time the property
were accessed.  This is the case for sequence types too, where you
can much more obviously encourage wasteful object creation:

  interface A {
attribute sequence values;
  };

  for (var i = 0; i < myA.values.length; i++) {
…
  }

That would create a new JS Array each time around the loop.  With
dictionaries you might access a couple of properties from the return
value of the getter property and have similar issues:

  dictionary D {
double x;
doubly y;
  };

  interface A {
attribute D d;
  };

  Math.sqrt(myA.d.x * myA.d.x + myA.d.y * myA.d.y);

This would create four copies of the JS object for the dictionary.

Another point is that these sequence and dictionary objects can’t be
monitored for changes, so for example you couldn’t write a spec that
required the browser to do something when you assign to myA.d.x since
that’s just a plain data property on the object returned from d.  So for
APIs where you do want to notice property value changes like this,
you’ll need to use interfaces, and for array-ish things we’ve now got 
FrozenArray, which is an array reference type (as opposed to
sequence’s (and dictionaries’) pass-by-value behaviour).

We don’t currently have a type that means “reference to an object that
has a particular shape defined by a dictionary”.  So for now if you
really want an API that allows

  myA.d = { x: 1, y: 2 };

where the A object either immediately, or later, inspects the values on
the object, then you have to use “object” as the type and invoke the
type conversions or do the JS property getting in the spec yourself.

-- 
Cameron McCormack ≝ http://mcc.id.au/
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Returning dictionaries in WebIDL

2016-02-14 Thread Martin Thomson
Yep, thanks for that.  I somehow missed the bit where it says, very
clearly, that attributes MUST NOT be dictionaries.

I was looking for something to give someone else who doesn't like
explanations and prefers reading from the scripture.  Your explanation
matches my understanding.

BTW, there are some people who consider the fact that a dictionary
becomes outside the control of the browser to be a feature.  That is
partly why WebRTC is how it is.

On Mon, Feb 15, 2016 at 10:11 AM, Cameron McCormack  wrote:
> Martin Thomson:
>> I know that this is not good practice, but is there something written
>> down somewhere explaining why?
>
> I don’t know if it’s written down somewhere.  Using dictionary types for
> IDL attributes is forbidden by the spec, because it would mean that a
> new copy of the object would need to be returned each time the property
> were accessed.  This is the case for sequence types too, where you
> can much more obviously encourage wasteful object creation:
>
>   interface A {
> attribute sequence values;
>   };
>
>   for (var i = 0; i < myA.values.length; i++) {
> …
>   }
>
> That would create a new JS Array each time around the loop.  With
> dictionaries you might access a couple of properties from the return
> value of the getter property and have similar issues:
>
>   dictionary D {
> double x;
> doubly y;
>   };
>
>   interface A {
> attribute D d;
>   };
>
>   Math.sqrt(myA.d.x * myA.d.x + myA.d.y * myA.d.y);
>
> This would create four copies of the JS object for the dictionary.
>
> Another point is that these sequence and dictionary objects can’t be
> monitored for changes, so for example you couldn’t write a spec that
> required the browser to do something when you assign to myA.d.x since
> that’s just a plain data property on the object returned from d.  So for
> APIs where you do want to notice property value changes like this,
> you’ll need to use interfaces, and for array-ish things we’ve now got
> FrozenArray, which is an array reference type (as opposed to
> sequence’s (and dictionaries’) pass-by-value behaviour).
>
> We don’t currently have a type that means “reference to an object that
> has a particular shape defined by a dictionary”.  So for now if you
> really want an API that allows
>
>   myA.d = { x: 1, y: 2 };
>
> where the A object either immediately, or later, inspects the values on
> the object, then you have to use “object” as the type and invoke the
> type conversions or do the JS property getting in the spec yourself.
>
> --
> Cameron McCormack ≝ http://mcc.id.au/
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Returning dictionaries in WebIDL

2016-02-14 Thread Boris Zbarsky

On 2/14/16 5:50 PM, Martin Thomson wrote:

I know that this is not good practice


It really depends.  There are some situations in which returning a 
dictionary is probably fine; ideally when what's meant is "a copy of the 
initialization data for this object".  This is the situation with 
WebGLRenderingContext.getContextAttributes().


One thing worth keeping in mind in general, though, is the argument 
Allen makes in 
https://lists.w3.org/Archives/Public/public-script-coord/2014JanMar/0201.html


-Boris
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Returning dictionaries in WebIDL

2016-02-14 Thread Martin Thomson
I know that this is not good practice, but is there something written
down somewhere explaining why?
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform