Re: wptview [Re: e10s]

2016-01-09 Thread James Graham

On 09/01/16 15:43, Benjamin Smedberg wrote:

On 1/8/2016 6:02 PM, James Graham wrote:

On 08/01/16 22:41, Robert O'Callahan wrote:

On Sat, Jan 9, 2016 at 10:27 AM, Benjamin Smedberg

wrote:


What are the implications of this?

The web-platform tests are pass/fail, right? So is it a bug if they
pass
but have different behaviors in e10s and non-e10s mode?



Yeah, I'm confused.

If a wpt test passes but with different output, then either there is no
problem or the test is incomplete and should be changed.


Maybe I should clarify.

web-platform-tests are slightly different to most tests in that we run
both tests we currently pass and tests that we currently don't pass.
On treeherder all we check is that we got the same result in this run
as we expected on the basis of previous runs.


Is this "same as previous run" behavior automatic, or manually
annotated? Running tests which don't pass is supported and common on
many other test suites: fail-if and random-if are used to mark tests as
a known fail but still run them.


It is semi-automated. There are explicit annotations in separate 
metadata files for each test which have to be updated by hand (or using 
output from running the affected tests) when a feature introduces 
different test results (e.g. by fixing tests or adding new non-passing 
ones), but which are automatically generated using a try run for an 
otherwise known-good build when we update to a new version of the 
testsuite from upstream.



Is this a temporary state, where the end goal is to have the
web-platform tests use similar manifests to all of our other tests? Can
you provide some context about why web-platform tests are using
different infrastructure from everywhere else?


I will first note that "similar manifests to our other tests" isn't very 
specific; we already use multiple manifest formats. I will assume you 
mean manifestparser manifests as used for mochitest, but note that 
web-platform-tests contain a mix of both js-based tests and reftests, so 
a single existing format would be insufficient; below I will mainly 
concentrate on the tests that could be well described by 
manifestparser-style manifests, although much applies to both cases.


Unfortunately the web-platform-tests have some rather different 
constraints to other testsuites that make the manifestparser format 
insufficient.


web-platform-tests js tests can contain multiple tests per file 
(sometimes many thousands), so purely per-file metadata is inadequate. 
As I understand it, for other test types we supplement this with in-file 
annotations. In order for us to bidirectionally sync web-platform-tests 
it is essential that we never make local modifications to the test files 
other than intentional bugfixes or additions that are suitable to be 
upstreamed. This means that we must be able to set the expected result 
for each subtest (i.e. individual testcase within a file) in a separate 
local-only file. This is not supported in manifestparser files, nor did 
it seem easy to add.


The restriction on not modifying tests also means that things like prefs 
cannot be set in the tests themselves; it is convenient to use the same 
expectation data files to store this additional information. Rather more 
trivially web-platform-tests may CRASH or ERROR in production, which 
other test types cannot. Obviously support for this would be easier to 
add to manifestparser, but support not existing prevents confusion for 
the multiple test types where the feature wouldn't make sense.


At this point I don't see any real advantages to trying to move to 
manifestparser for all web-platform-tests and many drawbacks, so I don't 
think it will happen. I am also not convinced that it's very relevant to 
the problem at hand; I don't see how the different manifest format is 
causing any issues. Indeed, now that most of our testsuites produce 
structured log output, you don't actually need to look at the input 
manifests at all.


The right thing to do is look at the log files produced from a test run. 
This is what wptview provides a GUI for, and what the test-informant 
tool ahal mentioned elsewhere does on the backend, but anyone with a 
little bit of time and a basic knowledge of the mozlog format (and 
treeherder API, perhaps) could have a go at making a one-off tool to 
answer this specific question efficiently. To do this one would consume 
all the structured logs for the e10s and non-e10s jobs on a push, and 
look for cases where the result is different for the same test in the 
two run types (this would also cover disabled tests that are recorded as 
SKIP).



The effect of all of this is that in order to understand what's
actually needed to bring e10s builds up to par with non-e10s builds
you need to look at the actual test results rather than just the list
of disabled tests. I believe that there are both instances of tests
that pass in non-e10s but not in e10s builds, and the reverse. wptview
gives you 

Re: wptview [Re: e10s]

2016-01-09 Thread Benjamin Smedberg

On 1/8/2016 6:02 PM, James Graham wrote:

On 08/01/16 22:41, Robert O'Callahan wrote:
On Sat, Jan 9, 2016 at 10:27 AM, Benjamin Smedberg 


wrote:


What are the implications of this?

The web-platform tests are pass/fail, right? So is it a bug if they 
pass

but have different behaviors in e10s and non-e10s mode?



Yeah, I'm confused.

If a wpt test passes but with different output, then either there is no
problem or the test is incomplete and should be changed.


Maybe I should clarify.

web-platform-tests are slightly different to most tests in that we run 
both tests we currently pass and tests that we currently don't pass. 
On treeherder all we check is that we got the same result in this run 
as we expected on the basis of previous runs.


Is this "same as previous run" behavior automatic, or manually 
annotated? Running tests which don't pass is supported and common on 
many other test suites: fail-if and random-if are used to mark tests as 
a known fail but still run them.


Is this a temporary state, where the end goal is to have the 
web-platform tests use similar manifests to all of our other tests? Can 
you provide some context about why web-platform tests are using 
different infrastructure from everywhere else?


The effect of all of this is that in order to understand what's 
actually needed to bring e10s builds up to par with non-e10s builds 
you need to look at the actual test results rather than just the list 
of disabled tests. I believe that there are both instances of tests 
that pass in non-e10s but not in e10s builds, and the reverse. wptview 
gives you the ability to do that using data directly from treeherder. 
The actual action to take on the basis of this data is obviously 
something for the people working on e10s to determine.


This is not the responsibility of the e10s team; this is an all-hands 
effort as we switch to making e10s the default configuration and soon 
the only configuration for Firefox. If having different results for e10s 
and non-e10s is not expected, who is the module owner/responsible for 
the web platform tests and can create a list of the problem results and 
find owners to get each one fixed?


--BDS

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


wptview [Re: e10s]

2016-01-08 Thread Kalpesh Krishna
For web-platform-tests there is an additional issue; tests may be enabled
but give a different result in e10s builds compared to non-e10s builds.
Therefore compiling a list of disabled web-platform-tests in e10s is
insufficient to spot all the differences in this case.

Instead, I recommend using the wptview tool which allows comparing the
results of different web-platform-tests runs (or, indeed, any testsuite
that uses structured logging). For people who are not interested in e10s
specifically, this tool may still be useful when trying to examine the
results of web-platform-tests runs.
The tool aims at providing filtering of structured log results, based on
status, path and test type. It takes one or more testsuite_raw.log files (e.g.
wpt_raw.log, mochitest_raw.log) (obtained from Treeherder) and allows us to
compare results, (eg: All tests where results from e10s != results from
non-e10s build). It reads all tests, (including disabled tests).

For example, to compare the W-2 and W-e10s-2 results from a recent run

* Load http://jgraham.github.io/wptview/

* From treeherder download the wpt_raw.log artifacts for a W-2 and W-e10s-2
run on a build you are interested in. For the rest of this example I used
[1] and [2] (loading from the log URL is on the roadmap).

* In the wptview UI click on import, select the log for the W2 run and give
the run a name like "W-2". Do the same for the W-e10s-2 log file. Note that
importing files is still rather slow.
http://ibin.co/2Sjznpoms8BW

* Once the 2 logs have been imported, we get a list of all disabled tests
in the e10s build, along with their results in the non-e10s build using a
"By Result" filter saying "w-e10s-2 is SKIP". A number of test rows are
rendered, each showing the test file, subtest (if any) and the Expected /
Actual STATUS for each of the two runs.
http://ibin.co/2Sk1vaguPyQ8

* If we wish to see all tests having a different status in w-2 and
w-e10s-2, we add a filter like "w-2 is not results from w-e10s-2". Once
again, a number of rows containing the filtered output are rendered.
http://ibin.co/2Sk2R7ngwVWF

Wptview is flexible and allows us to have a number of different complicated
filters. For a demonstration, To obtain tests under /XMLHttpRequest/ where
e10s and non e10s have the same result, and they have a status of ERROR or
TIMEOUT, we need three filters. A path filter saying "Path starts with
XMLHttpRequest", and two status filters saying "w-2 is results from
w-e10s-2" and "w-2 is ERROR or TIMEOUT" (The + symbol adds the OR operator).
http://ibin.co/2SlVPLNSvfnW

wptview also addresses several other use cases that have hitherto required
examining the logs by hand, such as looking for web-platform-tests that
CRASH, looking for tests for a specific feature that don't pass, and
comparing results from Servo and Gecko. Results are stored locally on an
IndexedDB, so one may resume filtering tasks for previously imported logs.
Test results could also be cleared using the "CLEAR ALL" button and started
afresh.

For feature requests and bug reports, please use the github issue tracker
at https://github.com/jgraham/wptview/. If you have any questions, please
ask me, martianwars, or jgraham, on #ateam.

I hope you found this interesting! :)

Cheers!

[1] - W-2 wpt-raw.log


[2] - W-e10s-2 wpt_raw.log


-- 
Kalpesh Krishna,
Sophomore Undergraduate,
Electrical Engineering,
IIT Bombay
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: wptview [Re: e10s]

2016-01-08 Thread Benjamin Smedberg



On 1/8/2016 3:16 PM, Kalpesh Krishna wrote:

For web-platform-tests there is an additional issue; tests may be enabled
but give a different result in e10s builds compared to non-e10s builds.
Therefore compiling a list of disabled web-platform-tests in e10s is
insufficient to spot all the differences in this case.

What are the implications of this?

The web-platform tests are pass/fail, right? So is it a bug if they pass 
but have different behaviors in e10s and non-e10s mode?


If it is a problem (or a potential problem), who is responsible for 
understanding the problem and fixing it? Should it turn something on 
treeherder red?


--BDS
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: wptview [Re: e10s]

2016-01-08 Thread James Graham

On 08/01/16 22:41, Robert O'Callahan wrote:

On Sat, Jan 9, 2016 at 10:27 AM, Benjamin Smedberg 
wrote:


What are the implications of this?

The web-platform tests are pass/fail, right? So is it a bug if they pass
but have different behaviors in e10s and non-e10s mode?



Yeah, I'm confused.

If a wpt test passes but with different output, then either there is no
problem or the test is incomplete and should be changed.


Maybe I should clarify.

web-platform-tests are slightly different to most tests in that we run 
both tests we currently pass and tests that we currently don't pass. On 
treeherder all we check is that we got the same result in this run as we 
expected on the basis of previous runs. That result might be pass but 
might also be FAIL, ERROR, TIMEOUT, or even CRASH. So they are pass/fail 
from the point of view of "did we meet the expectation value", but the 
expectation value itself might not be a PASS (e.g. expected FAIL got 
PASS would turn treeherder orange, as would expected CRASH got ERROR).


For e10s runs we have the ability to set different expectation values 
than for non-e10s runs. This means that we can continue to run tests 
that behave differently in e10s an only disable unstable ones. This has 
the advantage that we will catch some additional types of regression 
e.g. one that causes a test that PASSes in non e10s, previously FAILed 
in e10s and starts to CRASH in e10s whilst still PASSing in non-e10s. 
These would be missed if we just disabled all tests will differing 
behaviour.


The effect of all of this is that in order to understand what's actually 
needed to bring e10s builds up to par with non-e10s builds you need to 
look at the actual test results rather than just the list of disabled 
tests. I believe that there are both instances of tests that pass in 
non-e10s but not in e10s builds, and the reverse. wptview gives you the 
ability to do that using data directly from treeherder. The actual 
action to take on the basis of this data is obviously something for the 
people working on e10s to determine.


I hope that clarifies things somewhat?

Whilst I am here, it's always worth calling out contributions; wptview 
is a Kalpesh's ateam "Quarter of Contribution" project and he has done 
great work.


P.S. I am currently on leave and will remain so until 18th Jan, so don't 
be surprised if I am unresponsive to follow-ups until then. Ms2ger is a 
good person to ask web-platform-tests questions to in the interim.


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: wptview [Re: e10s]

2016-01-08 Thread Robert O'Callahan
On Sat, Jan 9, 2016 at 10:27 AM, Benjamin Smedberg 
wrote:

> What are the implications of this?
>
> The web-platform tests are pass/fail, right? So is it a bug if they pass
> but have different behaviors in e10s and non-e10s mode?
>

Yeah, I'm confused.

If a wpt test passes but with different output, then either there is no
problem or the test is incomplete and should be changed.

Rob
-- 
lbir ye,ea yer.tnietoehr  rdn rdsme,anea lurpr  edna e hnysnenh hhe uresyf
toD
selthor  stor  edna  siewaoeodm  or v sstvr  esBa  kbvted,t
rdsme,aoreseoouoto
o l euetiuruewFa  kbn e hnystoivateweh uresyf tulsa rehr  rdm  or rnea
lurpr
.a war hsrer holsa rodvted,t  nenh hneireseoouot.tniesiewaoeivatewt sstvr
esn
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform