Re: Is there somewhere to get a report of new test failures from a web-platform-tests sync?

2017-02-10 Thread James Graham

On 10/02/17 06:34, Brian Birtles wrote:

I don't expect James to file bugs for all the new failures he
encounters when syncing (and I suspect if he did, many of them would
end up being marked invalid/duplicate because they're features we
don't implement yet), but is there somewhere we can get a report of
all expected test failures that don't have a bug annotation? Maybe
even something we could integrate into triage-center?


I don't know what triage-center is, but increasing the visibility of 
failures in web-platform-tests is certainly something that I think I 
should work on; we are currently letting compat problems slip through 
unnecessarily. If there is a preferred way to expose that information to 
developers I'm very interested to hear about it.


As a short term stopgap, you might be interested to know that all 
upstream PRs are now run through a stability checker that runs the tests 
both in Firefox and Chrome. This includes test submissions that Google 
upstreams. So if you look for PRs that you are interested in (e.g. [1] 
for web-animations) you can get a heads-up about incoming tests with 
Firefox/Chrome differences (work to add Edge and Safari to this 
stability checker is in progress, but there are technical difficulties 
we have to overcome there).


[1] 
https://github.com/w3c/web-platform-tests/pulls?q=is%3Apr+is%3Aopen+label%3Aweb-animations


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Do we need to run web-platform-tests --manifest-update when touching existing tests?

2017-02-10 Thread James Graham

On 10/02/17 06:27, Brian Birtles wrote:

Hi,

It seems like the MANIFEST.json for web platform tests now includes a
checksum of test file contents. As a result, if you run './mach
web-platform-tests --manifest-update yer' on a clean checkout of m-c
you're likely to get a bunch of changes to MANIFEST.json showing up.

Should we be requiring people to update the MANIFEST.json whenever
they touch a file in testing/web-platform/tests (i.e. not just when
they add/remove files)?


This is too much to ask of people; it was effectively required when the 
manifest format change first landed and the result was that the 
associated lint got hidden on treeherder for being orange too often. 
Therefore I changed the lint to only complain if the manifest changes 
would result in the wrong tests being run, or tests being run incorrectly.


The current situation isn't ideal; this manifest is effectively a build 
artifact, but the process of generating from scratch is rather slow and 
would add around a minute to the build time, which isn't acceptable. 
Making this faster is non-trivial because it's mostly parsing HTML 
files; that's super slow in Python (Python bindings for html5ever would 
be of interest here).


The advantage of the checksums in the file is faster incremental updates 
with lower complexity. Given that, one possible short term win would be 
to make --manifest-update the default for |mach wpt| so that people are 
more likely to update it correctly when they are authoring tests.



(Currently when creating a patch queue that adds/removes files from
web-platform-tests my workflow involves first creating a base patch to
include all the checksum changes to m-c that haven't been added to
MANIFEST.json (so they don't show up in later patches in the series
when I need to run --manifest-update), regenerating that patch
whenever I rebase, and dropping it when I land the patch queue.
Probably I'm doing something wrong, however.)


Rebasing is a bit of a pain. I think it is possible to configure 
mercurial (or git) to just run |mach wpt-manifest-update| whenever 
there's a conflict in this file so that it's regenerated to be correct 
rather than needing manual work (there are edge cases where this likely 
won't work, but I think those are rare enough that it's worth trying).


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Is there somewhere to get a report of new test failures from a web-platform-tests sync?

2017-02-13 Thread James Graham

On 11/02/17 03:40, Brian Birtles wrote:

Yes, I saw that and was very impressed! That's super useful. For
Chrome, however, it would be even more useful if we could run those
tests with --enable-experimental-web-platform-features. A lot of the
Web Animations features we're testing are implemented in Chrome but
disabled unless that flag is set. I imagine that is true of other
features too.


Rick Byers, who is owning the Google side of this, agreed so I have some 
patches in review to enable the experimental flag.


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Visual Studio Code recommended extensions

2017-02-23 Thread James Graham

On 23/02/17 16:48, Marco Bonardo wrote:


// Rust language support.
"saviorisdead.RustyCode"


I haven't used either (or VS Code much), but my understanding is that is 
no longer maintained and you should prefer 
https://github.com/editor-rs/vscode-rust.


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Is there a way to improve partial compilation times?

2017-03-08 Thread James Graham

On 08/03/17 11:11, Frederik Braun wrote:

On 08.03.2017 01:17, Ralph Giles wrote:

I second Jeff's point about building with icecream[1]. If you work in
an office with a build farm, or near a fast desktop machine you can
pass jobs to, this makes laptop builds much more tolerable.



What do you mean by build farm?
Do some offices have dedicated build machines?
Or is this "just" a lot of desktop machines wired up to work together?


Several offices are using icecream with whatever desktop machines happen 
to be available to improve build times. The London office was told last 
year that there's a plan to buy dedicated build machines to add to the 
cluster, but I am unaware if there is any progress or whether the idea 
was dropped.


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Is there a way to improve partial compilation times?

2017-03-08 Thread James Graham

On 08/03/17 14:24, Ehsan Akhgari wrote:

On 2017-03-08 6:40 AM, James Graham wrote:

On 08/03/17 11:11, Frederik Braun wrote:

On 08.03.2017 01:17, Ralph Giles wrote:

I second Jeff's point about building with icecream[1]. If you work in
an office with a build farm, or near a fast desktop machine you can
pass jobs to, this makes laptop builds much more tolerable.



What do you mean by build farm?
Do some offices have dedicated build machines?
Or is this "just" a lot of desktop machines wired up to work together?


Several offices are using icecream with whatever desktop machines happen
to be available to improve build times. The London office was told last
year that there's a plan to buy dedicated build machines to add to the
cluster, but I am unaware if there is any progress or whether the idea
was dropped.


What we did in the Toronto office was walked to people who ran Linux on
their desktop machines and installed the icecream server on their
computer.  I suggest you do the same in London.  There is no need to
wait for dedicated build machines.  ;-)



Yes, we have done that. The dedicated hardware would be in addition to 
the existing linux desktop users.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Is there a way to improve partial compilation times?

2017-03-08 Thread James Graham

On 08/03/17 14:21, Ehsan Akhgari wrote:

On 2017-03-07 2:49 PM, Eric Rahm wrote:

I often wonder if unified builds are making things slower for folks who use
ccache (I assume one file changing would mean a rebuild for the entire
unified chunk), I'm not sure if there's a solution to that but it would be
interesting to see if compiling w/o ccache is actually faster at this point.


Unified builds are the only way we build so this is only of theoretical
interest.  But at any rate, if your use case is building the tree once
every couple of days, you should definitely disable ccache with or
without unified builds.  ccache is only helpful for folks who end up
compiling the same code over and over again (for example, if you use
interactive rebase a lot, or if you switch between branches, or use
other VCS commands that touch the file modified times without changing
the dates and times.  Basically if you don't switch between branches a
lot and don't write a lot of C++ code, ccache probably hurts you more
than it helps you.


At risk of stating the obvious, if you aren't touching C++ code (or 
maybe jsm?), and aren't using any funky compile options, you should be 
using an artifact build for best performance.


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Project Stockwell (reducing intermittents) - March 2017 update

2017-03-09 Thread James Graham

On 09/03/17 19:53, Milan Sreckovic wrote:

Not a reply to this message, just continuing the thread.

I'd like to see us run all the intermittently disabled tests once a ...
week, say, or at some non-zero frequency, and automatically re-enable
the tests that magically get better.  I have a feeling that some
intermittent failures get fixed without us realizing it (e.g., we reduce
memory usage, so OOMs go away, we speed things up so timeouts stop
triggering) and it would be good to be able to re-enable those tests
that start passing.

It'd make me feel slightly less sad that we're disabling tests that do
their job 90% of the time...


This idea is appealing, but there are some tricky details. Tests may 
work fine when run in isolation but either cause problems, or be 
problematic, when run in conjunction with related tests. Obviously it's 
possible to deal with that situation, but it might be the difference 
between "and now run this script that automatically enables all the 
tests that were stable over  repetitions" and "the 
results of this need careful manual analysis and are likely to result in 
tests that flip-flop between enabled and disabled a few times before 
people wise up to their peculiar brokenness".


I'm not saying that it's impossible, but I do doubt it's as trivial as 
it first appears.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Planned tree closure Tue 2017-03-14 08:30 UTC

2017-03-13 Thread James Graham
We will be running a migration on the Treeherder database which will 
require pausing job ingestion at 08:30 UTC tomorrow (Tuesday). This is 
expected to take around 90 minutes, and trees will be closed for the 
duration.


Thank  you for your patience.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: The future of commit access policy for core Firefox

2017-03-13 Thread James Graham

On 13/03/17 14:45, Byron Jones wrote:

David Burns wrote:

We should try mitigate the security problem and fix our nit problem
instead of bashing that we can't handle re-reviews because of nits.

one way tooling could help here is to allow the reviewer to make minor
changes to the patch before it lands.
ie.  "r+, fix typo in comment before landing" would become "r+, i fixed
the comment typo"



Assuming you mean "and land without further review", I don't see how 
this has different security properties from r+-with-nits in the — 
reasonably common — case that the patch author is at least as trusted as 
the reviewer (e.g. both L3 today).


I do think that tooling to support multiple authors collaborating on a 
single branch is a good thing independent of the changes discussed in 
this thread.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Please do NOT hand-edit web platform test MANIFEST.json files

2017-03-21 Thread James Graham

On 20/03/17 22:15, gsquel...@mozilla.com wrote:


Sorry if it's a silly suggestion:
Could the current tool insert some helpful reminders *everywhere* in the 
generated file (so it's can't be missed)?
E.g., every 2nd line would read: "// PSA: This file is auto-generated by ./mach 
wpt-manifest-update, please don't edit!" ;-)


It is of course possible but it's not trivial since JSON doesn't allow 
comments, so there would have to be some preprocessing magic and so on. 
Probably better to spend the time on a good solution.


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: A reminder about commit messages: they should be useful

2017-04-17 Thread James Graham

On 17/04/17 16:41, David Major wrote:

I'd like to add to this a reminder that commit messages should describe
the _change_ and not the _symptom_. In other words, "Bug XYZ: Crash at
Foo::Bar" is not a good summary.


An unfortunate pattern I see is non-descriptive commit messages for 
tests, which is particularly problematic for web-platform-tests that are 
later upstreamed. A commit message like "Bug 1234 - Part 2: Tests" is 
not totally terrible in the context of Mozilla Central — although 
obviously more detail about what is, and especially what is not, covered 
would be appreciated — but "Part 2: Tests" is utterly meaningless upstream.


When adding web-platform-tests it would be really appreciated if people 
would consider how the commit message will read outside the m-c 
repository, and at least mention which features the tests are intending 
to cover. Ideally we would also avoid bugzilla review cruft like part 
numbers, but I understand that a more realistic solution there is for 
the tooling to strip out more of this when upstreaming.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


|mach wpt| now allows running tests in other browsers

2017-06-01 Thread James Graham
Bug 1367041 recently landed on mozilla-central which should make it 
easier to run web-platform-tests from Mozilla source in (some) other 
browsers.


|mach wpt| now accepts an argument --product that specifies the browser 
to run the tests in. This accepts values of servo, chrome, and edge in 
addition to the obvious default of firefox. You must have the relevant 
browser installed (and, for servo, on your $PATH, unless you provide a 
--binary argument), but additional dependencies (e.g. Chromedriver) will 
be downloaded if not available (exception: Microsoft's WebDriver isn't 
yet downloaded automatically but a link to the relevant download site is 
provided). So for example:


mach wpt --product chrome testing/web-platform/tests/dom/historical.html

will run that single test in Chrome and

mach wpt --product servo --binary ~/servo/target/release/servo dom

will run all dom tests in the specified servo build.

If you want to add expectation metadata for a non-firefox browser you 
are testing it can be placed in testing/web-platform/products/{product}; 
this directory is ignored by the VCS (note that |mach wpt-update| 
doesn't yet know about different products, although that shouldn't be 
hard to fix).


Support for additional browsers will come. In particular Safari once a 
patch for wptrunner lands (although this is likely to always require 
some manual steps to enable automation), and maybe random browsers via 
Sauce Labs if there's demand.


Obvious note: this works on my machine but it wouldn't be too surprising 
if it doesn't work on your machine. If you run into problems file a bug 
in Testing::web-platform-tests or complain at me on irc.


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Shipping Headless Firefox on Linux

2017-06-15 Thread James Graham

On 15/06/17 21:51, Ben Kelly wrote:

On Thu, Jun 15, 2017 at 4:37 PM, Nathan Froyd  wrote:


On Thu, Jun 15, 2017 at 2:02 PM, Brendan Dahl  wrote:

Headless will run less of the platform specific widget code and I don't
recommend using it for platform specific testing. It is targeted more at
web developers and testing regular content pages. There definitely will

be

cases where regular pages will need to exercise code that would vary per
platform (such as fullscreen), but hopefully we can provide good enough
emulation in headless and work to have a consistent enough behavior

across

platforms that it won't matter.


Would it be feasible to use headless mode for mochitests (or reftests,
etc. etc.)?  I know there are probably some mochitests which care
about the cases you mention above (e.g. fullscreen), but if that could
be taken care of in the headless code itself or we could annotate the
tests somehow, it would be a huge boon for running mochitests locally,
or even in parallel.  (We already have some support for running
reftests/crashtests in parallel.)



There are some tests which fail if the "screen" is not a particular size.
Those might be a problem as well.


FWIW [1] is a try run of web-platform-tests in headless mode. There are 
clearly some tests that are broken. If people thought it was high value 
I could add a --headless command line argument with support for running 
most tests in headless and specially annotated tests in a real window. 
But it wouldn't be possible to keep that annotation data up to date 
automatically with new imports, so it would be a best-effort solution.


[1] 
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e4282575210badf4ab3a072d5ceab51ee2384e11&filter-searchStr=linux

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


wpt CSS tests now running on Linux

2017-07-20 Thread James Graham
Bug 1341078 and dependencies just landed on inbound, which means we now 
have the W3C/web-platform-tests CSS tests in-tree and running in 
automation. This adds about 12,000 reftests for CSS features to the 
web-platform-tests suite. They are currently enabled in CI, but only on 
linux*, due to limited capacity on OSX, and issues with the harness on 
Windows. The tests will be enabled on other platforms once these 
problems are resolved.


Servo was already running many of these tests in their automation, so 
this landing plugs a gap in the stylo testing vs upstream.


Note that, as usual with wpt landings, these tests have been vetted for 
stability, but not for the actual results.


Changes to the css tests in this directory will be upstreamed to 
web-platform-tests in the same way as for any other web-platform-test. 
Note that the reftest harness versions of the Mozilla submitted tests 
are still running, so if you want to edit or add to those it is 
recommended to use the copy in layout/reftests/w3c-css/submitted/ since 
that will be correctly synchronised and currently runs on more platforms 
in CI.


The number of tests and nature of reftests means that this change added 
a large number of files to the repository (around 37,000). Apologies for 
any inconvenience caused by such a large changeset. I'm told that narrow 
clones are just around the corner and may make this kind of thing more 
tolerable in the future.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: wpt CSS tests now running on Linux

2017-07-20 Thread James Graham

On 20/07/17 18:26, Emilio Cobos Álvarez wrote:

Thanks for this James! \o/

One question, do we run the CSS test linter on automation, or are there
any plans for it?


Yes, that should be run as part of the W lint job (e.g. [1]), which is 
run on pushes (including to try) that change files under 
testing/web-platform/tests/. We don't run it for changes under 
layout/reftests/w3c-css/submitted/ and it's not clear how easy that 
would be, since the lint is rather tied to the structure of the wpt 
repository.


Note that there are other reasons that a push might be blocked upstream 
but land in m-c (e.g. unstable tests). We are working to create an 
upstream PR earlier in the cycle, and improve the communication with 
test authors about problems upstreaming their changes (as well as 
corresponding improvements to downstreaming that should allow us to 
notify relevant people when "interesting" test changes are going to land).


[1]https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=60fa00c73b05bd2bc0e7485826a86ffed47627c9&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=runnable&filter-resultStatus=pending&filter-resultStatus=running&filter-resultStatus=success&filter-searchStr=lint&selectedJob=115937895
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: reducing high try macosx pending counts

2017-08-03 Thread James Graham

On 02/08/17 22:30, Kim Moir wrote:

You may have noticed that the time to wait for macosx test results on try
has been very long (>1day) this week.

We have taken the following steps to address this problem

[...]

That sounds great! Thanks.

For everyone else:

It looks like the queues are still pretty long, and I imagine there are 
lots of pending jobs people scheduled that aren't really necessary any 
more (e.g. because you already found issues with a patch on other 
platforms, or landed in spite of the missing results).


If you have a try push which requested OSX jobs, but you don't need them 
now, it would help if you go back and cancel the remaining jobs for that 
push from treeherder (look for the grey circle with a cross inside in 
the top right of the push to cancel all unfinished jobs, and the similar 
icon in the bottom panel for individual jobs). Also if you are making 
new try pushes and don't specifically need OSX testing, it's possible to 
limit tests to certain platforms with try syntax like (for running tests 
on linux64 and linux64-stylo only):


try: -b do -p all -u web-platform-tests[x64,linux64-stylo]

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: disabled non-e10s tests on trunk

2017-08-16 Thread James Graham

On 16/08/17 01:26, Nils Ohlmeier wrote:

I guess not a lot of people are aware of it, but for WebRTC we still have two 
distinct implementations for the networking code.
So if I understand the impact here right we just lost test coverage for 
probably a couple of thousand lines of code.

[…]


I’m not sure how others do it, but our low level C++ unit tests don’t have an 
e10s mode at all.
Therefore we can’t simply delete the non-e10s WebRTC networking code either 
(without loosing a ton of test coverage).


If the networking code is only covered by C++ unit tests, there is 
separate code for non-e10s vs e10s,  and the unit tests don't work in 
e10s mode doesn't that mean we currently don't have any test coverage 
for our shipping configuration on desktop? What am I missing?

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: disabled non-e10s tests on trunk

2017-08-16 Thread James Graham

On 15/08/17 21:39, Ben Kelly wrote:

On Tue, Aug 15, 2017 at 4:37 PM, Joel Maher  wrote:


All of the above mentioned tests are not run on Android (well
mochitest-media is to some degree).  Is 4 months unreasonable to fix the
related tests that do not run in e10s?  Is there another time frame that
seems more reasonable?



Last I checked it was your team that told me WPT on android was not an
immediate priority.  The WPT harness itself does not run there.


FWIW the story with wpt/Android is that originally Android didn't 
support the kind of remote control required to run wpt tests (i.e. 
marionette). That has subsequently been fixed, and it's believed to be 
possible incorporate Android into the wpt harness without a significant 
refactoring, but after that there is substantial, time consuming, work 
required to get from "it runs" to "this is a thing we can run in 
production".


I doubt that the work required to implement an Android backend, sort out 
issues with the relative slowness of the android emulator, and green up 
the tests, would be less than one person's work for a quarter. Even once 
this is done there would likely be ongoing problems updating the 
metadata on android for syncs from upstream simply from the additional 
slowness of try runs and probable additional intermittency.


So far there haven't been enough people working on wpt to make this a 
priority relative to other goals. Given the situation with e10s, it 
seems reasonable to reassess this when planning future work, but I 
wouldn't bet on such work being complete by Dec. 29th this year.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: disabled non-e10s tests on trunk

2017-08-16 Thread James Graham

On 16/08/17 19:36, Ben Kelly wrote:

My only thought about windows7-debug is that android is a variant of
linux.  Running a linux platform might be closer to android behavior.  But
I don't have a known specific difference in mind.


Right it seems like there are two use cases here:

1) Tests that are really aimed at Desktop or are cross-product but don't 
run on e10s for (reasons).


2) Tests for features that are run in e10s on Desktop, but exercise 
functional differences in non-e10s and don't run in Android.


For 1) running on Windows makes some sense because that's where most 
users are. For 2) it makes no sense because it's the most different from 
Android. For those cases running on Linux(64) makes more sense (and is 
also usually where we have most capacity, so that helps with infra issues).

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Implementing a Chrome DevTools Protocol server in Firefox

2017-08-31 Thread James Graham

On 31/08/17 02:14, Michael Smith wrote:

On 8/30/2017 15:56, David Burns wrote:
 > Do we know if the other vendors would see value in having this 
spec'ed properly so that we have true interop here? Reverse engineering 
seems like a "fun" project but what stops people from breaking stuff 
without realising?


Fortunately we're not reverse engineering here (for the most part), all 
protocol messages are specified in a machine-readable JSON format which 
includes inline documentation [0] --- this is what the cdp Rust library 
consumes. The spec is versioned and the authors do seem to follow a 
proper process of introducing new features as "experimental", 
stabilizing mature ones, and deprecating things before they're removed.


I think that the reverse engineering part is not the wire protocol, 
which is usually the most trivial part, but the associated semantics. It 
doesn't seem that useful to support the protocol unless we behave in the 
same way as Chrome in response to the messages. It's the specification 
of that behaviour which is — as far as I can tell — missing, and which 
seems likely to involve reverse engineering.


In general it seems unfortunate if we are deciding to implement a 
proprietary protocol rather than opting to either extend something that 
is already a standard (e.g. WebDriver) or perform standardisation work 
alongside the implementation. What alternatives have we considered here? 
Is it possible to extend existing standards with missing features? Or 
are the current tools using the protocol so valuable that we don't have 
any choice but to support them on their terms? If it's the latter, or we 
just think the Chrome protocol is so technically superior to the other 
options that we would be foolish to ignore it, can we work with Google 
to get it standardised? I think some meaningful attempt at 
standardisation should be a prerequisite to this kind of protocol 
implementation shipping in Firefox.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Implementing a Chrome DevTools Protocol server in Firefox

2017-08-31 Thread James Graham

On 31/08/17 19:42, Jim Blandy wrote:
Some possibly missing context: Mozilla Devtools wants to see this 
implemented for our own use. After much discussion last summer in 
London, the Firefox Devtools team decided to adopt the Chrome Debugging 
Protocol for the console and the JavaScript debugger. (The cases for 
converting the other tools like the Inspector are less compelling.)


Speaking as the designer of Firefox's protocol, the CDP is a de-facto 
standard. The Firefox protocol really has not seen much uptake outside 
Mozilla, whereas the Chrome Debugging Protocol is implemented with 
varying degrees of fidelity by several different browsers. "Proprietary" 
is not the right term here, but in the sense of "used nowhere else", one 
could argue that it is Mozilla that is using the proprietary protocol, 
not Chrome. In a real sense, it is more consistent with Mozilla's 
mission for us to join the rest of the community, implement the CDP for 
the tools where it makes sense, and participate in its standardization, 
than to continue to push a protocol nobody else uses.


I entirely agree that the current Firefox protocol is also proprietary. 
However I also assumed that it's considered an internal implementation 
detail rather than something we would expect people to interoperate 
with. If that wasn't the case then I apologise: I should have complained 
earlier :)


Going forward, if we implement a "de-facto" standard that is not 
actually standardised, we are assuming a large risk, in addition to the 
problems around our stated values. An obvious concern is that Google are 
free to change the protocol as they like, including in ways that are 
intentionally or accidentally incompatible with other implementations. 
We also know from past experience of implementing "de-facto" standards 
that implementation differences end up hardcoded into third party 
consumers (i.e. web pages in the case of DOM APIs), making it impossible 
to get interoperability without causing intolerable short-term breakage. 
This has prevented standardisation and compatibility of "de-facto" 
standards like innerText and contentEditable, which remain nominally 
equivalent but actually very different in all browsers.


If people are starting to standardise not just the protocol but also the 
semantics of CDP, that's great. But people tend to vastly underestimate 
how long standardisation will take, and overestimate the resources that 
they will find to work on it. So it would be good to see concrete 
progress before we are actually shipping.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Implementing a Chrome DevTools Protocol server in Firefox

2017-08-31 Thread James Graham

On 31/08/17 21:22, Jack Moffitt wrote:
Is there another alternative besides CDP you'd like to propose? 



I don't have an alternate proposal, and I feel like I must have been 
unclear at some point. I'm not saying "this is bad, period". I'm 
certainly not saying "this is bad because it isn't WebDriver". Given 
that people seem to be in agreement that technically the CDP is good, 
I'm saying "this is bad if it remains a vendor-controlled, partially 
documented, pseudo-standard". The fact that there is apparently interest 
in creating a standard is reassuring, but there doesn't seem to be any 
recent activity on remotedebug.org, so it's hard to tell what the real 
status is, or whether people have understood the amount of work that 
entails. I am slightly worried that there have been several replies 
suggesting that poor interoperability above the message layer won't be a 
big problem because the users will be technical and therefore happy to 
absorb the cost of backwards-incompatible changes between releases. 
Experience from WebDriver is that this isn't true.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Testing & "Intent to Ship"

2017-09-01 Thread James Graham
Looking back over recent "Intent to Ship" emails for web platform 
exposed features, I notice that only half make any mention of 
accompanying tests.


Since cross-browser tests are one of the main ways we prevent today's 
exciting new feature being tomorrow's site-breaking compat nightmare, 
I'd like to encourage everyone to be more diligent about following the 
Intent to Ship template, and recording whether the feature they want to 
enable has adequate test coverage to demonstrate compatibility with 
other browsers' existing and future implementations.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Implementing a Chrome DevTools Protocol server in Firefox

2017-09-05 Thread James Graham

On 04/09/17 23:34, Jim Blandy wrote:

On Mon, Sep 4, 2017 at 7:36 AM, David Burns  wrote:

I don't think anyone would disagree with the reasons for doing this. I,

like James who brought it up earlier, am concerned that we from the emails
appear to think that implementing the wire protocol would be sufficient to
making sure we have the same semantics.

LOL, give us a little credit, okay? The authors of the email do not think
that. We want to have a properly written specification and conformance
tests. I think you're reading "we have no interest in established
standardization processes" when what we wrote was "the process is in very
early stages".

Do you think the Browser Testing Tools WG is the right body to work on a JS
debugging and console protocol, used by interactive developer tools? That
seems like a surprising choice to me.


It is certainly not the only possible venue, but if you want to do the 
work at the W3C then it's probably the easiest way to get things going 
from a Process point of view, since this kind of protocol would be in 
the general remit of the group, and the rechartering could add it 
specifically. Certainly the people currently in the group aren't the 
right ones to do the work, but adding new participants to work 
specifically on this would be trivial.



Also - at least as far as I know -  this is not where the current
participants in the discussion (Kenneth Auchenberg or Christian Bromann, to
name two) have been working. Is having a previously uninvolved standards
committee take up an area in which current activity is occurring elsewhere
considered friendly and cooperative behavior? It seems unfriendly to me. I
would like to avoid upsetting the people I'm hoping to work closely with.


I think you have misinterpreted the intent here. I don't think anyone is 
interested in doing a hostile takeover of existing work. But there is 
concern that the work actually happens. Pointing at remotedebug.org, 
which has been around since 2013 without producing any specification 
materials, isn't helping assuage my concerns, and I guess others are 
having a similar reaction. It is of course entirely possible that 
there's work going on that we can't see. But my interpretation of 
David's email is that he is trying to offer you options, not force you 
down a certain path. The W3C is not always the right venue to work in, 
but it is sometimes sought out by organisations who would likely 
participate in this work because of its relatively strong IPR policy.


I should stress that irrespective of venue I would expect this 
standardisation effort to take years; people always underestimate the 
work and time required for standards work. It will certainly require us 
to commit resources to make it happen.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intermittent oranges and when to disable the related test case - a simplified policy

2017-09-12 Thread James Graham

On 12/09/17 14:55, Andrew Halberstadt wrote:

On Mon, Sep 11, 2017 at 10:33 PM Robert O'Callahan 
wrote:


On Tue, Sep 12, 2017 at 11:38 AM, Andrew Halberstadt <
ahalberst...@mozilla.com> wrote:


I don't think so, that data already exists and is query-able from
ActiveData:
https://activedata.allizom.org/tools/query.html#query_id=8pDOpeni



That query tells you about disabled tests, but doesn't know about *why* a
test was disabled. E.g. you can't distinguish tests disabled because
they're not expected to work on some (or all) platforms from tests that
were disabled for intermittent failures that should, in principle, be fixed.

Rob



True, though I don't know that gps' proposal would solve that either.

But this is a good idea, and is easy to solve from a technical standpoint.
We'd just need to agree on some standard manifest keys:


I'm pretty sure that the problem isn't technical, but actually getting 
people to do that consistently (plus retrofitting the data onto 
thousands of currently disabled tests). You would at least have to add a 
lint and a free pass for all the existing tests.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Reminder on Try usage and infrastructure resources

2017-09-14 Thread James Graham

On 14/09/17 16:48, Marco Bonardo wrote:

When I need to retrigger a mochitest-browser test multiple times (to
investigate an intermittent), often I end up running all the
mochitest-browser tests, looking at every log until I find the chunk
where the test is, and retrigger just that chunk. The chunk number
changes based on the platform and debug/opt, so it's painful.
Is there a way to trigger only the chunk that will contain a given
test, so I can save running all of the other chunks?


You might be able to use

mach try -p linux64 

in order to run a single chunk with just the chosen tests.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Reminder on Try usage and infrastructure resources

2017-09-15 Thread James Graham

On 15/09/17 00:53, Dustin Mitchell wrote:

2017-09-14 18:32 GMT-04:00 Botond Ballo :

I think "-p all" is still useful for "T pushes" (and it sounds like
build jobs aren't the main concern resource-wise).


Correct -- all builds are in AWS.

I'd like to steer this away from "What legacy syntax should we use
instead?" and "How should we tweak the legacy try syntax?" to:

  How can we use the modern tryselect functionality to achieve more
precise try pushes?


I think that's a good discussion to have, but the original motivation 
for this thread aiui are recent incidents where there have been 12+ hour 
backclogs on try, causing problems across the org. In general we ought 
to solve this by being smarter about what's run automatically, but we 
aren't there yet. We also don't have full uptake of |mach try fuzzy| and 
in any case, people likely to be impacted by this all know try syntax. 
So a discussion in those terms seems meaningful.


I think there are some fairly simple rules people can apply to help with 
the observed, recurring, problem. These are not official, I'm not in a 
position of authority here, but I assume people will correct anything 
that's wrong or controversial:


* -p all is generally OK because builds are on cloud machines and we 
aren't hardware constrained there. Obviously any unnecessary job, 
including builds, does cost money.


* Bare -p all -u all generally isn't OK. In particular it shouldn't be 
seen as the default "check before landing" try push. Of course, if you 
have a large cross-cutting change that genuinely could affect any test 
on any platform, it might be the right choice.


* A combination of selecting specific relevant suites and representative 
platforms using -u [platform] is generally a good choice. 
|mach try fuzzy| is a better way to schedule this kind of push.


*  mach try allows specifying specific paths or directories. This allows 
even finer grained test selection where you are interested in specific 
tests.


* In general running tests on mac should be avoided if possible. This is 
our most hardware constrained regression test platform. People only 
using it when they think that their change will affect mac differently 
to linux and windows will help a lot.


* If you know your try push failed before all jobs complete, or you land 
a patch with jobs still pending, please take a moment to cancel all 
pending jobs from treeherder. That is disproportionately helpful for 
freeing up resources on backlogged platforms.


* I have no idea about performance tests.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Reminder on Try usage and infrastructure resources

2017-09-15 Thread James Graham

On 15/09/17 18:45, Dan Mosedale wrote:

I wonder if this isn't (in large part) a design problem disguised as a
behavior problem.  The existing try syntax (even with try chooser) is so
finicky and filled with abbreviations that even after years of working with
it, I still regularly have to look up stuff and sometimes when I've been in
a hurry, I've done something more general than I really needed because it
was just too painful to figure out the exact thing.

I'd be pretty surprised if developers newer to the mozilla infrastructure
than I didn't end up doing this sort of thing substantially more frequently.

https://ahal.ca/blog/2017/mach-try-fuzzy/ seems like a fine step in the
right direction, and maybe that'll be enough.

But I do wonder if the path to saving substantial time and money in the
long run is to invest some real user-research / UX / design time into
designing a try configurator where it requires effort to do the
unnecessarily expensive thing, as opposed to the current situation, where
it requires effort to avoid the expensive thing.


I think that's a rather uncontroversial opinion. Historically we have 
been hampered by the fact that the set of try jobs was basically unknown 
and constantly changing, and the code was scattered across many 
repositories. Now that taskcluster defines everything in a single place 
and the majority of the code is in-tree it will be much easier to 
experiment with different frontends that make it easy to select the 
right jobs. That's what allowed ahal to write |mach try fuzzy|.


There is also a desire to have better change-based job selection, so 
that the default behaviour can be "run the jobs that are most likely to 
be affected by the change I just made".


However all of these improvements will take time, and in the meantime 
there are problems being caused by too-high backlog, so some changes in 
user behaviour will be helpful as we work toward better tools.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to require `mach try` for submitting to Try

2017-09-18 Thread James Graham

On 18/09/17 09:27, Samael Wang wrote:

In a rare case that we need to send a "CLOSED TREE" try job,
will we be able to do that with ./mach try?
Last time I didn't use mach try to submit try job was because of that.



That doesn't work right now, but it should be easy to add a 
--closed-tree flag or similar. File a bug please?

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to require `mach try` for submitting to Try

2017-09-18 Thread James Graham

On 18/09/17 04:05, Eric Rescorla wrote:


But that's just a general observation; if you look at this specific case,
it might not be much effort to support native git for richer/future try
pushing. But that's very different from requiring all the tools to support
native git on an equal basis. And it seems reasonable to evaluate the
utility of this specific support via a poll, even one known to be biased.



I don't think that's true, for the reasons I indicated above. Rather,
there's a policy decision about whether we are going to have Git as a
first-class thing or whether we are going to continue force everyone who
uses Git to fight with inadequate workflows. We know there are plenty of
people who use Git.


I don't entirely understand what concrete thing is being proposed here. 
As far as I can tell the git-hg parameter space contains the following 
points:


 1. Use hg on the server and require all end users to use it
 2. Use git on the server and require all end users to use it
 3. Use hg on the server side and use client-side tooling to allow git 
users to interact with the repository
 4. Use git on the server side and use client-side tooling to allow hg 
users to interact with the repository
 5. Provide some server side magic to present both git and hg to 
clients (with git, hg, or something else, as the on-disk format)


These all seem to have issues relative to the goal of "vanilla git with 
no custom code":


 1. Doesn't allow git to be used at all.
 2. Requires a multi-year transition away from hg. Probably not popular 
with hg fans.
 3. The status quo. Requires using a library for converting between hg 
and git (i.e. cinnabar) or some mozilla-specific custom scripts (the old 
moz-git-tools)
 4. Like 3. but with an additional multi-year transition and different 
custom tooling.
 5. Allows vanilla git and hg on the client side, but requires 
something complex, custom, and scary on the server side to allow pushing 
to either repo. Could be possible if we eliminate ~all manual pushes 
(i.e. everything goes via autoland), but cinnabar or similar is still 
there in the background.


Given none of those options seem to fit, the only other possibility I 
can think of is to skip the general problem of how to interact with the 
VCS for try specifically by making submitting to try not look like a VCS 
push, but like some other way of sending a blob of code to a remote 
machine (e.g. using some process that generates a patch file). But 
unless there's some extant way to achieve that it seems like it would be 
replacing known-working code (cinnabar) with new custom code.


So my best guess is that you mean that all pushes should go via autoland 
and we should provide read-only hg/git repositories, and try pushes 
should also go via something other than a vcs push. But I'm probably 
missing something.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: (hyperlink auditing)

2017-10-17 Thread James Graham

On 02/10/17 18:06, Anne van Kesteren wrote:

Bug: https://bugzilla.mozilla.org/show_bug.cgi?id=951104

Rationale: There's already a myriad of ways to obtain this data
through script. We might as well ship the protocol that both Chrome
and Safari ship in the hope that along with sendBeacon() it decreases
the usage of the slower alternatives; ultimately giving users a better
experience.

Previously: This was already discussed in
https://groups.google.com/d/msg/mozilla.dev.platform/DxvZVnc8rfo/RxSnyIFqxoQJ
and I think concluded given Jonas Sicking's statement in the
aforementioned bug, but since it's been a few years without action we
thought it would be worth it to send another ping.




Are there cross-browser (i.e. wpt) tests for this feature?
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: CSP directive worker-src

2017-10-18 Thread James Graham

On 22/09/17 15:18, Christoph Kerschbaumer wrote:

Hey Everyone,

within CSP2 workers used to be governed by the child-src directive [0]. CSP3 
introduces the worker-src directive [1] wich governs Workers, SharedWorkers as 
well as ServiceWorkers. Please note that the child-src directive has been 
deprecated within CSP3 in favor of worker-src as well as frame-src.

For backwards compatibility child-src will still be enforced for:
   * workers (if worker-src is not explicitly specified)
   * frames  (if frame-src is not explicitly specified)

We plan to ship the CSP directive worker-src within Firefox 58.


Do we have cross-browser (i.e. web-platform) tests for this feature?
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: CSP directive worker-src

2017-10-18 Thread James Graham

On 18/10/17 10:35, Christoph Kerschbaumer wrote:



On Oct 18, 2017, at 11:25 AM, James Graham  wrote:

On 22/09/17 15:18, Christoph Kerschbaumer wrote:

Hey Everyone,
within CSP2 workers used to be governed by the child-src directive [0]. CSP3 
introduces the worker-src directive [1] wich governs Workers, SharedWorkers as 
well as ServiceWorkers. Please note that the child-src directive has been 
deprecated within CSP3 in favor of worker-src as well as frame-src.
For backwards compatibility child-src will still be enforced for:
   * workers (if worker-src is not explicitly specified)
   * frames  (if frame-src is not explicitly specified)
We plan to ship the CSP directive worker-src within Firefox 58.


Do we have cross-browser (i.e. web-platform) tests for this feature?


Not yet. We just agreed with Chrome on the same fallback mechanism, see [1].
We are about to add mochitests for all the different fallback mechanisms though.


What's the reason for writing mochitests? It seems like this is 
something where we benefit from shared tests.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: PSA: Microsoft VMs for testing

2017-11-07 Thread James Graham

On 07/11/17 13:47, Tom Ritter wrote:

Warning: they auto-shut down after 30 minutes (maybe? I never timed
it). I haven't put any effort into figuring out if that's
configurable, but I don't think it is.


I think that only happens once the trial period expires but you can 
reinstall the VM to reset that (you'll lose state of course).

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: CSP Violation DOM Events

2017-11-17 Thread James Graham

On 17/11/17 05:55, Chung-Sheng Fu wrote:

Content Security Policy suggests Security Policy Violation DOM Events [1].
In case any of the directives within a policy are violated, such a
SecurityPolicyViolationEvent is generated and sent out to a reporting
endpoint associated with the policy. We are working on implementing those
violation events here [2] and plan to ship them within Firefox 59.


Do we have cross-browser (i.e. web-platform) tests covering this feature?
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: CSP Violation DOM Events

2017-11-17 Thread James Graham

On 17/11/17 16:06, Daniel Veditz wrote:
On Fri, Nov 17, 2017 at 2:01 AM, James Graham <mailto:ja...@hoppipolla.co.uk>> wrote:


Do we have cross-browser (i.e. web-platform) tests covering this
feature?


We fail many of the existing CSP web platform tests, despite having 
implemented most of the features, because they were written to use the 
violation events to check the results.​


Is that an issue with our implementation or something we should fix in 
the tests? In either case it seems problematic to have such a feature 
and no way of checking for compatibility with other implementations.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to implement (again): Shadow DOM

2017-11-27 Thread James Graham

On 27/11/17 12:20, smaug wrote:

This is basically an after the fact notification that
we're in progress of implementing Shadow DOM again, this time v1[1].
While doing this, the v0 implementation, which was never exposed to the 
web, will be removed.

v1 is luckily way simpler, so this all should simplify various bits in DOM.

FF60 might be quite realistic target to ship this, but there will be 
intent-to-ship

before that.

Many parts of the spec (DOM) is stable, but there are still couple of 
tricky issues in HTML, like
session history handling with shadow DOM. However Chrome and Safari are 
shipping v1 already.


Devtools will be able to see into the Shadow DOM.

Currently the work is under the old pref "dom.webcomponents.enabled"
but perhaps we should change that, so that people who were testing v0 
don't get

v1 APIs.


Do we have cross-browser (i.e. web-platform) tests for this feature, and 
have we assessed whether they are sufficiently complete to give us 
confidence of interop?

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: CSS Shapes Module Level 1 (partial)

2017-11-29 Thread James Graham

On 29/11/17 11:03, Ting-Yu Lin wrote:

CSS Shapes Module Level 1 [1] defines three properties: "shape-outside",
"shape-margin", and "shape-image-threshold" (used with shape-outside:
), which allows the users to define non-rectangular shapes for
floating elements.

See the previous discussion for the main bug, pref, and examples in the
"Intent to implement" thread [2]. Also, no notable changes to the spec
since last "intent to implement" mail.

web-platform-tests is under
"web-platform-tests/css/vendor-imports/mozilla/mozilla-central-reftests/"


It's great that we're adding tests for this. As a broader conversation, 
I would like to work out what's required to move away from using the 
css/vendor-imports directory for layout reftests.


As far as I know it's only Gecko using that directory, and I think that 
continuing to put our tests there is problematic because it increases 
the chance that other vendors simply won't realise that there are tests 
they should be paying attention to (e.g. when looking at 
https://wpt.fyi) compared to putting them under css// in the 
same way as all other CSS tests (including those upstreamed from Blink).

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to Ship: Web Authentication

2017-12-06 Thread James Graham

On 05/12/17 20:44, J.C. Jones wrote:

Summary: Support public key cryptographic authentication devices
through Web Authentication.


This sounds pretty cool!


Testing:
Mochitests in-tree; https://webauthn.io/; https://webauthn.bin.coffee/
; Web Platform Tests in-progress


Are the web-platform-tests going to be done before we ship?

For my information, what was missing from wpt that meant you had to 
write mochitests? (I don't doubt that there are good reasons, it's just 
understanding what they are helps shape future work).

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to Ship: Set document.URL (location) to url of active document when navigation started for javascript:-generated documents

2017-12-12 Thread James Graham

On 12/12/17 14:49, Samael Wang wrote:

The current behavior that Firefox would set document.URL to the javascript: URL 
on navigating to the javascript:-generated document doesn't fit HTML5 standard:
https://html.spec.whatwg.org/#javascript-protocol

We're going to fix that in bug 836567, and ship it to 59. That would also 
indicate the URL shown on the address bar will no longer be the javascript: URL 
in this case.

You can find more details on
https://bugzilla.mozilla.org/show_bug.cgi?id=836567



Do we have cross-browser (web-platform) tests covering this change?
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


This year in web-platform-tests

2017-12-15 Thread James Graham

I thought that people might be interested in a retrospective on the
progress with web-platform-tests in the last year. 2017 has seen big
advance in the adoption of web-platform-tests by the wider
browser community, and has brought considerable improvements to the
associated tooling and infrastructure. So, in no particular order:

== Sync ==
* About 600 commits from mozilla-central were upstreamed to GitHub.

* Google deployed a rapid 2-way sync process between
  web-platform-tests and the Blink repository. As a result,
  contributions from Chromium engineers to wpt more than tripled, with
  over 800 CLs upstreamed.

* WebKit and Edge made promising progress on running more tests and
  making contribution easier.

* We developed an improved two-way sync for Firefox which is expected
  to go live in the new year. This is expected to dramatically improve
  latency of our synchronisations, and, by tracking upstream changes in
  bugzilla, improve the visibility of wpt changes to gecko developers.

== Test Coverage ==

* A policy of requiring test changes with normative spec changes was
  widely adopted by W3C and WHATWG specs. Over half of all
  browser-relevant specifications now have some variant of this policy.

* The CSS test suite was merged into web-platform-tests, bringing
  almost 30,000 layout tests into the suite. To support running this,
  we overhauled the wpt reftest runner to significantly improve the
  performance.

* Initial work on "testdriver" landed. This will provide a wpt
  frontend to test features that aren't available to web content. The
  initial implementation allows providing user clicks via WebDriver,
  but there are plans to significantly expand the capabilities in the
  future.

* Significantly improved the support for testing the WebDriver spec.

== Infrastructure ==

* https://wpt.fyi launched. This displays the results of the tests in
  Firefox, Chrome, Edge and Safari from daily runs of the full
  suite. There are plans to improve this in the future to make it
  clearer which test failures are identifying interoperability
  problems.

* wpt-verify job enabled as Tier 2 on treeherder. This is the wpt
  equivalent of the test-verify job for locating tests that are
  intermittent as they are initially committed.

* Added a CLI in upstream web-platform-tests which provides support
  for running tests in multiple browsers using a `wpt run `
  command. This also supports running on Sauce Labs if you need to get
  a result for a browser that isn't available locally.

* Improved the usability of upstream PRs by consolidating all the
  relevant information into a single dashboard rather than spreading
  it over multiple GitHub comments.

* Improved the upstream CI testing to be faster and cover more
  browsers. Further improvements to increase reliability and improve
  the visibility of test results are planned for the near future.

Thanks to everyone, both at Mozilla, and in the wider wpt community,
who contributed to these changes. I think we've really made a lot of
progress on making cross-browser testing in integral part of the
process of browser engineering.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Next year in web-platform-tests

2017-12-15 Thread James Graham

Following the summary of what we achieved in wpt in the last year, I'd
like to solicit input from the gecko community to inform the
priorities of various pieces of wpt work for the next year.

In order to maximise the compatibility benefit we have two long term
objectives for the web-platform-tests work at Mozilla:

* Make writing writing web-platform-tests the default for all
  web-exposed features, so that we only write unshared tests in
  exceptional cases (e.g. where there is a need to poke at Gecko
  internals).

* Ensure that gecko developers are using the web-platform-tests
  results to avoid interoperability issues in the features they
  develop.

Obviously we are some way from achieving those goals. I have a large
list of things that I think will make a difference, but obviously I
have a different perspective to gecko developers, so getting some
feedback on the priorities that you have would be good (I know I have
already have conversations with several people, but it seems good to
open up the question to a broader audience). In particular
it would help to hear about things that you would consider blockers to
removing tests from non-wpt suites that are duplicated in wpt
(assuming exact duplication), and limitations either in the
capabilities of wpt or in the workflow that lead to you writing other
test types for cross-browser features.

Thanks

(Note: I set the reply-to for the email version of this message to be 
off-list as an experiment to see if that avoids the anchoring effect 
where early replies set the direction of all subsequent discussion. But 
I'm very happy to have an on-list conversation about anything that you 
feel merits a broader audience).

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to implement: support CSS paint-order for HTML text

2017-12-24 Thread James Graham

On 24/12/2017 13:13, Emilio Cobos Álvarez wrote:

On 12/24/2017 02:01 PM, Jonathan Kew wrote:

Tests - Is this feature fully tested by web-platform-tests? No, as it is
not yet spec'd (see above). I propose to land a basic mozilla reftest
along with the patches in bug 1426146 (behind a pref); if/when the CSS
WG agrees to accept this issue in the spec, we can migrate the reftest
to WPT


Just FYI, other people land tests into WPT with .tentative.html in the
name, like:

   https://github.com/w3c/web-platform-tests/pull/8602

Not sure what's preferred, I believe that if the chances of this getting
spec'd are high it may be better, but...

James, do you know whether there's any official guideline for these kind
of situations?


The .tentative.html thing is an accepted convention for stuff that tests 
the presumed behaviour of a future spec, although it's possible that 
there aren't any CSS tests using the pattern yet.


I would certainly encourage using it here rather than having to remember 
to upstream a test at some later date.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Improved wpt-sync now running experimentally

2018-02-09 Thread James Graham
The new sync for web-platform-tests is now running experimentally. This 
provides two way sync between the w3c/web-platform-tests repository on 
GitHub and mozilla-central, so allowing gecko developers to contribute 
to web-platform-tests using their normal gecko workflow, and ensuring 
that we get all the upstream changes submitted by the community 
including engineers at Google, Apple, and Microsoft.


The new code is intended to provide the following improvements over the 
old periodic batch sync approach:


* Faster sync. The code to actually land changes to mozilla-central is 
still undergoing testing, but the intent is that we can get at least one 
wpt update per day once the system is fully operational.


* One bug per PR we downstream, filed in a component determined by the 
files changed in the PR.


* One PR per bug we upstream. Currently this will be created when a 
patch lands on inbound or autoland and should be merged when the patch 
reaches central. In some hypothetical future world in which there's a 
single entry point for submitting code to land in gecko (e.g. 
phabricator) this will change so that the PR is created when the code is 
submitted for review, so that upstream test results are available before 
landing (see next point).


* Upstream CI jobs run on PRs originating from gecko repositories. 
Previously we skipped upstream travis jobs on pushes we landed, 
occasionally causing breakage as a result. Now these jobs are run on all 
our pushes and the original bug should get a notification if the jobs fail.


* Notifications of notable changes introduced by upstream PRs. In 
particular we will add a comment when tests that used to pass start to 
not pass, when there are crashes or disabled tests, and for new tests 
that fail. This notification happens in the bug for the sync, but there 
is already an issue open to move things that obviously require attention 
(e.g. crashes) into their own bug.


If you notice problems with the sync, please file an issue [1] or 
complain in #wpt-sync on irc.  The project team consists of:


* jgraham and maja_zf (development, primary contacts)
* AutomatedTester (project management)

Issues are not unanticipated at this time, so thanks in advance for your 
patience as we work out the kinks in the system.


[1] https://github.com/mozilla/wpt-sync/issues

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Improved wpt-sync now running experimentally

2018-02-09 Thread James Graham

On 09/02/2018 19:59, Josh Bowman-Matthews wrote:

On 2/9/18 1:26 PM, James Graham wrote:
* One bug per PR we downstream, filed in a component determined by the 
files changed in the PR.


What does this mean exactly? What is the desired outcome of these bugs?


They're tracking the process and will be closed when the PR lands in 
central. They are used for notifying gecko developers about the incoming 
change, and in particular contain the information about tests that went 
from passing to failing, and other problems during the import.


They are not essential to the sync so if they end up not working well at 
keeping people informed we can revisit the approach.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Improved wpt-sync now running experimentally

2018-02-14 Thread James Graham

On 12/02/2018 20:08, smaug wrote:

On 02/09/2018 10:39 PM, James Graham wrote:

On 09/02/2018 19:59, Josh Bowman-Matthews wrote:

On 2/9/18 1:26 PM, James Graham wrote:
* One bug per PR we downstream, filed in a component determined by 
the files changed in the PR.


What does this mean exactly? What is the desired outcome of these bugs?


They're tracking the process and will be closed when the PR lands in 
central. They are used for notifying gecko developers about the 
incoming change, and in particular contain the information about tests 
that went from passing to failing, and other problems during the import.


I guess I don't understand the bugmail. Most of the time I don't see any 
information about something failing. Am I supposed to look at the commit?

Or are new failures in bugmail like
"
Ran 2 tests and 44 subtests
OK : 2
PASS   : 34
FAIL   : 10
"

Are those 10 failures new failures, or failures from the test total?


That's the total failures. If that's all you see then nothing fell into 
one of the predefined categories of badness that get extra details added 
to the comment. If there is some information that you think should be 
present but is actually missing, please file an issue.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: Module scripts (ES6 modules)

2018-02-14 Thread James Graham

On 14/02/2018 14:13, jcoppe...@mozilla.com wrote:

I intend to turn on 

Re: Intent to ship: OpenType Variation Font support

2018-03-20 Thread James Graham

On 19/03/2018 22:32, Jonathan Kew wrote:
As of this week, for the mozilla-61 cycle, I plan to turn support for 
OpenType Font Variations on by default.


It has been developed behind the layout.css.font-variations.enabled and 
gfx.downloadable_fonts.keep_variation_tables preferences.


Other UAs shipping this or intending to ship it include:
   Safari (on macOS 10.13 or later)
   Chrome (and presumably other Blink-based UAs)
   MSEdge (on Windows 10 Fall Creators Update or later)

Bug to turn on by default: 
https://bugzilla.mozilla.org/show_bug.cgi?id=1447163


This feature was previously discussed in this "intent to implement" 
thread: 
https://groups.google.com/d/topic/mozilla.dev.platform/_FacI6Aw2BQ/discussion 


Are there now (cross-browser) tests for this feature?
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent To Require Manifests For Vendored Code In mozilla-central

2018-04-10 Thread James Graham

On 10/04/2018 05:25, glob wrote:
mozilla-central contains code vendored from external sources. Currently 
there is no standard way to document and update this code. In order to 
facilitate automation around auditing, vendoring, and linting we intend 
to require all vendored code to be annotated with an in-tree YAML file, 
and for the vendoring process to be standardised and automated.



The plan is to create a YAML file for each library containing metadata 
such as the homepage url, vendored version, bugzilla component, etc. See 
https://goo.gl/QZyz4xfor the full specification.


So we now have moz.build that in addition to build instructions, 
contains metadata for mozilla-authored code (e.g. bugzilla components) 
and moz.yaml that will contain similar metadata but only for 
non-mozilla-authored code, as well as Cargo.toml that will contain (some 
of) that metadata but only for code written in Rust.


As someone who ended up having to write code to update moz.build files 
programatically, the situation where we have similar metadata spread 
over three different kinds of files, one of them Turing complete, 
doesn't make me happy. Rust may be unsolvable, but it would be good if 
we didn't have two mozilla-specific formats for specifying metadata 
about source files. It would be especially good if updating this 
metadata didn't require pattern matching a Python AST.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent To Require Manifests For Vendored Code In mozilla-central

2018-04-10 Thread James Graham

On 10/04/2018 14:34, Ted Mielczarek wrote:

On Tue, Apr 10, 2018, at 9:23 AM, James Graham wrote:

On 10/04/2018 05:25, glob wrote:

mozilla-central contains code vendored from external sources. Currently
there is no standard way to document and update this code. In order to
facilitate automation around auditing, vendoring, and linting we intend
to require all vendored code to be annotated with an in-tree YAML file,
and for the vendoring process to be standardised and automated.


The plan is to create a YAML file for each library containing metadata
such as the homepage url, vendored version, bugzilla component, etc. See
https://goo.gl/QZyz4xfor the full specification.


So we now have moz.build that in addition to build instructions,
contains metadata for mozilla-authored code (e.g. bugzilla components)
and moz.yaml that will contain similar metadata but only for
non-mozilla-authored code, as well as Cargo.toml that will contain (some
of) that metadata but only for code written in Rust.

As someone who ended up having to write code to update moz.build files
programatically, the situation where we have similar metadata spread
over three different kinds of files, one of them Turing complete,
doesn't make me happy. Rust may be unsolvable, but it would be good if
we didn't have two mozilla-specific formats for specifying metadata
about source files. It would be especially good if updating this
metadata didn't require pattern matching a Python AST.


We are in fact rethinking the decision to put file metadata in moz.build files 
for these very reasons. I floated the idea of having it live in these same YAML 
files that glob is proposing for vendoring info since it feels very similar. I 
don't want to block his initial work on tangentially-related concerns, but I 
think we should definitely look into this once he gets a first version of his 
vendoring proposal working. I don't know if there's anything useful we can do 
about Cargo.toml--we obviously want to continue using existing Rust practices 
there. If there are specific things you need to do that are hard because of 
that I'd be interested to hear about them to see if there's anything we can 
improve.


That's great to hear! The main thing I currently have to do is 
automatically update bug component metadata when files move around 
during wpt imports. However one can certainly imagine having to script 
similar metadata updates For example, I assume that wpt is not "third 
party" code according to the terms of this discussion, since it's also 
edited in-tree, and whatever tooling we have to support generic third 
party repos won't apply. But it would make sense to store the upstream 
revision of wpt in there rather than in one-off custom file like we do 
currently.  So reusing the same moz.yaml format everywhere rather than 
having one case for "local" code and one for "remote" would make sense 
to me as someone maintaining what amounts to an edge case.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: PerformanceServerTiming

2018-04-24 Thread James Graham

On 24/04/2018 20:32, Valentin Gosu wrote:

Bug 1423495  is set
to land on m-c and we intend to let it ride the release train, meaning we
are targeting Firefox 61.

Chromium bug: https://bugs.chromium.org/p/chromium/issues/detail?id=702760

This affects web-compat, since per our "restrict new features to secure
origins policy" the serverTiming attribute will be undefined on unsecure
origins.
There is a bug on the spec to address this issue:
https://github.com/w3c/server-timing/issues/54

Link to the spec: https://w3c.github.io/server-timing/


What's the wpt test situation for this feature, and how do our results 
compare to other browsers?

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: PerformanceServerTiming

2018-04-25 Thread James Graham

On 24/04/2018 22:36, Valentin Gosu wrote:
On 24 April 2018 at 22:44, James Graham 
<https://bugs.chromium.org/p/chromium/issues/detail?id=702760>


This affects web-compat, since per our "restrict new features to
secure
origins policy" the serverTiming attribute will be undefined on
unsecure
origins.
There is a bug on the spec to address this issue:
https://github.com/w3c/server-timing/issues/54
<https://github.com/w3c/server-timing/issues/54>

Link to the spec: https://w3c.github.io/server-timing/
<https://w3c.github.io/server-timing/>


What's the wpt test situation for this feature, and how do our
results compare to other browsers?


The WPT tests pass when run over HTTPS: 
https://w3c-test.org/server-timing/test_server_timing.html


If we are only supporting this in secure contexts, we should rename the 
test so that it has .https. in the filename which will cause it to be 
loaded over https when run (e.g. in our CI). If there is general 
agreement about restricting the feature to secure contexts, we should 
additionally add a test that it doesn't work over http.


I can't imagine this would be controversial, but if it is we should at 
least ensure that there's a copy of the test set up to run over https.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: media-capabilities

2018-05-14 Thread James Graham

On 14/05/2018 16:19, Jean-Yves Avenard wrote:

Media Capabilities allow for web sites to better determine what content to 
serve to the end user.
Currently a media element offers the canPlayType method 
(https://html.spec.whatwg.org/multipage/media.html#dom-navigator-canplaytype-dev 
)
 to determine if a container/codec can be used. But the answer is limited as a 
maybe/probably type answer.

It gives no ability to determine if a particular resolution can be played 
well/smoothly enough or be done in a power efficient manner (e.g. will it be 
hardware accelerated).

This has been a particular problem with sites such as YouTube that serves VP9 
under all circumstances even if the user agent won't play it well (VP9 is 
mostly done via software decoding and is CPU itensive). This has forced us to 
indiscriminately disable VP9 altogether).
For YouTube to know that VP9 could be used for low resolution but not high-def 
ones would allow them to select the right codec from the start.

This issue is tracked in bugzilla 1409664  
(https://bugzilla.mozilla.org/show_bug.cgi?id=1409664 
)

The proposed spec is available at https://wicg.github.io/media-capabilities/ 


Chrome has shipped it a while ago now and talking to several partners 
(including YouTube, Netflix, Facebook etc) , Media Capabilities support has 
been the number one request.



What is the testing situation for this feature? Do we have 
web-platform-tests?

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Launch of Phabricator and Lando for mozilla-central

2018-06-07 Thread James Graham

On 06/06/2018 15:57, Mark Côté wrote:


Similarly, there are two other features which are not part of initial launch 
but will follow in subsequent releases:
* Stacked revisions. If you have a stack of revisions, that is, two or more 
revisions with parent-child relationships, Lando cannot land them all at once.  
You will need to individually land them. This is filed as 
https://bugzilla.mozilla.org/show_bug.cgi?id=1457525.


Have we considered the impact this will have on our CI load? If we 
currently have (say — I didn't bother to compute the actual number) an 
average of 2 commits per push, it seems like this change could increase 
the load on inbound by a corresponding factor of 2 (or perhaps less if 
the multiple-final-commit workflow is so bad that people start pushing 
fewer, larger, changes).

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Fwd: WPT Developer Survey - June 2018

2018-06-20 Thread James Graham

On 19/06/2018 10:01, Andreas Tolfsen wrote:

If you run, write, or work with Web Platform Tests (WPT) in some
capacity, we would like to invite you to answer a short survey.

The survey helps us identify ergonomic problems so that we can
improve the tools for building an interoperable web platform.


Just to re-emphasise what Andreas said; I know there are a lot of 
surveys going on at the moment, and we had a Mozilla-internal wpt survey 
late last year, but if you work on gecko development or otherwise on 
web-platform features it will be really valuable to us if you take five 
minutes to respond to this survey and thus ensure that the cross-browser 
testing initiative understands, and can meet, the needs of the Mozilla 
community.



-- >8 --


From: Simon Pieters 
Subject: WPT Developer Survey - June 2018
Date: 19 June 2018 at 09:20:03 BST
To: "public-test-in...@w3.org" 
Resent-From: public-test-in...@w3.org

Hello public-test-infra!

We're gathering feedback about recent changes and pain points in
wpt. Please help us by filling out this survey (and passing it on
to others you know who work with wpt) so we can better prioritize
future work and improve the experience for everyone. We won't take
much of your time - promise!

https://goo.gl/forms/gO2hCgCMvqiAHCVd2

Thank you!


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform



___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: PSA: pay attention when setting multiple reviewers in Phabricator

2018-07-05 Thread James Graham

On 05/07/2018 18:19, Mark Côté wrote:
I sympathize with the concerns here; however, changing the default would 
be a very invasive change to Phabricator, which would not only be 
complex to implement but troublesome to maintain, as we upgrade 
Phabricator every week or two.


This is, however, something we can address with our new custom 
commit-series-friendly command-line tool. We are also working towards 
the superior solution of automatically selecting reviewers based on 
module owners and peers and enforcing this in Lando.


Automatically selecting reviewers sounds like a huge improvement, 
particularly for people making changes who haven't yet internalised the 
ownership status of the files they are touching (notably any kind of 
first-time or otherwise infrequent contributor to a specific piece of 
code). So I'm very excited about this change.


That said, basing it on the list of module owners & peers seems like it 
may not be the right decision for a number of reasons:


* The number of reviews for a given module can be very large and being 
unconditionally selected for every review in a module may be overwhelming.


* The list of module owners and peers is not uniformly well maintained 
(in at least some cases it suggests that components are owned by people 
who have not been involved with the project for several years). Although 
this should certainly be cleaned up, the fact is that the current data 
is not reliable in many cases.


* Oftentimes there is substructure within a module that means that some 
people should be reviewers in certain files/directories but have no 
knowledge of other parts.


* It usually desirable to have people perform code review for some time 
as part of the process of becoming a module owner or peer.


A better solution would be to have in-tree metadata files providing 
subscription rules for code review (e.g. a mapping of usernames to a 
list of patterns matching files). Module owners would be responsible for 
reviewing changes to these rules to ensure that automatic delegation 
happens to the correct people.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: Changes to how offset*, client*, scroll* behave on tables

2018-07-10 Thread James Graham

On 10/07/2018 17:25, Boris Zbarsky wrote:

Bug: https://bugzilla.mozilla.org/show_bug.cgi?id=820891

Summary: In other browsers, and arguably per spec as far as cssom-view 
specs things, various geometry APIs on tables should report values for 
the table wrapper, not the table itself, because they are defined to 
work on the "first" box generated by the element.  That means that the 
caption is included in the returned values and that things like 
clientWidth should include the table border, modulo the various 
box-sizing weirdness around tables.


Right now, we are just applying the geometry APIs to the table box 
itself.  The patches in the above bug change this.


The behavior of getBoundingClientRect and getClientRects is not being 
changed here, though there is lack of interop around it as well; I filed 
https://bugs.webkit.org/show_bug.cgi?id=187524 and 
https://bugs.chromium.org/p/chromium/issues/detail?id=862205 on that.


Our new behavior aligns much better with other browsers and the spec, 
but this is a general heads-up in case there is compat fallout due to 
browser-sniffing or something...


Are there web-platform-tests covering this behaviour (both the part we 
are changing and the part we aren't)?

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: PSA: Re-run old (non-syntax) try pushes with |mach try again|

2018-07-17 Thread James Graham

On 17/07/2018 21:16, Nicholas Alexander wrote:

Ahal,

On Tue, Jul 17, 2018 at 11:55 AM, Andrew Halberstadt > wrote:


While |mach try fuzzy| is generally a better experience than try
syntax, there are a few cases where it can be annoying. One
common case was when you selected a bunch of tasks in the
interface and pushed. Then at a later date you wanted to push
the exact same set of tasks again. This used to be a really poor
experience as you needed to re-select all the same tasks
manually.

As of now, you can use |mach try again| instead. The general
workflow is:

This is awesome, thank you for building it!

Can it be extended to "named pushes"?  That is, right now I use my shell 
history to do `mach try fuzzy -q "'build-android | 'robocop", but nobody 
else will find that without me telling them, and it won't be 
automatically updated when robocop gets renamed.  That is, if I could 
`mach try fuzzy --named android-tier1` or something I could save myself 
some manual command editing and teach other people what a green try run 
means in my area.


./mach try fuzzy --save android-tier1 -q "'build-android | 'robocop"

And then run with

./mach try fuzzy --preset android-tier1

I think that's what you want? There isn't a way to share it or anything, 
but it works well for the use case of "I make the same set of try pushes 
repeatedly over many patches".

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Developer Outreach - Web Platform Research and Recommendations

2018-07-26 Thread James Graham

On 26/07/2018 19:15, Dietrich Ayala wrote:


Why are we doing this?

The goals of this effort are to ensure that the web platform technologies we're 
investing in are meeting the highest priority needs of today's designers and 
developers, and to accelerate availability and maximize adoption of the 
technologies we've prioritized to meet these needs.


I think this is a great effort, and all the recommendations you make 
seem sensible.


Taking half a step back, the overriding goal seems to be to make 
developing for the web platform a compelling experience. I think one way 
to subdivide this overall goal is into two parts


* Ensure that the features that are added to the platform meet the 
requirements of content creators (i.e. web developers).


* Ensure that once shipped, using the features is as painless as 
possible. In particular for the web this means that developing content 
that works in multiple implementations should not be substantially more 
expensive than the cost of developing for a single implementation.


The first point seems relatively well covered by your plans; it's true 
that so far the approach to selecting which features to develop has been 
ad-hoc, and there's certainly room to improve.


The second point seems no less crucial to the long term health of the 
web; there is a lot of evidence that having multiple implementations of 
the platform is not a naturally stable equilibrium and in the absence of 
continued effort to maintain one it will drift toward a single dominant 
player and de-facto vendor control. The cheaper it is to develop content 
that works in many browsers, the easier it will be to retain this 
essential distinguishing feature of the web.


There are a number of things we can do to help ensure that the cost to 
developers of targeting multiple implementations is relatively low:


1) Write standards for each feature, detailed enough to implement 
without ambiguity.


2) Write a testsuite for each feature, ensure that it's detailed enough 
to catch issues and ensure that we are passing those tests when we ship 
a new feature.


3) Ensure that the performance profile of the feature is good enough 
compared to other implementations (in particular if it's relatively easy 
to hit performance problems in one implementation, that may prevent it 
being useful in that implementation even though it "works")


4) Ensure that developers using the feature have a convenient way to 
develop and debug the feature in each implementation.


5) Ensure that developers have a convenient way to do ongoing testing of 
their site against multiple different implementations so that it 
continues to work over time.


There are certainly more things I've missed.

On each of those items we are currently at a different stage of progress:

1) Compared to 14 years ago, we have got a lot better at this. Standards 
are usually written to be unambiguous and produce defined behaviour for 
all cases. Where they fall short of this we aren't always disciplined at 
providing feedback on the problems, and there are certainly other areas 
we can improve.


2) We now have a relatively well established cross-browser testsuite in 
web-platform-tests. We are still relatively poor at ensuring that 
features we implement are adequately tested (essentially the only 
process here is the informal one related to Intent to Implement emails) 
or that we actually match other implementations before we ship a feature.


3) Performance testing is obviously hard and whilst benchmarks are a 
thing, it's hard to make them representative of the entire gamut of 
possible uses of a feature. We are starting to work on more 
cross-browser performance testing, but this is difficult to get right. 
The main strategy seems to just to try to be fast in general. Devtools 
can be helpful in bridging the gap here if it can identify the cause of 
slowness either in general or in a specific engine.


4) This is obviously the role of devtools, making it convenient to 
develop inside the browser and possible to debug implementation-specific 
problems even where a developer isn't using a specific implementation 
all the time. Requiring devtools support for new features where it makes 
sense seems like a good step forward.


5) This is something we support via WebDriver, but it doesn't cover all 
features, and there seems to be some movement toward vendor-specific 
replacements (e.g. Google's Puppeteer), which prioritise the goal of 
making development and testing in a single browser easy, at the expense 
of cross-browser development / testing hard. This seems like an area 
where we need to do much better, by ensuring we can offer web developers 
a compelling story on how to test their products in multiple browsers.


So, to bring this back to your initiative, it seems that the only point 
above you really address is number 4 by recommending that devtools 
support is required for shipping new features. I fully agree that this 
is a good reco

Re: Developer Outreach - Web Platform Research and Recommendations

2018-07-31 Thread James Graham

On 27/07/2018 21:26, Dietrich Ayala wrote:

Additionally, much of what we're proposing is based directly on the
interviews we had with people in different roles in the development of the
web platform. Common themes were: lack of data for making
selection/prioritization decisions, visibility of what is in flight both in
Gecko and across all vendors, lack of overall coordination across vendors,
and visibility into adoption. Those themes are the first priority, and
drove this first set of actions. >
Much of what you discuss is, as you noted, far better than in the past, so
maybe is why they didn't come up much in the interviews?


Without knowing what was in those interviews it's hard to conjecture 
about the reasons for any differences. All I can do is point out the 
issues I perceive.



2) Write a testsuite for each feature, ensure that it's detailed enough

to catch issues and ensure that we are passing those tests when we ship a
new feature.

2) We now have a relatively well established cross-browser testsuite in

web-platform-tests. We are still relatively poor at ensuring that features
we implement are adequately tested (essentially the only process here is
the informal one related to Intent to Implement emails) or that we actually
match other implementations before we ship a feature.

Can you share more about this, and some examples? My understanding is that
this lies mostly in the reviewer's hands. If we have these testsuites, are
they just not in automation, or not being used?


We have the testsuites and they are in automation, but our CI 
infrastructure is only designed to tell us about regressions relative to 
previous builds; it's not suitable for flagging general issues like "not 
enough of these tests pass".


Comparisons between browsers are (as of recently) available at wpt.fyi 
[1], but we don't have any process that requires people to look at this 
data.


We also know that many features which have some tests don't have enough 
tests (e.g. a recent XHR bugfix didn't cause any tests to start passing, 
indicating a problem with the coverage). This is a hard problem in 
general, of course, but even for new features we don't have any 
systematic approach to ensuring that the tests actually cover the 
feature in a meaningful way.



3) Ensure that the performance profile of the feature is good enough

compared to other implementations (in particular if it's relatively easy to
hit performance problems in one implementation, that may prevent it being
useful in that implementation even though it "works")

3) Performance testing is obviously hard and whilst benchmarks are a

thing, it's hard to make them representative of the entire gamut of
possible uses of a feature. We are starting to work on more cross-browser
performance testing, but this is difficult to get right. The main strategy
seems to just to try to be fast in general. Devtools can be helpful in
bridging the gap here if it can identify the cause of slowness either in
general or in a specific engine.

There is a lot of focus and work on perf generally, so not something that
really came up in the interviews. I'm interested in learning about gaps in
developer tooling, if you have some examples.


I note that there's a difference between "perf generally" and 
"compatibility-affecting perf" (although both are important and the 
latter is a subset of the former). Perf issues affect compatibility when 
they don't exist in other engines with sufficiently high combined 
marketshare. So something that is slow in Firefox but fast in all other 
browsers is likely to be used in real sites, whereas a feature that's 
fast in Firefox but slow in all other browsers probably won't get used 
much in the wild.


In terms of specific developer tooling, I don't have examples beyond the 
obvious that developers should be able to profile in a way that allows 
them to figure out which part of their code is causing slowness in 
particular implementations, in much the same way you would expect in 
other development scenarios.



4) Ensure that developers using the feature have a convenient way to

develop and debug the feature in each implementation.

4) This is obviously the role of devtools, making it convenient to

develop inside the browser and possible to debug implementation-specific
problems even where a developer isn't using a specific implementation all
the time. Requiring devtools support for new features where it makes sense
seems like a good step forward.

We've seen success and excitement when features are well supported with
tooling. We're asserting that *always* shipping tooling concurrently with
features in release will amplify adoption.


I entirely agree that coordinating tooling with features makes sense.


5) Ensure that developers have a convenient way to do ongoing testing of

their site against multiple different implementations so that it continues
to work over time.

5) This is something we support via WebDriver, but it doesn't cover all

Re: Developer Outreach - Web Platform Research and Recommendations

2018-08-01 Thread James Graham

On 31/07/18 10:34, James Graham wrote:
One of the underlying concerns I have here is that there are a lot of 
separate groups working on different parts of this. As one of the people 
involved, I would nevertheless struggle to articulate the overall 
Mozilla strategy to ensure that the web remains a compelling platform 
supporting multiple engine implementations. I believe it's important 
that we do better here and ensure that all these different teams have a 
shared understanding to set processes and priorities.


As a followup, I just came across [1], which details an explicit 
cross-functional strategy at Google for developer experience and web 
compatibility.


[1] 
https://medium.com/ben-and-dion/mission-improve-the-web-ecosystem-for-developers-3a8b55f46411

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: ./mach try fuzzy: A Try Syntax Alternative

2018-08-06 Thread James Graham

On 06/08/2018 01:25, Botond Ballo wrote:

Is there an easy way to do a T-push (builds on all platforms, tests on
one platform only) with |mach try fuzzy|?

I usually do T-pushes using try syntax, by Trychooser seems to be out
of date when it comes to building a T-push syntax for Android, so I'm
at a loss as to how to do a T-push for Android right now.


There are a couple of options. Interactively you can select all the 
builds you want, press ctrl+a (or whatever the select-all keybinding you 
have configured is), then do the same again with the tests you want, 
then accept all your choices.


If you want to construct a single query string that can be reused with 
--save, something like 'test-linux64 | build !ccov !pgo !msvc' seems to 
select all builds and tests just on linux64. Unfortunately I can't 
figure out any way to logically group expressions, which does make 
composing multiple terms more tricky.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Allowing web apps to delay layout/rendering on startup

2015-08-04 Thread James Graham

On 03/08/15 16:46, Bobby Holley wrote:

On Mon, Aug 3, 2015 at 12:37 AM, Jonas Sicking  wrote:


On Mon, Aug 3, 2015 at 12:32 AM, Anne van Kesteren 
wrote:

On Mon, Aug 3, 2015 at 9:23 AM, Jonas Sicking  wrote:

I think something like a  might be a
simpler solution here. Coupled with either simply removing the 
from the DOM, or having a function which indicates that rendering is
ok.


Neither of those deal well with multiple libraries being included in
the page, which is likely going forward with custom elements et al.


I suspect it's better for the various components to indicate when they
are done loading and let the page indicate which components are
critical, and which ones aren't.



Agreed. I think it would be very strange for a library to block all
rendering. The  tag (with removal to indicate readiness) sounds good
to me - then we don't even need a separate event.


I am extremely wary of designing a solution like this where there's a 
single master switch that any code can unilaterally flip; if the 
assumption that libraries will never want to delay rendering turns out 
to be false it will force page authors to deal with N library-specific 
protocols to indicate that they are no longer blocking rendering, and 
give any one component that ability to override all others.


Unrelatedly, I assume that people are thinking of this as a hint to the 
UA rather than an absolute requirement. I would certainly expect the UA 
to render in any case after some timeout so that sites with some mild 
brokenness that causes them not to unset the no-render flag for whatever 
reason (e.g. browser-specific codepaths and insufficient testing) are 
still actually usable.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: referrerpolicy attribute

2015-12-02 Thread James Graham

On 02/12/15 11:16, Franziskus Kiefer wrote:

There are web-platform-tests [1], though they're not up to date with the
spec. In particular, they still use |referrer| as attribute name instead of
|referrerpolicy|. The idl name is referrerPolicy, is that the
capitalisation issue you mean?

So there are no tests for interoperability at the moment.


It sounds like we need to fix these tests as part of this implementation 
work.


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Too many oranges!

2015-12-22 Thread James Graham

On 22/12/15 17:22, Andrew Halberstadt wrote:

FWIW a summary of top orangefactor[1] oranges are posted regularly
to dev.tree-alerts. Configuring it to also post to
dev.platform is certainly possible if that's what people want. Though I
have a feeling that people will mostly ignore these emails anyway if we do.


I think previous discussions on this topic have revealed that people 
mostly ignore those emails if they're not personally actionable. So 
whilst there are a few people who heroically go through all of the OF 
top bugs and try to get traction on fixes, we would get much better 
results from individual mail to developers or teams showing all the top 
intermittents which they are responsible for.


With that in mind, it's perhaps worth detailing some of the improvements 
we have in mind for treeherder in 2016:


1) Automatic classification of intermittent failures to help sheriffs on 
integration trees, and quickly identify failures that are new on try 
pushes. Potentially, automatic notification when an intermittent 
increases in frequency,


2) A replacement for Orange Factor, based on the same data as the 
autoclassification, that will make it easier to present personalised 
(or, at least path-filtered) lists of top intermittents. These could be 
posted to specific lists where most subscribers would get fully 
actionable information.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: unowned orange by team

2015-12-23 Thread James Graham

On 23/12/15 01:15, Ben Kelly wrote:

Hi all,

In an attempt to wrangle some of the orange plaguing the tree I've tried to
triage the top unowned bugs by team.

ateam/releng:

[…]

10) https://bugzilla.mozilla.org/show_bug.cgi?id=1231798


This is a web-platform-tests test which is an interesting case. De-facto 
I am on point for all problems in these tests, which is fine, but in 
practice there's a limited amount I can do. If the test is broken in 
some obvious way I can fix it, or I can disable it. In some cases I know 
the feature under test well enough to do something a bit more clever. 
But, in order to get the right fix, it would be better to loop in the 
people in the team responsible for the feature under test rather than 
just considering all wpt failures to ateam's responsibility.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: wptview [Re: e10s]

2016-01-08 Thread James Graham

On 08/01/16 22:41, Robert O'Callahan wrote:

On Sat, Jan 9, 2016 at 10:27 AM, Benjamin Smedberg 
wrote:


What are the implications of this?

The web-platform tests are pass/fail, right? So is it a bug if they pass
but have different behaviors in e10s and non-e10s mode?



Yeah, I'm confused.

If a wpt test passes but with different output, then either there is no
problem or the test is incomplete and should be changed.


Maybe I should clarify.

web-platform-tests are slightly different to most tests in that we run 
both tests we currently pass and tests that we currently don't pass. On 
treeherder all we check is that we got the same result in this run as we 
expected on the basis of previous runs. That result might be pass but 
might also be FAIL, ERROR, TIMEOUT, or even CRASH. So they are pass/fail 
from the point of view of "did we meet the expectation value", but the 
expectation value itself might not be a PASS (e.g. expected FAIL got 
PASS would turn treeherder orange, as would expected CRASH got ERROR).


For e10s runs we have the ability to set different expectation values 
than for non-e10s runs. This means that we can continue to run tests 
that behave differently in e10s an only disable unstable ones. This has 
the advantage that we will catch some additional types of regression 
e.g. one that causes a test that PASSes in non e10s, previously FAILed 
in e10s and starts to CRASH in e10s whilst still PASSing in non-e10s. 
These would be missed if we just disabled all tests will differing 
behaviour.


The effect of all of this is that in order to understand what's actually 
needed to bring e10s builds up to par with non-e10s builds you need to 
look at the actual test results rather than just the list of disabled 
tests. I believe that there are both instances of tests that pass in 
non-e10s but not in e10s builds, and the reverse. wptview gives you the 
ability to do that using data directly from treeherder. The actual 
action to take on the basis of this data is obviously something for the 
people working on e10s to determine.


I hope that clarifies things somewhat?

Whilst I am here, it's always worth calling out contributions; wptview 
is a Kalpesh's ateam "Quarter of Contribution" project and he has done 
great work.


P.S. I am currently on leave and will remain so until 18th Jan, so don't 
be surprised if I am unresponsive to follow-ups until then. Ms2ger is a 
good person to ask web-platform-tests questions to in the interim.


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: wptview [Re: e10s]

2016-01-09 Thread James Graham

On 09/01/16 15:43, Benjamin Smedberg wrote:

On 1/8/2016 6:02 PM, James Graham wrote:

On 08/01/16 22:41, Robert O'Callahan wrote:

On Sat, Jan 9, 2016 at 10:27 AM, Benjamin Smedberg

wrote:


What are the implications of this?

The web-platform tests are pass/fail, right? So is it a bug if they
pass
but have different behaviors in e10s and non-e10s mode?



Yeah, I'm confused.

If a wpt test passes but with different output, then either there is no
problem or the test is incomplete and should be changed.


Maybe I should clarify.

web-platform-tests are slightly different to most tests in that we run
both tests we currently pass and tests that we currently don't pass.
On treeherder all we check is that we got the same result in this run
as we expected on the basis of previous runs.


Is this "same as previous run" behavior automatic, or manually
annotated? Running tests which don't pass is supported and common on
many other test suites: fail-if and random-if are used to mark tests as
a known fail but still run them.


It is semi-automated. There are explicit annotations in separate 
metadata files for each test which have to be updated by hand (or using 
output from running the affected tests) when a feature introduces 
different test results (e.g. by fixing tests or adding new non-passing 
ones), but which are automatically generated using a try run for an 
otherwise known-good build when we update to a new version of the 
testsuite from upstream.



Is this a temporary state, where the end goal is to have the
web-platform tests use similar manifests to all of our other tests? Can
you provide some context about why web-platform tests are using
different infrastructure from everywhere else?


I will first note that "similar manifests to our other tests" isn't very 
specific; we already use multiple manifest formats. I will assume you 
mean manifestparser manifests as used for mochitest, but note that 
web-platform-tests contain a mix of both js-based tests and reftests, so 
a single existing format would be insufficient; below I will mainly 
concentrate on the tests that could be well described by 
manifestparser-style manifests, although much applies to both cases.


Unfortunately the web-platform-tests have some rather different 
constraints to other testsuites that make the manifestparser format 
insufficient.


web-platform-tests js tests can contain multiple tests per file 
(sometimes many thousands), so purely per-file metadata is inadequate. 
As I understand it, for other test types we supplement this with in-file 
annotations. In order for us to bidirectionally sync web-platform-tests 
it is essential that we never make local modifications to the test files 
other than intentional bugfixes or additions that are suitable to be 
upstreamed. This means that we must be able to set the expected result 
for each subtest (i.e. individual testcase within a file) in a separate 
local-only file. This is not supported in manifestparser files, nor did 
it seem easy to add.


The restriction on not modifying tests also means that things like prefs 
cannot be set in the tests themselves; it is convenient to use the same 
expectation data files to store this additional information. Rather more 
trivially web-platform-tests may CRASH or ERROR in production, which 
other test types cannot. Obviously support for this would be easier to 
add to manifestparser, but support not existing prevents confusion for 
the multiple test types where the feature wouldn't make sense.


At this point I don't see any real advantages to trying to move to 
manifestparser for all web-platform-tests and many drawbacks, so I don't 
think it will happen. I am also not convinced that it's very relevant to 
the problem at hand; I don't see how the different manifest format is 
causing any issues. Indeed, now that most of our testsuites produce 
structured log output, you don't actually need to look at the input 
manifests at all.


The right thing to do is look at the log files produced from a test run. 
This is what wptview provides a GUI for, and what the test-informant 
tool ahal mentioned elsewhere does on the backend, but anyone with a 
little bit of time and a basic knowledge of the mozlog format (and 
treeherder API, perhaps) could have a go at making a one-off tool to 
answer this specific question efficiently. To do this one would consume 
all the structured logs for the e10s and non-e10s jobs on a push, and 
look for cases where the result is different for the same test in the 
two run types (this would also cover disabled tests that are recorded as 
SKIP).



The effect of all of this is that in order to understand what's
actually needed to bring e10s builds up to par with non-e10s builds
you need to look at the actual test results rather than just the list
of disabled tests. I believe that there are both instances of tests
that pass in non-e10s b

Re: e10s

2016-01-20 Thread James Graham

On 09/01/16 22:29, James Graham wrote:

At this point I don't see any real advantages to trying to move to
manifestparser for all web-platform-tests and many drawbacks, so I don't
think it will happen. I am also not convinced that it's very relevant to
the problem at hand; I don't see how the different manifest format is
causing any issues. Indeed, now that most of our testsuites produce
structured log output, you don't actually need to look at the input
manifests at all.

The right thing to do is look at the log files produced from a test run.
This is what wptview provides a GUI for, and what the test-informant
tool ahal mentioned elsewhere does on the backend, but anyone with a
little bit of time and a basic knowledge of the mozlog format (and
treeherder API, perhaps) could have a go at making a one-off tool to
answer this specific question efficiently. To do this one would consume
all the structured logs for the e10s and non-e10s jobs on a push, and
look for cases where the result is different for the same test in the
two run types (this would also cover disabled tests that are recorded as
SKIP).


So I wrote a script to do this i.e. to produce a complete list of the 
differences between e10s and non-e10s results in a given treeherder run. 
You can get it at [1].


The repository also contains some sample output from a randomly selected 
treeherder run; the .txt file is designed for human consumption; each 
line has the format


test name | subtest name | non-e10s result | e10s result | platforms

Note that reftest harness based test types are currently not supported 
pending bug 1034290.


[1] https://github.com/jgraham/e10s-compare

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: To bump mochitest's timeout from 45 seconds to 90 seconds

2016-02-10 Thread James Graham

On 09/02/16 19:51, Marco Bonardo wrote:

On Tue, Feb 9, 2016 at 6:54 PM, Ryan VanderMeulen  wrote:


I'd have a much easier time accepting that argument if my experience
didn't tell me that nearly every single "Test took longer than expected" or
"Test timed out" intermittent ends with a RequestLongerTimeout as the fix



this sounds equivalent to saying "Since we don't have enough resources (or
a plan) to investigate why some tests take so long, let's give up"... But
then maybe we should have that explicit discussion, rather than assuming
it's a truth.


FWIW I think it's closer to the truth to say that these tests are not 
set up to be performance regression tests and as such they are difficult 
to use as incidential tests for that use case. For example we don't run 
them on machines with well-defined performance characteristics, don't 
make any effort to keep the tests themselves unchanged over time, and 
don't track the test runtime carefully in order to notice regressions. 
Using the test timeout as a threshold is a poor substitute because we 
will miss large regressions that nevertheless allow the test to finish 
inside the timeout, but be indirectly alerted (via intermittency) to 
smaller regressions, or test changes, for tests that were already close 
to the limit.


This isn't to say that we should never care if tests get slower, but 
that isn't a thing that we can reliably determine in our current setup, 
except in very coarse ways that are ineffective at directing engineering 
time onto the most important issues.


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Triage Plan for Firefox Components

2016-04-01 Thread James Graham

On 01/04/16 01:02, Emma Humphries wrote:

I've responded to a similar comment in the google doc, but I'll repeat it
here.

Priority sounds like a great choice, but given that everyone's using the
Priority field their own way, there's enough heterogeneity in how it's used
to make it difficult.


I kind of feel like this is the story of Bugzilla. Every time there's a 
desire to do something slightly new the path of least resistance is to 
add yet another UI element for that one special case. Now 18 years later 
the interface is a confusing mess, and fields are either irrelevant or 
critically important depending on which component the bug happens to be in.


Which is not to argue for a particular solution here. I don't have a 
strong opinion. But once we have picked something, can we at least try 
to remove any UI that is more-or-less vestigial given that decision and, 
at least briefly, fight entropy by making things simpler and more 
consistent, rather than the reverse.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Why is Mozreview hassling me about squashed commits?

2016-04-02 Thread James Graham

On 02/04/16 21:59, Gregory Szorc wrote:

When you say "I almost never want to review individual commits and instead
want to review the changeset as a single diff," I'm confused because a
commit is a changeset (in Mercurial terms at least) and this statement is
contradictory. You seem to be saying that you want to look at a series of
changesets/commits as a single diff? (I'm guessing your definition of
"changeset" doesn't match mine.) Anyway, you seem to be advocating for the
GitHub model where there are N commits on a pull/review request but most
people typically only look at the aggregate/final diff.


FWIW it's my experience that the GitHub model is that people --amend 
every commit and review using the built-in tools is almost impossible 
because history keeps getting lost. That is just a deficiency of GitHub, 
of course, but it means that your comparison doesn't make much sense to me.



The widely practiced Firefox code review model is to try to ensure that
commits that reviewers see - whether on Bugzilla or MozReview - are as
close to their final, landing state as possible. In my opinion, the model
of submitting all the original, intermediate, to-be-thrown-away commits
adds all kinds of UI/UX challenges, overhead, and dissonance to the code
review process. It is much simpler for reviewers and the tooling to require
history rewriting occur on the client side and for the proposed, final
commits - and only the proposed, final commits - to be the thing exposed to
MozReview.


This seems like a mostly orthogonal issue to wanting to see the effect 
of squashing multiple commits. A patch author might have chosen to break 
their change into a number of small commits that build individually, yet 
a reviewer may want to see how they fit together into a coherent whole. 
Without this it's possible to end up in a "can't see the forest for the 
trees" situation, where each change looks OK on its own, but the overall 
patch series is missing an important element. Noticing what *isn't* 
present is much harder than verifying what is, and so we should make 
tools that can help reviewers as much as possible.


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: PSA: Cancel your old Try pushes

2016-04-15 Thread James Graham

On 15/04/16 18:09, Tim Guan-tin Chien wrote:

I wonder if there is any use cases to do multiple Try pushes of different
changesets but with the same bug number. Should we automatically cancel the
old ones when there is a new one?


Unfortunately there are legitimate uses for e.g. comparing the effects 
of two different changesets related to the same bug.


On the other hand, without thinking too hard about the implementation 
details (which I am inclined to believe would be more complex than you 
might expect due to missing APIs, auth, etc.), it seems like it might be 
possible to extend |mach try| to prompt to cancel old pushes for the 
same bug.


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Please use "web-platform-tests --manifest-update" for updating wpt tests

2016-04-20 Thread James Graham

On 20/04/16 13:53, Josh Matthews wrote:

Servo has a script [1] that runs on the build machine that executes
--manifest-update and checks whether the contents of MANFEST.json is
different before and after. We could do the same for Gecko and make it
turn the job orange on treeherder.


I plan to add this, along with the lint from upstream, once it is easy 
to add specific lint jobs to treeherder; aiui a general framework for 
adding this kind of job is currently in progress.


The problem with changes to reftests producing huge diffs in the 
manifest is now fixed upstream, and will be pulled in with the next wpt 
update.


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Please use "web-platform-tests --manifest-update" for updating wpt tests

2016-04-20 Thread James Graham

On 20/04/16 14:13, Nathan Froyd wrote:

On Wed, Apr 20, 2016 at 8:59 AM, James Graham  wrote:

On 20/04/16 13:53, Josh Matthews wrote:

Servo has a script [1] that runs on the build machine that executes
--manifest-update and checks whether the contents of MANFEST.json is
different before and after. We could do the same for Gecko and make it
turn the job orange on treeherder.


I plan to add this, along with the lint from upstream, once it is easy to
add specific lint jobs to treeherder; aiui a general framework for adding
this kind of job is currently in progress.


We can already do this, no?  We have an ESLint job in tree:

https://dxr.mozilla.org/mozilla-central/source/testing/taskcluster/tasks/branches/base_jobs.yml#276


Yes, it's already possible, but I think that ahal has patches that will 
make it easier. I'm hopeful that waiting for that will be a better 
long-term option.


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: PSA: Cancel your old Try pushes

2016-04-26 Thread James Graham

On 15/04/16 16:47, Ryan VanderMeulen wrote:

I'm sure most of you have experienced the pain of long backlogs on Try
(Windows in particular). While we'd all love to have larger pools of test
machines (and our Ops people are actively working on improving that!), one
often-overlooked thing people can do to help with the backlog Right Now is
to cancel pending jobs on pushes they no longer need (i.e. newer push to
Try, broken patch, already pushed to inbound, etc).


Based on a conversation yesterday, it seems that the features of |mach 
try| are not well known. In particular it allows running only a subset 
of tests in cases that you are doing an experimental push that you 
expect to affect mainly one area of the code. For example:


mach try -b do -p linux64 dom

would run every test under dom/ on linux64 only. The other command line 
arguments work like trychooser syntax. For technical reasons the 
resulting tests will all be run in a single chunk (hopefully TaskCluster 
will eventually allow this limitation to be lifted).


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Autoclassification of intermittent failures live on treeherder

2016-04-27 Thread James Graham
Autoclassification of (a subset of) intermittent failures is now running 
on treeherder. You may have spotted that some jobs are now annotated 
with a hollow star symbol; this means that the autoclassifier matched 
all the error lines in that job with previously observed intermittents. 
The star will become filled once a sheriff has verified the matches made 
by the autoclassifier.


In the short term this feature should not have a big impact on most 
developers; the main benefit is that the unfilled star will indicate a 
try orange that is entirely due to known intermittents.


The UI used to interact with the autoclassifier is hidden behind a URL 
parameter; add &autoclassify to your treeherder URL to see it. This UI 
is mainly for sheriffing, but can also be used to see if a job was 
partially classified. Note that saving classifications in this UI will 
affect what is classified in the future, across trees, so avoid saving 
anything that may not be an intermittent.


The current classification is very basic; it only matches test failures, 
and only where the test/status/message exactly match a previous 
classification. It also relies on structured logging. As a result, many 
relatively-common failures are not supported:


* Test failures with a variable message
* Crashes
* Suites that don't use structured logging at all or use it in a way 
that doesn't maintain the required invariants (notably reftests)

* Logging messages

Work to improve the range of supported scenarios is in progress. Once we 
are happy with the way that the system is working, it will be possible 
to integrate with other engineering productivity tools such as autoland.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: KeyboardEvent question for docs update

2016-07-12 Thread James Graham

On 12/07/16 05:41, smaug wrote:

On 07/12/2016 07:24 AM, Eric Shepherd wrote:



I can continue to provide the per-OS information (I'd kind of like to --
but I have to consider the time involved), but if it's only marginally
helpful, it may not be worth the maintenance cost, so I'd like to see if
there are opinions on that.

I think the information is super useful, and I'm not aware of other
places in the web where
all that information is available. (but maybe there is some other
documentation somewhere)


I also found that this was the best (in the sense of "easiest to find 
using a search engine") resource containing this information when I was 
looking for it recently. If, however, the same information exists in 
other more canonical locations that I was simply unable to locate, I 
wouldn't object to the MDN page simply linking to these resources.


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to implement: CSS Houdini - Properties & Values API Level 1

2016-07-25 Thread James Graham

On 25/07/16 16:48, Daniel Holbert wrote:

On 07/25/2016 07:11 AM, Ms2ger wrote:

Hey Jonathan,

[...]

Do we know how other vendors feel about this?


Sentiment seems to be positive.

Browser vendors are collaborating on developing the Houdini specs, and I
haven't heard any serious reservations on this spec. (This is among the
more simple/stable of the Houdini family of specs.)

I believe we're not the only ones working on an implementation, too --
Google has a work-in-progress implementation of the Houdini "CSS Paint"
API (with a brief demo video here [1]), and that API layers on top of
this feature ("css properties & values"), which I think means they're
also working on implementing this feature.


Are there automated tests that will be shared with other vendors (and
Servo)?


There are some reftests on the bug [2] (final patch).


AIUI if you move those tests to somewhere under 
layout/reftests/w3c-css/submitted then dbaron will upstream them to the 
CSSWG testsuite at some point.


Alternatively I have no qualms about putting them in web-platform-tests 
where there are fewer metadata requirements and they will certainly be 
shared, but CSSWG might grumble at you.


Either way we should put them somewhere where it is possible for other 
vendors to check for interoperability.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Reorganization of Firefox-UI tests in mozilla-central

2016-09-02 Thread James Graham

On 02/09/16 10:37, Gijs Kruitbosch wrote:

On 02/09/2016 08:08, Henrik Skupin wrote:



The problematic piece here will be the package-tests step which
currently picks complete subfolders. It would mean if we mix-up tests
for firefox-ui-tests and eg. mochitests all would end-up twice in the
common.tests.zip archive. If we want to get away from using subfolders
we would have to improve the test archiver
(https://dxr.mozilla.org/mozilla-central/source/python/mozbuild/mozbuild/action/test_archive.py)

first to not only collect the directories of referenced manifests, but
also only pick those tests which are referenced and leave all others
behind. This would apply to all test suites currently covered by this
mozbuild action.


I am not familiar with this bit of our build architecture, but as far as
I can tell from a quick look it builds bits of the zipfile off the
objdir. So it collects mochitests from $OBJDIR/_tests/testing/mochitest,
where (again, AFAICT) things get installed via manifests.

We should be doing the same for firefox-ui tests. Not just to avoid
duplication of files in archives, but because otherwise, if we want to
add new tests somewhere else, we both have to add the manifests to the
build system and then modify this build system python file to make sure
they get included in the test archive. That would be wrong.


In the medium term we are trying to move away from requiring the 
package-tests step in favour of being able to run tests directly from a 
checkout. Therefore I suggest we avoid adding unnecessary dependencies 
on the objdir.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: What platform features can we kill?

2013-10-10 Thread James Graham

On 10/10/13 15:28, Axel Hecht wrote:

On 10/10/13 2:43 PM, Jeff Walden wrote:

On 10/10/2013 02:27 PM, Axel Hecht wrote:

I agree with the sentiment, but not on the eample.

Having been a peer of the XSLT module back in the days, we started
with a rather js DOM like implementation, and moved over to a pure
nsIContent etc impl, and each step there won us an order of magnitude
in perf.

But do we actually care about the perf of sites that use XSLT now, as
long as perf isn't completely abysmal?  A utility company showing
billing statements, I think we can slow down without feeling guilty.
But if, say, Google Maps or whichever used XSLT (I seem to remember
*something* Google used it, forcing Presto to implement XSLT, back in
the day -- maybe they've switched now, blink thread might say if I
checked it), we might care.

Jeff

My point is, the perf was completely abysmal, and the key is to use
nsINodeInfo for the xpath patterns instead of DOM localName and
namespaceURI string comparisons. There's also a benefit from using the
low-level atom-nsID-based content creation APIs.


Nevertheless it seems worth trying — at least in an experimental way — 
in case performance improvements of js and DOM APIs in the interim have 
made the difference small enough not to matter. If they haven't, that's 
interesting data on its own.


It may also be sufficient to adopt a presto-like XSLT implementation 
where (iirc; I haven't tested and don't remember too well) you just 
construct a string and feed it back through the HTML parser rather than 
trying to work on the output tree directly.


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Pushes to Backouts on Mozilla Inbound

2013-11-05 Thread James Graham

On 05/11/13 14:57, Kyle Huey wrote:

On Tue, Nov 5, 2013 at 10:44 PM, David Burns  wrote:


We appear to be doing 1 backout for every 15 pushes on a rough average[4].
This number I am sure you can all agree is far too high especially if we
think about the figures that John O'Duinn suggests[5] for the cost of each
push for running and testing. With the offending patch + backout we are
using 508 computing hours for essentially doing no changes to the tree and
then we do another 254 computing hours for the fixed reland. Note the that
the 508 hours doesn't include retriggers done by the Sheriffs to see if it
is intermittent or not.

This is a lot of wasted effort when we should be striving to get patches
to stick first time. Let's see if we can try make this figure 1 in 30
patches getting backed out.



What is your proposal for doing that?  What are the costs involved?  It
isn't very useful to say X is bad, let's not do X, without looking at what
it costs to not do X.

To give one hypothetical example, if it requires just two additional full
try pushes to avoid one backout, we haven't actually saved any computing
time.


So, as far as I can tell that the heart of the problem is that the 
end-to-end time for the build+test infrastructure is unworkably slow. I 
understand that waiting half a dozen hours — a significant fraction of a 
work day — for a try run is considered normal. This has a huge knock-on 
effect e.g. it requires people to context switch away from one problem 
whilst they wait, and context switch back into it once they have the 
results. Presumably it also encourages landing changes without proper 
testing, which increases the backout rate. It seems that this will cost 
a great deal not just in terms of compute hours (which are easy to 
measure) but also in terms of developer productivity (which is harder to 
measure, but could be even more significant).


Wht data do we currently have about why the wait time is so long? If 
this data doesn't exist, can we start to collect it? Are there easy wins 
to be had, or do we need to think about restructuring the way that we do 
builds and/or testing to achieve greater throughput?


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Pushes to Backouts on Mozilla Inbound

2013-11-05 Thread James Graham

On 05/11/13 15:20, Till Schneidereit wrote:


Do we have any way to identify tests that break particularly often for
specific areas? If so, we could create a mach command that runs just these
tests and finishes quickly. Something like `mach canary-tests`.


Isn't the end game for this kind of approach where you have a 
(frequently, automatically updated) map of code to tests, so the system 
knows that given a commit touches the code in file x, it seens to run 
the set of tests {T_x}. One can imagine using such a system on try to 
automatically run only the tests most likely to pick up regressions.


It is a lot of work to create the infrastructure for this kind of setup, 
however and I don't know if it would actually be enough of a win to be 
worthwhile.


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Pushes to Backouts on Mozilla Inbound

2013-11-06 Thread James Graham

On 06/11/13 15:49, Ryan VanderMeulen wrote:

On 11/6/2013 6:58 AM, Aryeh Gregor wrote:

Has anyone considered allowing try pushes to run only specified
directories of tests, and to allow incremental builds rather than
clobbers on try?  This would make try a heck of lot faster and
resource-efficient, for those who are willing to accept less certain
results.


What do we gain by having results that can't be trusted?


It could be a win if the results are misleading infrequently enough 
compared to the time savings that the expectation time for getting a 
patch to stick on m-c decreases. That depends on the P(result is 
different between try and clobber) and the time saving, neither of which 
I know, though.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Reftests execute differently on Android or b2g?

2014-01-14 Thread James Graham

On 14/01/14 12:45, Neil wrote:


Indeed, the XML parsing didn't "block" when I switched to serving the
reftest from the HTTP server, and I had to add a dummy progress listener
to restore blocking behaviour.


Progress listeners blocking onload is a bug. Please don't rely on it in 
tests (or outside of tests).

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Spring cleaning: Reducing Number & Footprint of HG Repos

2014-03-27 Thread James Graham

On 27/03/14 14:17, Armen Zambrano G. wrote:

On 14-03-26 08:27 PM, Bobby Holley wrote:

I don't understand what the overhead is. We don't run CI on user repos.
It's effectively just ssh:// + disk space, right? That seems totally
negligible.


FTR from an operations standpoint, it is never "just". Never.
If it was *just* we wouldn't even be having this conversation. Trust me.


To be fair there are also considerable costs associated with outsourcing 
VCS hosting, mostly associated with integrating the external hosting 
with other systems that need to work with the repository. For example 
W3C's web-platform-tests testsuite is being hosted on GitHub and as a 
result we have spent a non-trivial amount of effort on integration with 
a system for ensuring contributers agree to a CLA, a code review tool, 
synchronization of HEAD with a web server and various other things. This 
might be less effort than doing all the hosting at the W3C (although the 
reason we did it was purely that GitHub is familiar to potential 
contributers), but of course it will all have to be thrown away if we 
want to move providers in the future.



___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Policy for disabling tests which run on TBPL

2014-04-07 Thread James Graham

On 07/04/14 04:33, Andrew Halberstadt wrote:

On 06/04/14 08:59 AM, Aryeh Gregor wrote:

On Sat, Apr 5, 2014 at 12:00 AM, Ehsan Akhgari
 wrote:

Note that is only accurate to a certain point.  There are other
things which
we can do to guesswork our way out of the situation for Autoland, but of
course they're resource/time intensive (basically running orange
tests over
and over again, etc.)


Is there any reason in principle that we couldn't have the test runner
automatically rerun tests with known intermittent failures a few
times, and let the test pass if it passes a few times in a row after
the first fail?  This would be a much nicer option than disabling the
test entirely, and would still mean the test is mostly effective,
particularly if only specific failure messages are allowed to be
auto-retried.


Many of our test runners have that ability. But doing this implies that
intermittents are always the fault of the test. We'd be missing whole
classes of regressions (notably race conditions).


In practice how effective are we at identifying bugs that lead to 
instability? Is it more common that we end up disabling the test, or 
marking it as "known intermittent" and learning to live with the 
instability, both of which options reduce the test coverage, or is it 
more common that we realise that there is a code reason for the 
intermittent, and get it fixed?


If it is the latter then making the instability as obvious as possible 
makes sense, and the current setup where we run each test once can be 
regarded as a compromise between the ideal setup where we run each test 
multiple times and flag it as a fail if it ever fails, and the needs of 
performance.


If the former is true, it makes a lot more sense to do reruns of the 
tests that fail in order to keep them active at all, and store 
information about the fact that reruns occurred so that we can see when 
a test started giving unexpected results. This does rely on having some 
mechanism to make people care about genuine intermittents that they 
caused, but maybe the right way to do that is to have some batch tool 
that takes all the tests that have become intermittent, and does reruns 
until it has identified the commits that introduced the intermittency, 
and then files P1 bugs on the developer(s) it identifies to fix their bugs.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread James Graham

On 08/04/14 14:43, Andrew Halberstadt wrote:

On 07/04/14 11:49 AM, Aryeh Gregor wrote:

On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek 
wrote:

If a bug is causing a test to fail intermittently, then that test loses
value. It still has some value in that it can catch regressions that
cause it to fail permanently, but we would not be able to catch a
regression that causes it to fail intermittently.


To some degree, yes, marking a test as expected intermittent causes it
to lose value.  If the developers who work on the relevant component
think the lost value is important enough to track down the cause of
the intermittent failure, they can do so.  That should be their
decision, not something forced on them by infrastructure issues
("everyone else will suffer if you don't find the cause for this
failure in your test").  Making known intermittent failures not turn
the tree orange doesn't stop anyone from fixing intermittent failures,
it just removes pressure from them if they decide they don't want to.
If most developers think they have more important bugs to fix, then I
don't see a problem with that.


I think this proposal would make more sense if the state of our
infrastructure and tooling was able to handle it properly. Right now,
automatically marking known intermittents would cause the test to lose
*all* value. It's sad, but the only data we have about intermittents
comes from the sheriffs manually starring them. There is also currently
no way to mark a test KNOWN-RANDOM and automatically detect if it starts
failing permanently. This means the failures can't be starred and become
nearly impossible to discover, let alone diagnose.


So, what's the minimum level of infrastructure that you think would be 
needed to go ahead with this plan? To me it seems like the current 
system already isn't working very well, so the bar for moving forward 
with a plan that would increase the amount of data we had available to 
diagnose problems with intermittents, and reduce the amount of manual 
labour needed in marking them, should be quite low.


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Policy for disabling tests which run on TBPL

2014-04-08 Thread James Graham

On 08/04/14 15:06, Ehsan Akhgari wrote:

On 2014-04-08, 9:51 AM, James Graham wrote:

On 08/04/14 14:43, Andrew Halberstadt wrote:

On 07/04/14 11:49 AM, Aryeh Gregor wrote:

On Mon, Apr 7, 2014 at 6:12 PM, Ted Mielczarek 
wrote:

If a bug is causing a test to fail intermittently, then that test
loses
value. It still has some value in that it can catch regressions that
cause it to fail permanently, but we would not be able to catch a
regression that causes it to fail intermittently.


To some degree, yes, marking a test as expected intermittent causes it
to lose value.  If the developers who work on the relevant component
think the lost value is important enough to track down the cause of
the intermittent failure, they can do so.  That should be their
decision, not something forced on them by infrastructure issues
("everyone else will suffer if you don't find the cause for this
failure in your test").  Making known intermittent failures not turn
the tree orange doesn't stop anyone from fixing intermittent failures,
it just removes pressure from them if they decide they don't want to.
If most developers think they have more important bugs to fix, then I
don't see a problem with that.


I think this proposal would make more sense if the state of our
infrastructure and tooling was able to handle it properly. Right now,
automatically marking known intermittents would cause the test to lose
*all* value. It's sad, but the only data we have about intermittents
comes from the sheriffs manually starring them. There is also currently
no way to mark a test KNOWN-RANDOM and automatically detect if it starts
failing permanently. This means the failures can't be starred and become
nearly impossible to discover, let alone diagnose.


So, what's the minimum level of infrastructure that you think would be
needed to go ahead with this plan? To me it seems like the current
system already isn't working very well, so the bar for moving forward
with a plan that would increase the amount of data we had available to
diagnose problems with intermittents, and reduce the amount of manual
labour needed in marking them, should be quite low.


dbaron raised the point that there are tests which are supposed to fail
intermittently if they detect a bug.  With that in mind, the tests
cannot be marked as intermittently failing by the sheriffs, less so in
an automated way (see the discussion in bug 918921).


Such tests are problematic indeed, but it seems like they're problematic 
in the current infrastructure too. For example if a test goes from 
always passing to failing 1 time in 10 when it regresses, the first time 
we see the regression is likely to be around 10 testruns after the 
problem is introduced. That presumably makes it rather hard to track 
down what when things went wrong. Or are we running such tests N times 
where N is some high enough number that we are confident that the test 
has a 95% (or whatever) chance of failing if there is actually a 
regression? If not maybe we should be. Or perhaps the idea of 
independent testruns isn't useful in the face of all the state we have.


In any case this kind of test could be explicitly excluded from the 
reruns, which would make the situation the same as it is today.



But to answer your question, I think this is something which can be done
in the test harness itself so we don't need any special infra support
for it.  Note that I don't think that automatically marking such tests
is a good idea either way.


The infra support I had in mind was something like "automatically (doing 
something like) starring tests that only passed after being rerun" or 
"listing all tests that needed a rerun" or "having a tool to find the 
first build in which the test became intermittent". The goal of this 
extra infrastructure would be to get the new information about reruns 
out of the testharness and address the concern that doing automated 
reruns would mean people paying even less attention to intermittents 
than they do today.


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Standardized assertion methods

2014-06-03 Thread James Graham
On 03/06/14 00:24, Chris Peterson wrote:
> On 6/2/14, 3:42 PM, Ehsan Akhgari wrote:
>> 2. I also value consistency more than my personal preferences, and based
>> on that, using the existing APIs in some tests and the new APIs in other
>> tests (even if we agreed that #1 above doesn't matter) is strictly worse
>> than the status quo.
> 
> btw, in the mozilla.dev.tech.javascript-engine.internals fork of this
> thread, bz and David Bruant pointed out that W3C's testharness and
> TC39's test262 each use yet another set of assertion function names. Any
> tests we import from those test suites will need glue code to integrate
> with our test harness(es).

In fact, for testharness.js tests (and the W3C web-platform-tests in
general) the plan is to have a dedicated test harness (bug 945222). This
is already up and running on tbpl on Cedar and will be turned on for
mozilla-central as soon as the intermittents are under control (Linux is
looking good, Windows has some issues with WebVTT tests, OSX shows a
little more variability).

As a result, in the near future we won't need glue code between
testharness.js tests and other kinds of tests.

FWIW I think the main problem with the CommonJS assertions is their
their semantics. For example:

* Test assertions shouldn't silently type-cast, but ok, equal and
notEqual all do that. Their brevity compared to strictEqual and
notStrictEqual means that they are likely to be much more widely used.

* notStrictEqual and notDeepEqual are terrible names that actively
mislead about what the functions do.

* deepEqual has, as far as I can tell, underspecified semantics. I can't
tell if it is supposed to recurse into nested objects and, if so,
whether there is supposed to be be any loop prevention (the spec talks
about "equivalent values" for every key, without defining what this means).

* throws doesn't seem to provide any way to test that the type or
properties of the thrown object.

I know we have test harnesses using assertion functions that already
have some of these problems, but I don't see the benefit of adding a new
set of methods with exactly the same issues.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Standardized assertion methods

2014-06-03 Thread James Graham
I'm not sure I grasp your overall point, but I have a few comments.

On 03/06/14 11:22, Mike de Boer wrote:
> 1. The `Assert.*` namespace is optional and may be omitted. This
> module is also present in the addon-sdk and used _with_ that
> namespace, usually with a lowercase `assert.*`. Please pick whatever
> suits your fancy.

FWIW I consider that like other code tests should be optimised for
maintainability. Taking the position "pick whatever" doesn't help with this.

> 2. testharness.js, Mochitest, XPCShell’s head.js and other
> suite-runners that we use in-tree are needlessly monolithic. 

It's not needless if you want to run in non-Mozilla environments where
jaascript modules don't yet exist. For platform-level tests, being able
to cross check our implementation against other implementations has
considerable value.

> They mix
> defining assertion methods, test setup, test definition, test
> teardown in one silo and dogmatically impose a test definition style.
> Their lack of modularity costs us flexibility in adopting and/ or
> promoting TDD development. They are, however, great for writing
> regression tests and that’s what we use them for.

I'm not sure I understand this criticism. testharness.js and Mochitest,
at least, are not really intended for writing unit tests, but for
writing functional tests. I have had no problem using testharness.js in
a setup where a comprehensive upfront testsuite was written in parallel
with the code, which seems to be pretty close to a TDD ethos. I don't
think that technical problems are preventing us adopting this
development methodology.

(Maybe for testing frontend code one can use mochitest for unit tests. I
don't know how that works).

> 4. None of the test-suites promote modularity and needlessly dictate
> a reporting style. What I mean by this is that there’s no way to hook
> different reporting styles in a test runner to promote TDD, for
> example. What does automation use to detect test failures? TAP[1] is
> something efficient and optimised for machine-reading, but we parse
> output lines in a format that is far from an industry standard. We
> humans delve through a whole bunch of scroll back to find the test
> case/ assertion we’re interested in. We even rely on our tooling to
> repeat all the failing tests at the end of a run.

Yes, the way we deal with test output has historically been suboptimal.
We are in the process of fixing that as we speak. We have developed a
json-based protocol [1] for test output. This is already being used for
web-platform-tests, and for FirefoxOS certsuite. There is current work
in progress for converting mochitest and marionette tests to this
format. Other suites will follow.

As you say, once we are using structured logging rather than trying to
parse human-readable logging, it should be possible to do a lot more
interesting things with the logging results. The structured logging
package already comes with formatters for a small number of formats
including, for example, something XUnit XML compatible. There are also
lots of ideas for how to improve the user interface to test results in
automation. These will come after the launch of treeherder.

> 5. Assertion semantics are indeed poorly specified, across the board.
> Our switch from `do_check_matches()` to `deepEqual()` even revealed a
> buggy implementation there, which we didn’t know about. Apart from
> that, it was largely undocumented, not fully covered by unit tests
> except for the pathological cases. I’m actually a bit scared of what
> I’ll find in Mochitest[3] Type coercion is something specifiable, but
> I’m not sure whether that is something `ok`, `equal` and family
> should implement guards for. If there’s a wish/ call for more
> specific assertion methods like `is_truthy`, `is_falsy` and variants
> for all possible coercion traps, I think there’s room in Assert.jsm
> to add those. We are in the sad status quo that all assertion methods
> in all test suites are underspecified to some degree. The fact that
> these methods are an integral part of each suite makes it harder to
> change that. Assert.jsm breaks away from that approach to make these
> improvements possible to a wider audience. If we agree that more
> spec’ing is needed, we might as well fork the spec[2] to MDN and
> collectively work it out.

Changing the semantics of things that people are already using seems
like a uphill battle. I think if you wanted to introduce a common set of
assertion names across Mozilla harnesses, starting from CommonJS rather
than starting from a discussion of actual requirements was the wrong
approach.

> 7. Names of assertion methods are an excellent reason for
> bikeshedding. The main reason for the amount of time it took for the
> spec[2] to be formalised was exactly this, IIRC. Never mind that,
> like I said before: I’m fine with forking the spec and adding aliases
> for each assertion method if need be. I mostly care about the fact
> that we can implement them in one pl

Re: Standardized assertion methods

2014-06-03 Thread James Graham
On 03/06/14 12:27, Mike de Boer wrote:

>>> 4. None of the test-suites promote modularity and needlessly dictate
>>> a reporting style. What I mean by this is that there’s no way to hook
>>> different reporting styles in a test runner to promote TDD, for
>>> example. What does automation use to detect test failures? TAP[1] is
>>> something efficient and optimised for machine-reading, but we parse
>>> output lines in a format that is far from an industry standard. We
>>> humans delve through a whole bunch of scroll back to find the test
>>> case/ assertion we’re interested in. We even rely on our tooling to
>>> repeat all the failing tests at the end of a run.
>>
>> Yes, the way we deal with test output has historically been suboptimal.
>> We are in the process of fixing that as we speak. We have developed a
>> json-based protocol [1] for test output. This is already being used for
>> web-platform-tests, and for FirefoxOS certsuite. There is current work
>> in progress for converting mochitest and marionette tests to this
>> format. Other suites will follow.
> 
> Do you have bug numbers where I can follow that work in progress? It sounds 
> awesome!

Yes, sorry I should have included some.

The metabug for structured logging is bug 916295
The Mochitest work is happening in bug 886570 and Marionette in bug 956739.

Treeherder is using pivotal tracker rather than bugzilla, so I don't
have a bug number for that one, but the "story" for basic integration is
[1].

>> Changing the semantics of things that people are already using seems
>> like a uphill battle. I think if you wanted to introduce a common set of
>> assertion names across Mozilla harnesses, starting from CommonJS rather
>> than starting from a discussion of actual requirements was the wrong
>> approach.
> 
> That’s not what we’re doing here! Changing semantics is a non-goal, merging 
> assertion methods into one re-usable module is.
> 
> Taking the CommonJS spec as an umbrella for these simple assertion methods is 
> a causality and I think it helps provide a common, immediate understanding 
> for new contributors who’d like to write test for the code they contribute.
> 
> Sure, the semantics of `do_check_matches()` changed. But that method was only 
> used in two locations, its use not promoted in any wiki page and its 
> implementation lossy.

I was under the impression that you were proposing landing CommonJS
method support and then forking the commonjs spec to improve the
semantics. I may have misunderstood.

I think I would need some evidence to back up the hypothesis that new
contributers are unusually likely to know CommonJS compared to the N
other test frameworks that exist. For example most automation
contributers have a better working knowledge of Python than Javascript
and would be more comfortable with something that looks like Python's
unittest module. I imagine for front end developers there would be
different biases. For platform developers the story is likely to be
different again.

In any case, learning new assertion names isn't something that strikes
me as being a particularly high barrier to entry. With testharness.js
the only complaint I recall about the names is that they favour
explicitness over brevity. Confusion for people moving from other js
test frameworks has more often come from the differences in high-level
philosophy.

[1] https://www.pivotaltracker.com/s/projects/749519/stories/70575430

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Standardized assertion methods

2014-06-03 Thread James Graham
On 03/06/14 14:16, Mike de Boer wrote:

> Writing wrappers in python around things to improve the current
> situation like a band-aid isn’t the way I’m used to fix things; I
> like to take the bull by the horns[1]
> 
> I’d like to ask _why_ structured logging needs to be bolted on top of
> what we have currently? Is it more work to fix the real problem? Are
> we less comfortable doing things in JS?

I'm not sure what "wrappers in python" you have in mind, but I think
that there are a couple of important points to address here.

The first is a general point about testing at Mozilla. There is a lot
more going on than just testing of js code. We have tests written in,
and for, C++, Javascript, Python, HTML, CSS and probably a bunch more
things that I have forgotten. In terms of requirements it's pretty
different from what's needed to test a small project entirely
implemented in a single language.

As it happens a lot of the orchestration of testing is implemented in
Python. I doubt js was even a viable choice at the time this
infrastructure was originally written, and now we have a set of mature
libraries for dealing with lots of the incidental complexity of testing
Mozilla products, like controlling system processes, and setting up B2G
devices. This code is largely decoupled from the tests themselves and
hopefully isn't something that most developers should need to care about.

But the reason that we don't just throw it all away and start again is
because doing so would be a huge cost for extremely uncertain benefit.
That doesn't mean that we can't work to improve things where they are
bad; I'm sure we all have areas that we think are ripe for change.

In terms of structured logging in particular; I don't know why you think
it's "bolted on", or why it isn't fixing the real problem. To be honest
I don't know what you think "the real problem" is. Structured logging is
basically just an API for communicating the results of tests to anyone
that cares to be notified of them, be it tbpl, mach, treeherder, an app
running in your browser that collects test results, or whatever. The
fact that it currently has a Python implementation just reflects the
fact that a lot of our test infrastructure is Python, and doesn't mean
that there's anything Python-specific about it. One could certainly
implement StructuredLog.jsm that would be entirely interoperable with
the Python code. Indeed, if you are writing a test harness that works
entirely in javascript, I would strongly encourage you to do just that.
The work on converting Mochitest to the new format might well have some
code that you could reuse.

If there are particular requirements that you think structured logging
doesn't meet this is probably a good time to discuss them, since we
haven't deployed it to too many places. Perhaps that discussion would be
better off-list since it might not be of interest to everyone.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Standardized assertion methods

2014-06-03 Thread James Graham
On 03/06/14 22:28, Jonas Sicking wrote:

> testharness.js still requires lots of boiler plate. Especially when
> writing async tests. And especially if you try to follow the rule that
> each test within a file should clean up after itself.

At this point testharness.js has taken several steps to allow more
minimal tests. For example it has the concept of "single page tests" so
that if you want to write one test per page you don't need to use lots
of boilerplate; a test looks like

Test the load event fires


onload=done

At this point I think the main high-level semantic difference with
mochitest is that mochitest keeps going after asserts fail.

> I suspect this
> is part of the reason we haven't seen more people use it for normal
> regression-test writing.

I suspect the stronger reason is that the process for getting
testharness.js tests running on Mozilla infrastructure has been unclear.
Until that is solved — something which I expect to happen soon — it is
unreasonable to expect people to use it.

> * The fact that our httpd.js is completely non-standard and unlikely
> to ever run in other environments. Moving to node.js or pythons
> BaseHTTPServer seems much more likely to get it accepted in other
> environments. Node.js seems like the better choice given that all test
> writers will need to know JS anyway. Again, this isn't so much a
> problem with SimpleTest.js, but rather the environment that we use it
> in.

At this point, more or less the whole of Mochitest, as used at Mozilla,
is completely non-standard and unlikely to get accepted for running in
other environments.

web-platform-tests has standardised on a Python-based web server
designed specifically for writing tests. Care has been taken to ensure
that the test environment is portable to a number of different setups
e.g. running on a central server, running on an individual's machine,
running on automation, etc.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Standardized assertion methods

2014-06-03 Thread James Graham
On 03/06/14 20:34, Boris Zbarsky wrote:

> I'm arguing against Assert.jsm using the commonjs API names.

And I am arguing against using the CommonJS semantics. If we are adding
new assertions it shouldn't be ones that encourage broken tests.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Standardized assertion methods

2014-06-04 Thread James Graham
On 04/06/14 18:42, Mike de Boer wrote:
> On 04 Jun 2014, at 19:20, Ehsan Akhgari 
> wrote:
> 
>> On 2014-06-04, 5:45 AM, Mike de Boer wrote:
>>> On 04 Jun 2014, at 00:33, James Graham 
>>> wrote:
>>> 
>>>> On 03/06/14 20:34, Boris Zbarsky wrote:
>>>> 
>>>>> I'm arguing against Assert.jsm using the commonjs API names.
>>>> 
>>>> And I am arguing against using the CommonJS semantics. If we
>>>> are adding new assertions it shouldn't be ones that encourage
>>>> broken tests.
>>> 
>>> I think this is very subjective and, to be honest, the first time
>>> I heard someone say that the CommonJS semantics are broken, even
>>> encourage broken tests. The API surface is concise enough to
>>> limit the amount of typing and convey the meaning of the method
>>> used. They achieved this to closely follow the English verbs of
>>> operators used to test an code block. I really don’t see how much
>>> closer you’d like to get to 'doing what you say you’re going to
>>> do' as far as API semantics go. I realise that this reasoning is
>>> subjective too. Furthermore, are the tests we have currently
>>> broken? Is there something we need to get increasingly worried
>>> about?
>> 
>> Define broken.  We do have quirks in each of our test frameworks
>> that we'd do differently if we went back in time and wanted to redo
>> things again.
> 
> I wasn’t implying that they’re broken at all, it’s just that James
> was hinting at that.

OK, there seems to be some confusion about what I believe so I will try
to be as explicit as possible:

The CommonJS test assertions have semantics that are problematic when
writing tests. For example:

* They favour (through brevity) the use of == comparisons instead of ===
comparisons (or SameValue comparisons)

* They have function names that are ambiguous and therefore confusing
(notStrictEquals).

* They encourage the use of deepEqual which has underdefined semantics,
particularly in the case of objects that contain cycles (it looks like
Assert.jsm goes into an infinite loop in this case, but I may have
misread the code).

* The throws method encourages lazy testing since it doesn't provide any
way to inspect the properties of the thrown exception.

These concerns with semantics are irrespective of where these functions
are used i.e. this is not just a concern related to testharness.js
(although I would certainly not accept compatible assertions landing
there for the above reasons).

I think that having a common set of assertion functions in multiple
harnesses is a mildly worthwhile goal, as is a shared implementation,
albeit that the latter will only work within the Mozilla ecosystem. I
think that compatibility with NodeJS is a non-goal, or at least no more
important than compatibility with any other existing test frameworks.

I don't personally share the concern with the length of the assert
names, but I think this is a reasonable discussion to have.

I think the argument that "if we land these now we can change things in
the future" is troubling; we almost certainly won't be able to change
the behaviour of any existing assertion function, or at least doing so
will be a lot of work. And adding more and more assertions covering the
same functionality, but with different semantics is only going to make
things more confusing, negating the positive impact of sharing names
cross-suite.

Therefore, if we want to proceed with this work in order to get the
benefit of shared code/api, we should start by ditching the specifics of
current implementation, and design an API that actually meets all our
requirements.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Standardized assertion methods

2014-06-06 Thread James Graham
On 05/06/14 10:38, Mike de Boer wrote:

> As I tried to explain, the CommonJS API naively made sense to me at
> the time. To others as well, because we’re happily using it. As I now
> understand, some of us are very attached to a specific, different,
> API.

FWIW I don't think that I am attached to a "specific different api". I
am, however, attached to api semantics that make writing good tests
easy. I don't think that either CommonJS or SimpleTest achieve this in
their current form. For SimpltTest I think the main problems are:

* The is() and isnot methods use non-strict equality.

* ok() coerces its argument (this is a more minor problem).

* is and ok seem like pretty uninformative names. ise is even worse.

* isenot doesn't even exist.

* The API is largely undocumented. From reading MDN you would think that
is, isnot and ok were the only methods. I can't find any other
doumentation except for the source.

* The API seems to be inconsistently exposed e.g. doesThrow isn't placed
in the global scope but is on the SimpleTest object. But it seems like
some properties of SimpleTest that look like assertions are not e.g. isa
seems to return a bool rather than call ok().

* doesThrow does't provide any means of inspecting the object that was
thrown.

* isDeeply uses non-strict equality comparisons.

* All the todo stuff is mixing concerns. It forces you into a mode of
test writing where properties of a single implementation are hardcoded
into the testcases. This isn't a huge problem when there is only a
single relevant implementation, but we do a lot of work on standards
where there are multiple implementations.

* The fact that the implementation to todo* has to duplicate all the
comparison code is pretty terrible. Maybe that's why all the methods
other than ok, is, and isn are undocumented, because they don't have
todo equivalents.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Standardized assertion methods

2014-06-06 Thread James Graham
On 06/06/14 11:41, Gijs Kruitbosch wrote:
> On 06/06/2014 10:29, James Graham wrote:
>> On 05/06/14 10:38, Mike de Boer wrote:
>>
>>> As I tried to explain, the CommonJS API naively made sense to me at
>>> the time. To others as well, because we’re happily using it. As I now
>>> understand, some of us are very attached to a specific, different,
>>> API.
>>
>> FWIW I don't think that I am attached to a "specific different api". I
>> am, however, attached to api semantics that make writing good tests
>> easy. I don't think that either CommonJS or SimpleTest achieve this in
>> their current form. For SimpltTest I think the main problems are:
>>
>> * The is() and isnot methods use non-strict equality.
> 
> I will go ahead and assert that if you have a test that relies on strict
> versus non-strict equality, you should be using type checks to make that
> explicit, not an extra '=' in your comparisons. Makes assumptions much
> more explicit ("this return value should be a string '5' and not a
> number 5") rather than implied by the comparison function.
> 
> IOW, I wouldn't consider this a bug.

So you believe that every single time you write is(some_func(), 5) you
should also write is(typeof some_func(), "number") ? That seems pretty
much insane to me and I will happily assert that no one actually does it
consistently.

If there are cases where you really don't care about the type — and I
can't think of very many — then in those cases you should explicitly
type convert as a signal that you are doing something strange.

>> * ok() coerces its argument (this is a more minor problem).
> 
> I would even say s/more minor problem/feature/.
> 
> I've lost count of the number of times I've done:
> 
> ok(document.getElementById("foo"), "Foo should exist in the document")

So what you are checking for there is !== null, not "is a thing that
coerces to true". But like I said this is a more minor concern.

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Running mochitests from a copy of the objdir?

2014-08-20 Thread James Graham
On 20/08/14 18:38, Joshua Cranmer 🐧 wrote:
> On 8/20/2014 12:22 PM, L. David Baron wrote:
>> (I estimated that it was going to be faster to get that working than
>> to try to figure out how to use the packaged tests, since it was
>> possible to reverse-engineer from mochitest run inside mach, though
>> if there had been instructions on how to use packaged tests that
>> somebody had actually used before I'd likely have gone the other
>> way.)
> 
> Building packaged tests is easy (make package for the installer, make
> package-tests for the tests); running them is a little harder since you
> have to build the python runtests.py command line yourself. Or you can
> open up a tbpl log and grab the exact command line there. Certainly far
> easier than trying to work out how to run mozharness on a local system...
> 

Running mozharness on a local system is actually documented [1],
although I think that makes it sound harder than it actually is. I have
a run.sh script in the root of my mozharness checkout that looks like

#!/bin/bash

cd scripts

CONFIG_FILE=../configs/web_platform_tests/test_config.py
BUILD_ROOT=/home/jgraham/develop/gecko-dev/obj-x86_64-unknown-linux-gnu/dist
INSTALLER_URL=file://$BUILD_ROOT/firefox-34.0a1.en-US.linux-x86_64.tar.bz2
TEST_URL=file://$BUILD_ROOT/firefox-34.0a1.en-US.linux-x86_64.tests.zip

./web_platform_tests.py --no-read-buildbot-config --config-file
$CONFIG_FILE --installer-url $INSTALLER_URL --test-url $TEST_URL

Obviously for different testsuites you need different config files or
command lines, e.g. mochitest-plain would have something like

CONFIG_FILE=../configs/unittests/linux_unittest.py

and a final command like:

./desktop_unittest.py --no-read-buildbot-config --mochitest-suite plain
--config-file $CONFIG_FILE --installer-url $INSTALLER_URL --test-url
$TEST_URL --download-symbols true

[1]
https://wiki.mozilla.org/ReleaseEngineering/Mozharness/How_to_run_tests_as_a_developer
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


  1   2   3   >