Re: [gwt-contrib] RFC: sharded linking

2010-02-12 Thread Matt Mastracci
On 2010-02-12, at 1:15 PM, Ray Cromwell wrote:

 On Thu, Feb 11, 2010 at 4:43 PM, Scott Blum sco...@google.com wrote:

 - I dislike the whole transition period followed by having to forcibly
 update all linkers, unless there's a really compelling reason to do so.
 
 In general, I'd agree, but the number of linkers in the wild appears
 to be small, this may be a case of trying to preserve an API that only
 5 or 10 people in the world are using.

+1. I've written a handful of custom linkers (including one in the public 
gwt-firefox-extension project), but I'm used to updating them between GWT 
releases to work around subtle changes in the linker contract (ie: the 
evolution of hosted mode, various global variable changes, etc).  

I'd rather have a clean linker system that changes from version to version than 
an awkward one with a lot of legacy interfaces.

Matt.

-- 
http://groups.google.com/group/Google-Web-Toolkit-Contributors


Re: [gwt-contrib] RFC: sharded linking

2010-02-12 Thread Lex Spoon
On Thu, Feb 11, 2010 at 7:43 PM, Scott Blum sco...@google.com wrote:

 I have a few comments, but first I wanted to raise the point that I'm not
 sure why we're having this argument about maximally sharded Precompiles at
 all.  For one thing, it's already implemented, and optional, via
 -XshardPrecompile.  I can't think of any reason to muck with this, or why
 it would have any relevance to sharded linking.  Can we just table that part
 for now, or is there something I'm missing?


There are still two modes, but there's no more need for an explicit
argument.  For Compiler, precompile is never sharded.  For the three-stage
entry points, full sharding happens iff all linkers are shardable.



 - I'm not sure why development mode wouldn't run a sharded link first.
  Wouldn't it make sense if development mode works just like production
 compile, it just runs a single development mode permutation shard link
 before running the final link?


Sure, we can do that. Note, though, that they will be running against an
empty ArtifactSet, because there aren't any compiles for them to look at.
 Thus, they won't typically do anything.



 2) Instead of trying to do automatic thinning, we just let the linkers
 themselves do the thinning.  For example, one of the most
 serialization-expensive things we do is serialize/deserialze symbolMaps.  To
 avoid this, we update SymbolMapsLinker to do most of its work during
 sharding, and update IFrameLinker (et al) to remove the CompilationResult
 during the sharded link so it never gets sent across to the final link.


In addition to the other issues pointed out, note that this adds ordering
constraints among the linkers.  Any linker that deletes something must run
after every linker that wants to look at it.  Your example wouldn't work as
is, because it would mean no POST linker can look at CompilationResults.  It
also wouldn't work to put the deletion in a POST linker, for the same
reason.  We'd have to work out a way for the deletions to happen last, after
all the normal linkage activity.

Suppose, continuing that idea, we add a POSTPOST order that is used only for
deletion.  If it's really only for deletion, then the usual link() API is
overly general, because it lets linkers both add and remove artifacts during
POSTPOST, which is not desired.  So, we want a POSTPOST API that is only for
deletion.  Linkers somehow or another mark artifacts for deletion, but not
anything else.  At this point, though, isn't it pretty much the same as the
automated thinning in the initial proposal?


 The pros to this idea are (I think) that you don't break anyone... instead
you
 opt-in to the optimization.  If you don't do anything, it should still
work, but
 maybe slower than it could.

The proposal that started this thread also does not break anyone.

Lex

-- 
http://groups.google.com/group/Google-Web-Toolkit-Contributors

Re: [gwt-contrib] RFC: sharded linking

2010-02-11 Thread Scott Blum
I have a few comments, but first I wanted to raise the point that I'm not
sure why we're having this argument about maximally sharded Precompiles at
all.  For one thing, it's already implemented, and optional, via
-XshardPrecompile.  I can't think of any reason to muck with this, or why
it would have any relevance to sharded linking.  Can we just table that part
for now, or is there something I'm missing?


Okay, so now on to sharded linking itself.  Here's what I love:

- Love the overall goals: do more work in parallel and eliminate
serialization overhead.
- Love the idea of simulated sharding because it enforces consistency.
- Love that the linkers all run in the same order.

Here's what I don't love:

- I'm not sure why development mode wouldn't run a sharded link first.
 Wouldn't it make sense if development mode works just like production
compile, it just runs a single development mode permutation shard link
before running the final link?

- I dislike the whole transition period followed by having to forcibly
update all linkers, unless there's a really compelling reason to do so.
 Maybe I'm missing some use cases, but I don't see what problems result from
having some linkers run early and others run late.  As Lex noted, all the
linkers are largely independent of each other and mostly won't step on each
other's toes.

- It seems unnecessary to have to annotate Artifacts to say which ones are
transferable, because I thought we already mandated that all Artifacts have
to be transferable.

I have in mind a different proposal that I believe addresses the same goals,
but in a less-disruptive fashion.  Please feel free to poke holes in it:

1) Linker was made an abstract class specifically so that it could be
extended later.  I propose simply adding a new method linkSharded() with
the same semantics as link().  Linkers that don't override this method
would simply do nothing on the shards and possibly lose out on the
opportunity to shard work.  Linkers that can effectively do some work on
shards would override this method to do so.  (We might also have a
relinkSharded() for development mode.)

2) Instead of trying to do automatic thinning, we just let the linkers
themselves do the thinning.  For example, one of the most
serialization-expensive things we do is serialize/deserialze symbolMaps.  To
avoid this, we update SymbolMapsLinker to do most of its work during
sharding, and update IFrameLinker (et al) to remove the CompilationResult
during the sharded link so it never gets sent across to the final link.

The pros to this idea are (I think) that you don't break anyone... instead
you opt-in to the optimization.  If you don't do anything, it should still
work, but maybe slower than it could.

The cons are... well maybe it's too simplistic and I'm missing some of the
corner cases, or ways this could break down.

Thoughts?
Scott

-- 
http://groups.google.com/group/Google-Web-Toolkit-Contributors

[gwt-contrib] RFC: sharded linking

2010-02-09 Thread Lex Spoon
This is a design doc about speeding up the link phase of GWT.  If you don't
maintain a linker, and if you don't have a multi-machine GWT build, then
none of this should matter to you.  If you do maintain a linker, let's make
sure your linker can be updated with the proposed changes.  If you do have a
multi-machine build, or if you have some ideas about them, then perhaps you
can help us get the best speed benefit possible out of this.

I want to speed up linking for multi-machine builds in two ways:

1. Allow more parts of linking to run in parallel.  In particular, anything
that happens once per permutation and does not need information from other
permutations can run in parallel.  As an example, the iframe linker chunks
the JavaScript of each permutation into multiple script tags.  That work
can happen in parallel once the linker API supports it.

2. Link does a lot of Java serialization for its artifacts, but the majority
of the artifacts in a compile are emitted artifacts that have no structure.
 They are just a named bag of bits, from the compiler's perspective.  It
would help if such artifacts did not need a round of Java serialization on
the Link node and could instead be bulk copied.


=== Transition ===

The compiler will support two compilation modes: maximal sharding and
simulated sharding.  Maximal sharding is used when all linkers support it
and the Precompile/CompilePerms/Link entry points are used.  Simulated
sharding is used when either some linker can't shard or when the Compiler
entry point is used.

Linkers individually indicate whether they implement the sharding or
non-sharding API. This allows linkers to be updated one by one and to leave
the non-sharding API behind once they do. It does not cause trouble with
other linkers, because in practice linkers are highly independent.  I've
looked at as many linkers as I could find to verify this.  Occasionally one
linker depends on another; in such a case they'll have to be updated in
tandem, but the need for that should be rare.

By default, a linker is assumed to want the legacy non-sharding API. For
such linkers, it isn't safe to assume it generators or its associated
artifacts can be safely serialized and then deserialized on a different
computer.

The non-sharding API will be deprecated.  After the sharding API has been
out for one GWT release cycle, support for non-shardable linkers will be
dropped.


=== Maximal sharding ===

Currently, Precompile parses Java into ASTs and runs
generators. CompilePerms then runs one copy for each permutation, in
parallel. Each instance optimizes the AST for one permutation and then
converts it into JavaScript plus some additional artifacts. Finally, Link
takes the JavaScript and all the produced artifacts, runs the individual
linkers, and produces the final output. In summary, the three stages are:

current Precompile:

   - parse Java and run generators
   - output: number of permutations, AST, generated artifacts

current CompilePerms:

   - input: permutation id, AST
   - compile one permutation to JavaScript
   - output: JavaScript, generated artifacts

current Link:

   - input: JavaScript from all permutations, generated artifacts
   - run linkers on all artifacts
   - emit EmittedArtifacts into the final output

With maximal sharding, Precompile does no work except to count the number of
permutations. Each CompilePerms instance parses Java ASTs, run generators,
and optimizes for a specific permutation. Additionally,
each CompilePerms instance also runs the shardable part of linkers on the
results for that permutation. It then thins the artifacts (see below) and
emits them. Finally, Link takes these results from the CompilePerms
instances, runs the final, non-shardable part of each linker, and emits all
the artifacts designated as emitted artifacts.  In summary, the
maximal-sharding staging looks like this:

new Precompile:

   - output: number of permutations

new CompilePerms:

   - input: permutation id
   - compile one permutation to JavaScript, including running generators
   - run the on-shard part of linkers
   - thin down the resulting artifacts, as defined below
   - output: JavaScript and the thinned down set of artifacts

new Link:

   - input: JavaScript and transferable artifacts from each permutation
   - run the final part of linkers, which can add more files to the final
   output
   - output: resulting emitted artifacts


=== Simulated Sharding ===

Simulated sharding uses the in-trunk compiler staging, but runs the linkers
as much as possible as if they were using the maximal sharding staging. The
sequence is the same whether the Compiler entry point is used or the
Precompile/CompilePerms/Link trio of entry points is used. Under
simulated sharding, the Precompile and CompilePerms steps run exactly as in
trunk. The Link stage, however, runs the linkers in a careful order so as to
use the sharded API for those linkers that have been updated:

   - For each compiled permutation,