Re: [gwt-contrib] RFC: sharded linking
On 2010-02-12, at 1:15 PM, Ray Cromwell wrote: On Thu, Feb 11, 2010 at 4:43 PM, Scott Blum sco...@google.com wrote: - I dislike the whole transition period followed by having to forcibly update all linkers, unless there's a really compelling reason to do so. In general, I'd agree, but the number of linkers in the wild appears to be small, this may be a case of trying to preserve an API that only 5 or 10 people in the world are using. +1. I've written a handful of custom linkers (including one in the public gwt-firefox-extension project), but I'm used to updating them between GWT releases to work around subtle changes in the linker contract (ie: the evolution of hosted mode, various global variable changes, etc). I'd rather have a clean linker system that changes from version to version than an awkward one with a lot of legacy interfaces. Matt. -- http://groups.google.com/group/Google-Web-Toolkit-Contributors
Re: [gwt-contrib] RFC: sharded linking
On Thu, Feb 11, 2010 at 7:43 PM, Scott Blum sco...@google.com wrote: I have a few comments, but first I wanted to raise the point that I'm not sure why we're having this argument about maximally sharded Precompiles at all. For one thing, it's already implemented, and optional, via -XshardPrecompile. I can't think of any reason to muck with this, or why it would have any relevance to sharded linking. Can we just table that part for now, or is there something I'm missing? There are still two modes, but there's no more need for an explicit argument. For Compiler, precompile is never sharded. For the three-stage entry points, full sharding happens iff all linkers are shardable. - I'm not sure why development mode wouldn't run a sharded link first. Wouldn't it make sense if development mode works just like production compile, it just runs a single development mode permutation shard link before running the final link? Sure, we can do that. Note, though, that they will be running against an empty ArtifactSet, because there aren't any compiles for them to look at. Thus, they won't typically do anything. 2) Instead of trying to do automatic thinning, we just let the linkers themselves do the thinning. For example, one of the most serialization-expensive things we do is serialize/deserialze symbolMaps. To avoid this, we update SymbolMapsLinker to do most of its work during sharding, and update IFrameLinker (et al) to remove the CompilationResult during the sharded link so it never gets sent across to the final link. In addition to the other issues pointed out, note that this adds ordering constraints among the linkers. Any linker that deletes something must run after every linker that wants to look at it. Your example wouldn't work as is, because it would mean no POST linker can look at CompilationResults. It also wouldn't work to put the deletion in a POST linker, for the same reason. We'd have to work out a way for the deletions to happen last, after all the normal linkage activity. Suppose, continuing that idea, we add a POSTPOST order that is used only for deletion. If it's really only for deletion, then the usual link() API is overly general, because it lets linkers both add and remove artifacts during POSTPOST, which is not desired. So, we want a POSTPOST API that is only for deletion. Linkers somehow or another mark artifacts for deletion, but not anything else. At this point, though, isn't it pretty much the same as the automated thinning in the initial proposal? The pros to this idea are (I think) that you don't break anyone... instead you opt-in to the optimization. If you don't do anything, it should still work, but maybe slower than it could. The proposal that started this thread also does not break anyone. Lex -- http://groups.google.com/group/Google-Web-Toolkit-Contributors
Re: [gwt-contrib] RFC: sharded linking
I have a few comments, but first I wanted to raise the point that I'm not sure why we're having this argument about maximally sharded Precompiles at all. For one thing, it's already implemented, and optional, via -XshardPrecompile. I can't think of any reason to muck with this, or why it would have any relevance to sharded linking. Can we just table that part for now, or is there something I'm missing? Okay, so now on to sharded linking itself. Here's what I love: - Love the overall goals: do more work in parallel and eliminate serialization overhead. - Love the idea of simulated sharding because it enforces consistency. - Love that the linkers all run in the same order. Here's what I don't love: - I'm not sure why development mode wouldn't run a sharded link first. Wouldn't it make sense if development mode works just like production compile, it just runs a single development mode permutation shard link before running the final link? - I dislike the whole transition period followed by having to forcibly update all linkers, unless there's a really compelling reason to do so. Maybe I'm missing some use cases, but I don't see what problems result from having some linkers run early and others run late. As Lex noted, all the linkers are largely independent of each other and mostly won't step on each other's toes. - It seems unnecessary to have to annotate Artifacts to say which ones are transferable, because I thought we already mandated that all Artifacts have to be transferable. I have in mind a different proposal that I believe addresses the same goals, but in a less-disruptive fashion. Please feel free to poke holes in it: 1) Linker was made an abstract class specifically so that it could be extended later. I propose simply adding a new method linkSharded() with the same semantics as link(). Linkers that don't override this method would simply do nothing on the shards and possibly lose out on the opportunity to shard work. Linkers that can effectively do some work on shards would override this method to do so. (We might also have a relinkSharded() for development mode.) 2) Instead of trying to do automatic thinning, we just let the linkers themselves do the thinning. For example, one of the most serialization-expensive things we do is serialize/deserialze symbolMaps. To avoid this, we update SymbolMapsLinker to do most of its work during sharding, and update IFrameLinker (et al) to remove the CompilationResult during the sharded link so it never gets sent across to the final link. The pros to this idea are (I think) that you don't break anyone... instead you opt-in to the optimization. If you don't do anything, it should still work, but maybe slower than it could. The cons are... well maybe it's too simplistic and I'm missing some of the corner cases, or ways this could break down. Thoughts? Scott -- http://groups.google.com/group/Google-Web-Toolkit-Contributors
[gwt-contrib] RFC: sharded linking
This is a design doc about speeding up the link phase of GWT. If you don't maintain a linker, and if you don't have a multi-machine GWT build, then none of this should matter to you. If you do maintain a linker, let's make sure your linker can be updated with the proposed changes. If you do have a multi-machine build, or if you have some ideas about them, then perhaps you can help us get the best speed benefit possible out of this. I want to speed up linking for multi-machine builds in two ways: 1. Allow more parts of linking to run in parallel. In particular, anything that happens once per permutation and does not need information from other permutations can run in parallel. As an example, the iframe linker chunks the JavaScript of each permutation into multiple script tags. That work can happen in parallel once the linker API supports it. 2. Link does a lot of Java serialization for its artifacts, but the majority of the artifacts in a compile are emitted artifacts that have no structure. They are just a named bag of bits, from the compiler's perspective. It would help if such artifacts did not need a round of Java serialization on the Link node and could instead be bulk copied. === Transition === The compiler will support two compilation modes: maximal sharding and simulated sharding. Maximal sharding is used when all linkers support it and the Precompile/CompilePerms/Link entry points are used. Simulated sharding is used when either some linker can't shard or when the Compiler entry point is used. Linkers individually indicate whether they implement the sharding or non-sharding API. This allows linkers to be updated one by one and to leave the non-sharding API behind once they do. It does not cause trouble with other linkers, because in practice linkers are highly independent. I've looked at as many linkers as I could find to verify this. Occasionally one linker depends on another; in such a case they'll have to be updated in tandem, but the need for that should be rare. By default, a linker is assumed to want the legacy non-sharding API. For such linkers, it isn't safe to assume it generators or its associated artifacts can be safely serialized and then deserialized on a different computer. The non-sharding API will be deprecated. After the sharding API has been out for one GWT release cycle, support for non-shardable linkers will be dropped. === Maximal sharding === Currently, Precompile parses Java into ASTs and runs generators. CompilePerms then runs one copy for each permutation, in parallel. Each instance optimizes the AST for one permutation and then converts it into JavaScript plus some additional artifacts. Finally, Link takes the JavaScript and all the produced artifacts, runs the individual linkers, and produces the final output. In summary, the three stages are: current Precompile: - parse Java and run generators - output: number of permutations, AST, generated artifacts current CompilePerms: - input: permutation id, AST - compile one permutation to JavaScript - output: JavaScript, generated artifacts current Link: - input: JavaScript from all permutations, generated artifacts - run linkers on all artifacts - emit EmittedArtifacts into the final output With maximal sharding, Precompile does no work except to count the number of permutations. Each CompilePerms instance parses Java ASTs, run generators, and optimizes for a specific permutation. Additionally, each CompilePerms instance also runs the shardable part of linkers on the results for that permutation. It then thins the artifacts (see below) and emits them. Finally, Link takes these results from the CompilePerms instances, runs the final, non-shardable part of each linker, and emits all the artifacts designated as emitted artifacts. In summary, the maximal-sharding staging looks like this: new Precompile: - output: number of permutations new CompilePerms: - input: permutation id - compile one permutation to JavaScript, including running generators - run the on-shard part of linkers - thin down the resulting artifacts, as defined below - output: JavaScript and the thinned down set of artifacts new Link: - input: JavaScript and transferable artifacts from each permutation - run the final part of linkers, which can add more files to the final output - output: resulting emitted artifacts === Simulated Sharding === Simulated sharding uses the in-trunk compiler staging, but runs the linkers as much as possible as if they were using the maximal sharding staging. The sequence is the same whether the Compiler entry point is used or the Precompile/CompilePerms/Link trio of entry points is used. Under simulated sharding, the Precompile and CompilePerms steps run exactly as in trunk. The Link stage, however, runs the linkers in a careful order so as to use the sharded API for those linkers that have been updated: - For each compiled permutation,