Re: [webkit-dev] Is there a plan for supporting multi-process and WebCL in webkit

Filip Pizlo Wed, 10 Apr 2013 13:17:46 -0700

On Apr 10, 2013, at 1:15 PM, Filip Pizlo <fpi...@apple.com> wrote:

> 
> On Apr 10, 2013, at 12:37 PM, Dirk Pranke <dpra...@chromium.org> wrote:
> 
>> 
>> 
>> 
>> On Wed, Apr 10, 2013 at 7:37 AM, Filip Pizlo <fpi...@apple.com> wrote:
>> 
>> 
>> On Apr 10, 2013, at 2:29 AM, "Pozdnyakov, Mikhail" 
>> <mikhail.pozdnya...@intel.com> wrote:
>> 
>> >
>> >
>> >> Why not infer ParallelArrays automatically?
>> > Sorry, did not get it. Could you please elaborate?
>> 
>> Normal JS arrays. You use them with pure computation. Profiler notices this. 
>> And then the same things that you have in the ParallelArrays proposal now 
>> work except you don't need a new type.
>> 
>> 
>> Compiler writers have been chasing this problem for 50 years, with (AFAIK) 
>> very limited success for languages that don't give you compile-time 
>> annotations to help you along.
> 
> Lets demystify this a bit.  It's easy to say "OMG they did things for 50 
> years", and we're all guilty of saying things like this sometimes, but, it's 
> a bit more interesting to identify what these people did for those 50 years, 
> where they got, and what the remaining problems are.  It is already possible 
> to autoparallelize loops, but there are challenges that remain that make it 
> impractical for "real code".  The problems are with:
> 
> A) Identifying loops that are candidates for sound parallelization.  In 
> formal-speak they look for the lack of "loop-carried side-effects", i.e. 
> something you'd do in one iteration that would be visible in a subsequent 
> iteration - this represents a dependency that inhibits parallelization.  But 
> if you prove that no loop-carried side-effects exist, then you're off to a 
> good start: you know that you could fork off some threads to run the loop's 
> body in parallel, and the program would still be guaranteed to get identical 
> results.  (I'm oversimplifying a bit - if you tried to auto-parallelize a 
> loop in a program that already had threads, you'd be in a world of hurt from 
> a memory model standpoint - but we don't have that problem in JS because all 
> code in JS is semantically serial and so the parallelization you did wouldn't 
> interfere with existing uses of threads, since there aren't any.)
> 
> B) Identifying loops that are candidates for profitable parallelization.  
> Spinning up a thread carries a fixed cost.  If you find a loop with no 
> loop-carried side-effects, but that loop actually only executes for a short 
> time, parallelizing it will be a performance regression - and likely a steep 
> one at that.  Finding which loops are profitable to parallelize is thought to 
> be a hard problem.  The traditional attempts to solve it centered around 
> making threads cheaper (i.e. Cilk, CML, and similar), or having stronger 
> inference of profitability (I think this is what http://ciao-lang.org does), 
> or pushing parallelizability hints into the language (X10, OpenMP, and these 
> ParallelArrays).
> 
> (A) is not a problem that is solved by ParallelArrays.  ParallelArrays 
> guarantee sequential equivalence; i.e. ParallelArray.prototype.forEach() will 
> behave like Array.prototype.forEach() if the closure you pass it has the 
> equivalent of "loop-carried side-effects".


Sorry, to clarify: ParallelArray.prototype.forEach() will _always_ behave 
identically to Array.prototype.forEach(), and it is only if the 
runtime/compiler separately proves that there aren't loop-carried side-effects, 
that you get parallelism.  Otherwise you just get a slower and more bulky 
implementation of foreach().

> I.e. if you believe that (A) is not a solvable problem, then this constitutes 
> an argument against ParallelArrays, and not against inferring that a normal 
> array should behave like a ParallelArray.  If you can do the inference that 
> allows ParallelArrays.prototype.forEach() to soundly parallelize, then you 
> can do that inference for Array.prototype.forEach(), as well.  I happen to 
> think that (A) is a big problem with ParallelArrays; I don't like it that 
> these guys are proposing a language construct called "ParallelArray" but then 
> secretly they will probably not parallelize a ton of code because the user 
> didn't realize that storing to a global variable shafts their inference for 
> side-effects.
> 
> (B) is a problem that ParallelArrays attempt to solve.  But here's what we 
> know about the different approaches to solving (B):
> 
> - Making threads faster hasn't really worked.  Cilk's threads are fast enough 
> that if you use them yourself, manually, you'll be really happy, sort of, 
> except that it sort of sometimes doesn't work like you'd expect because 
> you've got user-level scheduling - but the point is, they're not fast enough 
> that you could spin them up for _any_ loop in a program.  You'd get a 
> regression if you did.
> 
> - Stronger inference of profitability is a promising area, which isn't 
> adequately explored, IMO.  The approaches with which I'm most familiar focus 
> too much on static inference.
> 
> - Adding hints to the language results in programming models that nobody 
> uses.  Have you ever used OpenMP?  Probably not, and if you have, you 
> probably wanted to puke all over yourself.  It's a disgusting programming 
> model.  There's something offensive about writing what looks like a serial 
> loop, and then adding decorations to it to "declare" to the compiler what are 
> the set of things it could do to the loop.  Eventually these declarations get 
> so sophisticated that you realize that you made a poor life decision by using 
> them - you could have been more productive if you wrote the thread code 
> yourself.  There is a similar, though lesser, problem with X10; in X10 at 
> least you can do like in Cilk and just spin up a bunch of light-weight 
> threads directly.  But even X10 has so far had only limited success, and a 
> bunch of super smart people have been working on it for a decade.  In short, 
> hardly anybody uses these programming hints.  I'm opposed to adding such a 
> hint-based model to JS because we already know that other such similar models 
> are used by almost nobody - these are niche programming models that you don't 
> want to pick up unless you've literally run out of all other sensible options.
> 
> So the most promising thing appears to be strengthening automagic inference, 
> which brings me to:
> 
>> I'm not aware of any work on JIT compilers doing this.
> 
> Neither am I!
> 
>> Why do you think we could do better?
> 
> Because we're doing it in a JIT compiler.  The problem with (B) is that you 
> don't know, in a static ahead of time compiler, how long a loop will execute. 
>  And even if you did have a guess, if your guess turned out to be wrong and 
> you emitted code to call pthread_create or similar, you'd get really terrible 
> performance.  This is why traditional autoparallelization techniques fail in 
> static compilers: static compiler writers, like all other performance geeks, 
> don't like to add features that allege to improve performance but in fact 
> regress performance.
> 
> But in a JIT, we already have to do two things that the static people can't 
> *quite* do:
> 
> 1) Runtime feedback-driven decision making that is acutely aware of the 
> running time of loops.  Both JSC and V8 do already know your average trip 
> count through any loop; they may not know it explicitly (i.e. for example in 
> JSC we don't have a data structure that will immediately tell you this 
> information) but we do know it implicitly (we happen to already have a loop 
> trip counter for deciding when to tier up, even though that counter is 
> currently not designed for the level of accuracy that autoparallelization 
> would need).  Point is, if we wanted to have the level of granularity of 
> understanding of loop execution times that is necessary for inferring when it 
> is profitable to parallelize a loop that you've already proven to be sound to 
> parallelize, then we could add it, at not much cost - we'd likely somehow 
> hijack existing loop profiling functionality.
> 
> 2) JITs are designed around sometimes making dump decisions, bailing out when 
> we detect that we've made a dumb decision, reconstituting the state of the 
> world so that we're in a state "as if" we had never made that decision, 
> recording that we were dumb to have made that decision, and then trying again 
> without making that same dumb decision.  This closed-loop feedback is what 
> allows us to infer types, reshape objects, make arithmetic fast, etc.  We - 
> both us here in JSC and those V8 dudes - do it all the time, because we're 
> awesome at our jobs.  So, if we had some statistical profiling that told us 
> that it's likely that a loop takes a long time, we could add guards that 
> assert that this forever continues to be true; if it turns out to be false 
> then we deoptimize back to a serial world.  Conversely, if we had some 
> profiling that told us that it is _NOT_ profitable to parallelize a loop, we 
> could similarly guard this and "deoptimize" (more like *optimize*) into a 
> parallel loop mode.
> 
> The irony here is that the overheads we already pay in JITs to enable even 
> basic type inference already give us most of the functionality we would need 
> to detect when it is profitable to autoparallelize.
> 
> This, to me, appears like a comprehensive solution to the profitability 
> problem.  It still leaves the soundness problem.  Here we have some help from 
> the JIT-crazy as well: for example we wouldn't have to *prove* that a loop 
> has no carried side-effects; we could simply *speculate* that it doesn't and 
> insert appropriate checks.  This is easier.  I don't know if it's easy and 
> practical enough to put into a browser, though, which brings me to the 
> central point:
> 
> If it is impractical to infer that a forEach closure passed to ParallelArray 
> is safe-to-parallelize, then it follows that ParallelArrays are impractical, 
> and we should R- this whole programming model and look for something else - 
> like adding threads and locks to JS and calling it a day.  But if inferring 
> that a forEach closure is safe-to-parallelize is doable, then it is _no_more_ 
> doable in a world where you have the ParallelArray construct - since that 
> construct only helps with the profitable-to-parallelize part of the equation. 
>  We could already do it for any closure, or any loop body.  And then we could 
> use existing JIT madness to take care of the profitability question.
> 
>> (Honest question, I'm not trying to be snarky, nor am I an expert).
> 
> Hope my answers help! :-)
> 
> Anyway, I was the one being snarky here. ;-)  I don't actually think that 
> this is a good feature for the web.  It would be better to just give JS a 
> more general-purpose concurrency construct - something that would work even 
> if you *did* have side effects!  I like threads and locks, but then again, 
> I've been told that I'm insane for liking them.  But I do know that to do the 
> work to infer the safe-to-parallelize that ParallelArrays need would be a 
> *huge* opportunity cost for the project - and if it was successful then it 
> would be better to go all the way and just infer the 
> profitable-to-parallelize bit as well.  It's better to infer things for the 
> programmer, than to have the programmer tell you things, if at all possible!
> 
> -Filip
> 
> _______________________________________________
> webkit-dev mailing list
> webkit-dev@lists.webkit.org
> https://lists.webkit.org/mailman/listinfo/webkit-dev

_______________________________________________
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev

Re: [webkit-dev] Is there a plan for supporting multi-process and WebCL in webkit

Reply via email to