On Apr 10, 2013, at 1:15 PM, Filip Pizlo <fpi...@apple.com> wrote: > > On Apr 10, 2013, at 12:37 PM, Dirk Pranke <dpra...@chromium.org> wrote: > >> >> >> >> On Wed, Apr 10, 2013 at 7:37 AM, Filip Pizlo <fpi...@apple.com> wrote: >> >> >> On Apr 10, 2013, at 2:29 AM, "Pozdnyakov, Mikhail" >> <mikhail.pozdnya...@intel.com> wrote: >> >> > >> > >> >> Why not infer ParallelArrays automatically? >> > Sorry, did not get it. Could you please elaborate? >> >> Normal JS arrays. You use them with pure computation. Profiler notices this. >> And then the same things that you have in the ParallelArrays proposal now >> work except you don't need a new type. >> >> >> Compiler writers have been chasing this problem for 50 years, with (AFAIK) >> very limited success for languages that don't give you compile-time >> annotations to help you along. > > Lets demystify this a bit. It's easy to say "OMG they did things for 50 > years", and we're all guilty of saying things like this sometimes, but, it's > a bit more interesting to identify what these people did for those 50 years, > where they got, and what the remaining problems are. It is already possible > to autoparallelize loops, but there are challenges that remain that make it > impractical for "real code". The problems are with: > > A) Identifying loops that are candidates for sound parallelization. In > formal-speak they look for the lack of "loop-carried side-effects", i.e. > something you'd do in one iteration that would be visible in a subsequent > iteration - this represents a dependency that inhibits parallelization. But > if you prove that no loop-carried side-effects exist, then you're off to a > good start: you know that you could fork off some threads to run the loop's > body in parallel, and the program would still be guaranteed to get identical > results. (I'm oversimplifying a bit - if you tried to auto-parallelize a > loop in a program that already had threads, you'd be in a world of hurt from > a memory model standpoint - but we don't have that problem in JS because all > code in JS is semantically serial and so the parallelization you did wouldn't > interfere with existing uses of threads, since there aren't any.) > > B) Identifying loops that are candidates for profitable parallelization. > Spinning up a thread carries a fixed cost. If you find a loop with no > loop-carried side-effects, but that loop actually only executes for a short > time, parallelizing it will be a performance regression - and likely a steep > one at that. Finding which loops are profitable to parallelize is thought to > be a hard problem. The traditional attempts to solve it centered around > making threads cheaper (i.e. Cilk, CML, and similar), or having stronger > inference of profitability (I think this is what http://ciao-lang.org does), > or pushing parallelizability hints into the language (X10, OpenMP, and these > ParallelArrays). > > (A) is not a problem that is solved by ParallelArrays. ParallelArrays > guarantee sequential equivalence; i.e. ParallelArray.prototype.forEach() will > behave like Array.prototype.forEach() if the closure you pass it has the > equivalent of "loop-carried side-effects".
Sorry, to clarify: ParallelArray.prototype.forEach() will _always_ behave identically to Array.prototype.forEach(), and it is only if the runtime/compiler separately proves that there aren't loop-carried side-effects, that you get parallelism. Otherwise you just get a slower and more bulky implementation of foreach(). > I.e. if you believe that (A) is not a solvable problem, then this constitutes > an argument against ParallelArrays, and not against inferring that a normal > array should behave like a ParallelArray. If you can do the inference that > allows ParallelArrays.prototype.forEach() to soundly parallelize, then you > can do that inference for Array.prototype.forEach(), as well. I happen to > think that (A) is a big problem with ParallelArrays; I don't like it that > these guys are proposing a language construct called "ParallelArray" but then > secretly they will probably not parallelize a ton of code because the user > didn't realize that storing to a global variable shafts their inference for > side-effects. > > (B) is a problem that ParallelArrays attempt to solve. But here's what we > know about the different approaches to solving (B): > > - Making threads faster hasn't really worked. Cilk's threads are fast enough > that if you use them yourself, manually, you'll be really happy, sort of, > except that it sort of sometimes doesn't work like you'd expect because > you've got user-level scheduling - but the point is, they're not fast enough > that you could spin them up for _any_ loop in a program. You'd get a > regression if you did. > > - Stronger inference of profitability is a promising area, which isn't > adequately explored, IMO. The approaches with which I'm most familiar focus > too much on static inference. > > - Adding hints to the language results in programming models that nobody > uses. Have you ever used OpenMP? Probably not, and if you have, you > probably wanted to puke all over yourself. It's a disgusting programming > model. There's something offensive about writing what looks like a serial > loop, and then adding decorations to it to "declare" to the compiler what are > the set of things it could do to the loop. Eventually these declarations get > so sophisticated that you realize that you made a poor life decision by using > them - you could have been more productive if you wrote the thread code > yourself. There is a similar, though lesser, problem with X10; in X10 at > least you can do like in Cilk and just spin up a bunch of light-weight > threads directly. But even X10 has so far had only limited success, and a > bunch of super smart people have been working on it for a decade. In short, > hardly anybody uses these programming hints. I'm opposed to adding such a > hint-based model to JS because we already know that other such similar models > are used by almost nobody - these are niche programming models that you don't > want to pick up unless you've literally run out of all other sensible options. > > So the most promising thing appears to be strengthening automagic inference, > which brings me to: > >> I'm not aware of any work on JIT compilers doing this. > > Neither am I! > >> Why do you think we could do better? > > Because we're doing it in a JIT compiler. The problem with (B) is that you > don't know, in a static ahead of time compiler, how long a loop will execute. > And even if you did have a guess, if your guess turned out to be wrong and > you emitted code to call pthread_create or similar, you'd get really terrible > performance. This is why traditional autoparallelization techniques fail in > static compilers: static compiler writers, like all other performance geeks, > don't like to add features that allege to improve performance but in fact > regress performance. > > But in a JIT, we already have to do two things that the static people can't > *quite* do: > > 1) Runtime feedback-driven decision making that is acutely aware of the > running time of loops. Both JSC and V8 do already know your average trip > count through any loop; they may not know it explicitly (i.e. for example in > JSC we don't have a data structure that will immediately tell you this > information) but we do know it implicitly (we happen to already have a loop > trip counter for deciding when to tier up, even though that counter is > currently not designed for the level of accuracy that autoparallelization > would need). Point is, if we wanted to have the level of granularity of > understanding of loop execution times that is necessary for inferring when it > is profitable to parallelize a loop that you've already proven to be sound to > parallelize, then we could add it, at not much cost - we'd likely somehow > hijack existing loop profiling functionality. > > 2) JITs are designed around sometimes making dump decisions, bailing out when > we detect that we've made a dumb decision, reconstituting the state of the > world so that we're in a state "as if" we had never made that decision, > recording that we were dumb to have made that decision, and then trying again > without making that same dumb decision. This closed-loop feedback is what > allows us to infer types, reshape objects, make arithmetic fast, etc. We - > both us here in JSC and those V8 dudes - do it all the time, because we're > awesome at our jobs. So, if we had some statistical profiling that told us > that it's likely that a loop takes a long time, we could add guards that > assert that this forever continues to be true; if it turns out to be false > then we deoptimize back to a serial world. Conversely, if we had some > profiling that told us that it is _NOT_ profitable to parallelize a loop, we > could similarly guard this and "deoptimize" (more like *optimize*) into a > parallel loop mode. > > The irony here is that the overheads we already pay in JITs to enable even > basic type inference already give us most of the functionality we would need > to detect when it is profitable to autoparallelize. > > This, to me, appears like a comprehensive solution to the profitability > problem. It still leaves the soundness problem. Here we have some help from > the JIT-crazy as well: for example we wouldn't have to *prove* that a loop > has no carried side-effects; we could simply *speculate* that it doesn't and > insert appropriate checks. This is easier. I don't know if it's easy and > practical enough to put into a browser, though, which brings me to the > central point: > > If it is impractical to infer that a forEach closure passed to ParallelArray > is safe-to-parallelize, then it follows that ParallelArrays are impractical, > and we should R- this whole programming model and look for something else - > like adding threads and locks to JS and calling it a day. But if inferring > that a forEach closure is safe-to-parallelize is doable, then it is _no_more_ > doable in a world where you have the ParallelArray construct - since that > construct only helps with the profitable-to-parallelize part of the equation. > We could already do it for any closure, or any loop body. And then we could > use existing JIT madness to take care of the profitability question. > >> (Honest question, I'm not trying to be snarky, nor am I an expert). > > Hope my answers help! :-) > > Anyway, I was the one being snarky here. ;-) I don't actually think that > this is a good feature for the web. It would be better to just give JS a > more general-purpose concurrency construct - something that would work even > if you *did* have side effects! I like threads and locks, but then again, > I've been told that I'm insane for liking them. But I do know that to do the > work to infer the safe-to-parallelize that ParallelArrays need would be a > *huge* opportunity cost for the project - and if it was successful then it > would be better to go all the way and just infer the > profitable-to-parallelize bit as well. It's better to infer things for the > programmer, than to have the programmer tell you things, if at all possible! > > -Filip > > _______________________________________________ > webkit-dev mailing list > webkit-dev@lists.webkit.org > https://lists.webkit.org/mailman/listinfo/webkit-dev
_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev