On Thu, Mar 15, 2012 at 7:46 PM, Allison Randal <[email protected]> wrote: > Think of Erlang. Every subroutine is a safe point to split off parallel > execution, because every subroutine is a self-contained unit. This is > absolutely critical in moving toward modern concurrent implementations. > They can handle things like data-parallelism in the background > automatically. It doesn't make sense to jettison one of Parrot's best > features and take a step backward toward stack-like dispatch.
I'm apprehensive about bacek's proposal, but I might not know enough about his plan to really judge it. It's hard to say that CPS is one of Parrot's strong points because we still don't implement it in a fully leveraged, completely symmetric way. The idea that any Sub invocation can be boxed up and dispatched to a different thread is a very important part of the threading work that nine has been doing, and I wouldn't want to do anything that damages the progress he has made. If bacek says we can have speedups without sacrificing important functionality, I'm inclined to trust him and see what he comes up with. >>> The main solution for the performance problem is to replace the GC with >>> a reasonably performant modern implementation. Another improvement would I don't think GC is a major bottleneck anymore, at least not to the magnitude that it used to be. The case can definitely be made that PMC allocation and initialization are too slow, but GC (mark and sweep) is not a problem right now. Most problems we have with GC have more to do with volume of allocated PMCs, and not with the underlying algorithm. Allocating PMC headers and PMC data structures separately, from two separate pools has drawbacks. We already try to reuse CallContext PMCs between the call and the return of a sub invocation. If we keep a pool of them around we can try to reuse them more often than that. We already cache and attempt to reuse register frames by size. More caching and reusing is probably a good idea. > Something of a tangent, but how much of Parrot's current dispatch does > 6model use? Anything? Parrot currently has a pile of pretty expensive > corner cases baked into dispatch that were added for Perl 6. But, if > Perl 6 isn't using them anymore, then ripping them out could give Parrot > some substantial speed gains (and improve maintainability at the same > time). The current multiple dispatch plumbing is a good example. It was > designed for Perl 6, but AFAIK, Perl 6 doesn't use it anymore. Perl 6 does use it's own dispatcher, so there is a chance that we can rip out some bits of our dispatcher that Perl6 no longer relies on. For instance, we now have a get_context_p opcode, which can get a call context much more quickly than a get_params with :call_context. Ripping out :call_context (which was never fully implemented anyway) will be a small start. :named :optional and :named :slurpy args are also much more expensive than many other arrangements. Of course, ripping those things out does start to eat away at core dispatch functionality and we don't do that just for fun. I'm going off on a tangent, I know. Going through PCC and looking for things that we no longer need to support for the cost would be a good exercise. >> No. Current approach is exactly this. And it's slow. Twice slower for >> the record. Because in 99% of the cases we are calling GC _twice_ to >> allocate CallContext. Twice for what, the CallContext and the hash for named args? I'm not sure how we expect to get much faster here. Copying a pointer to a register is just as expensive as copying the contents of that register. Rearranging the Caller's register frame to make for easy access by the callee is just as expensive as unpacking contents out in the callee. Again, if bacek says it's possible I trust that it is. A more constructive starting point, in my mind, is to start going through our list of features and supported behaviors and start cutting out things which cost more than they are worth. When we have fewer requirements to meet, we will be much more free to rearrange the core algorithms. >> Anyway, current PCC approach is wrong from the beginning. We always >> doing marshalling/demarshalling of arguments for all calls. And it's >> _slow_. Really really slow. I would really like to see a breakdown of the costs involved. If we turn GC mark/sweep off, what are the relative costs of CallContext allocation and initialization, marshalling caller args, demarshalling callee params, and resetting the CallContext to prepare for the return. A comparison of these things will help to inform our decisions. > I'm all for speeding things up. And I'll be the first to admit that > Parrot's current dispatch system was only intended as a "temporary" > partial fix of the old dispatch system (which was a horrible mass of > spaghetti code.) But further fixes need to be based on profiling data, > and not sacrifice Parrot's key competitive features. If that's true, I'm not sure I've ever seen what the long-term, non-temporary plan was supposed to be. I've got plenty of long-term plans of my own, but I developed those plans privately, long after the initial PCC refactors. If other people have other ideas for the long road to follow, I would be very interested to hear them. --Andrew Whitworth _______________________________________________ http://lists.parrot.org/mailman/listinfo/parrot-dev
