Implementing junctions
I've been looking at various Perl6 operators in terms of what their implementation would look at the Parrot level. Junctions struck me as possibly highly problematic. The issue is, how would one go about compiling them at the point when they get passed to a function call as 1 of the parameters. For instance, in pugs, this works: pugs my $a = 1|2 (1 | 2) pugs sin $a (0.8414709848078965 | 0.9092974268256817) pugs sub foo() { return 1|2; } pugs foo() (1 | 2) pugs sin foo (0.8414709848078965 | 0.9092974268256817) I came up with two ways: 1) Use Junction PMC that overloads various conversion entries in its vtable. Operators will have to use continuations to loop over junction contents. -- This seems a non-starter, as vtable entries can't spawn a continuation that can reenter them. (right?) 2) Expand explicitly. -- Problem is, you need to insert checks for each parameter before each function call, if a junction can be assigned to a variable like above. This means that perl6 calling convention would have to explicitly typecheck parameters everywhere. This strikes me as much less efficient then, say, checking for ties everywhere. And ties have to be explicitly declared. Would it make sense to require that Junctional variables *MUST* be declared as such, and same with function that return junctions? Strict reading of S9 seems to imply that this already is the case, but possibly a clarification with some examples might be in order. So the example above would read: my Junction $a = 1|2 sub foo(-- Junction) { return 1|2; } And hope this topic hasn't already been rehashed. :) Miro
Re: Perl 6 Summary for 2005-03-07 through 2005-03-22
[EMAIL PROTECTED] wrote: pugs too lazy Miroslav Silovic noticed that closing a file handle in pugs did not force all the thunks associated with the file. While this was a bug in pugs, it led to conversation about whether = should be lazy or eager. Larry thinks that it will be safer to start eager and become lazy then vice versa. http://xrl.us/fijd Accused of the original burst of insight, pleading not guilty! ;) The original post was to perl6-compiler by Andrew Savige. http://groups-beta.google.com/group/perl.perl6.compiler/browse_frm/thread/2aca524a1203cd41/912bea4d0d05554a?q=surprised#912bea4d0d05554a Miro
Re: MMD as an object.
[EMAIL PROTECTED] wrote: Ok. If you'd really need such random dispatch, it could be done like this, when I interpret A12 correctly: sub run_random_bar($x) { my @meths = WALKMETH($x, :method('bar')); my $meth = @meths[rand(@meths.elems)]; $meth($x); } or even with my sub bar($x) {...} # same body as above bar($x); This is enough to make me completely happy. :) Miro
Re: MMD as an object.
[EMAIL PROTECTED] wrote: What about multi subs? They can be defined everywhere. Given: multi sub *foo(A $a, B $b) {...} Is this something like: %globals{foo} -- MultiSub{foo_A_B = Sub, ...} What about a not so global multi: multi sub foo(A $a, B $b) {...} Thanks for clarifying, leo Uh, the real problem is the interaction between multisubs and modules. If you import a multi with the same short name from another module, you get a real issue that needs resolving. Especially if they do fundamentally different things and you don't expect a clash. Like module Users; multi get(Stream $f, UserDescriptor $u) {...} ... module FileTransfer; multi get(Server $s, File $f) {...} Even if this is easy to fix by renaming, the error message would take a while to track down and it'd be annoying distraction during development. I believe the DWIM thing to do would be to merge multis into the same MMD object on module load. This would have to happen during runtime, since that's when the module load can occur. Miro
Re: continuation enhanced arcs
[EMAIL PROTECTED] wrote: What this means is that care must be taken when you are writing code that you expects to be invoked multiple times. However, if you are a function that on your second invocation returns via the continuation from you first invocation, you should probably expect to be called again because it happened the first time! The contentious point is precisely that there is no way to tell at the compile time that you will be invoked multiple times. If something you pull from a library or a callback passed to you captures the continuation, you may notice that your return continuation had been promoted, but at that point it's already too late to kill all I/N/S register usage and to double-reference all PMCs. Of course, unless you keep the original AST and keep recompiling it to PIR. I'd argue that it's vastly more complex and error prone than Leo's solution he refered to in his post. It's also no better than refetching all lexicals. Also, note that return-continuation-restores semantics is simply a (possibly premature) optimisation in its own right. CPS is supposed to treat every continuation the same (which turned out to be inefficient), and then return continuations were added to simplify the most common case. So, while return continuation is unpromoted, it's perfectly okay for it to behave any way it wants. Once it does get promoted, I'd argue that it should behave like a normal continuation first because it's more practical (see above) and second because that way it doesn't break CPS semantics. Miro
Re: Lexicals, continuations, and register allocation
Leopold Toetsch wrote: Sure. But I've no confirmation of a compiler writer that its possible. Annotating PIR can only work for nested closures. If libraries are involved you are out of luck. And we have such code already in library/Streams/Sub.imc. I've been thinking of what could be implemented using continuations. Some possible ideas: - multiple returns from regexp matches - lazy lists. In particular gather/take - junctions - coroutines - resumable exceptions Thing is, we can't know what the Real World perl6 code is going to look like. If it turns out that continuations are The Way to implement both junctions and gather/take construct, 'out of luck' is a good description of the ensuing fun and merryment. Basically, any PMC can be silently swapped with a junction, possibly making anything that uses it backtrack (what are the junctions supposed to be implemented as, anyway?) and I can imagine that lots of production perl6 code will have gather/takes all over the place - they look like an utterly nice way to construct a list. My point is that continuations are rare because mainstream languages (with a single exception) don't support them. BTW, I wonder if there are any research papers on the topic of register allocation in presence of continuations. I just looked at Citeseer, but a quick cursory search didn't come up with anything useful. Miro
Re: Exceptions, sub cleanup, and scope exit
Dan Sugalski wrote: It's also important for people writing these things to take into account the possibility that their exit actions may potentially be triggered multiple times, courtesy of the joys of continuations. Hmm, the first thing to take into the account is that return continuations can be promoted to the fully blown continuations. This should affect the handlers in the same way - so exception handlers could have become arbitrary invokable objects at the point when the exception is thrown. Miro
Re: calling conventions, tracebacks, and register allocator
[EMAIL PROTECTED] wrote: On Nov 8, 2004, at 11:15 AM, Matt Fowles wrote: Dan~ On Mon, 8 Nov 2004 13:45:08 -0500, Dan Sugalski [EMAIL PROTECTED] wrote: The calling conventions and code surrounding them will *not* change now. When all the sub stuff, and the things that depend on it, are fully specified and implemented... *then* we can consider changes. Until then, things stand the way they are. I missunderstood. I though you were saying that what is currently in is final and will *never* be changed. Thanks for the clarification. Nevertheless, this is a legitimate topic for discussion, and the issues are fresh in people's minds. That's independent of any impediments that might block implementing changes at the current time. I think it'd be a good idea to at least agree on a good TODO list, and commit that to the bugtracker. Because it may turn out that some changes are fine to delay, while some must be implemented now or never (because, for example, a large ammount of compiler code will get broken because of them). Miro
Re: Why is the fib benchmark still slow - part 1
[EMAIL PROTECTED] wrote: a) accessing a new register frame and context b) during DOD/GC We have to address both areas to get rid of the majority of cache misses. ad a) For each level of recursion, we are allocating a new context structure and a new register frame. Half of these is coming from the recently implemented return continuation and register frame chaches. The other half has to be freshly allocated. We get exactly for every second function call L2 cache misses for both the context and the register structure. Or it would make sense to use multi-frame register chunks. I kept locality of access in mind but somehow never spelled it out. But I *think* I mentioned 64kb as a good chunk size precisely because it fits well into the CPU cache - without ever specifying this as the reason. Anyway, if you can pop both register frames -and- context structures, you won't run GC too often, and everything will nicely fit into the cache. Is the context structure a PMC now (and does it have to be, if the code doesn't specifically request access to it?) ad b) The (currently broken) Parrot setting ARENA_DOD_FLAGS shows one possibility to reduce cache misses in DOD. During a sweep (which runs through all allocated object memory) the memory itself isn't touched, just a nibble per object is used, which holds the relevant information like is_live. Is there a way to find out how many misses came out from DoD, compared to register frames allocation? I believe that you shouldn't litter (i.e. create an immediately GCable object) on each function call - at least not without generational collector specifically optimised to work with this. This would entail the first generation that fits into the CPU cache and copying out live objects from it. And this means copying GC for Parrot, something that (IMHO) would be highly nontrivial to retrofit. Miro
Re: Why is the fib benchmark still slow - part 1
Leopold Toetsch wrote: I believe that you shouldn't litter (i.e. create an immediately GCable object) on each function call - at least not without generational collector specifically optimised to work with this. The problem isn't the object creation per se, but the sweep through the *whole object memory* to detect dead objects. It's of course true, that we don't need the return continuation PMC for the fib benchmark. Well, creation is also the problem if you crawl the entire free heap before triggering the next GC round. You get a potential cache miss on each creation and on each mark and on each destruction. To keep GC out of the way, the entire arena has to be confined to cache size or less. But a HLL translated fib would use Integer PMCs for calculations. Hmm, I'm nitpicking here, but it's not how e.g. Psyco works. It specialises each function to specific argument types and recompiles for each new argument type set. Assuming that you'll call only very few functions with more than 1-2 type combinations, this is a good tradeoff. It also removes a lot of consing, especially for arithmetics. ... This would entail the first generation that fits into the CPU cache and copying out live objects from it. And this means copying GC for Parrot, something that (IMHO) would be highly nontrivial to retrofit. A copying GC isn't really hard to implement. And it has the additional benefit of providing better cache locality. Nontrivial to retrofit or not, we need a generational GC. Copying and generational are orthogonal concepts. It's quite possible to have noncopying gengc (and nongenerational copying GC, but that's beside the point). This gives you quick mark phases but without so much gain with locality (I think Boehm GC can do this). The problem with copying GC is that pointers can change under your feet at every opportunity. Embedded C libraries that try to manipulate GCed objects really hate when that happens - in particular where some pointers get cached in the CPU registers - and telling GC to (un)protect a pointer is a chore on C programmers (as bad as manual refcounting). I suppose that there are good solutions to this, I'm just not aware of any. The gain is that you can guarantee that the youngest generation will be no bigger than X kb. This can be very good thing. However, for the problem at hand - namely, littering during function calls, custom deallocator (that'd be chunks) could be enough. In particular, it makes sense to use it in conjunction with a non-copying gengc. Miro
Re: [Summary] Register stacks again
[EMAIL PROTECTED] wrote: Could we have the chunks only hold one frame and avoid much of the compaction work? If we return to the inderict access mechanism, we can switch register frames by changing one pointer. But if we keep the one frame per chunk, we do not need to compact frames, standard DOD/GC will be able to reclaim frames. I recall there being efficiency issues with frames being frequently allocated/deallocated too frequently, so we could have a special free list for frames. This proposal feels to me like a slightly simpler version of yours. Thus I would argue for it on the grounds of do the simple thing first and compare its efficiency. Well, for the code that doesn't do call/cc, bigger chunks mean that that you can use them as a classical stack. So you won't ever have to allocate them, and never have to run the compaction. For call/cc, you still don't have to compact them as often, since the non-captured continuations will get popped normally, and the watermark lowering will take care of temporarily captured continuations (between two GC's). Basically bigger chunks mean that frames are allocated using the special scheme just for them. Considering that you're going to allocate one on each function call, I would agree with LT that the complexity is justified (and is not too bad - the way I understand the Parrot internals, which is to say, not too well ;), arrays of PMC pointers already get copy-collected; stack frame chunks are not too different from these). Miro
[Summary] Register stacks again
This is a summary of a private mail conversation between Leo and myself. lieNo, it didn't start by me forgetting to fix Reply-To when trying to post follow-up on the list./lie ;) Essentially we whipped up a GC scheme for collecting the register stacks that doesn't make call/cc-using code, well, unusably slow. In addition to LT's original post on the register stack, here's how to allocate them and clean up after their use. LT, feel free to hit me with wet noodles if I forgot anything. Terminology: --- Register frame is an actual frame of 4x16 registers. Register chunk is a flat chunk of memory containing multiple register frames. It has a water mark that points where a new frame should be pushed into the chunk. I'm using stack and chunk interchangably. Frames are subject to DoD, and chunks are subject to GC. There are plenty of tricks that can prevent GC from happening in many cases (read on for details). DoD is necessary anyway (to retrieve the live PMCs pointed from the register frames). A chunk is pinned if GC currently can't copy it over and kill it (read on for details). Allocation: --- Register stacks should be allocated in fairly big chunks. However, since there must be at least one active chunk per thread (and per coroutine), choosing anything actually huge will pose a scaling problem. Frames are allocated from the current stack, simply by advancing the water mark of the currently active chunk. If this causes the water mark to overflow, a new chunk needs to be allocated. Note that if a continuation has been captured and then invoked, the water mark will not necessarily point at the end of the current frame (since the invoked continuation may keep its registers deeper in the chunk) Deallocation: --- The stack frame can only be popped if the current continuation hasn't been captured (from the frame being popped). Here, pop means changing both frame pointer and the watermark. This ammounts to adding a flag to the current frame and bumping the flag if the return continuation gets moved into any directly accessible location. If the frame can't be popped, only the frame pointer should be changed to the caller's. GC: --- During DoD, the register frames should be marked (as parts of their chunks). Then the dead frames get dealt with in the following manner: Remove the trailing dead frames from each chunk (by just lowering the water mark). If after this the water mark remains high (e.g. past 50% of the chunk) but more than certain ammount of the chunk is salvagable as dead frames (50% seems like a good number again), the chunk should be copied, all the frame pointers fixed up, then the chunk gets killed. Essentially the chunks are treated as buffers. The watermark lowering won't help in cases when continuations get invoked in a round-robin manner (wish I could think of some simple Scheme example here that behaves this way), and start littering the chunk with interlaced dead frames. Two caveats: The frame pointer of the currently active frames (can be more than 1 due to threads) may be kept in a CPU register and can't be fixed-up. So the chunk containing currently active frame is pinned until it either overflows into another chunk or gets freed by popping. Chunks can contain reverse pointers to the continuations that use its frames. When copying the frame, just go through these reverse pointers and fix the continuations they point to. Performance: --- This scheme requires some GC flags to each frame (as well as reverse pointers). Possibly also next pointers to the frames, if they are not cut to equal size. Without continuations, this behaves like C function calling. Nothing will read return continuation and so the frames will simply get popped from the stack on return. When continuations begin to prevent popping, the stack will start growing from the last captured continuation (even if its dead). Watermark will drop in GC if the GC happens to hit while the active frame is way down the stack (i.e. just between two function calls). Otherwise, GC will wait till the chunk overflows (so that the active frame is in a higher chunk) and then will copy the live frames to a newly allocated chunk, compacting several chunks together if possible. Copying can be skipped if the chunk is near-full of the live frames. I think this about sums it up. Comments, corrections, wet noodles? Miro