Implementing junctions

2007-08-06 Thread Miroslav Silovic
I've been looking at various Perl6 operators in terms of what their 
implementation would look at the Parrot level. Junctions struck me as 
possibly highly problematic. The issue is, how would one go about 
compiling them at the point when they get passed to a function call as 1 
of the parameters.


For instance, in pugs, this works:

pugs my $a = 1|2
(1 | 2)
pugs sin $a
(0.8414709848078965 | 0.9092974268256817)
pugs sub foo() { return 1|2; }
pugs foo()
(1 | 2)
pugs sin foo
(0.8414709848078965 | 0.9092974268256817)

I came up with two ways:

1) Use Junction PMC that overloads various conversion entries in its 
vtable. Operators will have to use continuations to loop over junction 
contents.


-- This seems a non-starter, as vtable entries can't spawn a 
continuation that can reenter them. (right?)


2) Expand explicitly.

-- Problem is, you need to insert checks for each parameter before each 
function call, if a junction can be assigned to a variable like above.


This means that perl6 calling convention would have to explicitly 
typecheck parameters everywhere. This strikes me as much less efficient 
then, say, checking for ties everywhere. And ties have to be explicitly 
declared.


Would it make sense to require that Junctional variables *MUST* be 
declared as such, and same with function that return junctions? Strict 
reading of S9 seems to imply that this already is the case, but possibly 
a clarification with some examples might be in order.


So the example above would read:

my Junction $a = 1|2
sub foo(-- Junction) { return 1|2; }

And hope this topic hasn't already been rehashed. :)

   Miro



Re: Perl 6 Summary for 2005-03-07 through 2005-03-22

2005-03-23 Thread Miroslav Silovic
[EMAIL PROTECTED] wrote:
  pugs too lazy
   Miroslav Silovic noticed that closing a file handle in pugs did not
   force all the thunks associated with the file. While this was a bug in
   pugs, it led to conversation about whether = should be lazy or eager.
   Larry thinks that it will be safer to start eager and become lazy then
   vice versa.
   http://xrl.us/fijd
 

Accused of the original burst of insight, pleading not guilty! ;) The 
original post was to perl6-compiler by Andrew Savige.

http://groups-beta.google.com/group/perl.perl6.compiler/browse_frm/thread/2aca524a1203cd41/912bea4d0d05554a?q=surprised#912bea4d0d05554a
   Miro


Re: MMD as an object.

2005-03-11 Thread Miroslav Silovic
[EMAIL PROTECTED] wrote:
Ok. If you'd really need such random dispatch, it could be done like
this, when I interpret A12 correctly:
 sub run_random_bar($x) {
my @meths = WALKMETH($x, :method('bar'));
my $meth = @meths[rand(@meths.elems)];
$meth($x);
 }
or even with
 my sub bar($x) {...} # same body as above
 bar($x);
 

This is enough to make me completely happy. :)
   Miro


Re: MMD as an object.

2005-03-10 Thread Miroslav Silovic
[EMAIL PROTECTED] wrote:
What about multi subs? They can be defined everywhere. Given:
 multi sub *foo(A $a, B $b) {...}
Is this something like:
 %globals{foo}  --  MultiSub{foo_A_B = Sub, ...}
What about a not so global multi:
 multi sub foo(A $a, B $b) {...}
Thanks for clarifying,
leo
 

Uh, the real problem is the interaction between multisubs and modules. 
If you import a multi with the same short name from another module, you 
get a real issue that needs resolving. Especially if they do 
fundamentally different things and you don't expect a clash. Like

module Users;
multi get(Stream $f, UserDescriptor $u) {...}
...
module FileTransfer;
multi get(Server $s, File $f) {...}
Even if this is easy to fix by renaming, the error message would take a 
while to track down and it'd be annoying distraction during development. 
I believe the DWIM thing to do would be to merge multis into the same 
MMD object on module load. This would have to happen during runtime, 
since that's when the module load can occur.

   Miro


Re: continuation enhanced arcs

2004-12-09 Thread Miroslav Silovic
[EMAIL PROTECTED] wrote:
What this means is that care must be taken when you are writing code
that you expects to be invoked multiple times.  However, if you are a
function that on your second invocation returns via the continuation
from you first invocation, you should probably expect to be called
again because it happened the first time!
The contentious point is precisely that there is no way to tell at the 
compile time that you will be invoked multiple times. If something you 
pull from a library or a callback passed to you captures the 
continuation, you may notice that your return continuation had been 
promoted, but at that point it's already too late to kill all I/N/S 
register usage and to double-reference all PMCs. Of course, unless you 
keep the original AST and keep recompiling it to PIR. I'd argue that 
it's vastly more complex and error prone than Leo's solution he refered 
to in his post. It's also no better than refetching all lexicals.

Also, note that return-continuation-restores semantics is simply a 
(possibly premature) optimisation in its own right. CPS is supposed to 
treat every continuation the same (which turned out to be inefficient), 
and then return continuations were added to simplify the most common 
case. So, while return continuation is unpromoted, it's perfectly okay 
for it to behave any way it wants. Once it does get promoted, I'd argue 
that it should behave like a normal continuation first because it's more 
practical (see above) and second because that way it doesn't break CPS 
semantics.

   Miro



Re: Lexicals, continuations, and register allocation

2004-11-24 Thread Miroslav Silovic
Leopold Toetsch wrote:
Sure. But I've no confirmation of a compiler writer that its possible. 
Annotating PIR can only work for nested closures. If libraries are 
involved you are out of luck.

And we have such code already in library/Streams/Sub.imc.
I've been thinking of what could be implemented using continuations. 
Some possible ideas:

   - multiple returns from regexp matches
   - lazy lists. In particular gather/take
   - junctions
   - coroutines
   - resumable exceptions
Thing is, we can't know what the Real World perl6 code is going to look 
like. If it turns out that continuations are The Way to implement both 
junctions and gather/take construct, 'out of luck' is a good description 
of the ensuing fun and merryment. Basically, any PMC can be silently 
swapped with a junction, possibly making anything that uses it backtrack 
(what are the junctions supposed to be implemented as, anyway?) and I 
can imagine that lots of production perl6 code will have gather/takes 
all over the place - they look like an utterly nice way to construct a list.

My point is that continuations are rare because mainstream languages 
(with a single exception) don't support them.

BTW, I wonder if there are any research papers on the topic of register 
allocation in presence of continuations. I just looked at Citeseer, but 
a quick cursory search didn't come up with anything useful.

   Miro


Re: Exceptions, sub cleanup, and scope exit

2004-11-19 Thread Miroslav Silovic
Dan Sugalski wrote:
It's also important for people writing these things to take into 
account the possibility that their exit actions may potentially be 
triggered multiple times, courtesy of the joys of continuations.

Hmm, the first thing to take into the account is that return 
continuations can be promoted to the fully blown continuations. This 
should affect the handlers in the same way - so exception handlers could 
have become arbitrary invokable objects at the point when the exception 
is thrown.

   Miro


Re: calling conventions, tracebacks, and register allocator

2004-11-09 Thread Miroslav Silovic
[EMAIL PROTECTED] wrote:
On Nov 8, 2004, at 11:15 AM, Matt Fowles wrote:
Dan~
On Mon, 8 Nov 2004 13:45:08 -0500, Dan Sugalski [EMAIL PROTECTED] wrote:
The calling conventions and code surrounding them will *not* change
now. When all the sub stuff, and the things that depend on it, are
fully specified and implemented... *then* we can consider changes.
Until then, things stand the way they are.

I missunderstood.  I though you were saying that what is currently in
is final and will *never* be changed.  Thanks for the clarification.

Nevertheless, this is a legitimate topic for discussion, and the 
issues are fresh in people's minds. That's independent of any 
impediments that might block implementing changes at the current time.

I think it'd be a good idea to at least agree on a good TODO list, and 
commit that to the bugtracker. Because it may turn out that some changes 
are fine to delay, while some must be implemented now or never (because, 
for example, a large ammount of compiler code will get broken because of 
them).

   Miro


Re: Why is the fib benchmark still slow - part 1

2004-11-05 Thread Miroslav Silovic
[EMAIL PROTECTED] wrote:
a) accessing a new register frame and context
b) during DOD/GC
We have to address both areas to get rid of the majority of cache
misses.
ad a)
For each level of recursion, we are allocating a new context structure
and a new register frame. Half of these is coming from the recently
implemented return continuation and register frame chaches. The other
half has to be freshly allocated. We get exactly for every second
function call L2 cache misses for both the context and the register
structure.
 

Or it would make sense to use multi-frame register chunks. I kept 
locality of access in mind but somehow never spelled it out. But I 
*think* I mentioned 64kb as a good chunk size precisely because it fits 
well into the CPU cache - without ever specifying this as the reason.

Anyway, if you can pop both register frames -and- context structures, 
you won't run GC too often, and everything will nicely fit into the 
cache.  Is the context structure a PMC now (and does it have to be, if 
the code doesn't specifically request access to it?)

ad b)
The (currently broken) Parrot setting ARENA_DOD_FLAGS shows one
possibility to reduce cache misses in DOD. During a sweep (which runs
through all allocated object memory) the memory itself isn't touched,
just a nibble per object is used, which holds the relevant information
like is_live.
 

Is there a way to find out how many misses came out from DoD, compared 
to register frames allocation?

I believe that you shouldn't litter (i.e. create an immediately GCable 
object) on each function call - at least not without generational 
collector specifically optimised to work with this. This would entail 
the first generation that fits into the CPU cache and copying out live 
objects from it. And this means copying GC for Parrot, something that 
(IMHO) would be highly nontrivial to retrofit.

   Miro


Re: Why is the fib benchmark still slow - part 1

2004-11-05 Thread Miroslav Silovic
Leopold Toetsch wrote:
I believe that you shouldn't litter (i.e. create an immediately 
GCable object) on each function call - at least not without 
generational collector specifically optimised to work with this.

The problem isn't the object creation per se, but the sweep through 
the *whole object memory* to detect dead objects. It's of course true, 
that we don't need the return continuation PMC for the fib benchmark.
Well, creation is also the problem if you crawl the entire free heap
before triggering the next GC round. You get a potential cache miss on
each creation and on each mark and on each destruction. To keep GC out
of the way, the entire arena has to be confined to cache size or less.
But a HLL translated fib would use Integer PMCs for calculations.
Hmm, I'm nitpicking here, but it's not how e.g. Psyco works. It
specialises each function to specific argument types and recompiles for
each new argument type set. Assuming that you'll call only very few
functions with more than 1-2 type combinations, this is a good tradeoff.
It also removes a lot of consing, especially for arithmetics.

...  This would entail the first generation that fits into the CPU 
cache and copying out live objects from it. And this means copying GC 
for Parrot, something that (IMHO) would be highly nontrivial to 
retrofit.

A copying GC isn't really hard to implement. And it has the additional 
benefit of providing better cache locality. Nontrivial to retrofit or 
not, we need a generational GC.
Copying and generational are orthogonal concepts. It's quite possible to
have noncopying gengc (and nongenerational copying GC, but that's beside
the point). This gives you quick mark phases but without so much gain
with locality (I think Boehm GC can do this).
The problem with copying GC is that pointers can change under your feet
at every opportunity. Embedded C libraries that try to manipulate GCed
objects really hate when that happens - in particular where some
pointers get cached in the CPU registers - and telling GC to (un)protect
a pointer is a chore on C programmers (as bad as manual refcounting). I
suppose that there are good solutions to this, I'm just not aware of any.
The gain is that you can guarantee that the youngest generation will be
no bigger than X kb. This can be very good thing.
However, for the problem at hand - namely, littering during function
calls, custom deallocator (that'd be chunks) could be enough. In
particular, it makes sense to use it in conjunction with a non-copying
gengc.
   Miro



Re: [Summary] Register stacks again

2004-10-19 Thread Miroslav Silovic
[EMAIL PROTECTED] wrote:
Could we have the chunks only hold one frame and avoid much of the
compaction work?  If we return to the inderict access mechanism, we
can switch register frames by changing one pointer.  But if we keep
the one frame per chunk, we do not need to compact frames, standard
DOD/GC will be able to reclaim frames.  I recall there being
efficiency issues with frames being frequently allocated/deallocated
too frequently, so we could have a special free list for frames.
This proposal feels to me like a slightly simpler version of yours. 
Thus I would argue for it on the grounds of do the simple thing first
and compare its efficiency.
 

Well, for the code that doesn't do call/cc, bigger chunks mean that that 
you can use them as a classical stack. So you won't ever have to 
allocate them, and never have to run the compaction. For call/cc, you 
still don't have to compact them as often, since the non-captured 
continuations will get popped normally, and the watermark lowering will 
take care of temporarily captured continuations (between two GC's).

Basically bigger chunks mean that frames are allocated using the special 
scheme just for them. Considering that you're going to allocate one on 
each function call, I would agree with LT that the complexity is 
justified (and is not too bad - the way I understand the Parrot 
internals, which is to say, not too well ;), arrays of PMC pointers 
already get copy-collected; stack frame chunks are not too different 
from these).

  Miro



[Summary] Register stacks again

2004-10-18 Thread Miroslav Silovic
This is a summary of a private mail conversation between Leo and myself. 
lieNo, it didn't start by me forgetting to fix Reply-To when trying to 
post follow-up on the list./lie ;)

Essentially we whipped up a GC scheme for collecting the register stacks 
that doesn't make call/cc-using code, well, unusably slow.

In addition to LT's original post on the register stack, here's how to 
allocate them and clean up after their use. LT, feel free to hit me with 
wet noodles if I forgot anything.

Terminology:
---
Register frame is an actual frame of 4x16 registers.
Register chunk is a flat chunk of memory containing multiple register 
frames. It has a water mark that points where a new frame should be 
pushed into the chunk. I'm using stack and chunk interchangably.

Frames are subject to DoD, and chunks are subject to GC. There are 
plenty of tricks that can prevent GC from happening in many cases (read 
on for details). DoD is necessary anyway (to retrieve the live PMCs 
pointed from the register frames).

A chunk is pinned if GC currently can't copy it over and kill it (read 
on for details).

Allocation:
---
Register stacks should be allocated in fairly big chunks. However, since 
there must be at least one active chunk per thread (and per coroutine), 
choosing anything actually huge will pose a scaling problem.

Frames are allocated from the current stack, simply by advancing the 
water mark of the currently active chunk. If this causes the water mark 
to overflow, a new chunk needs to be allocated.

Note that if a continuation has been captured and then invoked, the 
water mark will not necessarily point at the end of the current frame 
(since the invoked continuation may keep its registers deeper in the chunk)

Deallocation:
---
The stack frame can only be popped if the current continuation hasn't 
been captured (from the frame being popped). Here, pop means changing 
both frame pointer and the watermark. This ammounts to adding a flag to 
the current frame and bumping the flag if the return continuation gets 
moved into any directly accessible location. If the frame can't be 
popped, only the frame pointer should be changed to the caller's.

GC:
---
During DoD, the register frames should be marked (as parts of their 
chunks). Then the dead frames get dealt with in the following manner:

Remove the trailing dead frames from each chunk (by just lowering the 
water mark).

If after this the water mark remains high (e.g. past 50% of the chunk) 
but more than certain ammount of the chunk is salvagable as dead frames 
(50% seems like a good number again), the chunk should be copied, all 
the frame pointers fixed up, then the chunk gets killed. Essentially the 
chunks are treated as buffers. The watermark lowering won't help in 
cases when continuations get invoked in a round-robin manner (wish I 
could think of some simple Scheme example here that behaves this way), 
and start littering the chunk with interlaced dead frames.

Two caveats:
The frame pointer of the currently active frames (can be more than 1 due 
to threads) may be kept in a CPU register and can't be fixed-up. So the 
chunk containing currently active frame is pinned until it either 
overflows into another chunk or gets freed by popping.

Chunks can contain reverse pointers to the continuations that use its 
frames. When copying the frame, just go through these reverse pointers 
and fix the continuations they point to.

Performance:
---
This scheme requires some GC flags to each frame (as well as reverse 
pointers). Possibly also next pointers to the frames, if they are not 
cut to equal size.

Without continuations, this behaves like C function calling. Nothing 
will read return continuation and so the frames will simply get popped 
from the stack on return.

When continuations begin to prevent popping, the stack will start 
growing from the last captured continuation (even if its dead). 
Watermark will drop in GC if the GC happens to hit while the active 
frame is way down the stack (i.e. just between two function calls). 
Otherwise, GC will wait till the chunk overflows (so that the active 
frame is in a higher chunk) and then will copy the live frames to a 
newly allocated chunk, compacting several chunks together if possible. 
Copying can be skipped if the chunk is near-full of the live frames.

I think this about sums it up. Comments, corrections, wet noodles?
   Miro