from:"Jeff Clites"

Re: continuation enhanced arcs

2004-12-01 Thread Jeff Clites

On Dec 1, 2004, at 7:23 AM, Dan Sugalski wrote:
At 12:06 AM -0800 12/1/04, Jeff Clites wrote:
On Nov 30, 2004, at 11:45 AM, Dan Sugalski wrote:
In this example:
% cat continuation6.ruby
def strange
callcc {|continuation| $saved = continuation}
end
def outer
a = 0
strange()
a = a + 1
print a = , a, \n
end
Through the joys of reference types, a will continue to increase 
forevermore, assuming the compiler hasn't incorrectly put a in an 
int register. (Which'd be wrong)
Separate question, but then what would happen for languages which 
_do_ use primitive types? (Presumably, Perl6 would do that in the my 
int case.) If proper behavior requires basically never using the 
primitive I/N types, that seems like a waste.
Two potential options. One, they have a backing PMC for the lexicals 
(which they'll probably need anyway) and flush/restore at spots where 
things could get lost. Two, they don't actually get a low-level type 
but the compiler cheats and acts as if it was in spots where it's 
safe. (I admit, I'd always planned on having my int $foo use a PMC. 
The win would be for my int @foo which would also get a PMC, but one 
that had optimized backing store for the values)
But so it sounds like I and N registers are a bit of a waste then, no? 
They won't really be used. Even in your my int @foo case, you could 
have an optimized backing store for the array, but you'd have to 
on-the-fly create a PMC to hold any values you pulled out of the array, 
e.g. for a subsequent x = @foo[1] (in Perl6 syntax)--x would have to 
hold a full PMC there, it seems, so nothing there would end up in an I 
register (except as in intermediate in immediately creating the PMC).

Seems like instead we'd be able to simply use an I register, and be 
done with it. The returny-ness of a return continuation seems bogus, if 
it's supposed to survive promotion to a real continuation. All we need 
is some sort of per-frame storage (like lexical pads provide, but for 
all register types), and yes explicit moving of things in-and-out of 
that storage if we want to re-use registers (and only necessary across 
function call if we do plan on re-using a register). That's what 
happens in hardware CPUs.

But even with all of what you say below, I still think there's an issue 
with overlapping variable lifetimes. Consider my original pseudocode 
example again, with one added print statement:

x = 1
foo()
print x
print y
y = 2
return y
The first time foo() returns, the print y should find y holding 
undef (or something like that), and subsequent times it should find it 
holding 2. (That's certainly what the analogous goto would do.) That 
means that even with an intermediate Reference struct, you still can't 
use the same register to hold the Reference for x and the Reference for 
y, because their lifetimes overlap, but only due to the presence of 
continuations--without them, you could re-use the register. Re-entering 
the frame needs to leave all of the local variables intact, and that's 
stretching their lifetimes out to extend across most of the frame. Or 
so it would seem.

The contents can change over and over without the register itself 
ever changing.
But in this Ruby case, a = a + 1 actually creates a new Fixnum 
instance, so a ends up holding a different instance each time--you 
can verify that by printing out a.id in the print statement.
This is where the magic of objects comes in, for particularly loose 
values of magic. Ruby uses a double-reference system for objects the 
same way that perl 5 does -- that is, the underlying structure for a 
doesn't hold the object, rather it holds a pointer to another 
structure that holds the object. So it looks like:

 (name a) - (a struct, addr 0x04) - (object struct addr 0x08)
and after the store it looks like:
 (name a) - (a struct, addr 0x04) - (object struct addr 0x0C)
The PMC register would hold the 0x04 struct pointer, basically a 
Reference PMC, and assignments to it just switch the thing it refers 
to.
But what's the point of that--seems like a waste of an intermediate 
struct. And I assume the intermediate struct would be a PMC, no?

Essentially things like I and N registers are value types, PMC and 
strings are (for us) reference types, and many objects (including ruby 
and perl's objects) are double-reference types.
In Ruby everything is an object, so we'd have double references even 
for numbers (number-ish PMCs). That loops back to the point of my 
question in the thread Basic compilation example (a + b)?. If a + b 
compiles as add P16, P18, P17, and registers actually hold some 
intermediate Reference type, then the MMD of add would be 
type-dispatching on the reference type, not the type of the objects 
inside (as the MMD is currently written). Since Ruby (and Python) have 
type-less variables, that implies to me a single Reference type at 
least for them, so our MMD would always call the same function for 
add

Re: continuation enhanced arcs

2004-11-30 Thread Jeff Clites

On Nov 30, 2004, at 5:28 AM, Dan Sugalski wrote:
At 1:45 AM -0800 11/29/04, Jeff Clites wrote:
On Nov 28, 2004, at 2:48 AM, Piers Cawley wrote:
I just thought of a heuristic that might help with register
preservation:
A variable/register should be preserved over a function call if 
either of the
following is true:

1. The variable is referred to again (lexically) after the function 
has
   returned.
2. The variable is used as the argument of a function call within the
   current compilation unit.
That doesn't solve it, though you'd think it would. Here's the 
counter-example:

x = 1
foo()
print x
y = 2
return y
You'd think that x and y could use the same memory location 
(register, variable--whatever), since ostensibly their lifetimes 
don't overlap. But continuation re-invocation can cause foo() to 
return multiple times, and each time it should print 1, but it 
won't if x and y use the same slot (it would print 2 each time 
after the first). In truth, their lifetimes do overlap, due to the 
hidden (potential) loops created by continuations.
Except... we've already declared that return continuations are 
special, and preserve the registers in the 16-31 range. So when we 
return from foo, regardless of how or how many times, the pointer to 
x's PMC will be in a register if it was in there before the call to 
foo, if it's in the preserved range. So in this case there's no 
problem. Things'll look like:

  x = 1 # new P16, .Integer; P16 = 1 # P16 has pointer value 0x04
  foo() # foo invocation
  print x # P16 still has pointer value 0x04
  y = 2 # new P16, .Integer; P16 = 2 # P16 now has pointer value 
0x08
  return y # Passes back 0x08

With more or less clarity.
But the problem isn't preservation per se. When a continuation 
(originally captured inside of foo) is invoked, the frame will be 
restored with the register contents it had when it last executed, so 
P16 in your annotations will have pointer value 0x08 after the first 
time that continuation is invoked (because y = 2 will have executed 
and changed the register contents). That will result in print x 
printing the value 2, which is wrong from the perspective of the code 
(that line should always print 1). To do-the-right-thing, the 
register allocator either has to use a separate register to hold y, or 
needs to do some other preservation dance (instead of relying on 
preserved registers). And again, I think the reason for this is that 
the lifetimes of x and y overlap, so you can't just use the same 
register to store them. The only surprising part of all of this is that 
their lifetimes in fact overlap--they only overlap due to 
continuations, and wouldn't otherwise. It's the implicit branch from 
the return to the op-after-the-call-to-foo that's causing them to 
overlap.

None of this should have anything to do with return continuations 
specifically, since this is the case where the body of foo (or 
something called from it) creates a real continuation, which as I 
understand it is supposed to promote the return continuations to real 
continuations all the way up the stack.

JEff

Re: continuation enhanced arcs

2004-11-30 Thread Jeff Clites

On Nov 30, 2004, at 10:27 AM, Dan Sugalski wrote:
At 10:15 AM -0800 11/30/04, Jeff Clites wrote:

Oh. No, it won't. We've declared that return continuations will always 
leave the top half registers in the state they were when the return 
continuation was taken. In this case, when it's taken to pass into 
foo, P16 is 0x04. When that return continuation is invoked, no matter 
where or how many times, P16 will be set to 0x04. This does make 
return continuations slightly different from 'plain' continuations, 
but I think this is acceptable.
Ah, I see.
None of this should have anything to do with return continuations 
specifically, since this is the case where the body of foo (or 
something called from it) creates a real continuation, which as I 
understand it is supposed to promote the return continuations to 
real continuations all the way up the stack.
The return continuations have to maintain their returny-ness 
regardless, otherwise they can't be trusted and we'd need to 
unconditionally reload the registers after the return from foo(), 
since there's no way to tell whether we were invoked via a normal 
return continuation chain invocation, or whether something funky 
happened down deep in the call chain.
Yeah, so I think that won't work correctly. Here's an example from Ruby 
which I posted in a previous thread. If the return from the call to 
strange() by outer() always restores the registers as of the point the 
(return) continuation was created, then the below would print out a = 
1 over and over, but really it's intended that the value should 
increase, so with the behavior you describe, the following Ruby code 
wouldn't work right:

% cat continuation6.ruby
def strange
callcc {|continuation| $saved = continuation}
end
def outer
a = 0
strange()
a = a + 1
print a = , a, \n
end
# these two lines are main
outer()
$saved.call
% ruby continuation6.ruby
a = 1
a = 2
a = 3
a = 4
a = 5
a = 6
a = 7
a = 8
a = 9
a = 10
...infinite loop, by design
JEff

Re: EcmaScript

2004-11-29 Thread Jeff Clites

On Nov 28, 2004, at 4:58 AM, Herbert Snorrason wrote:
On Sat, 27 Nov 2004 21:49:49 -0500, Michael G Schwern 
[EMAIL PROTECTED] wrote:
On Sat, Nov 27, 2004 at 09:58:44PM +, Herbert Snorrason wrote:
It should. EcmaScript is also a relatively small language, which 
would
work strongly in its advantage...
A 188 page language spec is small? ;)
ECMA-262, ECMAScript Language Specification: 172 pages.
ECMA-334, C# Language Specification: 448 pages.
ISO 1539-1, Fortran Part 1, Base language: 567 pages.
ISO 1989, COBOL: 859 pages.
ISO 9899, C: 538 pages.
ISO 14882, C++: 757 pages.
Yes, it is. :)
You cheated:
Revised^5 Report on the Algorithmic Language Scheme: 50 pages.
But still, small by comparison with most. :)
JEff

Re: continuation enhanced arcs

2004-11-29 Thread Jeff Clites

On Nov 28, 2004, at 2:48 AM, Piers Cawley wrote:
I just thought of a heuristic that might help with register
preservation:
A variable/register should be preserved over a function call if either 
of the
following is true:

1. The variable is referred to again (lexically) after the function has
   returned.
2. The variable is used as the argument of a function call within the
   current compilation unit.
That doesn't solve it, though you'd think it would. Here's the 
counter-example:

x = 1
foo()
print x
y = 2
return y
You'd think that x and y could use the same memory location (register, 
variable--whatever), since ostensibly their lifetimes don't overlap. 
But continuation re-invocation can cause foo() to return multiple 
times, and each time it should print 1, but it won't if x and y use 
the same slot (it would print 2 each time after the first). In 
truth, their lifetimes do overlap, due to the hidden (potential) loops 
created by continuations.

The problem isn't preservation across calls per se, it's the implicit 
loops. Continuations are basically gumming up tons of potential 
optimizations.

JEff

Re: EcmaScript

2004-11-27 Thread Jeff Clites

On Nov 27, 2004, at 5:58 PM, liorean wrote:
On Sat, 27 Nov 2004 19:30:20 -0500, Sam Ruby [EMAIL PROTECTED] 
wrote:

Overall, JavaScript would be a good match for Parrot.  One place 
where it would significantly diverge at the moment is in the concept 
of a class.  Objects in JavaScript are little more than bundles of 
properites, some of which may be functions.  And classes are 
essentially templates for such objects.
I don't really think it's that strange. Essentially, all objects
contain a reference to their prototype. When getting a member of an
object, the object will first check it's own members for the
corresponding identifier, then ask it's prototype, and so on until the
prototype chain is depleted. Setting is always done on the object
itself. It's really not so much inheritance as it is conditional
runtime delegation. Functions are of course first class and shouldn't
differ from any other member - there is no native method/property
distinction in JavaScript, even though host object may have such a
distinction.
This seems to me to be very much the way Python works as well, though 
the emphasis is different. (That is, the common case in Python is to 
define methods per-class rather than per-instance, and in JavaScript 
it's the opposite. But that's not a technological difference, just a 
cultural one.) I would think that the implementations would share a 
lot.

JEff

Re: Another issue with pdd03

2004-11-16 Thread Jeff Clites

On Nov 15, 2004, at 12:38 AM, Leopold Toetsch wrote:
Bill Coffman [EMAIL PROTECTED] wrote:
[ pdd03 ]
The way I read it, paragraph one implies that when you print P5 after
calling foo(), you are expecting to get the return value.  You didn't
save and restore register P5, so you wanted foo() to do something to
it.
The nasty thing of a function call is:
  i = foo()   # fine P5 returned
vs.
  foo()   # P5 clobbered by foo
(but you can replace P5 with P15 too, or every register R5..R15)
This has never bothered me, probably because of the comparison to the 
register-based calling conventions that the PPC uses: A called function 
(which returns a value) has to store its result in some register, 
whether or not the caller wants it. A call like i = foo() is really 
two steps: call the function, then copy the result from r3 to the 
appropriate location (maybe a location on the stack, maybe another 
register). It may be possible to optimize so that i is already using 
the same register as the return value, but in general that can't be 
arranged for most cases; consider how this would compile:

	i = foo() //call foo, copy result from r3 to other register--must 
since bar() would clobber
	j = bar() //call bar, copy result from r3 to other register--could 
avoid copy, if j not
			//needed past baz
	baz(i + j) //add those two other registers into r3, and call baz

And due to the register-preservation semantics on the PPC, even a call 
to a void-return function could clobber r3, since it could call another 
function which returns a result and thus uses r3.

Not that parrot has to necessarily work this way, but it at least has 
precedent, so it's not totally strange behavior.

JEff

Re: Another issue with pdd03

2004-11-16 Thread Jeff Clites

On Nov 14, 2004, at 9:32 AM, Leopold Toetsch wrote:
Defining now that P5 has to be preserved in main, because it's a 
possible return result of foo() and therefore may be clobbered by 
foo() is meaning, that we have effectively just 16 registers per kind 
available for allocation around a function call.

If the latter is true according to current pdd03 then we are wasting 
half of our registers for a very doubtful advantage: being able to 
pass return values in R5..R15.
In effect this is quite similar to the PPC calling conventions: we have 
roughly half of the registers preserved across function calls. In terms 
of the volatile registers, it's fine to use them for local 
calculations, as long as either you're using them to hold values which 
don't need to persist across function calls, or you 
preserve-and-restore them yourself.

But that loops back to a previous proposal of mine: If they're not 
being preserved, and in fact need to be synced between caller and 
callee, then having these registers physically located in the 
interpreter structure, rather than in the bp-referenced frame, saves 
all the copying, and makes it more obvious what's going on.

JEff

Re: Continuations, basic blocks, loops and register allocation

2004-11-16 Thread Jeff Clites

On Nov 15, 2004, at 10:29 AM, Leopold Toetsch wrote:
Jeff Clites [EMAIL PROTECTED] wrote:
Picture this call stack:

	main -- A -- B -- C -- D -- E

The call from D to E uses the RESUMEABLE label, and E stores the
resulting continuation in a global, and everything up to main returns.
Then, main invokes the continuation, and we find ourselves back in D.
Now, C, B, and A will return again, without any compile-time hint.
That's fine. The return is an expected operation. I don't think that's
the problem. The error in gc_13 is a call path like:
   choose() - try (with_cc) - fail() -
|
 (choose again)  - + --+
 |
   choose() - try (with_cc) - fail() -|
||
 (choose again)  - +|
 |
   fail()  --+
The problem now is not per se the path in main from the two choose()
calls down to fail is executed more then once (as it's the case with
multiple returns). The problem is the loop in main. By denoting the 
loop
from the call to fail() to the first choose() with some kind of syntax,
the register allocator does the right thing.
But consider even this simple function:
sub B
{
a = 1
foo()
print a
b = 2
return b
}
If something called by foo() captures a continuation, and something 
invokes it after B() returns, then there's a hidden branch, in effect, 
from the return to the print, isn't there? The register allocator could 
decide to use the same register for a and b, but then the second 
return from foo() would print 2 instead of 1, which is wrong. And the 
author of B(), of course, may have no idea such a thing would happen, 
so wouldn't be able to supply any syntax to tell the compiler.

I'm just trying to come up with a simpler example, since it seems to me 
that there's a problem any time a function returns, and the last thing 
that executed in that frame wasn't a call to that function. (There's a 
lot going on in the gc_13 example, and I think some of it is 
distracting from the main point.)

But a RESUMABLE label seems like the information that's needed by the 
compiler. But on the other hand in an example like the above, the 
function B() may not be written to expect foo() to be resumed. So, what 
should happen at runtime, if the label is absent? We could declare 
undefined behavior, but that would mean that for defined behavior, 
you'd need the RESUMABLE label all the way up the stack, and Ruby and 
Scheme don't have that syntactic constraint. With Scheme, it's only 
clear from the syntax what's going on locally--but you can invoke a 
continuation far from any call/cc, if that continuation was stored away 
into a variable.

JEff

Re: Continuations, basic blocks, loops and register allocation

2004-11-16 Thread Jeff Clites

On Nov 16, 2004, at 10:03 AM, Matt Fowles wrote:
Since both you and Leo are arguing against me here, it seems like that
I am wrong, but I would like to figure out exactly why I am wrong so
that I can correct my train of thought in the future.
Here's a real example you can play with, if you have Ruby installed:
% cat continuation6.ruby
def strange
callcc {|continuation| $saved = continuation}
end
def outer
a = 0
strange()
a = a + 1
print a = , a, \n
end
# these two lines are main
outer()
$saved.call
% ruby continuation6.ruby
a = 1
a = 2
a = 3
a = 4
a = 5
a = 6
a = 7
a = 8
a = 9
a = 10
...infinite loop, by design
What happens when the program runs is that outer() is called (only 
once) which creates a continuation (inside of strange()), increments a, 
prints and returns. The next thing that happens is that the 
continuation is invoked. Control jumps to the location in strange() 
right after the callcc line, then that return and we are at the line in 
outer() where 'a' is incremented. So 'a' increments from the last value 
it had in that frame (since we are magically back again inside of the 
same single invocation of outer()), then 'a' is printed and outer() 
returns again (note: outer only called once, returned twice so far), 
and then we call the continuation again, and start the loop over.

We only ever create one continuation in this example, since we only 
ever call strange() once. The continuation preserves the frame (the 
mapping from logical variables to their values), but not the values of 
those variables at the time the continuation was created. In effect, I 
think the continuation is arranging to preserve the state of the 
variables as they were when code in the frame was last executed, rather 
than at the time the continuation was created.

The behavior you were describing is what I had thought would happen, 
but then I realized I wasn't sure, so I confirmed that it wasn't. The 
above is the behavior of Ruby, and I believe Scheme works the same way. 
What you described would be useful for backtracking (jumping back not 
only to a previous location in a computation, but also its previous 
state), but it's not what these languages seem to do.

JEff

Re: Continuations, basic blocks, loops and register allocation

2004-11-15 Thread Jeff Clites

On Nov 14, 2004, at 3:03 AM, Leopold Toetsch wrote:
Matt Fowles [EMAIL PROTECTED] wrote:
Yes, but in the case of the continuation resuming after foo, the
continuation should restore the frame to the point where it was taken.
 Thus all of the registers will be exactly as they were when the
continuation was taken (i.e. in the correct place).
Yes, but Jeff's example wasn't really reflecting the problem.
How come? (Not quibbling--just afraid I'm missing something.) It seems 
that even this function body shows the problem:

a = 1
foo()
print a
b = 10
return b
It would seem (w/o continuations) that b should be able to re-use a's 
register, but if it does then we'll print 10 instead of 1 the second 
time.

So what to do:
1) Extending register frame size ad infinitum and never reuse a Parrot
register will definitely blow caches.
2) Generating an edge for every call to every previous calls will blow
the CFG and cause huge pressure on the register allocator. A lot of
spilling will be the result.
3) Using lexicals all over is slower (but HLL compilers will very 
likely
emit code that does exactly that anyway). So the problem may not be a
real problem anyway. We just know that an optimizer can't remove the
refetch of lexicals in most of the subroutines.
It seems that, in term of cache locality, the same problem is there 
with more registers v. spilling v. lexicals. That is, if you have 100 
local variables whose lifetimes overlap (due to continuations), then 
you need 100 distinct memory locations to (repeatedly) access.

4) Having an explicit syntax construct (call-with-current-continuation
 that expresses verbatim, what's going on, like e.g. with a reserved
word placed as a label:
  RESUMEABLE: x = choose(arr1)
I don't think that really helps either: If you have such a call, then 
all the frames higher up the stack also can return multiple times, so 
they have the behavior, even w/o the label.

On the other hand, this Ruby code really bugs me (note: $ variables 
in Ruby are globals):

% cat continuation5.ruby
def strange
callcc {|continuation| $saved = continuation}
end
def outer
a = 0
strange()
a = a + 1
print a = , a, \n
a = hello
print a = , a, \n
end
outer()
$saved.call
% ruby continuation5.ruby
a = 1
a = hello
continuation5.ruby:8:in `+': failed to convert Fixnum into String 
(TypeError)
from continuation5.ruby:8:in `outer'
from continuation5.ruby:14

What bugs me is that outer gets an error when the continuation is 
invoked, because the second time strange() returns, a is a string 
and so you can't add 1 to it. But looking at the definition of outer, 
you'd expect that you could never get such an error. (Without the line 
setting a to hello, you get an infinite loop, printing increasing 
integers.)

JEff

Re: Continuations, basic blocks, loops and register allocation

2004-11-15 Thread Jeff Clites

On Nov 15, 2004, at 3:27 AM, Leopold Toetsch wrote:
Jeff Clites [EMAIL PROTECTED] wrote:
On Nov 14, 2004, at 3:03 AM, Leopold Toetsch wrote:

Yes, but Jeff's example wasn't really reflecting the problem.

How come?
Your code didn't reuse Ca after the call.
Oops.
It seems that, in term of cache locality, the same problem is there
with more registers v. spilling v. lexicals.
Not quite. We can't extend just one register set, we'd do that for all 
4
kinds. You can not easily have 64 PMC registers and just 10 INTVALs.
A lexical access is just an array lookup (at least if the compiler uses
the indexed addressing of lexicals).
Ah. What I don't like, though, is that in the in the lexical case you 
have things sitting in an  array, from which you need to move things 
back-and-forth to registers to work on them. In the more registers 
case, they're sitting in a quite similar array, but you can work on 
them directly there. Adding 2 such INTVALs is one op, instead of 4 (2 
loads, 1 add, 1 store). Since the problem exists for all register types 
(unless HLLs only use PMCs, which is possible), then we either need 4 
lexical arrays for maximum locality (so that they can be sized 
independently), or we need to be able to size the register frames 
independently for the 4 types. (I realize that currently lexicals must 
be PMCs, but it seems we have the same issue with other reg. types.)

... That is, if you have 100
local variables whose lifetimes overlap (due to continuations), then
you need 100 distinct memory locations to (repeatedly) access.
Sure. If the program is complex you need the storage anyway.
But I think the real problem is that it's only due to CPS that the 
lifetimes are overlapping so much--that's what's biting us. (By 
expanding my example code, you could come up with simple code which 
uses 100 locals be would only need 1 register w/o continuations, and 
needs 100 spots with them.) It's just a pretty unfortunate price 
we're paying, if the feature is not extensively used. Now we're just 
figuring out how to survive it.

4) Having an explicit syntax construct 
(call-with-current-continuation
 that expresses verbatim, what's going on, like e.g. with a reserved
word placed as a label:

  RESUMEABLE: x = choose(arr1)

I don't think that really helps either: If you have such a call, then
all the frames higher up the stack also can return multiple times, 
so
they have the behavior, even w/o the label.
The RESUMABLE label is of course at the invocation that might
resume, somehwere up the call chain. Again: the HLL compiler (or the
programmer) knows the effect of the continuation. Just the PIR code is
too dumb, i.e. is lacking this information.
Picture this call stack:
main -- A -- B -- C -- D -- E
The call from D to E uses the RESUMEABLE label, and E stores the 
resulting continuation in a global, and everything up to main returns. 
Then, main invokes the continuation, and we find ourselves back in D. 
Now, C, B, and A will return again, without any compile-time hint.

On the other hand, this Ruby code really bugs me (note: $ variables
in Ruby are globals):

... , you get an infinite loop, printing increasing
integers.)
Sure. With callcc you are calling the function again and again.
I know--the infinite loop was the desired behavior (just mentioned it 
so that people wouldn't have to run it). What bugs me is that you can 
get that error, though looking at the code it should be impossible. The 
author of outer() might have no clue that could happen, so it's not 
really his bug, and the person using a continuation needs really 
detailed knowledge of everything in the call stack, to know if it will 
work. I guess that's just how it is, but it seems to mean that 
continuations have limited usefulness in languages with side-effects, 
except for very local usage (breaking out of a loop and such).

JEff

Re: Continuations, basic blocks, loops and register allocation

2004-11-14 Thread Jeff Clites

On Nov 14, 2004, at 1:53 PM, Dan Sugalski wrote:
Since, for example, it's completely reasonable (well, likely at least) 
for a called sub to rebind lexicals in its parent
What does that mean, exactly? It seems like that directly contradicts 
the meaning of lexical. For instance, see Larry's comments from Re: 
Why lexical pads at September 25, 2004 10:01:42 PM PDT (the first of 
the 3 messages from him on that day).

JEff

Re: Continuations, basic blocks, loops and register allocation

2004-11-13 Thread Jeff Clites

On Nov 13, 2004, at 8:53 AM, Leopold Toetsch wrote:
2) Continuations (t/op/gc_13.imc [Hi Piers])
Again we have a hidden branch done by a Continuation, or better a 
loop. The control flow of the main program is basically represented by 
this conventional code fragment:

  arr1=[...]; arr2=[...]
   loop1: x = choose(arr1)
   loop2: y = choose(arr2)
  ...
 failed = fail()
 goto (loop1, loop2)[failed]
except that the gotos are performed by backtracking via the 
continuations. So we don't have these loop labels and the continuation 
continues at the next opcode after the invocation of choose() and not 
at the shown position above.

So again, the register allocator doesn't see that there is a branch, 
the variable's arr2 register is clobbered, in this case by the fail 
closure.

As we never know, if a subroutine captures the return continuation and 
creates such loops like above, we have in the absence of other syntax 
actually no chance to hold any variable in a register as far as I can 
see now. We'd have just to force using lexicals for all vars, except 
for leaf subroutines (that don't call other subs) or subs that just 
call one sub (they can't create loops).

Another idea is to create edges from all 1..n function calls in a sub 
to the previos 0..n-1 calls, so that basically all possible loops done 
via possible continuations are represented in the CFG.
That analysis looks correct to me--any time you have a function call, 
the subsequent op might be a branch target, if there is a subsequent 
function call.

We'd have just to force using lexicals for all vars
Having variable-size register sets would solve this, since you could 
have fixed assignments of variables to registers, and not have to 
suffer the overhead of moving data between registers and lexical pads 
over-and-over. Well, it doesn't really solve it--just makes it 
workable.

JEff

Re: Continuations, basic blocks, loops and register allocation

2004-11-13 Thread Jeff Clites

On Nov 13, 2004, at 11:16 AM, Matt Fowles wrote:
All~
On Sat, 13 Nov 2004 10:52:38 -0800, Jeff Clites [EMAIL PROTECTED] 
wrote:
On Nov 13, 2004, at 8:53 AM, Leopold Toetsch wrote:
We'd have just to force using lexicals for all vars
Having variable-size register sets would solve this, since you could
have fixed assignments of variables to registers, and not have to
suffer the overhead of moving data between registers and lexical pads
over-and-over. Well, it doesn't really solve it--just makes it
workable.
I like the idea of mandating lexicals vars.  This would also eliminate
the need for spilling (I think), as the register allocator would only
need to refetch the lexical rather than save it off somewhere to be
restored later.
In a way I feel like they're both same thing, just under a different 
description: spilling means moving data back-and-forth between 
registers and some other storage, and so does using lexicals.

But the only reason we have to do that sort of dance (under either 
description) is because we are RISC-ish: We have a limited number of 
registers, and calculations can only target registers (that is, you 
can't add 2 numbers directly in a lexical pad or other storage--they 
have to be moved to registers first). You don't have to move data 
back-and-forth if either you have an unlimited number of (preserved) 
registers, or you allow calculations to act directly on other memory 
locations. And I think this is again just two different ways of 
describing the same thing: you have an unlimited number of stable 
storage locations, and you do calculations directly from those 
locations. It's just that the former (unlimited preserved registers) 
feels cleaner, and doesn't require an explosion of op variants.

That's oversimplifying a bit, but I feel like those are the core issues 
(stemming from the observation of Leo's that continuations in effect 
give all variables a lifetime that extends to their entire block, in 
most cases).

JEff

Re: Continuations, basic blocks, loops and register allocation

2004-11-13 Thread Jeff Clites

On Nov 13, 2004, at 2:46 PM, Matt Fowles wrote:
On Sat, 13 Nov 2004 14:08:12 -0800, Jeff Clites [EMAIL PROTECTED] 
wrote:
That's oversimplifying a bit, but I feel like those are the core 
issues
(stemming from the observation of Leo's that continuations in effect
give all variables a lifetime that extends to their entire block, in
most cases).
This does not give all variables extended lifetimes.  It only gives
variables that are used in the exception handler such a lifetime.
Thus temporaries used in calculations can be safely overwritten.
Perhaps we should try adding the control flow arcs to the basic block
analysis and see how the allocator handles it then...
Not all variables, but due to Leo's case (2), it should be all 
variables which are referenced after the first function call out of a 
subroutine, if there's a later function call; for instance, consider:

...
foo();
a = 10;
b = 12;
... use a and b
bar();
... never use a or b again
c = 1;
d = 2;
... use c and d
baz();
... never use c or d again
zoo();
Normally, the lifetime of a and b would end at the call to bar, and 
that of c and d would end at the call to baz, but due to continuations, 
the call to zoo might resume at the op right after the call to foo 
(since the continuation created when calling foo may have been 
captured), so a,b,c,d have to persist at least until the last function 
call is made.

We could teach the allocator about these possibilities as you 
mentioned, and that would give us correct program behavior, but we know 
a priori that we'll have much higher register pressure that you might 
expect, so a different overall strategy might work better.

JEff

Re: Tail calls and continuations

2004-11-11 Thread Jeff Clites

On Nov 10, 2004, at 11:53 PM, Leopold Toetsch wrote:
Jeff Clites [EMAIL PROTECTED] wrote:
...it sounds like we have an easy way to tell if a real continuation
has a claim on the register frame, because creating such a real
continuation can mark the frame,
There is no such mark. If ctx-current_cont isa(RetContinuation), then
it's save to do the tail call.
Good--implicit mark then.
This OTOH is meaning that we can do the
check only at runtime. Thus the Ctailcall or Ctailcallmethod 
opcodes
have to do plain calls if they detect such a situation.
Yep, although there will be some situations where we can know for sure 
at compile time--for instance if all function calls within a function 
are tail calls. (We could have a special sort of Ctailcalldontcheck, 
but that depends on our philosophy--if it's okay for the VM to trust 
the compiler. Or possibly the VM could detect this, and cache this 
information as a sort of dynamic optimization; or a bytecode verifier 
could ensure that Ctailcalldontcheck is valid in a given context. 
Multiple options here.)

And also, even when Ctailcall has to fall back to a plain call, it 
doesn't have to fall all the way back--it can still pass along the 
return continuation from its caller, and get some benefit.

... (In fact that mark should be a reference
count,
That's really not needed. If you return from the function and you call
it next time, you've again a RetContinuation. If the continuation was
created somewhere deeper in the call chain, it's gone or not after the
GC cycle.
But if ctx-current_cont has been promoted to a real continuation (as a 
result of something that happened deeper in the stack), it will never 
be turned back to a RetContinuation, even if it could have been (ie, if 
GC reclaimed the things that caused the promotion), so we might forego 
a tail call that we could have made.

And you know - starting with refcounting one objects ends up
with refcounting containers holding that item ...
Not really. In this case, the only things which are allowed to point to 
a register frame (via a ctx) are the interpreter itself, and 
continuations. There are just a couple of specific points where the ref 
count would need to be incremented/decremented (including the destroy() 
of continuations). But that's the clear place to stop--you don't need 
to ref count the continuations themselves, since that wouldn't be 
practical (you'd need write barriers to always check if you're 
referencing or un-referencing a continuation, etc.). We wouldn't have 
to go off the deep end with it.

JEff

Re: #line directives from PMC file should now work

2004-11-11 Thread Jeff Clites

On Nov 11, 2004, at 6:53 AM, Leopold Toetsch wrote:
Nicholas Clark [EMAIL PROTECTED] wrote:
Builds pass with the --no-lines option in Makefile removed. Should 
this
be removed from the template Makefile so that all builds now use #line
directives?
Yep. Is there still that %ENV var around to turn line numbers off? - Or
was that in ops2c.pl - PARROT_NO_LINE is it, could be handy sometimes.
I'd really like a way to turn them off easily (for the ops as well, 
actually). I find them to be counterproductive (for our stuff), since 
what gets shown in the debugger isn't stuff you can actually get gdb to 
evaluate.

JEff

Re: Tail calls and continuations

2004-11-11 Thread Jeff Clites

On Nov 11, 2004, at 9:44 AM, Michael Walter wrote:
On Thu, 11 Nov 2004 12:30:16 -0500, Dan Sugalski [EMAIL PROTECTED] wrote:
Tail calls should be explicit, compile-time things. Otherwise we're
going to run afoul of traceback requirements and suchlike things
Nah, that's just the difference between running optimized and 
unoptimized. Actually, with a tailcall op that's effectively a hint, it 
would be ideal to have 3 run modes: a just obey the hints mode (for 
those that trust their compiler), a never do a tail call mode (for 
debugging purposes), and a do tail calls whenever possible mode 
(independent of whether the tailcall op was used). Not sure if those 
are run modes or assembler (optimizer) modes, though.

Besides, it's a
lot easier in general for a language compiler to decide when a tail
call's in order than it is for us.
I think it's pretty straightforward to tell, at least at the PIR level. 
Even at the pasm level, the only requirement is that there's 
effectively no code between a call from foo to bar, and a return from 
foo. It's a pure optimization--good fodder for an optimizer.

Even further, it's necessary for some languages
(Scheme)/paradigms (loop by recursion) that a tailcall is not just a
hint but mandatory.
I think that actually doesn't matter. Even in the case where we think 
we can't do a full tail call optimization (because of a continuation 
that's been taken), we can still actually remove the calling frame from 
the call stack--we just can't immediately re-use the register frame. 
That satisfies the Scheme requirement, I would think. You can still do 
unbounded recursion, just that GC may need to run to clean up call 
frames.

Leo said:
$ time parrot  -j fact.imc 1  # [1]
maximum recursion depth exceeded
I'd think that long-term our max recursion depth limit should only 
apply to net frame depth--tail calls shouldn't increase the count. 
(Probably we'd need 2 counts--net depth and logical depth.)

JEff

Re: Tail calls and continuations

2004-11-11 Thread Jeff Clites

On Nov 11, 2004, at 11:24 AM, Dan Sugalski wrote:
At 11:16 AM -0800 11/11/04, Jeff Clites wrote:
On Nov 11, 2004, at 9:44 AM, Michael Walter wrote:
On Thu, 11 Nov 2004 12:30:16 -0500, Dan Sugalski [EMAIL PROTECTED] 
wrote:

Even further, it's necessary for some languages
(Scheme)/paradigms (loop by recursion) that a tailcall is not just 
a
hint but mandatory.
I think that actually doesn't matter. Even in the case where we think 
we can't do a full tail call optimization (because of a continuation 
that's been taken), we can still actually remove the calling frame 
from the call stack--we just can't immediately re-use the register 
frame. That satisfies the Scheme requirement, I would think. You can 
still do unbounded recursion, just that GC may need to run to clean 
up call frames.
I only skimmed the earlier parts of this, but continuations shouldn't 
affect tail calls at all.
You should read the thread then.
If this is from some side effect of call speed optimization (I 
remember seeing some discussion of stack allocation of call frames or 
something, but I don't recall if it was before or after I said we 
weren't optimizing this stuff right now) then we need to rip out those 
optimizations.
Tail call optimization *is* and optimization That's what the whole 
feature is.

And there's a difference between a pure optimization (which increases 
speed at the cost of design), and an architectural feature which 
improves speed.

JEff

Re: Tail calls and continuations

2004-11-11 Thread Jeff Clites

On Nov 11, 2004, at 12:59 PM, Leopold Toetsch wrote:
Jeff Clites [EMAIL PROTECTED] wrote:
I think that actually doesn't matter. Even in the case where we think
we can't do a full tail call optimization (because of a continuation
that's been taken), we can still actually remove the calling frame 
from
the call stack--we just can't immediately re-use the register frame.
As Dan already has outlined a Continuation doesn't have any impact on
tail calls (my argument WRT that was wrong)
I'm confused then. What from the previous discussion in this thread was 
incorrect?

JEff

Re: #line directives from PMC file should now work

2004-11-11 Thread Jeff Clites

On Nov 11, 2004, at 11:12 AM, Leopold Toetsch wrote:
Jeff Clites [EMAIL PROTECTED] wrote:
I'd really like a way to turn them off easily (for the ops as well,
actually). I find them to be counterproductive (for our stuff), since
what gets shown in the debugger isn't stuff you can actually get gdb 
to
evaluate.
It depends. While hacking PMC files its much more useful to have the 
gcc
error line within the pmc files. Debugging ops is sometimes more
intuitive, if you see the corresponding .c file in the debugger, but 
not
always.
Sure--definitely depends on what you're doing. For me, I'm often 
debugging a JIT problem, and when I end up in a normal-op call I need 
to be able to examine the environment (registers, subsequent ops).

Anyway, as $ENV{PARROT_NO_LINE} is already in ops2c.pl, it should be
present in all build tools.
Or maybe also CLI options to the tools, so at least you can hand-modify 
the Makefile to easily turn off some, and leave on other. Having a 
Configure.pl parameter is also more consistent with everything else we 
do, compared to an env. variable.

JEff

Tail calls and continuations

2004-11-10 Thread Jeff Clites

I was thinking: Implementing tail calls seems easy; the normal calling 
sequence of do some setup, then jump just turns into don't bother 
with (most of) the setup, just jump. That is, don't move a new 
register base-pointer into place, etc.

But there's one wiggle: If you've created a continuation previously 
(and it still exists), then any call has to preserve the frame--you 
basically can't do a tail call, with its key implication of the 
current frame vaporizing, or being re-used (depending on how you want 
to describe it).

But that's not too much of a problem, with the following:
1) Consider a tailcall op a recommendation--but have the VM do a 
regular call, if necessary.
2) Regular calls create continuations, so you can't do a tail call out 
of a function, if you've already done a regular call inside that 
function, _unless_ we have an (efficient) way to tell if any such 
continuation was saved. You can figure some of that out at 
compile-time (whether a regular call could have been already made), but 
you'd need runtime checks for other cases, unless you just forego a 
tail call any time you _could_ have already done a regular call (which 
avoids the runtime checks, but allows less actual tail calls).

Do any existing languages have both tail calls and continuations? 
Scheme mandates that anything which looks like a tail call, must be 
optimized as expected (other languages like Lisp merely permit it), but 
I don't know of Scheme having continuations. Scheme cares, of course, 
so that you can have ostensibly unlimited recursion, without running 
out of stack space (or really, memory). Tail calls and continuations 
seem a bit like opposites--one preserves state, the other destroys it.

Just thought I'd send out these thoughts, since the topic was mentioned 
recently.

JEff

Re: Tail calls and continuations

2004-11-10 Thread Jeff Clites

On Nov 10, 2004, at 3:08 PM, Leopold Toetsch wrote:
Jeff Clites [EMAIL PROTECTED] wrote:
But there's one wiggle: If you've created a continuation previously
(and it still exists), then any call has to preserve the frame
Well, that's what continuations are doing. They do preserve the frame,
which they have taken a snapshot of. And they preserve any frames up 
the
call chain. This is needed and implemented and working.
But here's the part I'm thinking about:
Continuations only copy the interpreter context, which contains the 
register bp, but of course the actual contents of the register frame 
are not duplicated. (This is as it should be.) The contents of the 
register frame are effectively preserved because in sub-invoke() we 
allocate a new register frame, and change the bp to point there. 
(That's fine too.)

But, in a tail-call-optimization case, we don't need to call 
new_register_frame() and copy_regs()--ex hypothesi, we can re-use the 
register frame already in-place. That's a big savings--that's the 
optimization I'm after. But of course, we can only do that if we know 
that a real continuation hasn't also captured the context.

But in light of what you say here...
The concept of a return continuation (which recycles itself and the
frame it returned from) doesn't have an influence on tail calls.
Whenever you see a continuation on the surface, it's one that is
preserving the frames and the context. Eventually there isn't even a
RetContinuation object but just a pointer to the calling frame. But
whenever you can see a continuations it's a real one
...it sounds like we have an easy way to tell if a real continuation 
has a claim on the register frame, because creating such a real 
continuation can mark the frame, and we can check for the mark in our 
tail-call implementation, and if it's marked then fall back to 
new_register_frame/copy_regs. (In fact that mark should be a reference 
count, keeping track of how many continuations have a claim on the 
register frame. That way, if a real continuation is created, but GC 
claims it before our tail-call attempt, we can still use our 
optimization.)

But maybe we're already doing something like that, and I missed it.
AFAIK the only problem with tailcalls and tail-recursive functions is 
to
prperly detect them in some optimizer, or to convert to such tailcalls.
Seems like that shouldn't be too bad--we only need to know that there's 
effectively no code between the function call and subsequent return. 
(Though maybe telling call from return will be tricky.)

JEff

Re: Basic compilation example (a + b)?

2004-11-09 Thread Jeff Clites

On Nov 8, 2004, at 11:42 PM, Leopold Toetsch wrote:
Jeff Clites wrote:
new P16, 32  # .PerlInt
add P16, P18, P17
That's what worries me, and what prompted the question. You don't 
know at compile-time that the return type should be a PerlInt.
Yes, I've already stated that this needs fixing and I've proposed a 
scheme how to fix it.
Fine then, but in the scheme you recently mentioned, you were 
explicitly null-ing out the return register. That shouldn't be 
needed--unnecessary overhead. That is, unless we're assuming the 
semantics of references types--that the return register would hold a 
reference object to store into, always. But I don't think we can mix 
the behaviors--unless there's some special flag on a PMC to indicate 
that it's a reference type, but that seems awkward. But since the 
pie-thon output doesn't do that, I'll assume the plan is not to use 
such reference types, if that's meant to be the canonical example.

JEff

Re: No Cpow op with PMC arguments?

2004-11-09 Thread Jeff Clites

On Nov 8, 2004, at 3:08 AM, Leopold Toetsch wrote:
Jeff Clites [EMAIL PROTECTED] wrote:
No. The binary operations in Python are opcodes, as well as in 
Parrot.
And both provide the snytax to override the opcode doing a method 
call,
that's it.

I guess we'll just have to disagree here. I don't see any evidence of
this
UTSL please. The code is even inlined:
,--[ Python/ceval.c ]
|   case BINARY_ADD:
|   w = POP();
|   v = TOP();
|   if (PyInt_CheckExact(v)  PyInt_CheckExact(w)) {
|   /* INLINE: int + int */
|   register long a, b, i;
|   a = PyInt_AS_LONG(v);
|   b = PyInt_AS_LONG(w);
|   i = a + b;
|   if ((i^a)  0  (i^b)  0)
|   goto slow_add;
|   x = PyInt_FromLong(i);
`
But I said, from an API/behavior perspective. How the regular Python 
interpreter is implemented isn't the point--it's how the language acts 
that's important. And I can't think of any user code in which a+b and 
a.__add__(b) act differently, and I think that intentional--an 
explicit languages design decision. The impl. above of BINARY_ADD is 
most likely an optimization--the code for BINARY_MULTIPLY (and 
exponentiation, division, etc.) looks like this:

case BINARY_MULTIPLY:
w = POP();
v = TOP();
x = PyNumber_Multiply(v, w);
Py_DECREF(v);
Py_DECREF(w);
SET_TOP(x);
if (x != NULL) continue;
break;
And again, what about Ruby? If you believe in matching the current 
philosophy of the language, it won't use ops for operators (but rather, 
method calls), and won't do the right thing for objects with 
vtable/MMDs, and no corresponding methods.

Not actually MMD in Python--behavior only depends on the left operand,
it seems.
It's hard to say what Python actually does. It's a mess of nested if's.
Just look at the behavior--that's what's important:
Behavior depends only on left operand:
== class Foo:
... def __add__(a,b): return 7
...
== x = Foo()
== x + x
7
== x + 3
7
== x + b
7
== x + (1,2)
7
All the following are error cases. Error statement varies depending on 
left operand only:

== 3 + b
Traceback (most recent call last):
  File stdin, line 1, in ?
TypeError: unsupported operand type(s) for +: 'int' and 'str'
== 3 + x
Traceback (most recent call last):
  File stdin, line 1, in ?
TypeError: unsupported operand type(s) for +: 'int' and 'instance'
== 3 + (1,2)
Traceback (most recent call last):
  File stdin, line 1, in ?
TypeError: unsupported operand type(s) for +: 'int' and 'tuple'
== b + 3
Traceback (most recent call last):
  File stdin, line 1, in ?
TypeError: cannot concatenate 'str' and 'int' objects
== b + x
Traceback (most recent call last):
  File stdin, line 1, in ?
TypeError: cannot concatenate 'str' and 'instance' objects
== b + (1,2)
Traceback (most recent call last):
  File stdin, line 1, in ?
TypeError: cannot concatenate 'str' and 'tuple' objects
== (1,2) + 3
Traceback (most recent call last):
  File stdin, line 1, in ?
TypeError: can only concatenate tuple (not int) to tuple
== (1,2) + b
Traceback (most recent call last):
  File stdin, line 1, in ?
TypeError: can only concatenate tuple (not str) to tuple
== (1,2) + x
Traceback (most recent call last):
  File stdin, line 1, in ?
TypeError: can only concatenate tuple (not instance) to tuple
   null dest
   dest = l + r
should produce a *new* dest PMC.

Yes, it's a separate issue, but it's pointing out a general design
problem with these ops--their baseline behavior isn't useful.
It *is* useful. If the destination exists, you can use it. The
destination PMC acts as a reference then, changing the value in place.
But in case of Python it's not of much use
Right, changing the value in-place would do the wrong thing, for 
Python. (It depends on whether the arguments to the op are references, 
or the actual values. If they're references, then it can work 
correctly, but then we don't want to be MMD dispatching on the 
(reference) types, but rather on the types of what they're pointing 
to.)

except for the inplace (augmented) operations.
Yes, but that ends up being just for the two-argument forms (and even 
those don't work for Python--a += 3 doesn't really update in-place in 
Python, but returns a new instance). In-place operators tend to only 
take one argument on the right side, so the p_p_p forms aren't useful 
for this.

..., but for
PMCs this could compile like a = b.plus(c).

but you don't need add_p_p_p, just method invocation.
Why should we do method invocation with all it's overhead, if for the
normal case a plain function call we'll do it?
Ah, that's the key. Method invocation

Re: calling conventions, tracebacks, and register allocator

2004-11-09 Thread Jeff Clites

On Nov 8, 2004, at 11:15 AM, Matt Fowles wrote:
Dan~
On Mon, 8 Nov 2004 13:45:08 -0500, Dan Sugalski [EMAIL PROTECTED] wrote:
The calling conventions and code surrounding them will *not* change
now. When all the sub stuff, and the things that depend on it, are
fully specified and implemented... *then* we can consider changes.
Until then, things stand the way they are.
I missunderstood.  I though you were saying that what is currently in
is final and will *never* be changed.  Thanks for the clarification.
Nevertheless, this is a legitimate topic for discussion, and the issues 
are fresh in people's minds. That's independent of any impediments that 
might block implementing changes at the current time.

JEff

Register allocation/spilling and register volitility

2004-11-08 Thread Jeff Clites

From other threads:
Now we are placing arguments or return values in registers according 
to PDD03 and the other end has immediately access to the placed 
values, because the register file is in the interpreter.

With the indirect addressing of the register frame, this argument 
passing is probably the most costly part of function calls and 
returns, because we have to copy the values.
and
Anyway, if you can pop both register frames -and- context structures, 
you won't run GC too often, and everything will nicely fit into the 
cache.
I thought about that too, but I'd rather have registers adjacent, so 
that the caller can place function arguments directly into the callers 
in arguments.

OTOH it doesn't really matter, if the context structure is in the 
frame too. We'd just need to skip that gap. REG_INT(64) or I64 is as 
valid as I0 or I4, as long as it's assured, that it's exactly 
addressing the incoming argument area of the called function.
A problem with this is that you can't be sure that you can actually 
have the next frame of registers adjacent to the current frame--they 
might already be taken. Imagine A calls B, then B creates a 
continuation and stores it in a global, then returns. If A makes 
another function call, it can't just use the adjacent register 
frame--it's still taken. (I'm referring here to the sliding register 
window idea of course.) But this is reminding me of another idea I've 
had.

I've been thinking for a while about another idea to decrease some of 
the copying which we are now doing for function calls, as well as the 
allocation of register sets during subroutine calls, and which would as 
a side-effect allow us to reduce the need for register spilling.

First, a code snippet. Consider the following C code:
int a(int x) { return b(x); }
int b(int x) { return c(x + 1); }
int c(int x) { return x + 7; }
As compiled on the PPC, this code is very compact--each function is 
implemented by just one or two instructions, and only one register is 
used. There are two key factors here: (1) no register preservation is 
needed, and (2) the call from a() to b() is just a branch, since the 
call to a() has already loaded the registers with the correct values 
for the call to b().

By my thinking, register usage falls into three basic categories:
1) Registers used for parameter passing and returns values for function 
calls.

2) Registers used to hold values which need to be preserved across 
function calls.

3) Registers used to hold values which do not need to be preserved 
across function calls.

Really, (1) and (3) are similar--in either case, you have registers 
whose values are allowed to change across function calls. (For register 
which hold return values that's obvious, and for those used to pass 
parameters, it seems fair that these be expected to change across a 
function call.) So we really have two cases--volatile and non-volatile 
register usage.

In the PPC ABI, half the registers are treated as volatile, and the 
other half as non-volatile. For the PPC, this corresponds to 
caller-preserved v. callee-preserved. Because of continuations[1], 
parrot can't have callee-preserved registers, but it could still have 
volatile v. non-volatile registers. Here's what I'm thinking:

Keep the old-scheme registers inside the interpreter structure, *as 
well as* the new indirect registers. Call the registers in the 
interpreter I0 to I31, and the indirect registers I32 to whatever.[2] 
I'll call these the direct and indirect registers. By their very 
nature, the direct registers are volatile, and the indirect registers 
are non-volatile. What does this buy us? The following:

1) Parameter passing and return occur via the established calling 
conventions, in the volatile registers. No extra copying is needed upon 
function call or return--the volatile registers are physically the same 
for the caller and the callee.

2) Temporary calculations can use the volatile registers. Values 
which need preserving can use the non-volatile registers directly.

3) In functions which need no non-volatile registers, there's no need 
to allocate the indirect registers at all.

4) Cases like my example code above can compile just as compactly as 
the PPC case. With the volatile Parrot registers mapped to volatile CPU 
registers (in the PPC case at least), this would be highly efficient: 
you'd end up with asm similar to what you have in the C case above, and 
you'd not have a need to save the CPU registers back to Parrot 
registers (other than those used in the calling conventions) for Parrot 
function calls out of JIT code.

5) Because this opens things up to more than 32 registers, code could 
use as many registers as needed. You'd never have register spilling 
per-se in order to accommodate local variables, though you'd still want 
a smart register allocator which minimized the number of registers 
needed (for memory locality as well as minimizing allocation). But a 
very simple

Re: No Cpow op with PMC arguments?

2004-11-08 Thread Jeff Clites

On Nov 8, 2004, at 12:50 AM, Leopold Toetsch wrote:
Jeff Clites [EMAIL PROTECTED] wrote:
On Nov 5, 2004, at 9:40 AM, Leopold Toetsch wrote:

In Python, semantically you know that you'll end up doing a method 
call
(or, behaving as though you had), so it's very roundabout to do a
method call by using an op which you know will fall back to doing a
method call. Clearer just to do the method call.
No. The binary operations in Python are opcodes, as well as in Parrot.
And both provide the snytax to override the opcode doing a method call,
that's it.
I guess we'll just have to disagree here. I don't see any evidence of 
this from an API/behavior perspective in Python. I think the existence 
of a separate Python opcode is just a holdover from a time when these 
infix operators only existed for built-in types (just inferring this). 
I can't find any case where Python would act differently, if these were 
compiled directly to method calls. And for Ruby, it's *explicit* that 
these operators are just method calls. And languages like Java don't 
have these operators at all, for objects.

The only thing that's special is that there are certain built-in
classes, and some of them implement __pow__, but that's not really
anything special about __pow__.
Yes. And these certain *builtin* classes have MMD functions for binary
opcodes.
Not actually MMD in Python--behavior only depends on the left operand, 
it seems.

And even the ops we currently have are broken semantically. Consider 
a
= b + c in Python. This can't compile to add_p_p_p, because for that
op to work, you have to already have an existing object in the first P
register specified. But in Python, a is going to hold the result of
b + c, which in general will be a new object and could be of any
type, and has nothing to do with what's already in a.
That's a totally different thing and we have to address that. I have
already proposed that the sequence:
   null dest
   dest = l + r
should produce a *new* dest PMC. That's quite simple. We just have to
pass the address of the dest PMC pointer instead of the PMC to all such
operations. Warnocked.
Yes, it's a separate issue, but it's pointing out a general design 
problem with these ops--their baseline behavior isn't useful. The 
result of l + r will not depend on what's to the left of the = by 
HLL semantics, for any case I can think of. (Perl cares about context, 
but that's not really the same thing.)

... I think
we should create PMC-based ops only if one of the following criteria
are met: (a) there's no other reasonable way to provide some needed
functionality,
So, this is already a perfect reason to have these opcodes with PMCs.
  a = b + c
behaves differently, if b and c are plain (small) integers or
overflowing integers or complex numbers and so on. You can't provide
this functionality w/o PMCs.
I don't understand this example. Certainly you need PMCs, but if b and 
c are I or N types, of course you'd use add_i_i_i or add_n_n_n, but for 
PMCs this could compile like a = b.plus(c). Of course you have to 
know I v. N v. P at compile-time, and there's no reason that I/N v. P 
pasm must look identical, for similar-looking HLL code. You need PMCs, 
but you don't need add_p_p_p, just method invocation.

For complex numbers and such, I'd want to be able to define classes for 
them in bytecode. For that to work, ops would eventually have to 
resolve to method calls anyway. (You can't create a new PMC and 
vtable/MMDs in bytecode.) Why not skip the middle-man?

JEff

Re: Register allocation/spilling and register volitility

2004-11-08 Thread Jeff Clites

On Nov 8, 2004, at 1:34 AM, Leopold Toetsch wrote:
Jeff Clites [EMAIL PROTECTED] wrote:
OTOH it doesn't really matter, if the context structure is in the
frame too. We'd just need to skip that gap. REG_INT(64) or I64 is as
valid as I0 or I4, as long as it's assured, that it's exactly
addressing the incoming argument area of the called function.

A problem with this is that you can't be sure that you can actually
have the next frame of registers adjacent to the current frame--they
might already be taken. Imagine A calls B, then B creates a
continuation and stores it in a global, then returns.
Please read the proposal summary by Miroslav Silovic, keyword 
watermark.
If frames aren't adjacent, normal argument copying can be done anyway.
This would seem to require the same types of runtime checks that you 
are objecting to below, unless user code is expected to explicitly 
check whether it's supposed to be assigning to I3 v I(3 + 32x?). And 
that still seems to require copying, in my case of a function a() which 
calls a function b() with the exact same arguments.

Keep the old-scheme registers inside the interpreter structure, *as
well as* the new indirect registers. Call the registers in the
interpreter I0 to I31, and the indirect registers I32 to whatever.
That would need two different addressing modes depending on the 
register
number. That'll lead to considerable code bloat: we'd have all possible
permutations for direct/indirect registers. Doing it at runtime only
would be a serious slowdown.
I discussed that in item (1) under Further notes.
It's not needed. I've a better scheme in mind, which addressess
efficieny as well as argument passing.
And spilling?
JEff

Re: Basic compilation example (a + b)?

2004-11-08 Thread Jeff Clites

On Nov 8, 2004, at 2:47 AM, Leopold Toetsch wrote:
Jeff Clites [EMAIL PROTECTED] wrote:
What pasm is supposed to correspond to this snippet of Python code
(assume this is inside a function, so these can be considered to be
local variables):

a = 7
b = 12
c = a + b
Run it through pie-thon. It should produce some reasonable code. For
leaf-functions (w/o introspection i.e. calls to locals()), the lexical
handling would get dropped. And a better translator would use lexical
opcodes by index and not by name.
It doesn't do-the-right-thing in the cases I'm interested in:
% cat pythonClass.py
class A:
def __add__(x,y) : return boo
x = A()
y = x + 3
print y
% python pythonClass.py
boo
% perl pie-thon.pl pythonClass.py | ./parrot --python -
Can't find method '__get_number' for object 'py::A'
It looks like it's trying to get the float-value of x, rather than 
calling x's __add__ method.

But the part I was really wondering about is the a + b. This is what 
pie-thon.pl produces for that (you can just run it on the code fragment 
a + b--it doesn't matter the context):

$P1 = new PerlInt   # BINARY_ADD
$P1 = a + b
corresponding pasm:
find_global P18, a
find_global P17, b
new P16, 32  # .PerlInt
add P16, P18, P17
That's what worries me, and what prompted the question. You don't know 
at compile-time that the return type should be a PerlInt. It could be 
anything--it's really up to a. This is regarding my concern that the 
p_p_p ops aren't very useful (in Python at least), and I can't figure 
out what we should be using instead.

So I'm still left wondering, how *should* this compile?
JEff

Re: No Cpow op with PMC arguments?

2004-11-07 Thread Jeff Clites

On Nov 5, 2004, at 10:03 AM, Sam Ruby wrote:
Jeff Clites wrote:
a) As Sam says, in Python y**z is just shorthand for 
y.__pow__(z)--they will compile down to exactly the same thing 
(required for Python to behave correctly). Since __pow__ isn't 
special, we don't need anything to support it that we wouldn't need 
for any other arbitrary method, say y.elbowCanOpener(z).
[snip]
So I don't think an op really gives us what we want. (Specifically, 
we could define a pow_p_p_p op, but then Python wouldn't use it for 
the case Sam brought up.) I'd apply the same argument to many of the 
other p_p_p ops that we have--they don't gives us what we need at the 
HLL level (though they may still be necessary for other uses).
It is my intent that Python *would* use this method.  What is 
important isn't that y**z actually call y.__pow__(z), but that the two 
have the same effect.

Let's take a related example: y+z.  I could make sure that each are 
PMCs
[Not really a need for a check--everything will be a PMC, right? Or if 
not, you need to know before emitting the op anyway. Side issue, 
though.]

and then find the __add__ attribute, retrieve it, and then use it as a 
basis for a subroutine call, as the semantics of Python would seem to 
require.  Or I could simply emit the add op.

How do I make these the same?  I have a common base class for all 
python objects which defines an __add__ method thus:

 METHOD PMC* __add__(PMC *value) {
 PMC * ret = pmc_new(INTERP, dynclass_PyObject);
 mmd_dispatch_v_ppp(INTERP, SELF, value, ret, MMD_ADD);
 return ret;
 }
... so, people who invoke the __add__ method explicitly get the same 
function done, albeit at a marginally higher cost.
There are three problems I see with this:
1) If you have 2 PerlInts in Python code, then a + b will work, but 
a.__add__(b) won't, since PerlInts won't inherit from your Python 
base class. To my mind, that breaks an invariant of Python.

2) If your __add__ method above were somehow in place even for 
PerlInts, it would produce a PyObject as its result, instead of a 
PerlInt, which is what I would have expected. That's a basic problem 
with our p_p_p ops currently--the return type can't be decided by the 
implementation of the MMD method which is called.

and...
Now to complete this, what I plan to do is to also implement all 
MMD/vtable operations at the PyObject level and have them call the 
corresponding Python methods.  This will enable user level __add__ 
methods to be called.

Prior to calling the method, there will need to be a check to ensure 
that the method to be called was, in fact, overridden.  If not, a 
type_error exception will be thrown.
3) As described by Leo, the op would call the MMD dispatcher, which 
would ultimately do the method call, then your method above would call 
the MMD dispatcher, so you'd get an infinite loop, right? And if you 
avoid this by adding some check (as you mentioned) to make sure you 
only call __add__ if it was overridden (really, implemented at the user 
level), then your default implementation above will never be called, 
right?

If instead you just compile infix operators as method calls, then all 
of those problems go away, and it's much simpler conceptually and in 
terms of implementation.

And for Ruby (the language), it's explicit that infix operators are 
just an alternate syntax for method calls, so compiling them as ops is 
even more semantically problematic there.

JEff

Re: No Cpow op with PMC arguments?

2004-11-07 Thread Jeff Clites

On Nov 4, 2004, at 5:24 AM, Sam Ruby wrote:
[Referring to infix operators as an alternate syntax for a named method 
call]

What's the downside of compiling this code in this way?  If you are a 
Python programmer and all the objects that you are dealing with were 
created by Python code, then not much.  However, if somebody wanted to 
create a language independent complex number implementation, then it 
wouldn't exactly be obvious to a Python programmer how one would raise 
such a complex number to a given power.  Either the authors of the 
complex PMC would have to research and mimic the signatures of all the 
popular languages, or they would have to provide a fallback method 
that is accessible to all and educate people to use it.
Yes, and I think that compiling using ops makes things worse, because 
of languages such as Java which don't have operator overloading, so 
you'd have to make all functionality available as method calls anyway, 
so why bother with the ops? Methods are much more flexible, and don't 
bloat the VM.

Ultimately, Parrot will need something akin to .Net's concept of a 
Common Language Specification which defines a set of rules for 
designing code to interoperate.  A description of .Net's CLS rules can 
be found in sections 7 and 11 (a total of six pages) in the CLI 
Partition I - Architecture document[1].
I think that ultimately, code will break down into two categories:
1) Code designed with multiple langauges in mind.
2) Code designed with only one language in mind.
Code in case (2) will be awkward to use in other languages, but it 
should definitely be possible to use it somehow. For case (1), we need 
to make this easy for library authors to do.

In terms of method naming, we may want to do something automatic (if 
you name your method such and such, it will appear to Python named 
this, and Ruby named that, and Perl named...), or it may be better to 
provide an explicit way to create language-specific method aliases. 
Some cases are simple (__mul__ in Python and * in Ruby and 
multiply in Java should all map to the same method for mathematical 
objects, probably), but others are more subtle (adding two array-like 
things means something different in different languages, potentially: 
append v. componentwise add v. componentwise add only if they have the 
same length). Doing something automatic saves a bunch of redundant 
work in the former case, but could cause problems in the latter.[1]  
And whatever the approach, it should be possible (even easy) for 
someone to take a library designed for only one language, and provide 
some cross-language mapping info and turn it into a nice cross-language 
library, without necessarily having to dig into the source code. (I'm 
thinking here of being able to specify the mapping in a document 
separate from the source code.) This is sort of treating method names 
as part of the interface, and not the implementation.

[1] Automatic mapping could also cause problems in the case where I 
design a library and intend it to be cross-language, but where I want 
the API to look identical across langauges--I don't want to 
accidentally trip over a method name which happens got get translated 
for me, when I don't intend for that to happen. (For instance, I might 
name a method pow which controls some power level, not intending 
anything about exponentiation.) But, this wouldn't be a problem if the 
automatic approach were to let me somehow register a method as 
filling a certain role (my method 'blah' should be called to perform 
numeric addition), rather than inferring to from the method name. 
Probably a tricky balance between convenience and control/flexibility.

JEff

Re: Streams and Filters (Ticket #31921)

2004-11-07 Thread Jeff Clites

On Nov 7, 2004, at 2:25 AM, Jens Rieks wrote:
On Sunday 07 November 2004 09:48, Leopold Toetsch wrote:
* where exactly is the mismatch coming from?
Unix uses \n to indicate end-of-line, windows uses \r\n. The 
problem is,
that the perlhist.txt file is checked in as a text file. I'll 
recommit it
as a binary file in the hope that it fixes the problem.

The root of the problem is the different line ending, I have no idea 
how
parrot can deal with it, or if it should deal with it at all.
Ah, just to amplify what you've said, it sounds like the problem is 
that CVS on Windows is translating the line endings upon checkout, 
since it's a text file.

It sounds like the test is fine, just a bit tricky to get the 
perlhist.txt file in place unmodified. An alternative would be to 
create the test file on-the-fly, but flagging it as binary for CVS is 
probably simpler.

(In my opinion, the line-ending issue isn't a problem for Parrot in 
terms of this test--it's legitimate to expect that you can process a 
particular file, not matter what platform you happen to be on, and no 
matter if the file happens to have been created on another platform.)

JEff

Basic compilation example (a + b)?

2004-11-07 Thread Jeff Clites

What pasm is supposed to correspond to this snippet of Python code 
(assume this is inside a function, so these can be considered to be 
local variables):

a = 7
b = 12
c = a + b
Just a basic compilation question, brought to mind by a recent thread, 
because the answer isn't obvious to me.

JEff

Re: No Cpow op with PMC arguments?

2004-11-05 Thread Jeff Clites

On Nov 4, 2004, at 5:24 AM, Sam Ruby wrote:
From a Python or Ruby language perspective, infix operators are not 
fundamental operations associated with specific types, they are 
syntactic sugar for method calls.

A the moment, I'm compiling x=y**z into:
x = y.__pow__(z)
There is nothing reserved about the name __pow__.  Any class can 
define a method by this name, and such methods can accept arguments of 
any type, and return objects of any type.  They can be called 
explicitly, or via the infix syntax.
Of course--I should have realized that. I knew that's how Python 
handles +, etc.--don't know why I assumed exponentiation would be 
different.

So scratch what I said. I should have said this:
Languages tend to take one of the following two approaches when it 
comes to generalizing operations on basic types (numbers, strings) into 
operations on object types.

1) Generalization via conversion to a basic type. As an example, some 
languages generalize numeric addition, obj1 + obj2, as being 
syntactic sugar for something like, obj1.floatValue() + 
obj2.floatValue(). (That is, you do a basic operation on non-basic 
types by converting them into the relevant basic types, then performing 
the operation on those.) This is how Perl5 handles string 
concatenation--you string-concatenate two objects by string-ifying 
them, and concatenating those strings.

2) Generalization by method call. Some languages treat obj1 + obj2 as 
syntactic sugar for something like obj1.add(obj2). That is, you 
generalize in the obvious o.o. way. This is how Python (and C++) 
treats infix operators.

Different languages choose (1) v. (2), and can certainly mix-and-match 
(take one approach for some operations, another for others). Another 
way a language may mix-and-match is to do (2) if such a method is 
defined on the object, and fall back to (1) if it isn't.

Now from a Parrot perspective: Case (1) is already handled by 
Parrot--it's just an exercise in code generation by a compiler. For 
case (2), I think these operations correspond to method calls on 
objects (in the Parrot sense--the stuff in src/objects.c), not MMD or 
vtable operations accessed via custom ops. Here are a couple of 
examples why:

a) As Sam says, in Python y**z is just shorthand for 
y.__pow__(z)--they will compile down to exactly the same thing 
(required for Python to behave correctly). Since __pow__ isn't 
special, we don't need anything to support it that we wouldn't need 
for any other arbitrary method, say y.elbowCanOpener(z).

b) I can define arbitrary Python classes with arbitrary implementations 
of __pow__, and change those implementations on-the-fly, on a per-class 
or per-instance basis. These aren't new PMC-classes, and I don't think 
that the op-plus-MMD approach gives us the ability to handle that.

Summary: Both cases (1) and (2) are syntactic sugar, and case (1) is 
sugar for casting/conversion, and case (2) is sugar for object-method 
calls.

So I don't think an op really gives us what we want. (Specifically, we 
could define a pow_p_p_p op, but then Python wouldn't use it for the 
case Sam brought up.) I'd apply the same argument to many of the other 
p_p_p ops that we have--they don't gives us what we need at the HLL 
level (though they may still be necessary for other uses).

JEff

Re: No Cpow op with PMC arguments?

2004-11-05 Thread Jeff Clites

On Nov 4, 2004, at 10:30 PM, Brent 'Dax' Royal-Gordon wrote:
On Thu, 4 Nov 2004 21:46:19 -0800, Jeff Clites [EMAIL PROTECTED] wrote:
On Nov 4, 2004, at 8:29 PM, Brent 'Dax' Royal-Gordon wrote:
This is true.  But how do you define a number?  Do you include
floating-point?  Fixed-point?  Bignum?  Bigrat?  Complex?  Surreal?
Matrix?  N registers don't even begin to encompass all the numbers
out there.
Floating point, and possibly integer. Those are the numeric primitives
of processors. Other aggregate mathematical types are always defined 
in
terms of those (in a computing context), one way or another.
Yes, but your decomposition (N2=P2; N3=P3; N1=N2+N3; P1=N1) doesn't
take anything but the primitives into account.
Yes--see my subsequent message (just sent a moment ago), as that 
decomposition isn't what I meant. But you asked how I define a 
number--that's what I was answering above, from a computing 
perspective, not a more general question.

My point there, when I said one way or another, was that for example, 
even in mathematics, you define addition, multiplication, etc. on 
complex numbers in terms of such operations over the real numbers. 
(That is, you define them in terms of operations on a pair of real 
numbers.) Based on the design of processors, and of Parrot, there's a 
good performance reason to define basic ops on integers and floating 
point numbers--namely, they'll often JIT down to single instructions. 
Such ops on PMCs won't have that performance benefit--they'll still 
involve table lookups and multiple function calls to execute. So a 
mechanism other than an op may be more appropriate there. There are a 
myriad of interesting mathematical types and operations, but they don't 
need dedicated ops to support them. (And even the seemingly obvious 
cases aren't: There are at least three different operations on vectors 
which could be called multiplication. I don't think the mul op 
should be used for any of them.)

JEff

Re: No Cpow op with PMC arguments?

2004-11-04 Thread Jeff Clites

On Nov 4, 2004, at 8:29 PM, Brent 'Dax' Royal-Gordon wrote:
Jeff Clites [EMAIL PROTECTED] wrote:
I.e., PMCs don't inherently exponentiate--numbers do, and you can
exponentiate PMCs by numberizing them, exponentiating, and creating a
PMC with the result.
This is true.  But how do you define a number?  Do you include
floating-point?  Fixed-point?  Bignum?  Bigrat?  Complex?  Surreal?
Matrix?  N registers don't even begin to encompass all the numbers
out there.
Floating point, and possibly integer. Those are the numeric primitives 
of processors. Other aggregate mathematical types are always defined in 
terms of those (in a computing context), one way or another.

JEff

Re: more vtables

2004-11-02 Thread Jeff Clites

On Nov 2, 2004, at 12:41 AM, Leopold Toetsch wrote:
When we have objects with finalizers, we have to run the finalizers in
order from most derived down the parent chain.
Maybe, but not necessarily. The case of loops means that we cannot 
always do this cleanly (no top of the chain), and the fact that 
finalizers may modify the topology of the object graph also makes this 
a bit ill-defined. There are several possible approaches, ranging from 
very conservative to very aggresive. Two real-world approaches already 
used are: make no guarantees about finalization order--only guarantee 
that a finalizer will be called at some point after an object becomes 
unreferenced, with no guarantees about ordering, and no guarantees that 
it will not be called if the object has already be re-rooted (Java); 
or, do graph-ordered finalization, and never call the finalizers of 
objects which are in loops (the Boehm collector does something along 
these lines, I believe).

So as you state, we've to
defer destruction, store these objects with finalizers somewhere, sort
them, run user code to finalize objects and so on.
Other objects are impacted as well--any object reachable from a 
finalizable object must wait for reclamation until finalizers have 
fired.

Doing that all in the destroy vtable is tedious especially, when user
code is involved too. The finalize vtable is overridable, the destroy
isn't. A split makes this functionality much cleaner.
I think what's bothering me about this is that I think part of the idea 
is to move closing of filehandles and such into destroy()--right? That 
sounds good--don't get rid of this resource until it's certain nothing 
will try to use it (eg, things which want to log a message during their 
finalization would use a filehandle). But I don't think this really 
solves it. For one thing, there will be similar-in-functionality code 
coming up in HLL's (doing some orderly shutdown of connections, which 
involves more than just closing the filehandle), and this will have to 
happen in a finalizer (since that's the only HLL option), which will 
feel too early in some cases. And even in a destroy(), you might want 
to do some I/O, and whether this will work will depend on whether the 
filehandles have been closed yet, and the same ordering concerns 
re-emerge.

So I think the split would help, in a sense, but only be a partial 
solution to an unsolvable problem, and therefore maybe is not very 
elegant. So I'm a bit on the fence.

(And in my parlance, just to be clear: finalizers are called after an 
object becomes un-referenced, and could re-root an object; 
destructors are called when the object is actually having its memory 
re-claimed--when it is going away for sure. No user code can be called 
from a destructor in the Parrot case, because of the possibility of 
re-rooting, and because of other constraints which may exist at 
destroy-time.)

JEff

Re: case mangling and binary strings

2004-11-02 Thread Jeff Clites

On Nov 2, 2004, at 10:46 AM, Dan Sugalski wrote:
At 1:42 PM -0500 11/2/04, Sam Ruby wrote:
I don't care if Parrot uses ICU on any platform.
I do care that Parrot supports utf-8 on every platform.
Ah, OK. Yes, we will support all the unicode encodings, as well as the 
unicode character set, on all platforms.
I'll point out, for the record, that it make no sense to have strings 
holding binary data--the observation that fundamental string API make 
no sense for such strings is an indication that this is an attempt to 
use the wrong data type for the wrong purpose. Just thrown an 
exception is not a magic cure--it's an indication that things are 
being modeled incorrectly. It shouldn't be necessary.

In terms of ICU specifically, I don't much care what underlying library 
we use. That said, if we want to support all the unicode encodings, 
we're going to have to provide all of the relevant functionality that 
ICU gives us. Trying to re-implement this, separately, from scratch, is 
a mistake--a waste of resources. A better approach (for Parrot, and for 
the computing community in general) would be to put our efforts toward 
getting ICU working on the platforms on which it is having problems. Or 
start with some other well-established library--but ICU seems to be the 
best around.

JEff

Re: Performance Statistics (and Graphs!)

2004-11-02 Thread Jeff Clites

On Nov 2, 2004, at 7:10 PM, Matt Diephouse wrote:
Joshua Gatcomb and I have been working a little under a week to set up
an automated build system for parrot that tracks performance (with
help from Dan's box). We have collected benchmark data for regular and
optimized builds with and without JIT from June 1st through October.
With some help from Perl and GD, we've up several pages of graphs:
  http://www.sidhe.org/~timeparrot/graphs/
Both the build process and the creation of the graphs will (hopefully)
be run daily (automated). There are plans to do some other things with
the data as well (some notification scripts).
Very cool. A useful thing to do would be to preserve all of the builds, 
so that newly-created benchmarks could be back-filled. (That way, if we 
realize that we've not been effectively testing something, we can 
devise an appropriate benchmark, and evaluate the consequences of past 
design changes against it.)

JEff

Re: Are we done with big changes?

2004-11-01 Thread Jeff Clites

On Nov 1, 2004, at 6:14 AM, Dan Sugalski wrote:
Because I need to get strings working right, so I'm going to be 
implementing the encoding/charset library stuff, which is going to 
cause some major disruptions.
Please tag cvs before checking this in.
Thanks,
JEff

Re: Parrot on AMD64

2004-10-30 Thread Jeff Clites

On Oct 29, 2004, at 11:29 PM, Brent 'Dax' Royal-Gordon wrote:
I recently got my hands on money for an upgrade, and got an AMD64 and
motherboard and installed them.  I'm still using 32-bit Windows, but
I've also installed a 64-bit Gentoo on some previously unpartitioned
space.
...
Failed Test Stat Wstat Total Fail  Failed  List of Failed
--- 

t/library/streams.t1   256211   4.76%  11
3 tests and 53 subtests skipped.
Failed 1/124 test scripts, 99.19% okay. 1/1959 subtests failed, 99.95%  
okay.

Any thoughts on this?
Known failure currently for all platforms--see recent message [CVS ci]  
indirect register frame 9 - go. So it looks like you are passing all  
expected tests!

JEff

Re: Mostly a Perl task for the interested

2004-10-30 Thread Jeff Clites

On Oct 30, 2004, at 12:58 AM, Leopold Toetsch wrote:
Nicholas Clark [EMAIL PROTECTED] wrote:
On Fri, Oct 29, 2004 at 05:47:55PM +0200, Leopold Toetsch wrote:

* The created C code could benefit from #line directives to track 
where
  C code came from the input .pmc file, so that compiler errors are 
reported
  for the original .pmc file. Perl 5's xsubpp does this well, using 
#line
  directives to switch between foo.c and foo.xs, depending on whether 
that
  section of code was human written, or autogenerated. It makes 
things much
  easier while developing.
Yep, thanks. The hooks are there, as well as a command-line option to
turn line numbers off, which is sometimes useful.
Anyway most line number stuff is already inside, but it's broken and
needs fixing.
FYI I'm fiddling with classes/pmc2c2.pl, in connection with your 
ccache-ish feature request, but I'm not (much) touching Pmc2c.pm, so 
I shouldn't conflict with anyone working there.

JEff

[PATCH] PPC JIT failure for t/pmc/threads_8.pasm

2004-10-29 Thread Jeff Clites

I was getting a failure under JIT on PPC for t/pmc/threads_8.pasm, and 
the problem turned out to be that emitting a restart op takes 26 
instructions, or 104 bytes, and we were hitting the grow-the-arena 
logic just shy of what would have triggered a resize, then running off 
the end.

The below patch fixes this; really that magic number (200, now) needs 
to be bigger than the amount of space we'd ever need to emit the JIT 
code for a single op (plus saving registers and such), but with the 
possibility of dynamically loadable op libs (with JIT?), it's hard to 
say what number is guaranteed to be large enough. Or, we can pick a 
reasonable, largish number that works for the built-in ops (empirically 
determined, as now), and document that loadable JITted ops which could 
take more than this, need to make sure to grow the arena as necessary. 
(And we could provide a utility function to make this easy.)

JEff
Index: src/jit.c
===
RCS file: /cvs/public/parrot/src/jit.c,v
retrieving revision 1.95
diff -u -b -r1.95 jit.c
--- src/jit.c   25 Oct 2004 10:24:14 -  1.95
+++ src/jit.c   29 Oct 2004 07:50:09 -
@@ -1395,7 +1395,7 @@
 while (cur_op = cur_section-end) {
 /* Grow the arena early */
 if (jit_info-arena.size 
-(jit_info-arena.op_map[jit_info-op_i].offset + 
100)) {
+(jit_info-arena.op_map[jit_info-op_i].offset + 
200)) {
 #if REQUIRES_CONSTANT_POOL
 Parrot_jit_extend_arena(jit_info);
 #else

AIX PPC JIT warning

2004-10-29 Thread Jeff Clites

Recently config/gen/platform/darwin/asm.s was added, containing 
Parrot_ppc_jit_restore_nonvolatile_registers(). Corresponding code also 
needs to be added to config/gen/platform/aix/asm.s -- Parrot should 
fail to link on AIX currently, without this. I didn't try to update the 
AIX asm.s myself, since I wasn't confident that I could do this 
correctly without having a way to test.

So, someone with AIX asm expertise, please take a look.
Thanks,
JEff

Re: Perl 6 and byte code

2004-10-27 Thread Jeff Clites

On Oct 27, 2004, at 6:24 AM, Gisle Aas wrote:
How about the code JITed from the bytecodes. Will it be shared?
The JITed code can't be shared directly--in it's current form, it 
(intentionally) includes absolute addresses which wouldn't be valid for 
other processes.

But the exec core allows (or, will allow) bytecode to be compiled into 
a native executable (or presumably, library), so that would allow the 
advantages of native, memory-shared code.

JEff

[PATCH] Re: JIT and platforms warning

2004-10-23 Thread Jeff Clites

On Oct 22, 2004, at 3:57 AM, Leopold Toetsch wrote:
Jeff Clites wrote:
On Oct 22, 2004, at 1:01 AM, Leopold Toetsch wrote:
[JIT changes]
I just finished tracking down the source of a couple of JIT test 
failures on PPC--due to recent changes but only indirectly related, 
and pointing out things which needed fixing anyway (float register 
preservation issues). I'll send it in tomorrow after I've had a 
chance to clean it up and add some comments.
Please make sure to get a recent CVS copy and try to prepare the patch 
during my night ;)
Of course!
I've changed the PPC float register allocation yesterday, because it 
did look bogus - i.e. the non-volatile FPRs were allocated but not 
saved. That should be fixed.
Yep, that was the core of the issue. There's no free lunch--if we use 
the nonvolatile registers, we need to preserve/restore them in 
begin/end, but if we use the volatile registers, we need to preserve 
them across function calls (incl. normal op calls). So I added code to 
do the appropriate save/restore, and use the non-volatile registers for 
mapping--that should be less asm than what we'd have to do to use the 
volatile registers. (The surprising thing was that we only got 2 
failures when using the volatile registers--I'll look into creating 
some tests that would detect problems with register preservation.)

The other tricky part was that saving/restoring the FP registers is one 
instruction per saved register, so saving all 18 was exceeding the asm 
size we allocate in src/jit.c (in some cases), since we emit Parrot_end 
for all restart ops. The fix for this was to pull this asm out into a 
utility routine, and just call that from the asm. (This is only done 
for restoring the registers so far--I should do it for the preservation 
step too, but right now that's just inline.)

The attached patch also contains some other small improvements I'd been 
working on, and a few more jitted ops to demonstrate calling a C 
function from a jitted op.

While you're investigating PPC JIT: JIT debugging via stabs doesn't 
work at all here. I get around the gdb message (missing data segment) 
by inserting .data\n.text\n in the stabs file, but that's all. After 
this change and regenerating file.o I can load it with 
add-symbol-file file.o without any complaint, but the access to the 
memory region the jitted code occupies is not allowed, disassemble 
doesn't work ...
I'll take a look and see if I can figure it out. I remember gdb being 
uncooperative about disassembling, frustratingly, and I've moved to the 
habit of using something like x/300i jit_code as a workaround. 
Clearly it can access the memory region, so it seems like a gdb bug.

See attached the patch, plus the new asm.s file.
JEff
config/gen/platform/darwin/asm.s:


asm.s
Description: application/applefile


asm.s
Description: application/text


ppc-jit-preserve-fp.patch
Description: application/text

Re: [PATCH] Re: JIT and platforms warning

2004-10-23 Thread Jeff Clites

On Oct 23, 2004, at 4:20 AM, Leopold Toetsch wrote:
Jeff Clites [EMAIL PROTECTED] wrote:
See attached the patch, plus the new asm.s file.
Doesn't run, segfaults on even mops.pasm - please check.
I can't reproduce that here; parrot -j works for me with 
examples/{benchmarks,assembly}/mops.pasm, and all 'make testj' tests 
pass. (They were passing before, and I updated to pick up changes since 
I sent the patch, and still all passes.) I also tried building against 
a system ICU (was building against the parrot-supplied version), in 
case there were issues with calling into shared lib code (long shot), 
and no difference.

Below is myconfig--I'm not doing any special configuration (just 
running 'perl Configure.pl', and not building optimized). Do you have 
any uncommitted local changes which might be involved? I'm up-to-date 
from CVS, and have no uncommitted changes except for those in the 
patch.

JEff
Summary of my parrot 0.1.1 configuration:
  configdate='Sat Oct 23 13:06:30 2004'
  Platform:
osname=darwin, archname=darwin
jitcapable=1, jitarchname=ppc-darwin,
jitosname=DARWIN, jitcpuarch=ppc
execcapable=1
perl=perl
  Compiler:
cc='cc', ccflags='-g -pipe -pipe -fno-common -no-cpp-precomp 
-DHAS_TELLDIR_PROTOTYPE  -pipe -fno-common -Wno-long-double  
-I/sw/include',
  Linker and Libraries:
ld='c++', ldflags='  -flat_namespace  -L/sw/lib',
cc_ldflags='',
libs='-lm -lgmp'
  Dynamic Linking:
share_ext='.dylib', ld_share_flags='-dynamiclib',
load_ext='.bundle', ld_load_flags='-bundle -undefined suppress'
  Types:
iv=long, intvalsize=4, intsize=4, opcode_t=long, opcode_t_size=4,
ptrsize=4, ptr_alignment=1 byteorder=4321,
nv=double, numvalsize=8, doublesize=8

% gcc -v
Reading specs from /usr/libexec/gcc/darwin/ppc/3.3/specs
Thread model: posix
gcc version 3.3 20030304 (Apple Computer, Inc. build 1495)

Re: [PATCH] Re: JIT and platforms warning

2004-10-23 Thread Jeff Clites

On Oct 23, 2004, at 3:42 AM, Leopold Toetsch wrote:
Jeff Clites [EMAIL PROTECTED] wrote:
Yep, that was the core of the issue. There's no free lunch--if we use
the nonvolatile registers, we need to preserve/restore them in
begin/end, but if we use the volatile registers, we need to preserve
them across function calls (incl. normal op calls).
Good point and JIT/i386 does it wrong in core.ops. But normal 
non-JITted
code is not the problem - the framework preserves mapped registers or
better - it has to copy all mapped registers to Parrot registers so 
that C
code is able to see the actual values.
Ah, good. The problem I was seeing was caused by Cset_s_sc, since my 
jit_emit_call_func was written with the assumption that we should be 
mapping only non-volatile registers (even though we were actually 
mapping volatiles, but at the end of the list, and even though we 
weren't yet preserving the appropriate float registers). But a recent 
check-in had us mapping the volatile float registers only, which 
exposed the problem.

... So I added code to
do the appropriate save/restore, and use the non-volatile registers 
for
mapping--that should be less asm than what we'd have to do to use the
volatile registers. (The surprising thing was that we only got 2
failures when using the volatile registers--I'll look into creating
some tests that would detect problems with register preservation.)
Well, allocation strategy on PPC (or I386) is to first use the
non-volatile registers. PPC has 14 usable registers. To provoke a
failure you'd need e.g. 15 different I-registers and then a JITted
function call like Cset_s_sc.
We were allocating the volatile float registers first (or, only)--so 
Cset_s_sc was blowing away an N-register, even with only one in use. 
That's why I was surprised there weren't more failures.

Anyway, I think we need a more general solution.
...
Solution: The JIT compiler/optimizer already calculates the register
mapping. We have to use this information for JIT pro- and epilogs.
That makes sense--I hadn't initially realized we were tracking this, 
but since we are, we should use it. Things may be a bit tricky 
fixup-wise, since the size of the prolog will depend on how many float 
registers we need to preserve. (We can save a variable number of int 
registers with just a single instruction in the PPC case, as long as we 
are using registers in order, or we don't mind saving a few extra if 
we are not.)

The allocation strategy should be adjusted and depend on code size and
register usage:
- big chunks of compiled code like whole modules should use the
  non-volatile registers first and then (if needed) the volatile
  registers.
- small chunks of code should use the volatile registers to reduce
  function startup and end sizes.
Yep, makes sense. And there may be a case for not mapping at all for 
very small chunks, but I'd have to experiment to know if that's be a 
win.

2) JITed functions with calls into Parrot
The jit2h JIT compiler needs a hint, that we call external code, e.g.
   CALL_FUNCTION(string_copy)
This notion is needed anyway for the EXEC core, which has to generate
fixup code for the executable. So with that information available, the
JIT compiler can surround such code with:
   PRESERVE_REGS();
   ...
   RESTORE_REGS();
The framework can now depending on the register mapping save and 
restore
volatile registers if needed.
Sounds good.
The other tricky part was that saving/restoring the FP registers is 
one
instruction per saved register, so saving all 18 was exceeding the asm
size we allocate in src/jit.c (in some cases), since we emit 
Parrot_end
for all restart ops.
Yep, that's suboptimal. I've done that on i386 because it was just 
easy.
But you are right, the Parrot_end() code should really be there only
once.
Yep, instead of emitting it multiple times and conditionally jumping 
over it, we can emit it just once and conditionally jump to it.

... , frustratingly, and I've moved to the
habit of using something like x/300i jit_code as a workaround.
Ah, yes - forgot that.
At least stepping does the right thing now, with your fix for the stabs 
file. Very nice to be able to do p I0 and such.

One other question: Your P_ARITH optimization in jit/ppc/core.jit. I 
can't come up with a case where this kicks in, since in the tests I've 
tried, prev_op is always NULL when JITting if_i_ic or unless_i_ic. If I 
set things to not map any int registers, then I don't hit the cases in 
jit_emit.h which set prev_op to 0, but build_asm is doing it--I can't 
come up with a small test case which isn't being treated as multiple 
sections, it seems. There's something there w.r.t. sections which I 
don't understand, but anyway I just wanted to know how to set things up 
so that I can see your optimization in action.

Thanks,
JEff

Re: A small Perl task

2004-10-23 Thread Jeff Clites

On Oct 23, 2004, at 5:14 AM, Leopold Toetsch wrote:
First, if you don't have it yet done, install ccache.
Thanks for the tip--seems awesome.
HOW IT WORKS
   The basic idea is to detect when you are compiling exactly the 
same code a 2nd time and use the pre-
   viously compiled output. You detect that it is the same code by 
forming a hash of:
...
So you get really fast recompiles *except* for classes/*.c. What I'd 
like to have is (based on ccache's philosophy) a cache for 
classes/*.dump and classes/*.c files. Currently they are recreated 
permanently.
I started playing with this, got confused, and finally figured out why 
I'm confused.

In the dump case, you basically (it seems) need to parse the pmc file 
in order to determine the parent (chain), in order to determine the 
dependencies, in order to figure out what you need to put into your 
digest. (In the ccache case, it's the preprocessor doing this for you.) 
But by then, you've done most of the work I'd think (by parsing the pmc 
file). So at that point, you'd end up calculating a digest over a bunch 
of files, to avoid the Dumper overhead, and the latter may likely be 
faster. So this might not be a win for us, since it doesn't look like 
it would let us skip much.

I think this will only be worth doing if we have a faster way to 
determine the dependencies (basically the parents of the pmc in 
question). The Makefile implicitly knows this, so it might be possible 
for it to pass this into the script.

But I have the caching and use-the-best-digest-module logic worked out, 
I'm just stuck on figuring out what work it can skip. Let me know if 
you have any ideas, or if I'm missing something.

I guess maybe the *.{c,h} case is simpler, since there you only depend 
on your *.dump files (plus the script itself, etc.), though there's 
more work to figure out what files you're trying to generate (in the 
library case, etc.).

JEff

Re: JIT and platforms warning

2004-10-22 Thread Jeff Clites

On Oct 22, 2004, at 1:01 AM, Leopold Toetsch wrote:
[JIT changes]
I just finished tracking down the source of a couple of JIT test 
failures on PPC--due to recent changes but only indirectly related, and 
pointing out things which needed fixing anyway (float register 
preservation issues). I'll send it in tomorrow after I've had a chance 
to clean it up and add some comments.

JEff

Re: Register stacks, return continuations, and speeding up calling

2004-10-21 Thread Jeff Clites

On Oct 20, 2004, at 12:09 PM, Leopold Toetsch wrote:
Dan Sugalski wrote:
'Kay, now I'm confused. I thought we were talking about removing the 
registers from out of the interpreter structure, which'd leave us 
needing two pointers, one for the interpreter struct and one for the 
registers.
Ok, short summary of future layout of JIT regs:
itemPPC   i386

interpreter r13   -16(%ebp)
frame pointer   r16%ebx
Register addressing is done relative to the frame pointer, which will 
be in a register. The interpreter isn't used that often inside integer 
JIT  code, so it isn't in an register in i386 but is easily reloaded 
into one.

Currently the frame pointer and the interpreter are the same.
Just to clarify: This is the approach wherein each frame gets a fresh 
set of registers, and function call and return (or continuation 
invocation) copy the relevant registers between the register sets? And 
this isn't quite the scheme from the towards a new call scheme 
thread, in which we'd be duplicating the interpreter context for each 
frame, right? (And the latter was what you did in Proof of concept - 
hack_42 (was: the whole and everything), right?)

Just trying to sort out all of the ideas.
JEff

Re: Pathological Register Allocation Test Generator

2004-10-21 Thread Jeff Clites

On Oct 20, 2004, at 11:24 PM, Leopold Toetsch wrote:
Bill Coffman [EMAIL PROTECTED] wrote:

And of course, lexicals and globals already have a storage, you don't
need to spill them.
I'm not sure that's true. If there's no 'eval' in scope, lexicals don't 
have to live in pads--they could purely exist in registers. And with 
tied namespaces and such, it may not be legitimate to re-fetch a global 
(ie, to fetch it multiple times, if the code appears to only fetch it 
once) -- one could pathologically have a global whose value appears to 
increase each time it's fetched, for instance, or you could end up with 
multiple round-trips to a database.

... Another thing that might be worth
checking, after parrot gets out of alpha, is if reducing or
increasing the number of registers will help performance.  Just a
thought.
The 4 x 32 is pretty good. It matches recent hardware too. But if a 
good
register algorithm shows that 4 x 16 is enough, we can of course
decrease the register number. Increasing shouldn't be necessary.
Matching hardware is probably not too significant--even though the PPC 
has 32 int registers, we can't map to all of them in JIT (some are 
dedicated to holding the interpreter pointer, etc.), and we'd really 
need 3 x 32 hardware int registers to accommodate all we'd like (I, S, 
and P registers). So even currently it's a loose match.

JEff

Re: Pathological Register Allocation Test Generator

2004-10-21 Thread Jeff Clites

On Oct 21, 2004, at 4:13 AM, Leopold Toetsch wrote:
Jeff Clites [EMAIL PROTECTED] wrote:
On Oct 20, 2004, at 11:24 PM, Leopold Toetsch wrote:

And of course, lexicals and globals already have a storage, you don't
need to spill them.

I'm not sure that's true.
It should read: if there are lexical or global opcodes, lexicals and
globals have a storage.
Ah yes, true.
tied namespaces and such, it may not be legitimate to re-fetch a 
global
(ie, to fetch it multiple times, if the code appears to only fetch it
once) -- one could pathologically have a global whose value appears to
increase each time it's fetched,
Then there's something horribly wrong with that usage of tie: not the
value is refetched from the var - the var is refetched from the
namespace.
I think there'll be two types of tie--tied variables (like Perl has 
already), and tied namespaces (as supposedly some people really need, 
though I don't fully know why). But even without the above pathological 
case: with tied namespaces, a namespace fetch potentially has unknown 
overhead, and a compiler can't know if re-fetching is better than 
spilling. But on the other hand, maybe that's just part of the deal 
with tied namespaces--they may be fetched from more often than the code 
would imply, so the tied namespace needs to be prepared for that.

JEff

Re: Register stacks, return continuations, and speeding up calling

2004-10-21 Thread Jeff Clites

On Oct 21, 2004, at 8:34 AM, Leopold Toetsch wrote:
Dan Sugalski wrote:
In that case I won't worry about it, and I think I know what I'd like 
to do with the interpreter, the register frame, and the register 
backing stack. I'll muddle it about some and see where it goes.
JIT/i386 is up to date now that is: it doesn't do any absolute 
register addressing anymore.

So what next:
* deprecate the usage of allmost all register stack push and pops?
I think we don't need them anymore. Register preservering is done as 
part of the call sequence.
Are we still planning to move the current return continuation and 
current sub, out of the registers and into their own spots in the 
interpreter context (in order to avoid the PMC creation overhead in the 
common case, etc.)? (Or, have we already done this?)

JEff

Re: C89

2004-10-21 Thread Jeff Clites

On Oct 21, 2004, at 11:51 AM, Dan Sugalski wrote:
At 11:25 AM -0700 10/21/04, Bill Coffman wrote:
I read somewhere that the requirement for parrot code is that it
should be compliant with the ANSI C'89 standard.  Can someone point me
to a description of the C89 spec, so I can make sure my reg_alloc.c
patch is C89 compliant?
I don't think the ANSI C89 spec is freely available, though I may be 
wrong. (Google didn't find it easily, but I don't always get along 
well with Google) If the patch builds without warning with parrot's 
standard switches then you should be OK. (ANSI C89 was the first big 
rev of C after the original KR C. If you've got the second edition or 
later of the KR C book, it uses the C89 spec)
Also, if you're compiling with gcc, then you can pass -std=c89 to the 
compiler to enforce that particular standard. (Apparently--though I 
haven't tried it.) I believe -ansi does the same thing.

JEff

Re: [perl #32036] [BUG] t/pmc/signal.t fails

2004-10-19 Thread Jeff Clites

On Oct 19, 2004, at 12:42 AM, Leopold Toetsch wrote:
Will Coleda [EMAIL PROTECTED] wrote:
t/pmc/signal...Hangup
I saw that once too: looks like the test script got the signal.
That's what my patch from last week was supposed to fix--I'm surprised 
it's still happening. We should currently be making sure not to kill 
the harness.

(However, if there's another copy of parrot running--for instance, 
stopped in a debugger somewhere--then that may get the signal 
erroneously.)

JEff

Re: [Proposal] JIT, exec core, threads, and architectures

2004-10-19 Thread Jeff Clites

On Oct 19, 2004, at 1:56 AM, Leopold Toetsch wrote:
Jeff Clites [EMAIL PROTECTED] wrote:
On Oct 17, 2004, at 3:18 AM, Leopold Toetsch wrote:

Nethertheless we have to create managed objects (a Packfile PMC) so
that we can recycle unused eval-segments.

True, and some eval-segments are done as soon as they run (eval 3 +
4), whereas others may result in code which needs to stay around 
(eval
sub {}), and even in the latter case not _all_ of the code
generated in the eval would need to stay around. It seems that it may
be hard to determine what can be recycled, and when.
Well, not really. As long as you have a reference to the code piece,
it's alive.
Yes, that's what I meant. In the case of:
$sum = eval 3 + 4;
you don't have any such reference. In the case of:
$sub = eval sub { return 7 };
you do. In the case of:
$sub = eval 3 + 4; sub { return 7 };
you've got a reference to the sub still, but the 3 + 4 code is no 
longer reachable, so wouldn't need to stay around.

But it's possible that Parrot won't be able to tell the difference, and 
will have to keep around more than is necessary.

And we have to protect the packfile dictionary with mutexes, when 
this
managing structure changes i.e. when new segments gets chained into
this list or when they get destroyed.

Yes, though it's not clear to me if all eval-segments will need to go
into a globally-accessible dictionary. (e.g., it seems the 3 + 4 
case
above would not.)
It probably depends on the generated code. If this code creates globals
(e.g. Sub PMCs) it ought to stay around.
Yes, that's what I meant by not all--some yes, some no.
JEff

Re: [Proposal] JIT, exec core, threads, and architectures

2004-10-17 Thread Jeff Clites

On Oct 17, 2004, at 3:18 AM, Leopold Toetsch wrote:
Jeff Clites wrote:
On Oct 16, 2004, at 4:47 AM, Leopold Toetsch wrote:
Nethertheless we have to create managed objects (a Packfile PMC) so 
that we can recycle unused eval-segments.
True, and some eval-segments are done as soon as they run (eval 3 + 
4), whereas others may result in code which needs to stay around (eval 
sub {}), and even in the latter case not _all_ of the code 
generated in the eval would need to stay around. It seems that it may 
be hard to determine what can be recycled, and when.

And we have to protect the packfile dictionary with mutexes, when this 
managing structure changes i.e. when new segments gets chained into 
this list or when they get destroyed.
Yes, though it's not clear to me if all eval-segments will need to go 
into a globally-accessible dictionary. (e.g., it seems the 3 + 4 case 
above would not.)

OTOH it might be better to just toss all the constant table access 
in all instructions, except:

   set_n_nc
   set_s_sc
   set_p_pc   # alias set_p_kc
This would reduce the interpreter size significantly (compare the 
size of core_ops_cgp.o with core_ops_cg.o).
Reducing the size is good, but this doesn't overall reduce the number 
of accesses to the constant table, just changes which op is doing 
them.
Not quite, e.g:
   set P0[foo], bar
   set S0, P0[foo]
are 3 accesses to the constant table. It would be
   set S1, foo
   set S2, bar
   set P0[S1], S2
   set S0, P0[S1]
with 2 accesses as long as there is no pressure on the register 
allocator.
Sure, but you can do this optimization today--narrowing it down to just 
those 3 ops isn't required. But it only helps if local re-use of the 
same constants is frequent, and it may not be. (But still, it's a good 
optimization for a compiler to implement--it just may not have a huge 
effect.)

Also, there's some subtlety. This:
set S1, foo
set S2, foo
isn't the same as:
set S1, foo
set S2, S1
but rather:
set S1, foo
clone S2, S1
since 'set' copies in the s_sc case. (That's not a problem, just 
something to keep in mind.)

The assembler could still allow all constant variations of opcodes 
and just translate it.
For this we'd need a special register to hold the loaded constant, so 
that we don't overwrite a register which is in use.
No, just the registers we have anyway,
For PIR yes, but the PASM assembler can't know for sure what register 
would be safe to use--the code could be using its own obscure calling 
conventions.

JEff

Re: [Proposal] JIT, exec core, threads, and architectures

2004-10-16 Thread Jeff Clites

On Oct 16, 2004, at 12:26 AM, Leopold Toetsch wrote:
Jeff Clites [EMAIL PROTECTED] wrote:
... But, we use this currently, because
there is one issue with threads: With a thread, you don't start from
the beginning of the JITted code segment,
This isn't a threading issue. We can always start execution in the
middle of one segment, e.g. after an exception. That's already handled
on almost all JIT platforms and no problem. The code emitted in
Parrot_jit_begin gets the Ccur_opcode * as argument and has to branch
there, always.
I was remembering wrong--we do this on PPC too.
On Oct 16, 2004, at 4:47 AM, Leopold Toetsch wrote:
String, number (and PMC) constants are all addressed in terms of the 
compiling interpreter.
...
When we do an eval() e.g. in a loop, we have to create a new constant 
table (and recycle it later, which is a different problem). Running 
such a compiled piece of code with different threads would currently 
do the wrong thing.
The correct constant table depends on the code segment, rather than the 
specific interpreter, right? That means that referencing the absolute 
address of the const table entry would be correct for JIT code no 
matter the executing thread, but getting the const table from the 
compiling interpreter is wrong if that interpreter isn't holding a 
reference to the corresponding code segment.

Access to constants in the constant table is not only a problem for 
the JIT runcore, its a lengthy operation for all code, For a string 
constant at PC[i]:

   interpreter-code-const_table-constants[PC[i]]-u.string
These are 3 indirections to get at the constants pointer array, and 
worse they depend on each other, emitting these 3 instructions on an 
i386 stalls for 1 cycle twice (but the compiler is clever and 
interleaves other instructions)

For the JIT core, we can precalculate the location of the constants 
array and store it in the stack or even in a register (on not so 
register-crippled machines like i386). It only needs reloading, when 
an Cinvoke statement is emitted.
For PPC JIT, it seems that we are putting in the address of the 
specific const table entry, as an immediate.

OTOH it might be better to just toss all the constant table access in 
all instructions, except:

   set_n_nc
   set_s_sc
   set_p_pc   # alias set_p_kc
This would reduce the interpreter size significantly (compare the size 
of core_ops_cgp.o with core_ops_cg.o).
Reducing the size is good, but this doesn't overall reduce the number 
of accesses to the constant table, just changes which op is doing them.

The assembler could still allow all constant variations of opcodes and 
just translate it.
For this we'd need a special register to hold the loaded constant, so 
that we don't overwrite a register which is in use.

JEff

Re: [Proposal] JIT, exec core, threads, and architectures

2004-10-15 Thread Jeff Clites

On Oct 14, 2004, at 12:10 PM, Leopold Toetsch wrote:
Proposal:
* we mandate that JIT code uses interpreter-relative addressing
- because almost all platforms do it
- because some platforms just can't do anything else
- and of course to avoid re-JITting for every new thread
FYI, the PPC JIT does already do parrot register addressing relative to 
the interpreter pointer, which as you said is already in a CPU 
register. This is actually less instructions than using absolute 
addressing would require (one rather than three).

We do still re-JIT for each thread on PPC, though we wouldn't have to 
(just never changed it to not). But, we use this currently, because 
there is one issue with threads: With a thread, you don't start from 
the beginning of the JITted code segment, but rather you need to 
start with a specific Parrot function call, somewhere in the middle. 
But you can't just jump to that instruction, because it would not have 
the setup code needed when entering the JITted section. So currently, 
we use a technique whereby the beginning of the JITted section has, 
right after the setup code, a jump to the correct starting address--in 
the main thread case, this is just a jump to the next instruction 
(essentially a noop), but in the thread case, it's a jump to the 
function which the thread is going to run. So right now the JITted code 
for a secondary thread differs by one instruction from that for the 
main thread. We'll need to work out a different mechanism for handling 
this--probably just a tiny separate JITted section to set things up for 
a secondary thread, before doing an inter-section jump to the right 
place.

JEff

[PATCH] Re: [perl #31978] [BUG] dynclasses broken

2004-10-14 Thread Jeff Clites

On Oct 13, 2004, at 6:36 PM, Will Coleda (via RT) wrote:
# New Ticket Created by  Will Coleda
# Please include the string:  [perl #31978]
# in the subject line of all future correspondence about this issue.
# URL: http://rt.perl.org:80/rt3/Ticket/Display.html?id=31978 
One of the recent changes has broken:
cd dynclasses  make
Which did, in fact, work for all of one checkout for me. =-)
This is on a fresh checkout (not update), configure, make, cd 
dynclasses and...
...
/Users/coke/research/parrot2/include/parrot/interpreter.h:61: error: 
conflicting
   types for `typedef struct Parrot_Interp*Parrot_Interp'
/Users/coke/research/parrot2/include/parrot/platform_interface.h:124: 
error: previous
   declaration as `struct Parrot_Interp'
The below patch should fix this.
The problem is that ${ld} for Mac OS X became 'c++' because that's 
needed to link libparrot.dylib since ICU contains C++ code. But, 
dynclasses/build.pl tried to take a shortcut, and compile and link in 
one step. This caused a problem, because the c++ compiler was choking 
on an odd typedef of ours. So we need to build with just a C compiler. 
(We could probably link with one in this case too, but we don't have 
separate configs for linking the stuff in dynclasses, v. everything 
else.)

So the fix here is just to compile and link as separate steps, always, 
rather than only when we need to combine multiple .o files.

At the same time, I'm not sure why we need this construct in a header:
struct Parrot_Interp;
typedef struct Parrot_Interp *Parrot_Interp;
I'm not surprised that chokes a C++ compiler, but I don't know why a C 
compiler tolerates it either. Not sure why this is necessary--I wonder 
if we could do something a bit less tricky?

JEff
Index: config/gen/makefiles/dynclasses_pl.in
===
RCS file: /cvs/public/parrot/config/gen/makefiles/dynclasses_pl.in,v
retrieving revision 1.2
diff -u -b -r1.2 dynclasses_pl.in
--- config/gen/makefiles/dynclasses_pl.in   8 Oct 2004 07:08:31 
-  1.2
+++ config/gen/makefiles/dynclasses_pl.in   14 Oct 2004 06:08:27 
-
@@ -13,14 +13,6 @@
 our $PMC2C = $PERL ${build_dir}${slash}classes${slash}pmc2c2.pl;

 # Actual commands
-sub compile_loadable_cmd {
-my ($target, $source) = @_;
-$LD $CFLAGS $LDFLAGS $LD_LOAD_FLAGS  .
-${cc_o_out} . $target .   .
--I${build_dir}${slash}include -I${build_dir}${slash}classes  .
-$source;
-};
-
 sub compile_cmd {
 my ($target, $source) = @_;
 $CC $CFLAGS  .
@@ -58,14 +50,8 @@
 } elsif ($mode eq 'compile') {
 my ($group_files, $pmc_group) = gather_groups(@pmcs);
-my @grouped_pmcs = grep { exists $pmc_group-{$_} } @pmcs;
-my @ungrouped_pmcs = grep { ! exists $pmc_group-{$_} } @pmcs;
-
-# Convert X.c - X.so for all non-grouped X.c
-compile_loadable($_) foreach (@ungrouped_pmcs);
-
-# Convert X.c - X.o for all grouped X.c
-compile($_) foreach (@grouped_pmcs);
+# Convert X.c - X.o for all X.c
+compile($_) foreach (@pmcs);
 # lib-GROUP.c
 for my $group (keys %$group_files) {
@@ -80,6 +66,11 @@
 partial_link($group, lib-$group, @$pmcs)
   or die partial link of $group failed ($?)\n;
 }
+
+# Link non-grouped PMCs individually
+my @ungrouped_pmcs = grep { ! exists $pmc_group-{$_} } @pmcs;
+partial_link($_, $_) foreach (@ungrouped_pmcs);
+
 } elsif ($mode eq 'copy') {
 # Copy *.so - destination, where destination is the first
 # argument, given as --destination=DIRECTORY
@@ -166,17 +157,6 @@
 }
 }
-sub compile_loadable {
-my ($src_stem, $dest_stem) = @_;
-$dest_stem ||= $src_stem;
-if (needs_build($dest_stem$LOAD_EXT, $src_stem.c)) {
-run(compile_loadable_cmd($dest_stem$LOAD_EXT, $src_stem.c))
-  or die compile $src_stem.c failed ($?)\n;
-} else {
-print $dest_stem$LOAD_EXT is up to date\n;
-}
-}
-
 sub partial_link {
 my ($group, @stems) = @_;
 my @sources = map { $_$O } @stems;

Re: dynamically loadable modules

2004-10-08 Thread Jeff Clites

On Oct 8, 2004, at 7:54 AM, Andy Dougherty wrote:
On Thu, 7 Oct 2004, Steve Fink wrote:
So what I need is names for these. At the moment, I'm mostly using 
$(SO)
for shared lib extensions, $(DYNMOD) for d-l-modules. The buildflags I
gneerally call $(LD_SHARED) or something with shared for shared libs,
and something like $(LD_DYNMOD_FLAGS) for d-l-modules.

Clearly, I'm not very experienced with dealing with these things 
across
platforms, so I was hoping somebody (Andy?) might have a better sense
for what these things are called.
Sorry -- offhand I don't have any sense of any standard names
I don't think it's common to have a split like the dylib v. bundle 
split on Mac OS X, so there's probably not a common convention.

JEff

Re: ICU causing make to fail w/o --prefix=`pwd`

2004-10-08 Thread Jeff Clites

On Oct 8, 2004, at 9:03 AM, Steve Fink wrote:
If I just do
  perl Configure.pl
  make
right now, it builds the parrot executable ok but then fails when it
tries to compile the library .imc files. It's looking for the icu data
dir in $(prefix)/blib/lib/2.6.1. It works if I do
  perl Configure.pl --prefix=$(pwd)
  make
or set PARROT_ICU_DATA_DIR, but this seems like an unfriendly default
for developers.
I have a similar problem with the search path for loadable modules.
I think I probably broke this, btw, when I repaired 'make install'. I
had previously bandaged over the problem by defaulting ${prefix} to the
top-level directory. But I'm not sure how to fix it.
It's probably the issue of what to use at build-time v. install-time; 
I'd expect that nothing should be looked for via ${prefix} at build 
time, and rather it would just be used at install time (and possibly on 
the link line when building the shared libparrot, to set the install 
name).

JEff

Re: ICU causing make to fail w/o --prefix=`pwd`

2004-10-08 Thread Jeff Clites

On Oct 8, 2004, at 9:24 AM, Jeff Clites wrote:
On Oct 8, 2004, at 9:03 AM, Steve Fink wrote:
If I just do
  perl Configure.pl
  make
right now, it builds the parrot executable ok but then fails when it
tries to compile the library .imc files. It's looking for the icu data
dir in $(prefix)/blib/lib/2.6.1. It works if I do
  perl Configure.pl --prefix=$(pwd)
  make
or set PARROT_ICU_DATA_DIR, but this seems like an unfriendly default
for developers.
I have a similar problem with the search path for loadable modules.
I think I probably broke this, btw, when I repaired 'make install'. I
had previously bandaged over the problem by defaulting ${prefix} to 
the
top-level directory. But I'm not sure how to fix it.
It's probably the issue of what to use at build-time v. install-time; 
I'd expect that nothing should be looked for via ${prefix} at build 
time, and rather it would just be used at install time (and possibly 
on the link line when building the shared libparrot, to set the 
install name).
Ignore what I said. I misread that as failure when building the ICU 
data files. Sorry about that.

JEff

Re: Python bound and unbound methods

2004-10-07 Thread Jeff Clites

On Oct 6, 2004, at 11:49 PM, Leopold Toetsch wrote:
Jeff Clites [EMAIL PROTECTED] wrote:
3) I won't mention the problem of languages which allow an object to
have instance variables and instance methods of the same name (so that
in Python, a.b would be ambiguous if a is an object from such a
language).
Well, Python has that very problem. By dynamically defining an instance
variable, a method with that same name becomes inaccessible.
Well, for Python itself that's not a problem per se, it's a fundamental 
part of the design--there's only one slot for a given name, and it can 
hold a sub or something else, but it's just one slot. (You don't really 
have named methods, you just have named slots that may or may not hold 
subs.) That's just the way Python objects work. But for objects coming 
over from another language, you might really have separate slots for 
subs v. data, but if these are accessed from Python code, you don't 
have a way to specify which one you mean. It's very similar to the 
namespace problem with language crossing.

JEff

Python bound and unbound methods

2004-10-06 Thread Jeff Clites

Python's method call semantics allows you to look up a method of an 
object as an attribute, store it in a variable, and later invoke it 
like a regular function. This works similarly if you do the lookup on 
the class object, but when you invoke the function you need to pass 
in an instance as an argument. Consider:

class printWrapper:
def __init__(self, message):
self.message = message
def printMe(self):
print self.message
x = printWrapper(foo)
x.printMe()  # prints foo
a = x.printMe
a()# bound method call; prints foo
b = printWrapper.printMe
b(x)   # unbound method call; prints foo
Do we have plans on how we might implement this via Parrot?
To put it another way, the expression foo.bar() in Python doesn't 
really parse as invoke method bar on object foo, but rather as 
lookup attribute bar on object foo, and call the result as a 
function. That's an incredibly flexible and compact semantics, and 
let's you do stuff like this:

class omniWrapper:
def __init__(self, message):
self.message = message
def printMe(self):
print self.message
def __getattr__(self, attrname):
return self.printMe
z = omniWrapper(hello)
z.printMe() # prints hello
z.igloo()   # also prints hello
z.anythingAtAll()   # anything prints hello
That is, you can appear to be calling methods that aren't there at all. 
It seems odd, but makes perfect sense in light of the above 
description--method calls in Python are not what you might think, based 
on how other languages act. But that would seem to imply that Python 
would need a very different method call infrastructure than Perl -- 
a.b() in Python means something very different than $a.b() in Perl. 
(That of course brings up questions of how things act when Python tries 
to call a method on a Perl object--syntax, semantics.)

JEff

Re: Python bound and unbound methods

2004-10-06 Thread Jeff Clites

On Oct 6, 2004, at 1:11 AM, Leopold Toetsch wrote:
Jeff Clites [EMAIL PROTECTED] wrote:
To put it another way, the expression foo.bar() in Python doesn't
really parse as invoke method bar on object foo, but rather as
lookup attribute bar on object foo
Well, there isn't much difference here. invoke method bar implies a
method lookup that normally checks the method cache for existing
(statically built) methods. If a method isn't found there, the dynamic
plan B comes in. And if Python code overrides a method (attribute slot)
the method cache has to be invalidated.
A few things though:
1) In my case of a = x.printMe, we need to be prepared to return a 
function created by currying a method using that particular instance, 
and for b = printWrapper.printMe we need to return a function which 
does type checking on its first argument (then calls the relevant 
method on it)--certainly doable, just odd.

2) I'd expect the method cache to be per-class, but Python can change 
an attribute slot on a per-instance basis (as well as a per-class 
basis), so we can't really use a per-class method cache (or, we need a 
flag on particular instances which tell us not to use it for them).

3) I won't mention the problem of languages which allow an object to 
have instance variables and instance methods of the same name (so that 
in Python, a.b would be ambiguous if a is an object from such a 
language).

More fun is probably to get the same inheritance (mro) as Python.
Yes, true, and I guess related--Python must first look for things 
overridden on the particular instance, I think, even before looking at 
its class.

JEff

Re: Namespaces, part 2

2004-10-05 Thread Jeff Clites

On Oct 4, 2004, at 8:25 AM, Dan Sugalski wrote:
Okay, since we've got the *basic* semantics down (unified namespace, 
namespace entries get a post-pended null character)
I'll ask again, what about subs? Do they get name-mangled too?
   $Px = find_global [key; key; key], 'name'
As Leo pointed out in a thread of the same name last year, this is a 
new syntax--keyed access on nothing. I assume you mean for [key; key; 
key] to serve as a sort of literal syntax for an array of strings? If 
so, we should make the syntax for that sort of thing explicit. Or, we 
can do what Leo suggested (in a thread of the same name last year, and 
more recently), and write this as keyed access on a particular 
namespace (typically, the root namespace):

$P0 = root_namespace
$P1 = find_global $P0['Foo'; 'Bar'; 'Baz'], 'xyzzy'
The case where the namespace is specified by string I think should go 
away.
I assume by specified by string, you mean a string such as 
Foo::Bar? If so, I agree, and I'll note that it would be possible to 
write per-language utility functions to parse these apart--compilers 
will need to be able to process these, in some form, and so might 
programmers in general, but it doesn't need to be an op.

That is, if I have the namespace Foo, Foo::Bar, and Foo::Bar::Baz, and 
then have a thing named xyzzy in Foo::Bar::Baz, I can get it by doing:

   $P1 = find_global ['Foo'; 'Bar'; 'Baz'], 'xyzzy'
or
   $P0 = find_global ['Foo']
   $P1 = find_global $P0, ['Bar'; 'Baz']
   $P2 = find_global $P1, 'xyzzy'
...
That is, if we don't specify the name of the thing in the namespace we 
get the namespace PMC itself, which we can then go look things up in. 
This is handy since it means code can cache the namespace PMC so if 
there are a dozen variables to go fetch out, we don't have to do the 
multi-level hash lookup each time. (That'd be nasty)
Several things here:
1) By your name mangling scheme, it seems we don't need a special 
syntax for looking up a namespace--it's just something else in the 
namespace above it, so looking up Foo::Bar would be:

$P0 = root_namespace
$P1 = find_global $P0['Foo'], 'Bar'
But if everything's lumped together and name-mangled in a non-segmented 
namespace, then you don't need find_global, you just need this (as Leo 
suggested):

$P0 = root_namespace
$P1 = $P0['Foo'; 'Bar'] # P1 holds Foo::Bar
$P2 = $P1['Baz'; 'xyzzy']   # P2 holds xyzzy from Foo::Bar::Baz
$P2 = $P0['Foo'; 'Bar'; 'Baz'; 'xyzzy']  # same thing
I was formerly a proponent of keeping the final thing in a separate 
parameter, but with your current mangling-plus-flat-namespace proposal, 
I don't see a reason for it.

2) What's putting in the trailing null bytes (HLL programmer, compiler, 
or Parrot's implementation of the lookup)? I'd assume it would be the 
HLL compiler, so shouldn't the above be:

	$P2 = $P0['Foo\0'; 'Bar\0'; 'Baz\0'; '$xyzzy'] #or at least, shouldn't 
the PASM variant look like this

(In particular, I noticed you said, a thing named xyzzy, but didn't 
include a sigil?)

3) Canonical decomposition: Consider the following:
$P0 = root_namespace
$P2 = $P0['Foo\0'; 'Bar\0'; 'Baz\0'; '$xyzzy']
v.
$P0 = root_namespace
$P0 = $P0['Foo\0']
$P0 = $P0['Bar\0']
$P0 = $P0['Baz\0']
$P2 = $P0['$xyzzy']
(The example's the same even with your other syntax.)
These appear equivalent, _but_ with the former, the root namespace 
could take into consideration to whole list keys, and decide what to 
return based on that information; with the latter, it only sees 
'Foo\0'. So in the general case (in light of namespace tie-ing), they 
could do different things. The upshot of this is that we need to make 
explicit the algorithm followed by a multi-keyed lookup--basically, 
iterative v. recursive, if you think it through. Recursive is more 
flexible and powerful (because any namespace along the way can short 
circuit the rest of the lookup), but means that HLL compilers must 
emit the first option above (with an iterative approach, where Parrot 
controls the algorithm and namespaces have less control, a compiler 
would have a choice).

4) Similar to (3), you can't do much caching of a namespace PMC, in 
light of tying and such. Consider:

$a = $Foo::Bar::bar;
somesub();
$b = $Foo::Bar::zoo;
A compiler can't safely optimize this to cache the lookup of the 
Foo::Bar namespace, because somesub() might perform tying of one of the 
relevant namespaces, or otherwise rearrange the hierarchy, and you'd 
get the wrong $zoo. Even without somesub(), in light of namespace tying 
it's possible that the lookup of $bar actually caused a namespace 
rearrangement, so you can't even cache across adjacent lookups (or a 
lookup and a store, such as $Foo::Bar::bar = $Foo::Bar::zoo).

5) Python. Language crossing issues aside, there are issues with Python 
itself. Python does not know, at compile-time, what's

Re: Namespaces, part 2

2004-10-05 Thread Jeff Clites

On Oct 4, 2004, at 9:58 PM, Brent 'Dax' Royal-Gordon wrote:
Tim Bunce [EMAIL PROTECTED] wrote:
Now, with that out of the way, let's talk about overlaid namespaces.
I don't think I ever read a description of what the purpose of this 
was.
I get the what but not the why. Without the why it's hard to
critique the how.
Clearly I'm not Dan, but I think the idea here is that, for example,
in the following code:
module Foo::Bar {
class Baz { $quux }
}
You can have the current namespace actually be [ ::Foo::Bar::Baz,
::Foo::Bar, ::* ] (or, for the last one, whatever the namespace that
@*ARGS and friends are in is called), so that the search for $quux can
be done very easily.
This may have changed for Perl6, but at least for Perl5, non-lexicals 
are only ever looked for in the current package, the top-level (main) 
namespace, plus the pre-defined variables (don't know if these are 
handled at compile-time), according to the Camel. So having an 
arbitrary search list might not be used in the Perl case.

JEff

Re: Namespaces again

2004-10-04 Thread Jeff Clites

On Sep 29, 2004, at 9:01 PM, Brent 'Dax' Royal-Gordon wrote:
[Argh...]
Chip Salzenberg [EMAIL PROTECTED] wrote:
   parrot_alias(a, 'b', # dest: Python is unified, no 
need for a category here
a, 'b', 'scalar')   # src:  Perl is not unified, so 
source category is required

   parrot_alias(a, 'c',
a, 'c', 'array')# here's a different category, to 
get '@c'

or some such.  Yes it's ugly.  But if we can't fix a ditch, the least
we can do is put a big friendly warning sign on it.
Turns out the ugliness isn't needed anyway; Python already has a syntax 
to access a namespace entry by string name (via __dict__), and Python 
import copy anyway, rather than alias. That is, this:

from os import WNOHANG
creates a WNOHANG variable in the local namespace with the value of 
os. WNOHANG, but it's a copy--changing one changes the other. So, one 
could do this:

foo = Perl.Blah.__dict__[some-japanese-variable-name]
when the variable name isn't something Python can accommodate directly, 
and get the same result.

Another way to do it would be to have each category actually be a
namespace.  In other words, Perl's namespaces are structured like
this:
File = package {
ns = package {
Find = package {
sub = package {
 find = { ... }
 }
},
Path = package {
sub = package {
BUILD = { ... },
DESTROY = { ... }
}
}
...
},
scalar = package {
some_idiot = put a variable in this package
}
}
And Python would access them like so:
File.ns.Path.sub.new()
Not perfect, certainly, but it would work, and be reasonably elegant.
Yep, this is exactly what I had in mind. Everything is hash-like 
anyway, so whether you call the category a namespace or just something 
namespace-like (since it's hash-like), doesn't matter so much, and on 
the Python side they're all just things which allow named attribute 
lookup.

(This does pose a problem going the other way, but I suspect Perl
could simply mark its own packages in some way, and fall back to a
simpler scheme, such as ignore the sigil, when it's munging another
language's namespaces.)
I'd think that, when importing a module, you'd need a concept of what 
structure of namespace you're dealing with (which maps roughly to what 
language was used to implement the module, though flat namespace 
languages might all act the same). As I touched on in another post, I 
think you'd need to concept of a wrapper--for instance, when Python 
code pulls in a Perl module, it would get a wrapper which acts 
Pythonish, and acts as a go-between for the real namespace. (Probably, 
this would be via namespace tie-ing.)

In this light, I think that going the other way is easy--all Python 
variables would look to Perl like scalars holding references (since 
this is the semantics that Python variables have). But see my recent 
post in the Namespaces, part 1 (new bits) thread for some caveats.

JEff

Re: Namespaces again

2004-10-04 Thread Jeff Clites

On Oct 1, 2004, at 5:45 AM, Leopold Toetsch wrote:
Jens Rieks [EMAIL PROTECTED] wrote:
On Friday 01 October 2004 08:42, Leopold Toetsch wrote:
sucks a lot less than making python programmers say
import Foo.ns.Bar.scalar.baz
But OTOH I can imagine that finally standard modules are present in 
Parrot that do the right thing.
E.g. in Python

  import os
would import a Parrot module that wraps these functions to Parrot. So
basically, when you have:
  os.setsid()  # Python
  POSIX::setsid(); # Perl
these two functions would be the same, living in some Parrot lib and
aliased to the language's equivalent.
So the import or use syntax shouldn't be changed just to accomodate
Parrot's namespace internals.
Well, it wouldn't be a syntax change per-se, and it wouldn't be to 
accommodate Parrot--this is about language crossing.

You'd say, in Python, import Perl.ns.Foo.ns.Bar.scalar.baz only in 
the case where the namespaces you are dealing with are of Perl origin. 
This is Python accommodating Perl (or really, Perl accommodating 
Python). Perl has more structure to its namespaces than Python 
does--that's just a fact, and this is one way to deal with it.

In your example above, it looks like you're talking about one function, 
ostensibly living in two namespaces. That's fine--if you author a 
module in pasm (no HLL involved), then you get to structure your 
namespace any way you want. In this case, it sounds like you'd want 
your module to implement two different interfaces, one intended for 
Perl, and one intended for Python. That would be fine, and it would be 
optimal for both languages--at the cost of extra work for the module 
implementor, explicitly taking into account multiple languages. In the 
case of someone just writing a module in Perl, not thinking about other 
languages at all, the compiler will create a Perl-style interface for 
its namespace, and Python programmers will still be able to use this 
module, just with a somewhat odd-looking interface. But this isn't a 
new syntax for Python--from a Python perspective, that really is the 
structure of that module. It will just look like Perl programmers 
create really nested modules.

JEff

Re: Namespaces, part 1 (new bits)

2004-10-03 Thread Jeff Clites

More detailed responses are below, but some general comments first:
I think that no matter what the approach, there's an unavoidable 
mismatch between Perl and Python when it comes to variable naming, it's 
going to be a bit awkward to access Perl variables from within Python. 
I don't see any way around that. Either of the obvious approaches will 
have issues to deal with; these approaches are:

1) Treat Perl variables as having the sigil as part of the name.
-or-
2) Treat Perl variables as not including the sigil in the name, but 
have multiple categories.

For Python, the problem with (1) is that no Perl variables will have 
legal names from Python's point of view. With (2), the variable names 
are okay for Python, but Python doesn't have the concept of multiple 
categories of variables.

But, that's not as bad as it sounds, and I'll spell out how either of 
those could work. But with either approach, the integration won't be 
seamless--using a Perl-originating module (namespace) in Python will 
involve more work on the Python side than is required when using a 
Python-originating module (in Python). Here's how either approach could 
work. Consider the following Perl(5) package as an example:

# Perl5
package Gizmo;
$foo = 1;
@foo = ('a');
%foo = ('b' = 'c');
sub foo { return 1; }
Approach 1) Sigil is part of the name.
import Perl.Gizmo
	x = Perl.Gizmo.foo  # error--not defined
	x = Perl.Gizmo.__dict__['$foo']  # x now holds the value of Perl's $foo
	Perl.Gizmo.__dict__['$foo'] = 'hello'  # Perl's $foo now holds a 
Python string
	# similarly for hashes, arrays, etc.; subs would probably need a 
sigil, maybe ''

That would actually work. It's awkward, but it reflects the fact that 
Python's symbol tables (like Perl's) can deal with entries which you 
couldn't create with the normal Python syntax.

Approach 2) Sigil is not part of the name. Use categories.
import Perl.Gizmo
	x = Perl.Gizmo.scalars.foo  # x now holds the value of Perl's $foo
	Perl.Gizmo.scalars.foo = 'hello'  # Perl's $foo now holds a Python 
string
	x = Perl.Gizmo.arrays.foo  # x now holds the value of Perl's @foo

Either approach would work from a Python perspective. The syntax of the 
second, to my eyes, is a bit less awkward. (Note that even with 
approach 2, you'd need to resort to the explicit 
Perl.Gizmo.scalars.__dict__[] style to access variables with 
non-ASCII names, but at least you wouldn't have to do that for *all* 
variables.)

Both approaches require some special Pythonish behavior of the 
Perl-originated namespace. Parrot's namespace-tie-ing mechanism could 
probably be leveraged to provide this behavior and avoid putting the 
burden on the Perl side (i.e., the extra work should be on the import 
side, since it's reaping the benefit of the language crossing). What 
I'm thinking here is that when Python tries to import a Parrot module, 
it actually gets a namespace object which is a wrapper (or adaptor) 
around the actual Perl namespace. (So in the example above, the 
Perl.Gizmo that Python gets is actually a special namespace which 
mediates interaction with the real Perl Gizmo namespace.) Here's what 
the adaptor would need to do in each case:

Case 1) The adaptor would need to expose a __dict__ attribute to 
provide a hash-like interface to the contents of the namespace. This is 
necessary because the real Perl namespace won't have the necessary 
__dict__ variable in place. This would be simple to 
implement--namespaces are already hash-like.

Case 2) The adaptor would satisfy a request for Perl.Gizmo.scalars.foo 
by asking the real Gizmo namespace for the entry foo in the scalars 
category. This would also be simple to implement if we have 
category-structured namespaces.

Either approach requires that the import mechanism have an awareness of 
the language of the imported namespace (i.e., Perl modules need to be 
wrapped but Python modules don't, for import from Python code), but 
that's supplied by the Perl. prefix anyway.

Going from Python to Perl is easy--all Python variables look like 
Perl-scalars-holding-references, though we might want to special-case 
strings and numbers. See below for further notes on this.

Further comments, some of which will be redundant with the above:
On Sep 30, 2004, at 1:00 AM, Leopold Toetsch wrote:
Jeff Clites [EMAIL PROTECTED] wrote:
First off, Perl5 doesn't describe itself that way. The Camel states,
Note that we can use the same name for $days, @days, and %days 
without
Perl getting confused.
While that's fine for Perl it doesn't help, if you want to access one
distinct days from Python:
  import days from Perl.foo   # which one?
So it's true that $foo and @foo are different items, but that can be
stated as, the scalar 'foo' and the array 'foo' are different items,
with the same names.
That does only work, if the context provides enough information, which
one of the foos should be used. That information isn't present 
always.
What I mean is, if someone

Re: Namespaces

2004-09-29 Thread Jeff Clites

On Sep 29, 2004, at 7:25 AM, Dan Sugalski wrote:
Okay, after seeing all the back and forth, here's what we're going to 
do.

Namespaces are going to be *simple*. They do two things, and only two 
things.

1) They provide a hierarchy for other namespaces
2) They bind names to PMCs
That's it. No typing, no classification, no nothing.
By postpending a null character, below, you _are_ doing 
typing/classification, of course. And, what about subs?

If languages want to do that, then they'd better darned well do it 
themselves. Yes, this is going to make interoperability a pain at the 
variable level, but it's pretty clear that we're just not going to be 
able to do that.

The next step, then, is to sketch out ops to query, read, and write 
name/PMC bindings and query/read/write/overlay hierarchies. (Which, I 
realize, many languages just won't use, and that's fine) I think it's 
best to go with a unified hierarchy/variable namespace, so we'll 
postpend a NUL to the end of non-variables. Printing NULs is a pain, 
so we should make sure that we've got good general-purpose binary 
printing routines people can use.
As Larry said, it's best to prepend it. And then we can call it a 
sigil. And since we're name mangling, pick a printable character like 
: for the prefix, since we'll know the initial character is always 
really a throw-away encoding of the syntactic category. Then we're back 
to _exactly_ my scheme, with the syntactic category names as single 
characters, and as an implementation detail shoving the category and 
name together in a single string instead of keeping them separate. And 
of course this will all be awkward for languages with distinct 
syntactic categories, but without natural name mangling. And it will 
probably result in more intermediate strings being allocated.

This is exactly like open(IN, file) v. open(IN, , file), and 
we're deciding that the former is vastly superior.

And of course, you do realize that in Perl5, this:
$data[1]
refers to the variable @data, so that even for Perl5 there will be 
name-mangling (not just the ability to use the literal sigil-plus-namae 
as seen in the source).

Yes, I know, we could do better. If anyone wants to do so, go lock 
Larry, Matz, and Guido in a room together. When they hash out the 
language semantics we'll let 'em out and then do better. :)
We don't need Larry, Matz, and Guido to agree on anything here. I 
explained a scheme which would work for all of their languages, as well 
as Scheme, Common Lisp, Objective-C, Java, C, and probably anything 
else. It just requires acknowledging that different languages have 
different sets of syntactic categories, rather than pretending things 
are simpler than they are.

JEff

Re: Namespaces, part 1 (new bits)

2004-09-29 Thread Jeff Clites

On Sep 29, 2004, at 2:53 AM, Leopold Toetsch wrote:
Dan Sugalski [EMAIL PROTECTED] wrote:
Okay, so we've got two points of dispute:

1) Jeff doesn't think the sigil should be part of the variable name
Which isn't practicable. We can't strip off the sigil for perl5. It's
part of the variable name, $foo and @foo are different items.
Those statements don't follow from one another. :)
First off, Perl5 doesn't describe itself that way. The Camel states, 
Note that we can use the same name for $days, @days, and %days without 
Perl getting confused. I'm asserting that it works perfectly well (and 
seems to have been the original intent) to say that Perl allows for a 
scalar, a hash, and an array all named foo, and the grammar always 
makes it clear which one you mean (mostly, via sigils).

So it's true that $foo and @foo are different items, but that can be 
stated as, the scalar 'foo' and the array 'foo' are different items, 
with the same names. (Just like there can be both a person named 
April and a month named April in English.)

Here are some further demonstrations that the seeming intent of Perl5 
is not to treat the sigil as actually part of the name, but as a 
feature of the grammar which indicates the syntactic category of the 
name:

	$hello and ${hello} are the same thing, but $hel{lo} is not -- the 
name can separate from the sigil
	$array[1] refers to @array, not $array -- the sigil changes depending 
on context, even for a given item

Secondly, Perl just clouds the issue, since it _could_ work either way. 
For other languages, you have distinct syntactic categories, but 
without the name decoration. For example, in Common Lisp, this:

(foo foo)
means, call the function foo, and pass the variable foo as an 
argument. So if the function foo doubles numbers, and the variable 
foo is set to 11, then the above evaluates to 22.

The reason I'm taking this as important, despite Common Lisp's not 
being a target language for Parrot, is that it's pointing out that 
namespaces across languages have the concept of dealing with multiple 
syntactic categories, though the number of such categories varies 
between languages. Ruby and Python are simple--they have unified 
namespaces, or so it seems--and Perl has a syntax which allows us to 
pretend that it has a unified namespace, although I think that's 
stretching the truth. (And, I think there's still an issue with 
namespace names and sub names, since they don't have sigils in the 
grammar--at least not as normally written.) But other languages aren't 
so simple, and if we oversimplify our treatment of namespaces, then we 
end up with something less elegant and less flexible that we could 
have.

If you want to use a perl5 module from Python which has both $foo and
@foo exported, we can just pitch a fit.
We should be able to handle accessing either, if Python provides a 
syntax for doing so.

And: we can't attach hints to the namespace lookup because you just
don't know, if Python wants the scalar foo or the array foo. There
is exactly one foo object that Python can use, that's it.
That's not accurate, and it's not a hint, it's a demand--the programmer 
should know exactly which one he wants (which is especially true if you 
are trying to think of the sigil as part of the name). Possible 
syntaxes within Python:

a = lookupPerlScalar(foo);
b = lookupPerlSub(foo);
c = lookupPerlArray(foo);
or its:
a = lookupFromPerl(scalar, foo);
or it's:
a = lookupInParrotNamespace(Perl, scalar, foo);
Since Python deals in references only, assignment syntax could work 
just fine for this, but if someone wanted a more aliasing-like syntax, 
it could work like what Chip suggested in another thread:

   parrot_alias(a, 'b', # dest: Python is unified, no need 
for a category here
a, 'b', 'scalar')   # src:  Perl is not unified, so 
source category is required

   parrot_alias(a, 'c',
a, 'c', 'array')# here's a different category, to 
get '@c'

or some such.  Yes it's ugly.  But if we can't fix a ditch, the least
we can do is put a big friendly warning sign on it.

I've lost hope for transparent aliasing, but it would work partially so 
say, imports from Perl to Python can automatically alias scalars to 
Python variables of the same name, but arrays and hashes you have to 
pull in manually, and alias to an explicitly-specified name on the 
Python side.

It will never just work for all things, since this is a valid 
identifier in Common Lisp: *foo-foo*

So to exploit the full power of Parrot, languages will need to have 
syntaxes/functionality/API to access foreign namespaces, but things 
could be made transparent for at least a subset of variables. (And as 
an example, not _all_ identifiers in Common Lisp are so exotic.)

Python allows only bare names in the import statement:
  from a import foo [ as a_foo ]
but not:
  from a import @foo [ as a_foo ]
This seems to make things worse for treating the sigil as part of the 
name,

Re: Namespaces again

2004-09-28 Thread Jeff Clites

On Sep 27, 2004, at 8:55 AM, Dan Sugalski wrote:
Okay, I've come to realize that it really helps if I'm clear about 
what I want, which kinda requires being clear about what I want.

There are two things in the namespaces I'm concerned about.
First are the actual objects one grabs out. Variables, sub objects, 
class objects -- actual stuff. Languages seem to fall into two 
categories -- those that group subroutines in with data and those that 
don't. (Most generally do)

Second there's the structure of the namespace. I really do want 
Foo::Bar to be handled with a Foo entry in the top-level namespace, 
and a Bar entry in the Foo namespace, and nothing that anyone does to 
any Foo object at the top level can damage that.
What I proposed in my 9/26/2004 post to the Namespaces, part 1 (new 
bits) thread directly accommodates all that.

Here is how that looks in data structure terms. (I'll use hash notation 
here, though the actual implementation could optimize.)

Let's say that all you have around are $Foo and $Foo::Bar::baz -- that 
is, a Foo variable at the top level, and a bar variable inside a Bar 
namespace inside a Foo namespace. The namespaces involved could look 
like this:

top-level namespace (say this is namespace #1):
{
variables = { Foo = PerlScalar PMC (or whatever) },
namespaces = { Foo = PerlNamespace PMC, call namespace #2 }
}
namespace #2 (the namespace from above):
{
namespaces = { Bar = PerlNamespace PMC, call namespace #3 }
}
namespace #3 (the namespace from immediately above):
{
variables = { baz = PerlScalar PMC }
}
It doesn't matter if the sigils are treated as part of the name--the 
structure's the same, you just have $Foo and Foo:: up there 
instead. (Though Perl maybe could have done without the structure for 
variable v. namespace, other languages don't have a natural syntax to 
distinguish them.) And of course, you'd have a subs = ... section if 
we had any subs in our example.

My above-mentioned post explained what you'd do for Python, and how it 
fits well into this. And for languages which need more sections, they 
would be accommodated as well.

The fact that the Foo class has an object named Foo at the top level 
which you can fool with is separate from the fact that there's a Foo 
*namespace* at the top level. Using a filesystem analogy, I want to 
have a directory and file named Foo, with file operations not touching 
the directory, and directory operations not touching the file.
Yep, that does that.
It seems like the simplest thing to do is to have a *real* unified 
system and go with a prefix or suffix scheme for namespaces that the 
namespace ops automatically pre/post-pend on the names.
I don't think there's any need for name mangling--that's the sort of 
thing you do when you need to shoehorn extra information into a 
structure that wasn't designed to hold it.

If you want to name mangle, _and_ not arbitrarily restrict what 
characters a language might allow in identifiers, then it makes sense 
to give everything a prefix, and it's unambiguous that the prefix is 
everything up to your first separator character (say, slash); for 
example: namespace/Foo, variable/Foo--there's no ambiguity or 
restriction, for instance variable/foo/bar is just the foo/bar 
variable in some language which allows slashes inside of variable 
names. And just to close the loop, you'd still express your 
$Foo::Bar::baz lookup like:

lookupVariableInNamespace P1, [Foo; Bar], baz # the things in the 
[...] are always namespace names

Logically, that's identical to what I have above--it's just another way 
to express the same structure, just an internal implementation detail, 
with some pros (maybe one less hash lookup) and cons (more intermediate 
strings created). You'd still say a given namespace has different 
sections to accommodate different categories of entities.

JEff

Re: Namespaces again

2004-09-28 Thread Jeff Clites

On Sep 28, 2004, at 7:02 AM, Aaron Sherman wrote:
Rather than trying to shuffle through the keyboard and find that 
special
character that can be used, why not have each language do it the way
that language is comfortable (e.g. place it in the regular namespace as
a variable like Python or place it in the regular namespace, but
append noise like Perl or hide it in some creative way for other
languages). For the most part, there's no performance penalty in having
a callback that the language/library/compiler provides because access 
to
the objects in question will be via a PMC, and only LOOKUP of that PMC
will be via namespace, no?

In that way, you could:
namespace_register  perl5_namespace_callback, Perl5
namespace_register  python_namespace_callback, Python
[...]
namespace_lookupP6, F\0o::Bar, Perl5
namespace_lookupP7, foo.bar, Python
the namespace callback could take a string and return whatever Parrot
needs to look up a namespace (Array PMC?), having encoded it according
to Parrot's rules.
That's similar in spirit to what I proposed of allowing PMC-subclassing 
of the default ParrotNamespace, so that namespaces created from 
different languages (often implicitly) could have different behaviors. 
But I'd keep the pulling-apart of F\0o::Bar into [F\0o; Bar] a 
compile-time task, so that at runtime the work's already been done 
(since the compiler knows what language it's compiling).

JEff

Re: Namespaces again

2004-09-28 Thread Jeff Clites

On Sep 28, 2004, at 8:58 AM, Jeff Clites wrote:
And just to close the loop, you'd still express your $Foo::Bar::baz 
lookup like:

lookupVariableInNamespace P1, [Foo; Bar], baz # the things in 
the [...] are always namespace names
Here are more examples, just to be clear:
(and the actual op names would be different, but I'm trying to be 
unambiguous here):

# $Foo::Bar::baz, or presumably Python's Foo.Bar.baz
lookupVariableInNamespace P1, [Foo; Bar], baz
# $::foo, or foo as a variable in the top-level namespace
lookupVariableInNamespace P1, [], foo
# Foo::bar
lookupSubInNamespace P1, [Foo], bar
Now, the above are shortcuts (or optimizations) for the generalized op:
lookupInNamespace P1, [Foo; Bar], .VARIABLE, baz
And, they're also shortcuts for doing things more manually:
# $Foo::Bar::baz
lookupNamespaceInNamespace P0, [], Foo # or rootNamespace P0
lookupNamespace P1, P0, Bar #lookup namespace Bar in namespace in P0
lookupVariable P2, P1, baz  #lookup variable baz in namespace in P1
And if we must treat the sigil as part of the name for Perl6 at the 
parrot level, we just get stuff like:

# $Foo::Bar::baz
lookupVariableInNamespace P1, [Foo::; Bar::], $baz
But, it seems better to not do that.
JEff

Re: Namespaces again

2004-09-28 Thread Jeff Clites

On Sep 28, 2004, at 9:54 AM, Chip Salzenberg wrote:
According to Jeff Clites:
Let's say that all you have around are $Foo and $Foo::Bar::baz ...
top-level namespace (say this is namespace #1):
{
variables = { Foo = PerlScalar PMC (or whatever) },
namespaces = { Foo = PerlNamespace PMC, call namespace #2 }
}
I'm a bit confused by this example.  Don't you mean:
variables = { '$Foo' = PerlScalar PMC (or whatever) },
i.e. with the '$' in the key?
Sorry, accidentally sidestepped the array/scalar/hash issue. Given 
$foo, @foo, and %foo, we could either do:

variables = { '$foo' = whatever,
 '@foo' = whatever,
 '%foo' = whatever
}
-or-
scalars = { 'foo' = whatever },
arrays = { 'foo' = whatever },
hashes = { 'foo' = whatever }
The infrastructure's the same, it just depends on whether we want to 
(at the Parrot level) think of Perl as having one category of 
variables, or three. We could even have:

scalars = { '$foo' = whatever },
arrays = { '@foo' = whatever },
hashes = { '%foo' = whatever }
...if we wanted to model Perl as having three categories, but still 
treat the sigils as part of the name.

I think the second option has the most potential for smooth 
interoperability with other languages.

Presumably, all of Pythons stuff would go in a single bucket in a 
Python-originating namespace. So it might look like:

{
whatever = { a = namespace, b = object, c = sub }
}
Where there's only one category, and whatever could be references 
or objects or variables or scalars or whatever made the most 
sense from a cross-language standpoint. (I think the different 
semantics of Perl v. Python variables means that it might be 
appropriate for the categories to be different between the two 
languages, though Python and Ruby might match one another. But, as all 
Python variables are references, they're sorta-like Perl scalars, so 
maybe the category for Perl-scalars should be the same as that for 
Python-references--just depends on what works best for cross-language 
access, but the infrastructure's the same.)

And what I mean here is that how many categories are inside a given 
namespace depends on where it was created (in Perl code v. Python code, 
for example), but after that point the usage is the same.

JEff

Re: Namespaces again

2004-09-28 Thread Jeff Clites

On Sep 28, 2004, at 11:26 AM, Chip Salzenberg wrote:
According to Jeff Clites:
top-level namespace (say this is namespace #1):
{
variables = { Foo = PerlScalar PMC (or whatever) },
namespaces = { Foo = PerlNamespace PMC, call namespace #2 }
}
I think I get it.  You're replacing sigil characters and associated
name mangling, turning it into explicit named categories, thus
avoiding lots of anticipated string tricks.
Yep, exactly. And it works for Perl, but also fits languages which 
don't have a built-in mangling-via-sigils, yet still have different 
categories (like Common Lisp, which allows functions and variables of 
the same name).

And the named categories don't necessarily imply an additional hash 
lookup--with specialized ops for the common categories, we can 
fast-path getting to the correct hash to do the final lookup.

JEff

Re: Namespaces again

2004-09-28 Thread Jeff Clites

On Sep 28, 2004, at 12:28 PM, Chip Salzenberg wrote:
According to Dan Sugalski:
At 11:58 AM -0700 9/28/04, Jeff Clites wrote:
On Sep 28, 2004, at 11:26 AM, Chip Salzenberg wrote:
According to Jeff Clites:
top-level namespace (say this is namespace #1):
{
variables = { Foo = PerlScalar PMC (or whatever) },
namespaces = { Foo = PerlNamespace PMC, call namespace #2 }
}
I think I get it.  You're replacing sigil characters and associated
name mangling, turning it into explicit named categories, thus
avoiding lots of anticipated string tricks.
Yep, exactly.
And unfortunately dies a horrible death for languages that don't
categorize the same way as perl. :(
The horrible death you fear is unavoidable.  The variable categories
are an impedance mismatch that namespaces can't paper over.  Name
spaces have different dimensionality, if you will, in different
languages.  We can't *fix* that without changing the languages.
You said on IRC that the Perl/Python gap would have to be bridged with
one-shot namespace connection, something like Exporter.  (Perl $a::b
= Python 'a.b', Perl @a::c = Python 'a.c', etc.)  So the import
process could use Jeff-style categories to get some handle on what's
being imported.  e.g.:
   parrot_alias(a, 'b', # dest: Python is unified, no need 
for a category here
a, 'b', 'scalar')   # src:  Perl is not unified, so 
source category is required

   parrot_alias(a, 'c',
a, 'c', 'array')# here's a different category, to 
get '@c'

or some such.  Yes it's ugly.  But if we can't fix a ditch, the least
we can do is put a big friendly warning sign on it.
Exactly. If Python want to alias to a Perl variable, it's going to have 
to specify the category one way or another. Doing string-fiddling by 
putting a sigil on the front of the string is one way, but just 
specifying a category explicitly is another. The latter is a much more 
natural thing to do, in regard to languages which don't themselves have 
sigils. And the string-fiddling seems like more of a hack--why try to 
shove the category and name information together, into a single string, 
rather than leaving them separate?

I'm not even suggesting an implementation here really; it's just, 
namespaces need to handle multiple syntactic categories, and we need 
to have API to reflect this. And you hit the nail on the head that 
these are *syntactic* categories. These syntactic categories should 
always be known at compile-time (which is what makes them syntactic 
categories); for languages with sigils, its easy, and for languages 
without them (most), the grammar of the language defines it.

And in terms of adaptors, I think that aliasing from Perl to Python is 
going to end up aliasing '[EMAIL PROTECTED]', and never '@c' really (since Python 
always holds references--really aliasing to '@c' would imply that 
assignment within Python would copy); conversely, going from Python to 
Perl would always end up as a scalar on the Perl side, since everything 
from Python would look like a reference to Perl. (At least, this seems 
like the most natural approach.)

JEff

Re: Why lexical pads

2004-09-26 Thread Jeff Clites

On Sep 25, 2004, at 10:27 PM, Larry Wall wrote:
On Sat, Sep 25, 2004 at 10:01:42PM -0700, Larry Wall wrote:
: We've also said that MY is a pseudopackage referring to the current
: lexical scope so that you can hand off your lexical scope to someone
: else to read (but not modify, unless you are currently compiling
: yourself).  However, random subroutines are not allowed access
: to your lexical scope unless you specifically give it to them,
: with the exception of $_ (as in 1 above).  Otherwise, what's the
: point of lexical scoping?
Note that this definition of MY as a *view* of the current lexical
scope from a particular spot is exactly what we already supply
to an Ceval, so we're not really asking for anything that isn't
already needed implicitly.  MY is just the general way to invoke the
pessimization you would have to do for an Ceval anyway.
A mildly interesting thought would be for Ceval to take additional 
parameters to make explicit what's visible to the eval'd 
code--essentially making the running of the code like a subroutine 
call. So the traditional Ceval would turn into something like eval 
$str, MY, but you could also have eval $str, $x, $y, or just eval 
$str, which would execute in an empty lexical scope. That would 
allow additional optimizations at compile-time (and make MY the sole 
transporter of lexical scope), since not every Ceval would need what 
MY provides, but even more importantly, it would allow the programmer 
to protect himself against accidentally referencing a lexical he didn't 
intend, just because the code in his string coincidentally used the 
same variable name. More optimization opportunities, and more explicit 
semantics.

But that's now a language issues, so I'm cc-ing this over to there.
JEff

Re: Why lexical pads

2004-09-25 Thread Jeff Clites

On Sep 25, 2004, at 11:15 AM, Dan Sugalski wrote:
At 2:10 PM -0400 9/25/04, Chip Salzenberg wrote:
According to Dan Sugalski:
  Leaf subs and methods can know [their call paths], if we stipulate
 that vtable methods are on their own, which is OK with me.
So, given this sub and tied $*var:
   sub getvar { my $i = rand; $*var }
the FETCH method implementing $*var might not be able to see $i?
Which implies that there may be no pad and $i could be in a register?
Yeah, I think that's OK. I'm certainly OK with it, though there is an 
appeal to introspection. (Since you might want to fiddle with things 
in a debugger)
But for a debugging, you'd want to be able to compile with 
optimizations disabled, so there's no real problem there.

(And also, a clever-enough debugger might be able to let you do by-name 
manipulations of things stored in registers--you just need to preserve 
enough information at compile-time, in a form the debugger can use, 
like a separate symbols file).

JEff

Re: towards a new call scheme

2004-09-25 Thread Jeff Clites

On Sep 24, 2004, at 1:13 AM, Leopold Toetsch wrote:
Piers Cawley [EMAIL PROTECTED] wrote:
I could be wrong here, but it seems to me that having a special
'tailinvoke' operator which simply reuses the current return
continuation instead of creating a new one would make for rather 
faster
tail calls than fetching the current continuation out of the 
interpreter
structure, then invoking the target sub with with this new
continuation (ISTM that doing it this way means you're going to end up
doing a chunk of the work of creating a new return continuation 
anyway,
which rather defeats the purpose.)
Yep, that's true. We already have (unimplemented) Ctailcallmethod
opcodes. A plain Ctailcall for functions is still missing. Anyway, if
the current continuation has its location in the interpreter context,
its simple to implement these opcodes.
You mentioned:
Well, basically, if an interpreter template is used (hanging off the 
subroutine), the previous interpreter template is the return 
continuation.
So this occurred to me: Not sure if it would be useful, but since you'd 
be able to trace back up the call stack, a  tailcall for functions 
could take a numeric parameter, to let you invoke a continuation from 
further back in the stack, without having had to pass it in as an 
explicit parameter. So for instance:

	return_cc == alias for call_cc_indexed 0
	tailcall  == alias for call_cc_indexed 1
	call_cc_indexed 2 == invoke the continuation from 2 frames back, as 
though it had been passed down

It's just a generalization--it lets you do the equivalent of grabbing 
the current continuation, and passing that to a function which passes 
it to a function which passes it to a function...which invokes it, 
without having to ever create a real PMC.

But I'm not sure if that's a useful design pattern.
JEff

Re: Namespaces, part 1

2004-09-24 Thread Jeff Clites

On Sep 23, 2004, at 9:53 AM, Dan Sugalski wrote:
At 12:06 AM -0700 9/23/04, Jeff Clites wrote:
On Sep 22, 2004, at 8:13 PM, Dan Sugalski wrote:
At 7:32 PM -0700 9/22/04, Jeff Clites wrote:
*) If a language wants different types of variables to have the 
same name, it has to mangle the names. (So you can't have an 
Array, a String, and an Integer all named Foo) The only language I 
know of that does this is Perl, and we just need to include the 
sigil as part of the name and we're fine there.
This seems antithetical to cross-language functionality.
Why? Not to be snarky here, I'm curious.
Just that if I set a global $foo = 5 in Perl, I'd want to be able 
to change it from Python as, foo = 5.
The problem there is deciding whether foo is $foo, @foo, or %foo, all 
of which may exist simultaneously.
Yep, exactly my point/worry.
If they aren't named $foo, @foo, and %foo, but instead are all foo, 
then we're back to globs and globs are a place we don't want to be at.
Nah, it's no different from what we're already proposing with 
variable/sub/namespace partitioning--typeglobs are just an 
implementation choice. In particular, there are 2 obvious ways we could 
implement what we've got so far: a given namespace object manages 3 
hashes--one to lookup variables, one for subs, and one for other 
namespaces. Or, a namespace could manage a single hash, in which you 
lookup a little structure which has one slot to hold a variable, one 
slot to hold a sub, and one to hold a namespace. The latter approach is 
basically a typeglob. You wouldn't necessarily expose that, but as an 
implementation detail it has some advantages, if there'd ever be a 
situation where you wanted to (say) look for foo by retrieving the 
variable if there is one, else a sub if there isn't a variable, else a 
namespace (advantage because there'd only be one hash lookup). Whether 
you've got 3 conceptual sections or 5, you'd have the same sorts of 
implementation choices.

In particular, a nice approach would be to allow a given namespace 
object to decide on its semantics based on the context (language 
probably) in which it was created--so a namespace created in Perl code 
might have 3 or 5 sections, and one created in another language might 
have only 1. If the namespace lookup API (ie, how you do a lookup once 
you've got hold of a particular namespace object) had a signature such 
as, lookup(namespace object, name of thing, type indicator), then the 
namespace could decide what it wants to do if it doesn't have a section 
for that type of thing--it could return null, fall back to a 
different type, signal an error, whatever.

Also, don't forget Perl5 (for Ponie)--it's never been characterized as 
having the sigil as part of the variable name. (At least, judging by 
the Camel.)

(Not to mention the cross-language issues you really run into, for 
languages that don't do a simple scalar/hash/array split, but use 
different basic classifications, or none at all) So even if we did 
force perl to do name mangling, we get other problems in return.

 From Python, I can't set it using $foo = 5, since that isn't 
syntactically valid in Python, and it's no fun at all to have to do 
something introspective like, 'setValueOfGlobal($foo, 5)'.
Yep. But, then, from Python you won't be able to set $nOl either, 
so I'm not sure it's that big a deal.
Yeah, but as I said before, I'm less worried that there will be _some_ 
such cases, and more bothered that even if you try to coordinate, you 
can't create in Perl a variable which is legal in Python, or create one 
in Python which is legal in Perl. It's bad if our approach leaves zero 
overlap. (And if we do have zero natural overlap between Perl6 and 
Python, I'd probably argue that we should partition Python and Ruby 
apart, so that introspective lookups would have to specify a language. 
But that's all just awkward.)

('specially since I'm not sure my mail client will let me either...)
Ha, yep, I assume you meant: $. (Your client specified the wrong 
encoding [ISO-8859-1 instead of Shift_JIS] in the MIME headers--but I 
know mine won't. :)

Besides, Python's OO and we all know global variables are evil things 
one doesn't use in OO code, right? :)
True, they're evil-ish, OO or not, but more benign in short scripts, 
which P/P/R tend to favor. (And of course globals inside namespaces are 
still globals, but better behaved since they have 
fancier::looking::names.)

(And one legitimate use of inside-a-package-but-still-global variables 
is for constants relevant to the package. Java does that a lot. As a 
made-up example, stuff like java.lang.math.PI)

For me, I want to be able to load some compiled bytecode module, and 
not care what language it was written in.

Alas, there will always be problems. We can make interoperability 
*possible*, but that doesn't mean we can make it *easy*.
Yes, but we should *try* to make it easy and elegant, and not resign 
ourselves too easily to giving up on it.

(Besides

Re: Why lexical pads

2004-09-24 Thread Jeff Clites

On Sep 24, 2004, at 8:07 AM, Aaron Sherman wrote:
On Fri, 2004-09-24 at 10:03, KJ wrote:
So, my question is, why would one need lexical pads anyway (why are 
they
there)?
They are there so that variables can be found by name in a lexically
scoped way. One example, in Perl 5, of this need is:
my $foo = 1;
return sub { $foo ++ };
Here, you keep this pad around for use by the anon sub (and anyone else
who still has access to that lexical scope) to find and modify the same
$foo every time. In this case it doesn't look like a by-name lookup,
and once optimized, it probably won't be, but remember that you are
allowed to say:
	perl -le 'sub x {my $foo = 1; return sub { ${foo}++ } }$x=x();print 
$x-(), $x-(), $x-()'

Which prints 012 because of the ability to find foo by name.
Ha, you're example is actually wrong (but tricked me for a second). 
Here's a simpler case to demonstrate that you can't look up lexicals by 
name (in Perl5):

% perl -le '$x = 2; print ${x}'
2
% perl -le 'my $x = 2; print ${x}'
(printed nothing)
The first case prints 2 because $x is a global there; in the second 
case, it's a lexcial, and ${x} is looking for a global.

In your example, ${foo} is actually addressing the global $foo, not 
your lexical. To demonstrate, both this:

perl -le 'sub x {my $foo = 7; return sub { ${foo}++ } }$x=x();print 
$x-(), $x-(), $x-()'

and this:
perl -le 'sub x { return sub { ${foo}++ } }$x=x();print $x-(), 
$x-(), $x-()'

also print 012.
(Your example should have printed 123, if the lexical $foo had been 
what the closure was incrementing.)

Someone else suggested that you need this for string eval, but you 
don't
really. You need it for by-name lookups, which string evals just happen
to also need. If you can't do by-name lookups, then string eval doesn't
need pads (and thus won't be able to access locals).
String eval is special because without it, you can tell at compile time 
all of the places where a lexical is used (ie, you can trace all of the 
places it's used back to which declaration matches them), and obviate 
the need for a by-name lookup (since Perl5 doesn't allow explicit 
by-name lookups of lexicals). For string evals to work in the lexical 
scope in which they occur, you do have to have by-name lookups 
behind-the-scenes. I believe that's all correct. And where this could 
be a savings is that, in any lexical scope in which there is no eval 
visible (looking down the tree of nested lexical scopes), then you 
don't need to save the name-to-variable mapping in nested pads. Add a 
call to eval, and you need to save a lot more stuff.

JEff

Re: Why lexical pads

2004-09-24 Thread Jeff Clites

On Sep 24, 2004, at 6:51 PM, Aaron Sherman wrote:
On Fri, 2004-09-24 at 12:36, Jeff Clites wrote:
Ha, you're example is actually wrong (but tricked me for a second).
Here's a simpler case to demonstrate that you can't look up lexicals 
by
name (in Perl5):
You are, of course, correct. If I'd been ignorant of that in the first
place, this would be much less embarassing ;-)
No need to be embarrassed--it's easy to trick yourself. (I had 
forgotten myself, until I recently tried it while thinking whether 
lexical pads really needed a by-name API.)

However, the point is still sound, and that WILL work in P6, as I
understand it.
Hmm, that's too bad--it could be quite an opportunity for optimization, 
if you could use-and-discard lexical information at compile-time, when 
you know there's no eval around to need it.

JEff

Re: Why lexical pads

2004-09-24 Thread Jeff Clites

On Sep 24, 2004, at 7:32 PM, Dan Sugalski wrote:
At 7:28 PM -0700 9/24/04, Jeff Clites wrote:
On Sep 24, 2004, at 6:51 PM, Aaron Sherman wrote:
However, the point is still sound, and that WILL work in P6, as I
understand it.
Hmm, that's too bad--it could be quite an opportunity for 
optimization, if you could use-and-discard lexical information at 
compile-time, when you know there's no eval around to need it.
Even if not it's going in anyway. The introspection abilities are more 
than worth the extra memory that the name hashes use.
It's a compiler issue. You're right that no matter what, you need 
lexical pads as a feature in Parrot for...those cases where you need 
lexical pads. But it's nice to have stuff that a compiler can optimize 
away in a standard run, and maybe leave in place when running/compiling 
a debug version--but that's a matter of the semantics of the language. 
(And I'm less worried about the memory than I am about all of the 
pushing and popping and by-name stores and lookups, which could 
optimized away to just register usage.)

JEff

Re: Namespaces, part 1

2004-09-23 Thread Jeff Clites

On Sep 22, 2004, at 8:13 PM, Dan Sugalski wrote:
At 7:32 PM -0700 9/22/04, Jeff Clites wrote:
One problem: Some languages (Scheme, for example, and arguably C) 
have a unified namespace for subs and variables. What to do there?
The easiest thing would be to allow the languages to store into 
multiple sections at once, or leave it to them to do the right thing 
as they need to.
The risk is that if they store into both, then setting a variable in 
Scheme code might blow away a sub in perl, and other than that it seems 
you'll have big-time ambiguity. But I don't know of an obvious 
solution.

Either way works OK for me, since neither Scheme nor C are really our 
target languages.
Well, my point here is that namespace semantics varies a lot between 
languages, and quite reasonably people seem to want to target a lot of 
different languages to Parrot, besides just Perl/Python/Ruby. And I'm 
not sure where Python and Ruby stand, in terms of sub/variable name 
overlap.

*) Sub and method names *will* collide, so be careful there.
I'm not sure that they will.
I'd really, *really* rather they did. Not because I'm up for 
collisions (I'm not) but because if you *do* leave methods in 
namespaces it allows you to do all sorts of very interesting things 
with lexically overlaid namespaces.
To me, it doesn't make sense to muddle basic semantics to allow tricks. 
In particular, for method invocations on objects, it makes sense that 
the lookup is very directly object to class, lookup method in class, 
rather than object to class, get class name, lookup method in global 
namespace. There's an efficiency issue, and the latter precludes truly 
anonymous classes (since they should have no name with which to do a 
lookup). I tend to think of this as classes live in namespaces, 
methods live in classes--and of course anonymous classes aren't stored 
in a namespace, but are only usable via a reference to the class (like 
lexical v. global variables).

[Side note: Oh, and of course that reminds me--class names might need 
to be their own section of the namespace, or maybe they do just live in 
the, uh, namespace-namespace. My worry here is that the lookup API 
should really know the type of thing it's looking for--that is, have 
ops like lookupsub v. lookupvariable v. lookupnamespace--and it could 
be awkward if there is one that might return a namespace or might 
return a class, because those two things would have at least somewhat 
different API.]

And for existing libraries in languages which allow subs and methods 
with the same name, we can't retroactively change it--it's too late to 
be careful. But again, I'm not sure what it means to have a sub in a 
class, so they might just automatically stay out of each other's way. 
(For example, you could say Java had only methods, no functions, and 
Perl5 has only functions, no real methods.)

*) If a language wants different types of variables to have the same 
name, it has to mangle the names. (So you can't have an Array, a 
String, and an Integer all named Foo) The only language I know of 
that does this is Perl, and we just need to include the sigil as 
part of the name and we're fine there.
This seems antithetical to cross-language functionality.
Why? Not to be snarky here, I'm curious.
Just that if I set a global $foo = 5 in Perl, I'd want to be able to 
change it from Python as, foo = 5. From Python, I can't set it using 
$foo = 5, since that isn't syntactically valid in Python, and it's no 
fun at all to have to do something introspective like, 
'setValueOfGlobal($foo, 5)'.

For me, I want to be able to load some compiled bytecode module, and 
not care what language it was written in. That's the goal of a generic 
object runtime. If the module's docs say, set the global 'foo' to 
control..., then I'd want to do that in a way that is natural for 
whatever language I'm using ($foo in Perl, foo in Python...).

Strictly speaking, the names can be anything, which is fine. Nulls, 
control characters, punctuation, grammatically incorrect Bengali 
insults... we don't put any limits on 'em. And the only place that it 
makes any difference at all is when you're either doing the equivalent 
of C's extern or explicitly walking the namespace. In the latter case 
you ought to be ready for anything (since with Unicode I fully expect 
to see the full range of characters in use) and in the former case, 
well... you ought to be ready for anything since Unicode's going to 
put the full range of characters into use. :)
Sure, that fine--there will certainly be cases where one language 
allows variable names that another can't access naturally, or maybe 
even at all. But in the Perl case, we'd end up having _all_ variable 
names turning out illegal in Python (and I suppose Ruby), which seems 
bad.

(Though it's somewhat academic. Besides leading to globs, and we *so* 
don't want to go there, Larry's declared that perl variable names 
include the sigil)
Of course, I'm not talking about Perl

Re: Namespaces, part 1

2004-09-22 Thread Jeff Clites

On Sep 22, 2004, at 10:58 AM, Dan Sugalski wrote:
*) There are three things that can be in a namespace: Another 
namespace, a method or sub, and a variable.

*) The names of namespaces, methods  subs, and variables do *not* 
collide. You may have a namespace Foo, a sub Foo, and a variable Foo 
at the same level of a namespace.
The easiest way to say these two point may be just to say that each 
namespace has three sections, and a given lookup is always a lookup in 
a specific section.

One problem: Some languages (Scheme, for example, and arguably C) have 
a unified namespace for subs and variables. What to do there?

*) Sub and method names *will* collide, so be careful there.
I'm not sure that they will. Classes are namespace-like, but aren't 
precisely namespaces. Since methods live in classes (and subs don't), 
this may just work itself out. (To give a concrete example, for a class 
Foo with an instance method bar, I never really look up any 
Foo::bar--instead, I invoke bar in an instance of Foo, which doesn't 
need to go looking for the relevant namespace--the namespace is 
directly attached to the Foo class.) It's a different story for class 
methods, which really are basically just subs (not really methods). But 
we still may have a problem if there are any languages which allow 
class and instance methods of the same name, in the same class.

*) If a language wants different types of variables to have the same 
name, it has to mangle the names. (So you can't have an Array, a 
String, and an Integer all named Foo) The only language I know of that 
does this is Perl, and we just need to include the sigil as part of 
the name and we're fine there.
This seems antithetical to cross-language functionality. Treating the 
perl sigil as part of the variable name is cute and all, but if we take 
that to heart at the Parrot level, then we really cut off 
cross-language access in any natural way. So I think Perl's $foo 
needs to look like foo to Python, or else everything will be nasty 
looking. Which will loop back to how handle $foo and @foo at the 
same time.

You're right that functions' lacking sigils makes this less of a 
problem, but that seems like a fragile truce.

*) Namespaces, by default, support notifications.
What does that mean? Notification of what?
JEff

Re: [perl #31682] [BUG] Dynamic PMCS [Tcl]

2004-09-22 Thread Jeff Clites

On Sep 22, 2004, at 5:30 PM, Will Coleda (via RT) wrote:
# New Ticket Created by  Will Coleda
# Please include the string:  [perl #31682]
# in the subject line of all future correspondence about this issue.
# URL: http://rt.perl.org:80/rt3/Ticket/Display.html?id=31682 
...
$ make realclean
$ export LD_LIBRARY_PATH=.:blib/lib
$ perl Configure.pl
$ make
$ make shared
$ cd dynclasses; make
On OSX 10.3.5, I end up with:
ld: /Users/coke/research/parrot/blib/lib/libparrot.dylib is input for 
the dynamic link editor, is not relocatable by the static link editor 
again
...
As for the next error... huh?
Yeah, this is going to be a fun config dance. On Mac OS X, there are 
two distinct types of dynamic libraries--those which are meant to be 
linked against and automatically loaded at runtime (dylibs), and those 
which are meant to be manually loaded (bundles).

libparrot needs to be built as a dylib
things in dynclasses (or other plug-in-like stuff) need to be built 
as bundles

The Makefiles don't have this conceptual split currently, and so things 
(including libparrot) are being built as bundles, but getting the dylib 
extension (running the 'file' command on the library reveals the 
truth). A bit Frankenstein.

I worked out in the past the right configs to build parrot shared on 
Mac OS X, but I don't think I'd yet wired the Makefiles (and config 
process) to do that, and still have the other stuff build as bundles. 
I'll dig this up and see what I did.

JEff

incremental collector and finalization

2004-09-19 Thread Jeff Clites

Hi Leo:
I was reading over you incremental GC posts from about a month ago, and 
read the referenced paper--quite nice work you've done in implementing 
the ideas there.

I have one question: What about finalizers? I may have just missed it, 
but it would seem that calling finalizers would require another sweep 
over the items newly resident on the free list (the left-over items in 
the from-space), which would (unfortunately) take time proportional to 
the number of freed object.

BUT, a nifty thing would be to actually delay finalization until an 
object is about to be re-used off of the free list. That is, treat ecru 
items as only-maybe free, and as they are pulled off of the free list 
for re-use, check to see if a finalizer needed to be called and if so, 
call it and move on to the next item on the free list (as the first one 
may now be referenced, and should be marked grey). This would allow 
finalization to be treated incrementally, at the cost of it happening 
later than it would otherwise (which I think is fine). But maybe this 
is what you had in mind already. This doesn't give ordered 
finalization, but that may be okay.

JEff

Re: No Autoconf, dammit!

2004-09-18 Thread Jeff Clites

On Sep 18, 2004, at 2:09 AM, [EMAIL PROTECTED] wrote:
* Nicholas Clark [EMAIL PROTECTED] [2004-09-08 17:37:52 +0100]:
The probing is going to *have* to get written in something that 
compiles
down to parrot bytecode to work on the autoconf-deprived systems, so 
with
that as a given there's no need for autoconf ahead of that.
How feasable would it be to write a sh - parrot bytecode compiler?
:)
Ha, I'm sure it could probably be done, but of course most of what 
the shell does it invoke other programs, so in the common case it still 
wouldn't give you portability to non-Unix-like platforms.

JEff

Re: Namespaces

2004-09-14 Thread Jeff Clites

On Sep 13, 2004, at 1:07 AM, Luke Palmer wrote:
Jeff Clites writes:
On Sep 12, 2004, at 8:43 PM, Luke Palmer wrote:
Jeff Clites writes:
On Sep 7, 2004, at 6:26 AM, Dan Sugalski wrote:
*) Namespaces are hierarchical
So we can have [foo; bar; baz] for a namespace. Woo hoo and 
all
that. It'd map to the equivalent perl namespace of foo::bar::baz.
How does this hierarchical nature manifest? I ask because I don't 
know
of any languages which actually have nested namespaces,
Other than, um, well, Perl.
As an implementation detail yes, but I can't think of any Perl code
(except for explicit introspection) which reveals this.
Does:
print ${$Foo::{Bar::}{baz}}
Count as explicit introspection?
Well, that's another one of those cases where I don't see that it gives 
the programmer power that you don't already get by being able to do: 
print ${${Foo::Bar::}{baz}}.

Now it's true that if you think it's really important to tie
namespaces such that you can take over the name resolution for
[Foo], then you'd want the former. But I don't see that as really
terribly useful
A debugger.
I'm not sure exactly what you mean--if you're referring to being able 
to have a debugger intercept name resolution for only certain 
namespaces, then all you need to get this is the ability to intercept 
all name resolution--and just fall back to the default for namespaces 
you don't care about.

If you're talking about the ability to have the debugger dump out all 
namespaces with Foo at the top, or do a hierarchical printout, then a 
debugger can certainly do this without having truly nested namespaces. 
That's just output formatting/logic, really. And of course, that's 
introspection.

Sure.  That's not the idea behind the heirarchical namespaces either.  
A
lot of power comes out of being able to treat namespaces as a tree-like
data structure (I can actually attest to this: see Class::Closure), as
was pointed out before, being able to treat namespaces as a filesystem.
For me, the filesystem metaphor plays out a little differently. I think 
of an individual namespace as being like a DB file. It's convenient to 
organize files via a hierarchical filesystem--but files are not 
themselves nested (you don't have files inside of other files), and the 
API of files (open/close/read/write/seek/truncate) has nothing to do 
with anything hierarchical. So for me the analogy means that it's 
convenient to have hierarchical organization so that humans can keep 
CPAN organized, but that doesn't have runtime consequences.

Also, Perl 6's classes will behave more heirarchically:
module Foo;
class Bar {...}
my $x = Bar.new;   # Actually works, while Bar is Foo::Bar.
But Java has something similar (inner classes), and does it without 
nested namespaces. In fact it (I believe) unwinds stuff like this 
completely at compile-time, so that your last line compiles down to the 
same thing as my $x = Foo::Bar.new;.

What might do it for me, would be if stuff like this worked:
module Foo;
$x = 7;
class Bar {...}
# now, in a different lexical scope...
	sub Foo::Bar::blah { ++$x; } # Resolves to $Foo::x, because Bar is 
inside Foo

# now, elsewhere at runtime
	...some syntax which moves Bar from module Foo into module Zoo at 
runtime
	then the above sub is Zoo::Bar:blah, and $x resolves to $Zoo::x, if 
that exists

Not that I think any of that would necessarily be a good thing to have, 
but it would be the sort of thing which would naturally be implemented 
via nested namespaces. (And, if any significant language allows this 
sort of run-time namespace hierarchy rearrangement, then I'd see that 
as a good reason to nest--so that we don't cut out a whole language 
from Parrot-targetablility.)

But, that said, I think I'm pretty much the only person who thinks that 
namespaces shouldn't nest conceptually. And I'm really more worried 
about the other issue I brought up, about how do deal with 
interoperability between different languages which have different ideas 
about namespace segmenting between different types of entities. That's 
a bigger problem.

JEff

Re: Namespaces

2004-09-13 Thread Jeff Clites

On Sep 12, 2004, at 8:43 PM, Luke Palmer wrote:
Jeff Clites writes:
On Sep 7, 2004, at 6:26 AM, Dan Sugalski wrote:
*) Namespaces are hierarchical
So we can have [foo; bar; baz] for a namespace. Woo hoo and all
that. It'd map to the equivalent perl namespace of foo::bar::baz.
How does this hierarchical nature manifest? I ask because I don't know
of any languages which actually have nested namespaces,
Other than, um, well, Perl.
As an implementation detail yes, but I can't think of any Perl code 
(except for explicit introspection) which reveals this. That is, 
seemingly nothing about Foo::Bar comes out of Foo, nor is it 
necessarily related to Foo::Baz.

so I'm not sure what this is meant to imply. In particular, if this
implies cascading lookup (meaning that a symbol looked up in [foo;
bar; baz] would fall back to looking in [foo; bar] if it
didn't find it), than that's not how Perl5 packages work.
No, the lookup is not cascading downwards.  It is cascading upwards,
however, so that in:
[ foo; bar; baz ]
The [ foo; bar ] part can be implemented differently.  This means
that Python's namespaces and Perl's namespaces can have different
semantics.  Picture a PythonStash and PerlStash pmc.
Sure, if they're nested then the baz namespace is looked up inside of 
bar which is looked up inside of foo--I get that. But what I'm 
wondering is, let's say you have $Foo::Bar::someVariable (in Perl5 
syntax). That means look up $someVariable, inside of the Foo::Bar 
namespace. Now, let's contrast two ways this namespace name might 
compile down:

[Foo; Bar]
v.
[Foo::Bar]
Now, what would behave differently in those two cases, in terms of the 
behavior of $Foo::Bar::someVariable? I can't think of anything.

Now it's true that if you think it's really important to tie namespaces 
such that you can take over the name resolution for [Foo], then you'd 
want the former. But I don't see that as really terribly useful--in the 
examples I've heard of (e.g., turning the namespace lookup into an 
Oracle fetch) have seemed to not provide much that a tied hash doesn't. 
And there are currently a ton of, for instance, Text::Blah modules on 
CPAN--but that doesn't actually mean that they have anything to do with 
a Text namespace--that's not the intention, and most of these modules 
have nothing to do with each other. That naming is to keep them 
conceptually organized, not functionally.

I can certainly see having different namespace semantics for different 
languages, but that doesn't imply arbitrary nesting. That might imply 
separate top-level namespaces, each with its own behavior (so 
overall, two levels of namespace), or you could just have one-level 
namespaces, but each with its individualized semantics, or all 
namespace might be represented by the same type of data structure, 
which is just traversed differently from different languages. (And I'm 
not sure that the semantics do differ between languages--given a 
namespace and a name, there's seemingly only one thing to do.)

And again, I don't so much object to the idea of nested namespaces--I 
just feel that they'll slow down symbol lookups, without giving us much 
in return. I'm afraid we're adding complexity we don't need.

JEff

Re: Namespaces

2004-09-13 Thread Jeff Clites

On Sep 12, 2004, at 11:29 PM, Brent 'Dax' Royal-Gordon wrote:
Jeff Clites [EMAIL PROTECTED] wrote:
And again, I don't so much object to the idea of nested namespaces--I
just feel that they'll slow down symbol lookups, without giving us 
much
in return. I'm afraid we're adding complexity we don't need.
One thing this buys you is that you can have a Perl package:
class Foo::Bar {...}
And in Python, refer to it with Python's syntax:
 bar = __Perl.Foo.Bar()
Since both of them boil down to the same thing:
 [__Perl; Foo; Bar]
True, but you'd also get this if the Perl and Python compiler authors 
decided to coordinate and compile down to the same thing, even without 
namespace nesting. And the Ruby people could decide to do the same, or 
not, as they prefer. That is, you could get this benefit even if 
Foo::Bar didn't imply the existence of a Foo namespace. (And frankly, 
you could even keep the idea a namespace is named by an array of 
strings, without implying nesting. That wouldn't have the performance 
impact of actual nesting.)

JEff

Re: Namespaces

2004-09-13 Thread Jeff Clites

On Sep 13, 2004, at 6:38 AM, Timm Murray wrote:
On Sunday 12 September 2004 10:08 pm, Jeff Clites wrote:

I'd say that the language-level namespaces should get nice reverse-DNS
names, like [com.perl.perl5], or whatever's appropriate.
No, no, no, no.  Bad Java programmer! :)
Ha, it's actually one of the few things I like about Java
The reverse-DNS namespace sounds great until you realize that not 
everyone has
a domain name.  One could imagine some developer whittling away a toy
language and hosting off Geocities/Tripod/whatever without having a 
proper
domain name.  Or even an programmer who doesn't have an Internet 
presence for
their language at all (lots of academic languages are like this).  And 
what
do you do if your domain name changes?  Worse, what do you do if 
someone
steals your domain name?

Sun tried using reverse-DNS namespaces for Java, and in my experiance, 
it
causes too many problems for edge cases.
Well this issues is, from my perspective, that the only reason anyone 
needs namespaces at all is that humans want nice short symbol names 
like invert(), and order to keep people from conflicting is to have a 
distinguishing prefix. But that doesn't help unless you can keep the 
prefixes from clashing, and the only ways to do that are to either have 
ugly randomly-generated prefixes, which nobody would like, or to 
explicitly coordinate, via a registry, and DNS is already one such 
registry. It's not perfect, but all of the above problems are 
avoidable. (That is, I don't need to have network access at all--I just 
need to _register_ a domain name, and keep the registration up-to-date. 
Or, find a domain-owner who's willing to let me user a subdomain--could 
be along the lines of edu.whatever-university.project-name. And of 
course this only matters if I'm planning to distribute my software.) 
That's certainly better than starting with no coordination at all, and 
forcing people to have to deal with potential conflicts for everything.

JEff

Re: Namespaces

2004-09-12 Thread Jeff Clites

On Sep 7, 2004, at 6:26 AM, Dan Sugalski wrote:
*) Namespaces are hierarchical
So we can have [foo; bar; baz] for a namespace. Woo hoo and all 
that. It'd map to the equivalent perl namespace of foo::bar::baz.
How does this hierarchical nature manifest? I ask because I don't know 
of any languages which actually have nested namespaces, so I'm not sure 
what this is meant to imply. In particular, if this implies cascading 
lookup (meaning that a symbol looked up in [foo; bar; baz] would 
fall back to looking in [foo; bar] if it didn't find it), than 
that's not how Perl5 packages work.

If it doesn't mean that, then I can't come up with any non-contrived 
functionality that nested namespaces give us, over non-nested 
namespaces, other than slower symbol lookup.

*) Namespaces and sub-spaces can be overlaid or aliased
So code can, for example, throw a layer over the ['foo'; 'bar'; 'baz'] 
part of the namespace that gets looked at first when searching for 
something. These layers can be scoped and shifted in and out, which 
means its possible to have two or more [IO] namespaces that have 
completely (or partially) different contents depending on which space 
is in use.
I think I know what you are getting at, though it's a bit fuzzy still. 
But it may help to point out that Perl5 at least never searches 
multiple namespaces. If a variable is not found in any enclosing 
lexical scope, then it's looked up in the current package--no effort to 
determine which namespace to look in at all, actually. This loops back 
to my worry as to whether nesting implies fallback lookups.

It's also possible to hoist a sub-space up a few levels, so that the 
[IO] space and the [__INTERNAL; Perl5, IO] namespace are the 
same thing
This seems like Tim's chroot-ing, I guess.
*) The top-level namespace [__INTERNAL] is taken. It's ours, don't 
touch in user code.

Alternate names are fine. I'm seriously tempted to make it [\0\0]
I don't see a need to name it. Since everything's relative to it, it 
should just be []. So the [foo] namespace is automatically relative 
to the root. No need to make up a reserved name (which is always 
dangerous).

*) Each language has its own private second-level namespace. Core 
library code goes in here.

So perl 5 builtins, for example, would hang off of [__INTERNAL; 
perl5] unless it wants something lower-down

*) Parrot's base library goes into [_INTERNAL; Parrot]
I'd say that the language-level namespaces should get nice reverse-DNS 
names, like [com.perl.perl5], or whatever's appropriate. This is sort 
of the modern, best way to avoid name clashes (because there's an 
authoritative definition of who owns a given name). This is 
especially important when people start implementing languages with 
multiple dialects, like Lisp.

Leopold Toetsch wrote:
Dan Sugalski [EMAIL PROTECTED] wrote:
*) Namespaces are hierarchical
What do we do against collisions?
...
  .namespace [foo ; bar ]
prohibits any foo global in the top-level namespace.
Not necessarily, but it does bring up a different problem. The not 
necessarily comes in because I think of a namespace as really being 
multiple lookup tables. That is, in Perl5 you can have a function 
foo, a scalar foo, an array foo, and a hash foo, all at the 
same time. So really, a lookup is typed--you have to have the 
equivalent of look up the array named 'foo' in the 'blah' namespace 
(or, look up 'foo' in the array section of the 'blah' namespace). In 
the above example, you'd be looking up foo in the namespace section 
of the root namespace, so you _could_ still have a variable or whatever 
in the top-level namespace.

But, this gets tricky right away: Perl5 has (arguably) a whole bunch of 
different sections in its namespace--function v. scalar v. array v. 
...; Common Lisp has separate namespace for functions v. other symbols 
(you can have a function foo and a variable foo), but doesn't have 
segmented variable types; Scheme has a single namespace (so you only 
get one foo).

So the problem here is that it would seem that Parrot should try to be 
agnostic about this sort of thing (and support different languages with 
different styles), but thing will get tricky when crossing between 
languages (eg, if Perl has $foo and @foo, which does foo resolve to 
from Scheme code; if Scheme code defines bar, where does it end up 
from the perspective of Perl). And agnostic == push the difference 
into the code gen done by the compilers == you can't optimize much, 
at least probably.

[[And you can't really side-step this in the Perl5 case by trying to 
treat the sigil as part of the variable name--for one because that's 
not really true, and for another because that would probably make 
Perl5-defined variable inaccessible from, say, Python code.]]

Tim Bunce said:
I think a filesystem analogy is a very helpful one. Many of the
issues are similar and there's a lot of experience with how to
address them in creative ways.
I'd suggest adding:
*) Each interpreter has a

Re: Bit ops on strings

2004-05-26 Thread Jeff Clites

On May 26, 2004, at 2:02 AM, Nicholas Clark wrote:
On Tue, May 25, 2004 at 07:48:32PM -0700, Jeff Clites wrote:
On May 25, 2004, at 12:26 PM, Dan Sugalski wrote:

Yup. UTF8 is Just another variable-width encoding. Do anything with 
it
and we convert it to a fixed-width encoding, in this case UTF32.
This has the unfortunate side-effect of wasting 50-75% of the storage
space in the common cases, of course.
True. But variable length encodings suck performance wise.
Yes--that was the point I made previously in this thread. But my 
proposed scheme was neither variable length nor egregiously wasteful of 
space.

The only thing that might be useful to cache on a UTF8 string is the 
highest
code point seen, so that we know whether to unpack to 8, 16 or 32 bit 
without
a scan. Presumably we can find this when we input validate on the
conversion from binary to UTF8.
This is basically what I implemented.
JEff

1 2 3 4 >

1 - 100 of 323 matches

Mail list logo