Re: Lexical implementation work

2002-01-28 Thread Nicholas Clark

On Sat, Nov 10, 2001 at 12:37:24PM -0500, Dan Sugalski wrote:
 I think we're going to switch over to some sort of key creation op, but I'm 
 not sure at the moment. Constant keys are easy, of course--they can be 
 thrown up into the constants section, built at compile-time.

Do constants with precomputed hashes buy us enough to overcome the extra
memory needed to store the integer for the hash as well as the string?

On and off I've been playing with this in perl5, but it seems to be
impossible to make a benchmarkable difference (either faster *or* slower)

Nicholas Clark
-- 
EMCFT http://www.ccl4.org/~nick/CV.html



Re: Lexical implementation work

2002-01-28 Thread Dan Sugalski

At 11:11 PM + 1/28/02, Nicholas Clark wrote:
On Sat, Nov 10, 2001 at 12:37:24PM -0500, Dan Sugalski wrote:
  I think we're going to switch over to some sort of key creation op, but I'm
  not sure at the moment. Constant keys are easy, of course--they can be
  thrown up into the constants section, built at compile-time.

Do constants with precomputed hashes buy us enough to overcome the extra
memory needed to store the integer for the hash as well as the string?

Good question, and I'm not sure. I expect that for some it will--such 
as AUTOLOAD or ISA which are used a lot--and for most it won't 
matter. But the frequently accessed entries can be gotten even faster 
if we reserve slots for them in the various hashes and access them by 
offset instead.
-- 
 Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk



Re: Lexical implementation work

2001-11-17 Thread Dan Sugalski

At 10:10 PM 11/13/2001 -0500, Ken Fox wrote:
QUESTIONS!

Who owns the bytecode format? How do I propose changes?

Nobody in particular at the moment, and note your change proposals to 
the  list.

I need
a scope definition section. Each scope is assigned a per-module
id. I'm not sure what info is needed yet, but certainly a size
and a code ref (opcode address) for the DESTROY sub.

For scopes you'll probably also need a template lexical scratchpad ID.

The control stack isn't used for much

yet. :)

and it would simplify my
code a lot if we combine the frame stack with the control stack.

The control stack is for things the interpreter needs to track. More will 
go on it--local() restore information, scope markers (for scope cleanup), 
scoped locks, possibly exception handler markers (though I'm not sure about 
that yet)--a fair amount of stuff. It would seem appropriate for your frame 
information to go on it.

The only down-side I think will be with hand-coded assembler.

I wouldn't worry too much about the hand-coded stuff.

Anybody care if I subsume the control stack?

No. Or, rather, use the control stack for what you're doing.

Lastly, does anybody care if I change how assembler directives
are registered?

No. I like the leading period directives. It's either that or ALL-UPPERCASE 
directives, and that's kind of ugly.


Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




RE: Lexical implementation work

2001-11-14 Thread Dan Sugalski

At 04:12 AM 11/11/2001 -0500, James Mastros wrote:
On Sat, 10 Nov 2001, Dan Sugalski wrote:
  At 01:39 PM 11/9/2001 -0800, Brent Dax wrote:
  Dan Sugalski:
  Of course.  Random question only very tangentially related to this: is
  INTVAL (and thus the I registers) supposed to be big enough to hold a
  pointer?
  INTVAL shouldn't ever get a pointer in it. We're not going to be casting
  pointers to ints and back anywhere, except possibly in some of the
  deep'n'evil spots (like the memory allocator).
Correction (and please correct this correction if I'm wrong):
An INTVAL should never get a /native/ pointer in it.  However, when we do
relitave or absolute jumps in parrot code, the destination is an INTVAL.

Also, there's a good chance that PMC constants or non-constants may be at
some points native pointers, and it would probably help effinceny for
sizeof(INTVAL)==sizeof(PMC), no?

Nope. PMCs that hold pointers should stash them in the pointer slot of the 
PMC structure. (If I said we were requiring that to be a pointer to a PMC 
or a buffer structure, I relent--it must for GC, but if the PMC is tagged 
appropriately so the GC won't follow it, it can point to anything)

Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: Lexical implementation work

2001-11-13 Thread Ken Fox

Dan Sugalski wrote:
 Nope, not stone tablet at all. More a sketch than anything else,
 since I'm not sure yet of all the things Larry's got in store.

Ok. I've made some more progress. There's a crude picture of
some of the internals at http://www.msen.com/~fox/parrotguts.png
The lexical stuff is at the bottom of the picture.

I've patched the assembler to accept syntax like:

.begin
  .reserve 1
  fetch_lex I0, 0, 0
.end

This is nice syntax for use in hand-coded assembler. I assume
that a compiler will take control of the scope definitions and
emit the enter/exit scope ops itself. (I'm throwing in the more
complicated sync_scope op too so a compiler can handle weird
jumps between scopes -- IMHO it's simpler to do this than provide
an introspective API that lets the compiler manually adjust
the scope stack.)

There's psuedo-code for the ops attached below. I'm probably
a couple evenings away from having things working.

Unless people really hate the idea, I'm going to put in v
operand types in ops2c.pl. They'll look like c types (e.g. nc),
but reference the current lexical scope instead of the constant
section.

QUESTIONS!

Who owns the bytecode format? How do I propose changes? I need
a scope definition section. Each scope is assigned a per-module
id. I'm not sure what info is needed yet, but certainly a size
and a code ref (opcode address) for the DESTROY sub.

The control stack isn't used for much and it would simplify my
code a lot if we combine the frame stack with the control stack.
The only down-side I think will be with hand-coded assembler.
There may be a redundant restore scope pointer on the stack
when calling a sub in the same scope. (This is impossible with
Perl code BTW.) Well, there's also a purist downside too --
mixing opcode addresses with lexical storage. This is the thing
that makes capturing a closure easier, so I see it as an
advantage.

Anybody care if I subsume the control stack?

Lastly, does anybody care if I change how assembler directives
are registered? I think there are going to be a whole bunch of
symbol table and debugging directives going in. The current if
tests are kind of clunky.

Here's the op definitions (in pseudo-code form) taken from the
copy of core.ops I'm working on:

--- parrot/core.ops Wed Nov  7 22:34:10 2001
+++ my-parrot/core.ops  Tue Nov 13 21:13:53 2001
@@ -1930,6 +1930,159 @@
 
 ###
 
+=head2 Lexical Variables
+
+These opcodes implement lexical scopes and PMC lexical variables.
+This functionality is only concerned with scoping and storage,
+not with the symbol tables used for translating variable names to
+storage. Compilers often use the same symbol table for managing
+both lexical and dynamic variables.
+
+First, the terminology:
+
+Lexical Scope: A contiguous region of text (code) bounded by
+.begin/.end scope directives. A lexical scope may contain any
+number of non-overlapping interior (child) scopes. The flow
+control of the code does not affect these scoping rules.
+
+Lexical Variable: A variable that is visible only by code in the
+same lexical scope, or an interior scope, that the variable is
+defined in.
+
+Frame: A group of lexical variables defined in a lexical scope.
+The frame does not include the lexical variables defined in
+interior scopes -- each interior scope requires its own frame.
+
+Frame ID: The position of a frame relative to either the current
+scope (non-positive frame IDs), or the outer-most scope (positive
+IDs).
+
+Variable ID: The non-negative position of a variable relative to
+the start of a frame.
+
+Scope ID: A unique positive number assigned by the assembler or
+compiler to each lexical scope. Information about the scope, such
+as how much space is required for a frame, is retrieved using the
+scope ID.
+
+=over 4
+
+=item Bfetch_lex(out p variable, i|ic frame_id, i|ic variable_id)
+
+=item Bstore_lex(i|ic frame_id, i|ic variable_id, p variable)
+
+Note: While the PMC code is being developed, lexicals hold integers
+instead of PMCs. This changes the usage of lexicals because PMC
+lexicals will not need to be stored back to the frame.
+
+=item Benter_scope(ic scope_id)
+
+=item Bexit_scope()
+
+=item Bsync_scope(ic scope_id)
+
+=item B.begin [ID]
+
+B.begin is a pseudo-op that does two things: it begins a lexical
+scope at the current position in the code and it emits an
+Benter_scope op for the current scope. If ID is not provided, the
+assembler will generate one automatically.
+
+=item B.reserve variable_count
+
+=item B.end
+
+B.end is a pseudo-op that does two things: it ends the current
+lexical scope (returning to the enclosing lexical scope) and it
+emits an Bexit_scope op.
+
+=back
+
+=cut
+
+AUTO_OP fetch_lex(i, i|ic, i|ic) {
+  /*
+ int frame_id = $2;
+ int variable_id = $3;
+
+ if (frame_id = 0) {
+frame_id += interpreter-frame_display-used;
+ }
+
+ $1 = 

Re: Lexical implementation work

2001-11-12 Thread Jason Gloudon

On Sun, Nov 11, 2001 at 08:57:15PM -0800, Brent Dax wrote:

 You get the idea?  And as for multidimensional stuff, what's wrong with:
 
   fetchlex P1, @lol
   fetchary P2, P1, 1
   fetchary P3, P2, 2
   #...

Consider (from exegesis 2):

my int @hit_count is dim(100,366,24);

This array is supposed to contain bare integer values as opposed to scalars in
order to be memory efficient. It has been said that there would be a single PMC
which references the appropriate block or blocks of memory.

Also, any array access, for example @hitcount[5][5][6]

has to be done using the vtable of @hitcount as there is no scalar for each
entry. Hence the variable number of keys.

 Yes, I know, some people don't like that multidimensional arrays are
 internally arrays of arrays, but that doesn't necessarily need to be
 obvious to the outside world, does it?  Internally, the nested arrays
 could have tweaked vtables to do whatever they ought to do.

You potentially make all multidimensional arrays slower this way, if you HAVE
to do a memory lookup for each dimension to find an entry, as opposed to
directly calculating its offset in a contiguous block of memory.

-- 
Jason



Re: Lexical implementation work

2001-11-12 Thread Michael L Maraist

On Friday 09 November 2001 03:36 pm, Dan Sugalski wrote:
     Do we want non-PMC lexical support?

 Nope, I wasn't going to bother. All variables are PMCs. The int/string/num
 things are for internal speed hacks.

But can't a compiler generate one of these internal hacks?  My thoughts are 
that a given scope can not return internal representations (like int's / 
string's), and must encapsulate them inside scalars, BUT the compiler may 
optimize the inside of a scope.  If no evals occur inside the block (rather 
common), then the lexicals aren't volitile.  If no divides are ever performed 
on a scalar, nor are they stringified, then the compiler might be able to 
assume that they're raw integers (whether they were declared my $foo as int 
or not).  This also assumes that the would-be-integer does not interact with 
any other scalars.  I believe there are op-codes that assign the value of 
primatives into a scalar (would have to), so scalars and primatives can still 
interact.  Since this is an optimization, I agree that it's usefulness is 
limited.  

From the above, the only uses I can see for declaring my $foo as int is to 
set flags (or utilize different vtables) to enforce integerness, and to say 
to the optimizer that it's ok to use a primitive integer if the block 
contained a divide.  Note that:
$foo = int( $bar / $baz)
should have the same effect.

End result is that this is a problem that can be put off.

-Michael



RE: Lexical implementation work

2001-11-10 Thread Dan Sugalski

At 01:39 PM 11/9/2001 -0800, Brent Dax wrote:
Dan Sugalski:
# At 12:39 AM 11/9/2001 -0500, Ken Fox wrote:
# 3. We've adopted a register machine architecture to
# reduce push/pop stack traffic. Register save/load
# traffic is similar, but not nearly as bad.
# 
# Do we want to further reduce memory moves by allowing
# ops to work directly on lexicals?
#
# No, I don't think so--that's what the registers are for.
# Fetch out the PMC
# pointer into a PMC register and just go from there. Any
# changes made via
# the pointer will, of course, be reflected in the lexical, since we're
# working on the real thing.

Here's a thought--do we want to have variants on every PMC op to support
a key?  IIRC, assembly.pod proposes something like:

I think we're going to switch over to some sort of key creation op, but I'm 
not sure at the moment. Constant keys are easy, of course--they can be 
thrown up into the constants section, built at compile-time.

 fetchlex P1, %foo
 add P3, P1, key, 1

Why not just have:

 fetchlex P1, %foo #or whatever
 fetchhash P2, P1, key
 add P3, P2, 1

and save ourselves a few opcode numbers?

Why save the numbers? Also we'll end up needing variable-arg-count opcodes 
for proper multidimensional fetching.

# 4. There has been discussion about how useful non-PMC
# registers will be to the Perl compiler. If there are
# questions there, surely non-PMC lexicals would be even
# less useful to Perl.
# 
# Do we want non-PMC lexical support?
#
# Nope, I wasn't going to bother. All variables are PMCs. The
# int/string/num
# things are for internal speed hacks.

You may want to bounce that off the -language people--they seem to be
expecting that 'my int $foo' will only take up a pad entry and an
int-sized chunk of memory.

They can think what they like, but it's not going to happen for plain 
scalars. There's not much point, really--the space savings comes in with 
array and hash storage. The typical program doesn't have that many plain 
scalars.

# 5. Perl 5 uses pads and scratchpads for holding lexicals.
# 
# Do we want to consider other implementations such as
# traditional stack frames with a display or static
# link?
#
# Well, I don't think we can go with traditional stack frames
# as such, since
# the individual frames (and their children) may need to stick
# around for
# ages, courtesy of closures and suchlike things. (We almost
# end up with a
# linked list of stacks when you factor recursion in)

With closures, don't we just clone the PMC pointer and pad entry, thus
avoiding having to keep stack frames around until the end of time?  Or
am I missing something?

Bad explanation. What I meant was we need to keep stacks of call frames 
around, or something of the sort. If you've got a recursive set of calls 
and create a closure inside them, the closure only captures the top-most 
version of all the variables. (Basically the current pads for each block) 
You still need to keep all of 'em around and available, in case at runtime 
something walks up them with caller and MY.

We're going to keep the full pads around for closures, otherwise you set 
yourself up for problems with evals inside them.

# Lexical implementation is as critical to
# Perl's performance as the dispatcher is, so we should
# take some time to get it right.
#
# I'm not sure it's as performance critical, but it's
# definitely important.
# Fast is, of course, good. :)

Of course.  Random question only very tangentially related to this: is
INTVAL (and thus the I registers) supposed to be big enough to hold a
pointer?

INTVAL shouldn't ever get a pointer in it. We're not going to be casting 
pointers to ints and back anywhere, except possibly in some of the 
deep'n'evil spots (like the memory allocator).

Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Lexical implementation work

2001-11-08 Thread Ken Fox

Simon just chastised me for talking instead of patching.
Most of the stuff I'm interested in is heavily related to
the implementation of lexicals, so that's where I'm going
to jump in.

There are a number of decisions to make about lexicals
and the current PDD is pretty slim. So, IMHO, the place
to start is with the PDD. Anybody have any more thoughts
on the interface? Dan? Is this stone tablet stuff yet?

The big five questions on my list are:

1. The find_lex/fetch_lex ops push a lot of work into the
   run-time when the compiler can figure out what frame a
   lexical appears in.

   Should there be an alternative, faster interface to
   lexicals that eliminates string lookups? The symbolic
   interface is required to support %MY and debugging, so
   it stays regardless of whether other interfaces exist.

2. Perl 5 doesn't support nested subs, but I haven't read
   anything about Perl 6 and I don't know if other
   targets for Parrot support them either.

   Do we want to support nested subs? Efficiently?

3. We've adopted a register machine architecture to
   reduce push/pop stack traffic. Register save/load
   traffic is similar, but not nearly as bad.

   Do we want to further reduce memory moves by allowing
   ops to work directly on lexicals?

4. There has been discussion about how useful non-PMC
   registers will be to the Perl compiler. If there are
   questions there, surely non-PMC lexicals would be even
   less useful to Perl.

   Do we want non-PMC lexical support?

5. Perl 5 uses pads and scratchpads for holding lexicals.

   Do we want to consider other implementations such as
   traditional stack frames with a display or static
   link?

I'm leaning towards yes on all questions.

If there aren't any obvious answers, I propose that we
implement a couple different approaches and objectively
compare them. Lexical implementation is as critical to
Perl's performance as the dispatcher is, so we should
take some time to get it right.

- Ken