RE: Bytecode metadata

2003-01-23 Thread Brent Dax
Dan Sugalski:
# Since it looks like it's time to extend the packfile format and the 
# in-memory bytecode layout, this would be the time to start discussing 
# metadata. What sorts of metadata do people think are useful to have 
# in either the packfile (on disk) or in the bytecode (in memory).

I do think that, whatever "native" (i.e. understood by Parrot) metadata
we support, we *must* allow for extensibility, both for future native
metadata and for third-party tools.  Moreover, this must not be
implemented with a special type of metadata block, or by using
sequentially-increasing numbers.  (The first means that any metadata we
decide to add in the future will be slower than the metadata we add now;
the second has problems with several third-party tools picking the same
number.)

--Brent Dax <[EMAIL PROTECTED]>
@roles=map {"Parrot $_"} qw(embedding regexen Configure)

>How do you "test" this 'God' to "prove" it is who it says it is?
"If you're God, you know exactly what it would take to convince me. Do
that."
--Marc Fleury on alt.atheism





Re: [PATCH] nci.pmc mark routine

2003-01-23 Thread Leopold Toetsch
Steve Fink wrote:


I'm confused by nci.pmc's mark() routine. It calls pobject_lives() on
the ->cache.struct_val pointer. But in set_string_keyed(), that seems
to be set to a pointer to a function, which is definitely not a PObj*.



Yes, that's right.



The ->data field, on the other hand, appears to be a PObj*. 


No the data field is either the statically built native calling thunk 
from nci.c:build_call_func() or a malloced thunk from JITs equivalent.


... Is the below patch correct?



Almost :-) No, we don't have a PMC to mark finally, the line
PObj_custom_mark_SET(SELF);
is wrong (and makes the mark obsolete)

Thanks for that analysis,

leo






Re: Transferring control between code segments, eval, and suchlike things

2003-01-23 Thread Leopold Toetsch
Dan Sugalski wrote:


Okay, since this has all come up, here's the scoop from a design 
perspective.


Hard stuff did meet my printer at midnight, reading it onscreen twice 
didn't help ;-)

First:

Definition #0: A bytecode segment is a sequence of code, which is loaded 
into memory with no execution of such code intersparsed. So all subs, 
modules, whatever loaded from zig files may be one code segment, *if* 
the runloop wasn't entered. Or: as soon as the code is running, loading 
additional bytecode puts this code into a different bytecode segment.


Design Edict #1: Branches, which is any transfer of control that takes 
an offset, may *not* escape the current bytecode segment.


Design Edict #2: Jumps may go anywhere.




Design Edict #3: All destinations *must* be marked as such in the 
bytecode metadata segment. (I am officially nervous about this, as I can 
see a number of ways to subvert this for evil)


I would define: Jumps may go to any location aquired per set_addr call 
or to branch tables. Jumping somewhere else may kill your dog.

Jumping to a set_addr label is recognized already, jump tables may 
probably need some marker around them, so that the jump targets won't 
get killed by dead code elimination.


I'm only keeping jumps (and their corresponding jsr) around for 
nostalgic reasons, and with the vague hope they may be useful. I'm not 
sure about this.


They would be useful for a computed goto.


s/compreg/compile/g for($below);



The compreg op should compile the passed code ...



Design Edict #7: the compreg opcode will execute the compiled code, 
calling in with parrot's calling conventions. If it should return 
something, then it had darned well better build it and return it.


If the compile opcode has to execute the code, I would call it "eval".

But: When compile and eval are separate stages, the HL might be able to 
pull the compile stage out of e.g. loops. So I think keeping compiling 
and evaling separate makes sense.


Thanks for putting this together,
leo




Re: Transferring control between code segments, eval, and suchlike things

2003-01-23 Thread Leopold Toetsch
Benjamin Stuhl wrote:


At 03:00 PM 1/22/2003 -0500, you wrote:



... Although,
all this would seem to suggest that we'd need/want a special-purpose 
allocator for bytecode segments, since every sub has to fit within 
precisely
one segment (and I know _I'd_ like to keep bytecode segments on their 
own memory pages, to e.g. maximize sharing on fork()).


IMHO this is a big waste of memory - and running this page aligned code 
JITted doesn't buy anything.


Design Edict #7: the compreg opcode will execute the compiled code, 
calling in with parrot's calling conventions. If it should return 
something, then it had darned well better build it and return it.


How does this play with

eval 'sub bar { change_foo(); } BEGIN { bar(); }  (...stuff that depends 
on foo...)';

? The semantics of BEGIN{} would seem to require that bar be installed 
into the symbol table immediately... but then how do we reproduce that 
if we're e.g. loading
precompiled bytecode?


Precompiled PBC and eval is a PITA. This issue seems to imply some extra 
parsing during load time and setting up symbols. I dunno yet, how to 
handle this.

leo



Re: Parrot Developer Day(s)?

2003-01-23 Thread Leon Brocard
Dan Sugalski sent the following bits through the ether:

> Also, I know that we do have people scattered all over the world, but 
> if someone wants to try and get a list of who's where, we may find 
> it's worth it to get groups of people together. (I don't, after all, 
> have to be involved... :)

Right, I'm going to organise a sort of world map of Parrot
developers. If you consider yourself a Parrot developer (ie write
Parrot code, patches, assembler, whatever, the more the merrier!)
please email me *privately* filling in the following form:

Name:
Email:
Nickname:
City:
County/State:
Country:
Latitude:
Longitude:

The longitude and latitude should be +-DDD.DDD, that is decimal
degrees. For example London, UK would be (51.500, -0.083).

Useful resources for finding your longitude, latitude:
http://www.getty.edu/research/tools/vocabulary/tgn/
http://www.ckdhr.com/dns-loc/finding.html

Thanks! Leon
-- 
Leon Brocard.http://www.astray.com/
scribot.http://www.scribot.com/

... Famous last words - You and what army?



Re: [PATCH] nci.pmc mark routine

2003-01-23 Thread Leopold Toetsch
Steve Fink wrote:


I'm confused by nci.pmc's mark() routine. 


Ok, nci's mark() is gone.

But - what confuses me - this patch needs a "make progclean" for changes 
to take effect. Without default_mark gets called, because I don't know 
where, something isn't recompiled.

Argh: pmc->vtable->init or ->set_string_keyed() is a totally static 
thing. This implies, that after changing a vtable function, each file, 
that is using a PMC of this type has to be rebuilt. Our dependencies 
don't reflect this case.

leo



Re: Parrot Developer Day(s)?

2003-01-23 Thread Nicholas Clark
On Thu, Jan 23, 2003 at 10:23:22AM +, Leon Brocard wrote:
> Latitude:
> Longitude:

You forgot altitude. A proper ICBM block needs altitude :-)

> Useful resources for finding your longitude, latitude:

If you're in the UK you can get lat and long conversions from
http://www.streetmap.co.uk/
Which is useful, as they let you look up locations by postcode, OS grid
reference and lots of other things. I await to see how many people this
is useful for.

Nicholas Clark



Re: Objects, finally (try 1)

2003-01-23 Thread Gopal V
If memory serves me right, Erik Bågfors wrote:
> > :-) Python basically requires that each step in the process be
> > overridable. (1. look up attribute 2. call attribute, at least in
> > `callmethod's case).

would this be more of what you need ? 

obj.__dict__["foo"].__call__();

/me again shows up and says that the compiler designer can do this with
ease ... Or in this case the interpreter designer can implement an
``InvocationExpression'' in anyway they want ... 

I think Jython would be a fine example of how this could be done (though
speed suffers on a hash lookup without engine support ?).

> Ruby needs to call the missing_method method (if I remember correctly). 
> So if "foo" doesn't exist, it would be good to be able to override
> callmethods behavior and make it call missing_method.

like I said , the compiler designer can put that explicitly in the 
generated code ... You don't actually need instructions to do that.
Also the explicit generation might prove to be better to handle all
the quirks future languages might encounter

My interest here is to obtain a clear and fast way to call stuff for
static compiled languages. :) 

Gopal
-- 
The difference between insanity and genius is measured by success



Re: Objects, finally (try 1)

2003-01-23 Thread Erik Bågfors
On Thu, 2003-01-23 at 15:46, Gopal V wrote:

> > Ruby needs to call the missing_method method (if I remember correctly). 
> > So if "foo" doesn't exist, it would be good to be able to override
> > callmethods behavior and make it call missing_method.
> 
> like I said , the compiler designer can put that explicitly in the 
> generated code ... You don't actually need instructions to do that.
> Also the explicit generation might prove to be better to handle all
> the quirks future languages might encounter

Sure, there is only one problem with that.  I don't know if it's a real
problem or not.

But if I write a library in ruby that depends on the missing_method
method it will not be usable from other languages, since those languages
doesn't call missing_method if the method they try to call doesn't
exist.

Of course, in real life I don't think that's a problem because I haven't
seen much use of missing_method.

Also, having a instruction would be faster which of course is more fun
:)

> My interest here is to obtain a clear and fast way to call stuff for
> static compiled languages. :) 

But the really interesting thing about parrot is that it is primarily
made for very dynamic languages.  Personally I think it's quite ok if C#
is a little bit slower under parrot than under mono/dotgnu/MS.NET, as
long as the dynamic languages are as fast or faster than they are now.

/Erik

-- 
Erik Bågfors   | [EMAIL PROTECTED]
Supporter of free software | GSM +46 733 279 273
fingerprint:  A85B 95D3 D26B 296B 6C60 4F32 2C0B 693D 6E32



Re: Objects, finally (try 1)

2003-01-23 Thread Gopal V
If memory serves me right, Erik Bågfors wrote:

> But if I write a library in ruby that depends on the missing_method
> method it will not be usable from other languages, since those languages
> doesn't call missing_method if the method they try to call doesn't
> exist.

Hmm... that's twisting language features with virtual machine instructions.
Actually that's a gray area as far as I can see ... we'll have to go on
with `n' number of methods for `n' languages for member resolution ...

> Of course, in real life I don't think that's a problem because I haven't
> seen much use of missing_method.

Unfortunately I use a lot of __getattr__ for my python code (especially
GUI) ... 

> Also, having a instruction would be faster which of course is more fun
> :)

Yes... But this not only makes it ugly (as an instruction) , but slow as 
well ? . Like you said only a small number of people use this feature , 
does it make sense to slow down the rest ? . And how does hacker ZZ add 
a new language with a different member lookup without getting his patches 
inside Parrot ?..

Anyway I'm not *against* implementing this , I'm just questioning the
*need* to implement this ... Just a question for the philosophers ...

Gopal
-- 
The difference between insanity and genius is measured by success



Re: Bytecode metadata

2003-01-23 Thread chromatic
On Wed, 22 Jan 2003 13:27:47 +, Dan Sugalski wrote:

> Since it looks like it's time to extend the packfile format and the 
> in-memory bytecode layout, this would be the time to start discussing 
> metadata. What sorts of metadata do people think are useful to have 
> in either the packfile (on disk) or in the bytecode (in memory).

Comments, if a disassembler is to be able to reconstruct the original source
sufficiently well[1].

-- c

1) for the various values of "well" that include "semantic equivalence"



Re: Bytecode metadata

2003-01-23 Thread Juergen Boemmels
Hello, 

after quite a long time away from keyboard and fighting through a huge
backlog of mail I'm (hopefully) back again.

Dan Sugalski <[EMAIL PROTECTED]> writes:

> Since it looks like it's time to extend the packfile format and the
> in-memory bytecode layout, this would be the time to start discussing
> metadata. What sorts of metadata do people think are useful to have in
> either the packfile (on disk) or in the bytecode (in memory).

My current idea for the in memory format of the bytecode is this:
One bytecodesegment is a PMC consisting of three parts the actual
bytecode (a flat array of opcode_t), the associated constants, which
don't fit into an opcode_t (floats and strings), and a scratch area
for the JITed code. All other Metadata will be attached as
properties (or maybe as elements of an aggregate). This will be an
easy way for future extension. The invoke call to this pmc would
simply start the bytecode from the first instruction.

To support inter-segment jumps a kind of symboltable is also
neccessary. All externally reachable codepoints need some special
markup. This could be a special opcode extlabel_sc or an entry in a
symboltable. Also needed is a fixup of the outgoing calls, either via
modification of the bytecode or via a jumptable. Both have their pros
and cons: The bytecode modifcation prohibits a readonly mmap of the
data on disk and the fixup needs to be done at load-time but once this
is done the impact on the runtimespeed is minimal, whereas the
jumpcode is on extra indirection. But as stated somewere else the
typical inter-segment jump will be call/tailcall/callmethod/invoke,
which are at least two indirections.

The on disk version is a matter of serializing and deserializing this
PMC.

> Keep in mind that parrot may be in the position where it has to ignore
> or mistrust the metadata, so be really cautious with things you
> propose as required.

Ok to summarize:

ByteCodeSegment = {
  bytecode  => requiered;
  constants => only neccessary if string or num constants;
  fixup => (or jumptable) only neccessary if outgoing jumps;
  symbols   => all possible incomming branchpoints, optional;
  JIT   => will be filled when bytecode is invoked;

  source=> surely optional;
  debuginfo => also optional;
  ...
}

bye
boe.
-- 
Juergen Boemmels[EMAIL PROTECTED]
Fachbereich Physik  Tel: ++49-(0)631-205-2817
Universitaet Kaiserslautern Fax: ++49-(0)631-205-3906
PGP Key fingerprint = 9F 56 54 3D 45 C1 32 6F  23 F6 C7 2F 85 93 DD 47



Re: Bytecode metadata

2003-01-23 Thread Dave Mitchell
On Thu, Jan 23, 2003 at 09:21:45PM +0100, Juergen Boemmels wrote:
> My current idea for the in memory format of the bytecode is this:

I would strongly urge any file-based byte-code format to arranged
in such a way that it (or most of it) can simply be mmap-ed in (RO),
analogously to executables.

This means that a Perl server that relies on a lot of modules, and which
forks for each connection (imagine a Perl-based web server), doesn't
consume acres of swap space just to have an in-memory image per Perl
process, of all the modules.

This is a real problem that's hitting me hard with Perl 5 in my day job.

Dave.

-- 
Any [programming] language that doesn't occasionally surprise the
novice will pay for it by continually surprising the expert.
 - Larry Wall



Re: Bytecode metadata

2003-01-23 Thread James Michael DuPont

--- chromatic <[EMAIL PROTECTED]> wrote:
> On Wed, 22 Jan 2003 13:27:47 +, Dan Sugalski wrote:
> 
> > Since it looks like it's time to extend the packfile format and the
> 
> > in-memory bytecode layout, this would be the time to start
> discussing 
> > metadata. What sorts of metadata do people think are useful to have
> 
> > in either the packfile (on disk) or in the bytecode (in memory).
> 
> Comments, if a disassembler is to be able to reconstruct the original
> source
> sufficiently well[1].
> 
> -- c
> 
> 1) for the various values of "well" that include "semantic
equivalence"

Yes!
Deparsing. that would be great.

mike

=
James Michael DuPont
http://introspector.sourceforge.net/

__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com



Re: Transferring control between code segments, eval, and suchlike things

2003-01-23 Thread Jason Gloudon
On Wed, Jan 22, 2003 at 03:00:37PM -0500, Dan Sugalski wrote:

> Destinations. These are a pain, since if we can go anywhere then the 
> JIT has to do all sorts of nasty and unpleasant things to compensate, 
> and to make every op a valid destination. Yuck.

Arbitrary jumps are not that difficult to deal with in the JIT.  The JIT
compiler can handle jumps to arbitrary addresses by falling back into the
interpreter if the destination does not coincide with a previously known entry
point, reentering the JIT code later at a safe point. pbc2c generated code does
this. This way the JIT does not have to support making every instruction a safe
branch destination.

-- 
Jason



RE: Transferring control between code segments, eval, and suchlik e things

2003-01-23 Thread Fisher Mark
> Design Edict #3: All destinations *must* be marked as such in the 
> bytecode metadata segment. (I am officially nervous about this, as I 
> can see a number of ways to subvert this for evil)
[...]
> Design Edict #4: Dan is officially iffy on jumps, but can see them as 
> useful for lower-level statically bound languages such as forth, 
> Scheme, or C.

The combination of jumps and destinations
would help the implementation of COME FROM.
==
Mark Leighton Fisher   [EMAIL PROTECTED]
Thomson, Inc.  Indianapolis IN
"we have tamed lightning and used it to teach sand to think"




Re: Bytecode metadata

2003-01-23 Thread James Michael DuPont

--- Dave Mitchell <[EMAIL PROTECTED]> wrote:
> On Thu, Jan 23, 2003 at 09:21:45PM +0100, Juergen Boemmels wrote:
> > My current idea for the in memory format of the bytecode is this:
> 
> I would strongly urge any file-based byte-code format to arranged
> in such a way that it (or most of it) can simply be mmap-ed in (RO),
> analogously to executables.
> 
> This means that a Perl server that relies on a lot of modules, and
> which
> forks for each connection (imagine a Perl-based web server), doesn't
> consume acres of swap space just to have an in-memory image per Perl
> process, of all the modules.

sounds good.

could that be seen as similar to shared memory communication with the
compile,
via mem-mapped file interfaces?

mike

> This is a real problem that's hitting me hard with Perl 5 in my day
> job.
> 
> Dave.
> 
> -- 
> Any [programming] language that doesn't occasionally surprise the
> novice will pay for it by continually surprising the expert.
>  - Larry Wall


=
James Michael DuPont
http://introspector.sourceforge.net/

__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com



Re: Transferring control between code segments, eval, and suchlike things

2003-01-23 Thread Juergen Boemmels
Dan Sugalski <[EMAIL PROTECTED]> writes:

> Okay, since this has all come up, here's the scoop from a design perspective.
> 
> First, the branch opcodes (branch, bsr, and the conditionals) are all
> meant for movement within a segment of bytecode. They are *not*
> supposed to leave a segment. To do so was arguably a bad idea, now
> it's officially an error. If you need to do so, branch to an op that
> can transfer across boundaries.
> 
> 
> Design Edict #1: Branches, which is any transfer of control that takes
> an offset, may *not* escape the current bytecode segment.

Okay with that.

> Next, jumps. Jumps take absolute addresses, so either need fixup at
> load time (blech), are only valid in dynamically generated code (okay,
> but limiting), or can only jump to values in registers (that's
> fine). Jumps aren't a problem in general.
> 
> 
> Design Edict #2: Jumps may go anywhere.

In the sense that every possible target (via #3) can be reached with a
jump, but bad things may happen if target isnt valid.

> Destinations. These are a pain, since if we can go anywhere then the
> JIT has to do all sorts of nasty and unpleasant things to compensate,
> and to make every op a valid destination. Yuck.
> 
> 
> Design Edict #3: All destinations *must* be marked as such in the
> bytecode metadata segment. (I am officially nervous about this, as I
> can see a number of ways to subvert this for evil)

This is not more or less evil than 
branch -1
The destinations can be rangechecked at load time, the assembler will
hopefully emit these offsets correct, and they will be read-only after
compilation.

> I'm only keeping jumps (and their corresponding jsr) around for
> nostalgic reasons, and with the vague hope they may be useful. I'm not
> sure about this.
> 
> 
> Design Edict #4: Dan is officially iffy on jumps, but can see them as
> useful for lower-level statically bound languages such as forth,
> Scheme, or C.
> 
> 
> That leads us to
> 
> Design Edict #5: Dan will accommodate semantics for languages outside
> the core set (perl, python, ruby) only if they don't compromise
> performance for the core set.
> 
> 
> Calling actual routines--subs, methods, functions, whatever--at the
> high level isn't done with branches or jumps. It is, instead, done
> with the call series of ops. (call, callmeth, callcc, tailcall,
> tailcallmeth, tailcallcc (though that one makes my head hurt), invoke)
> These are specifically for calling code that's potentially in other
> segments, and to call into them at fixed points. I think these need to
> be hashed out a bit to make them more JIT-friendly, but they're the
> primary transfer destination point

This calls are allways jumps or jsr in disguise. In the end they
always do a goto ADDRESS(something). These means that every
sub/method/continuation must be marked by #3

> Design Edict #6: The first op in a sub is always a valid
> jump/branch/control transfer destination

This is the essentally #3

> Now. Eval. The compile opcode going in is phenomenally cool (thanks,
> Leo!) but has pointed out some holes in the semantics. I got handwavey
> and, well, it shows. No cookie for me.
> 
> 
> The compreg op should compile the passed code in the language that is
> indicated and should load that bytecode into the current
> interpreter. That means that if there are any symbols that get
> installed because someone's defined a sub then, well, they should get
> installed into the interpreter's symbol tables.

Not the compile would install the symbols in the interpreters symbol
table, it would store it somewhere in the bytecode metadata. The eval
should install this in the interpreters symboltable.

The problem really starts if BEGIN {...} blocks are used because they
will be evaluated after the block compiled but before the whole
compile is finished.

> Compiled code is an interesting thing. In some cases it should return
> a sub PMC, in some cases it should execute and return a value, and in
> some cases  it should install a bunch of stuff in a symbol table and
> then return a value. These correspond to:
> 
> 
> 
> eval "print 12";
> 
> $foo = eval "sub bar{return 1;}";
> 
> require foo.pm;
> 
> respectively. It's sort of a mixed bag, and unfortunately we can't
> count on the code doing the compilation to properly handle the
> semantics of the language being compiled. So...
> 
> 
> Design Edict #7: the compreg opcode will execute the compiled code,
> calling in with parrot's calling conventions. If it should return
> something, then it had darned well better build it and return it.

I find it better to leave compile and eval seperate.
The compile opcode should simply return a bytecode-PMC which then can
be invoked sometimes later.

> Oh, and:
> 
> Design Edict #8: compreg is prototyped. It takes a single string and
> must return a single PMC. The compiler may cheat as need be. (No need
> to check and see if it returned a string, or an int)

It should return a bytecodesegment.
 
> Yes, this

Re: Bytecode metadata

2003-01-23 Thread Dan Sugalski
At 10:29 PM -0800 1/22/03, James Michael DuPont wrote:

You will probably think that this is overkill for parrot,


Why yes, yes I do. On the other hand, when we hand people bazookas to 
deal with their fly problems, we often find they start in on the 
elephant problems as well.

The proposal in general interests me--it looks like a general 
annotation system we can attach to the bytecode. (I admit, I haven't 
read the page you pointed at) I will admit, though, that I was 
thinking more about metadata that the engine could use itself, or 
would provide to programs running on it, but the scheme you've 
outlined may be useful for that.

'Swhat I get for asking a too-general question. :)
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Objects, finally (try 1)

2003-01-23 Thread Dan Sugalski
At 10:10 PM +0530 1/23/03, Gopal V wrote:

If memory serves me right, Erik Bågfors wrote:


 But if I write a library in ruby that depends on the missing_method
 method it will not be usable from other languages, since those languages
 doesn't call missing_method if the method they try to call doesn't
 exist.


Hmm... that's twisting language features with virtual machine instructions.
Actually that's a gray area as far as I can see ... we'll have to go on
with `n' number of methods for `n' languages for member resolution ...


Grey? Heck, take a step back, lots of parrot is done up in neon paisley. :-P


 > Of course, in real life I don't think that's a problem because I haven't

 seen much use of missing_method.


Unfortunately I use a lot of __getattr__ for my python code (especially
GUI) ...


Perl also makes heavy use of this in some of the more interesting modules.


 > Also, having a instruction would be faster which of course is more fun

 :)


Yes... But this not only makes it ugly (as an instruction) , but slow as
well ? . Like you said only a small number of people use this feature ,
does it make sense to slow down the rest ? . And how does hacker ZZ add
a new language with a different member lookup without getting his patches
inside Parrot ?..


That's why the smarts isn't in the opcode function, but rather in the
vtable method lookup function. That way you only pay the cost if the
feature is used.

This can also work to the advantage of languages that need this
feature, as it means that classes that don't have to have a fallback
method lookup can use a faster lookup function that doesn't need to
do a second trace to look for missing functions.


Anyway I'm not *against* implementing this , I'm just questioning the
*need* to implement this ... Just a question for the philosophers ...


The philosophers, alas, got drunk and started fighting over the one
fork at the table. Very messy.
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk



RE: Bytecode metadata

2003-01-23 Thread Dan Sugalski
At 12:10 AM -0800 1/23/03, Brent Dax wrote:

Dan Sugalski:
# Since it looks like it's time to extend the packfile format and the
# in-memory bytecode layout, this would be the time to start discussing
# metadata. What sorts of metadata do people think are useful to have
# in either the packfile (on disk) or in the bytecode (in memory).

I do think that, whatever "native" (i.e. understood by Parrot) metadata
we support, we *must* allow for extensibility, both for future native
metadata and for third-party tools.


"Must" is an awfully strong word, there. We don't really "must" do 
anything, though I do realize the feature is useful, hence my 
question.

 Moreover, this must not be
implemented with a special type of metadata block, or by using
sequentially-increasing numbers.  (The first means that any metadata we
decide to add in the future will be slower than the metadata we add now;
the second has problems with several third-party tools picking the same
number.)


I'm afraid extensible metadata is going to live in its own chunk 
unless someone can come up with a way to embed it without penalty. 
(And I'm generally considering using separate chunks for the metadata 
the engine does understand)
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Objects, finally (try 1)

2003-01-23 Thread Dan Sugalski
At 8:42 AM +0100 1/23/03, Erik Bågfors wrote:

On Wed, 2003-01-22 at 19:46, Christopher Armstrong wrote:

 On Wed, Jan 15, 2003 at 01:57:28AM -0500, Dan Sugalski wrote:
 > At 9:37 PM -0500 1/14/03, Christopher Armstrong wrote:
 > >But who knows, maybe it could be made modular enough (i.e., more
 > >interface-oriented?) to allow the best of both worlds -- I'm far too
 > >novice wrt Parrot to figure out what it'd look like, unfortunately.
 >
 > It'll actually look like what we have now. If you can come up with
 > something more abstract than:
 >
 >   callmethod P1, "foo"
 >
 > that delegates the calling of the foo method to the method dispatch
 > vtable entry for the object in P1, well... gimme, I want it. :)

 Just curious. Exactly how overridable is that `callmethod'? I don't
 really know anything about the vtable stuff in Parrot, but is it
 possible to totally delegate the lookup/calling of "foo" to a function
 that's bound somehow to P1? Or does the "foo" entry have to exist in
 the vtable already? Sorry for the naive question :-) Oh, and if you
 just want to point me at a source file, I guess I can try reading it
 :-) Python basically requires that each step in the process be
 overridable. (1. look up attribute 2. call attribute, at least in
 `callmethod's case).



Ruby needs to call the missing_method method (if I remember correctly).
So if "foo" doesn't exist, it would be good to be able to override
callmethods behavior and make it call missing_method.


This'll be core functionality if languages want to use it. Perl has a
similar function, AUTOLOAD, that gets called if you make a
nonexistent method call. It sounds like we need generic pre and post
method call handler functionality as well, which should be
interesting to design.
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk



Re: Objects, finally (try 1)

2003-01-23 Thread Dan Sugalski
At 1:46 PM -0500 1/22/03, Christopher Armstrong wrote:

On Wed, Jan 15, 2003 at 01:57:28AM -0500, Dan Sugalski wrote:

 At 9:37 PM -0500 1/14/03, Christopher Armstrong wrote:
 >But who knows, maybe it could be made modular enough (i.e., more
 >interface-oriented?) to allow the best of both worlds -- I'm far too
 >novice wrt Parrot to figure out what it'd look like, unfortunately.

 It'll actually look like what we have now. If you can come up with
 something more abstract than:

   callmethod P1, "foo"

 that delegates the calling of the foo method to the method dispatch
 vtable entry for the object in P1, well... gimme, I want it. :)


Just curious. Exactly how overridable is that `callmethod'?


Completely. It ultimately delegates finding the method to the PMC via 
its vtable, so you can then do whatever you want. We're going to 
provide some convenience functions and predefined functionality so 
everyone doesn't have to reimplement the same stuff over and over.

Delegating to the PMC also means that perl objects or ruby objects 
will behave the way they should regardless of what language's code is 
using them. I'm not really expecting too much in the way of different 
behaviour between the languages, but the differences that are there 
should be respected.
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Objects, finally (try 1)

2003-01-23 Thread Dan Sugalski
At 8:16 PM +0530 1/23/03, Gopal V wrote:

If memory serves me right, Erik Bågfors wrote:
 > Ruby needs to call the missing_method method (if I remember correctly).

 So if "foo" doesn't exist, it would be good to be able to override
 callmethods behavior and make it call missing_method.


like I said , the compiler designer can put that explicitly in the
generated code ... You don't actually need instructions to do that.
Also the explicit generation might prove to be better to handle all
the quirks future languages might encounter


Or hiding it in the objects themselves, so we can make sure the
expense of generality is only in place for those objects or classes
that need it, rather than for everyone.


My interest here is to obtain a clear and fast way to call stuff for
static compiled languages. :)


Fair enough, though that would argue for embedding the functionality
in the objects and not the generated code, as AUTOLOAD searching
should be done for a method call on a perl object regardless of
whether the language making the method call supports it. If your C#
code calls a method on a perl object it gets, that method resolution
should be done with perl semantics, not C# semantics.
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk



Re: Bytecode metadata

2003-01-23 Thread Juergen Boemmels
Dave Mitchell <[EMAIL PROTECTED]> writes:

> On Thu, Jan 23, 2003 at 09:21:45PM +0100, Juergen Boemmels wrote:
> > My current idea for the in memory format of the bytecode is this:
> 
> I would strongly urge any file-based byte-code format to arranged
> in such a way that it (or most of it) can simply be mmap-ed in (RO),
> analogously to executables.
> 
> This means that a Perl server that relies on a lot of modules, and which
> forks for each connection (imagine a Perl-based web server), doesn't
> consume acres of swap space just to have an in-memory image per Perl
> process, of all the modules.

This might be possible if the byteorder, wordsize, defaultencoding
etc. are the same in the file on disk and the host.

bye
boe
-- 
Juergen Boemmels[EMAIL PROTECTED]
Fachbereich Physik  Tel: ++49-(0)631-205-2817
Universitaet Kaiserslautern Fax: ++49-(0)631-205-3906
PGP Key fingerprint = 9F 56 54 3D 45 C1 32 6F  23 F6 C7 2F 85 93 DD 47



Re: Bytecode metadata

2003-01-23 Thread Dan Sugalski
At 10:31 PM +0100 1/23/03, Juergen Boemmels wrote:

Dave Mitchell <[EMAIL PROTECTED]> writes:


 On Thu, Jan 23, 2003 at 09:21:45PM +0100, Juergen Boemmels wrote:
 > My current idea for the in memory format of the bytecode is this:

 I would strongly urge any file-based byte-code format to arranged
 in such a way that it (or most of it) can simply be mmap-ed in (RO),
 analogously to executables.

 This means that a Perl server that relies on a lot of modules, and which
 forks for each connection (imagine a Perl-based web server), doesn't
 consume acres of swap space just to have an in-memory image per Perl
 process, of all the modules.


This might be possible if the byteorder, wordsize, defaultencoding
etc. are the same in the file on disk and the host.


Which will generally be the case, I expect. Tell a sysadmin that they 
can reduce the memory footprint of mod_parrot by 50% by running a 
utility (that we provide in the parrot kit) over the library and I 
expect you'll see smoke from the keyboard as he/she whips off the 
command at supersonic speeds... :)
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Bytecode metadata

2003-01-23 Thread Juergen Boemmels
Dan Sugalski <[EMAIL PROTECTED]> writes:

> >This might be possible if the byteorder, wordsize, defaultencoding
> >etc. are the same in the file on disk and the host.
> 
> Which will generally be the case, I expect. Tell a sysadmin that they
> can reduce the memory footprint of mod_parrot by 50% by running a
> utility (that we provide in the parrot kit) over the library and I
> expect you'll see smoke from the keyboard as he/she whips off the
> command at supersonic speeds... :)

It might be even possible to dump the jitted code. This would increase
the startup. Then strip the bytecode to reduce the size of the file
and TADA: Yet another new binary format.

I'm really not sure if I'm serious here
boe
-- 
Juergen Boemmels[EMAIL PROTECTED]
Fachbereich Physik  Tel: ++49-(0)631-205-2817
Universitaet Kaiserslautern Fax: ++49-(0)631-205-3906
PGP Key fingerprint = 9F 56 54 3D 45 C1 32 6F  23 F6 C7 2F 85 93 DD 47



Re: Bytecode metadata

2003-01-23 Thread Dan Sugalski
At 8:39 PM + 1/23/03, Dave Mitchell wrote:

On Thu, Jan 23, 2003 at 09:21:45PM +0100, Juergen Boemmels wrote:

 My current idea for the in memory format of the bytecode is this:


I would strongly urge any file-based byte-code format to arranged
in such a way that it (or most of it) can simply be mmap-ed in (RO),
analogously to executables.


This is the way the bytecode currently works, and we will *not* 
switch to any bytecode format that doesn't at least allow the 
executable code to be mmapped in.
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


RE: Bytecode metadata

2003-01-23 Thread Brent Dax
Dan Sugalski:
# At 12:10 AM -0800 1/23/03, Brent Dax wrote:
# >Dan Sugalski:
# ># Since it looks like it's time to extend the packfile 
# format and the # 
# >in-memory bytecode layout, this would be the time to start 
# discussing # 
# >metadata. What sorts of metadata do people think are useful 
# to have # 
# >in either the packfile (on disk) or in the bytecode (in memory).
# >
# >I do think that, whatever "native" (i.e. understood by 
# Parrot) metadata 
# >we support, we *must* allow for extensibility, both for 
# future native 
# >metadata and for third-party tools.
# 
# "Must" is an awfully strong word, there. We don't really "must" do 
# anything, though I do realize the feature is useful, hence my 
# question.

A strong word for a strong opinion.  :^)  Besides, I did qualify it with
an "I do think", which is another way to say IMO.

# >  Moreover, this must not be
# >implemented with a special type of metadata block, or by using 
# >sequentially-increasing numbers.  (The first means that any 
# metadata we 
# >decide to add in the future will be slower than the metadata we add 
# >now; the second has problems with several third-party tools 
# picking the 
# >same
# >number.)
# 
# I'm afraid extensible metadata is going to live in its own chunk 
# unless someone can come up with a way to embed it without penalty. 
# (And I'm generally considering using separate chunks for the metadata 
# the engine does understand)

Are you expecting to have chunk type determined by order?  If so, what
will you do if a future restructuring means you either don't need chunk
type X or you need a new, highly incompatible version?  Will you leave
in an "empty" ghost chunk?

I would suggest (roughly) the following format for a chunk:

   TYPE: One 32-bit number
VERSION: One 32-bit number; suggested usage is as four eight-bit
components
   SIZE: One 32-bit number of bytes (or maybe 64-bit)
   DATA: arbitrary length

For C-heads, think of it like this:

struct Chunk {
opcode_t type;
opcode_t version;
opcode_t size;
void data[];
};

Type IDs less than 256 would be reserved to Parrot (so we have plenty of
room for future expansion); all third-party tools would use some sort of
cryptographic checksum of the tool's name and the data structure's name,
making sure (of course) that their type ID was greater than 255.

If there's a directory of some sort, it should record the type ID and
the offset to the beginning of the chunk.  This should allow for a
fairly quick lookup by type.  If you think that there might be a demand
for multiple instances of the same type of metadata, you may want to add
a chunk ID of some sort.

--Brent Dax <[EMAIL PROTECTED]>
@roles=map {"Parrot $_"} qw(embedding regexen Configure)

>How do you "test" this 'God' to "prove" it is who it says it is?
"If you're God, you know exactly what it would take to convince me. Do
that."
--Marc Fleury on alt.atheism




Re: Bytecode metadata

2003-01-23 Thread Dan Sugalski
At 11:48 AM -0800 1/23/03, chromatic wrote:

On Wed, 22 Jan 2003 13:27:47 +, Dan Sugalski wrote:


 Since it looks like it's time to extend the packfile format and the
 in-memory bytecode layout, this would be the time to start discussing
 metadata. What sorts of metadata do people think are useful to have
 in either the packfile (on disk) or in the bytecode (in memory).


Comments, if a disassembler is to be able to reconstruct the original source
sufficiently well[1].


Noted. I can see problems with multiline comments across multiline 
code, but that's probably rare enough to not really care much about.
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Bytecode metadata

2003-01-23 Thread James Michael DuPont

--- Juergen Boemmels <[EMAIL PROTECTED]> wrote:
> Hello, 
> 
> after quite a long time away from keyboard and fighting through a
> huge
> backlog of mail I'm (hopefully) back again.
> 
> Dan Sugalski <[EMAIL PROTECTED]> writes:
> 
> > Since it looks like it's time to extend the packfile format and the
> > in-memory bytecode layout, this would be the time to start
> discussing
> > metadata. What sorts of metadata do people think are useful to have
> in
> > either the packfile (on disk) or in the bytecode (in memory).
> 
> My current idea for the in memory format of the bytecode is this:
> One bytecodesegment is a PMC consisting of three parts the actual
> bytecode (a flat array of opcode_t), the associated constants, which
> don't fit into an opcode_t (floats and strings), and a scratch area
> for the JITed code. All other Metadata will be attached as
> properties (or maybe as elements of an aggregate). This will be an
> easy way for future extension. The invoke call to this pmc would
> simply start the bytecode from the first instruction.
> 
> To support inter-segment jumps a kind of symboltable is also
> neccessary. All externally reachable codepoints need some special
> markup. This could be a special opcode extlabel_sc or an entry in a
> symboltable. Also needed is a fixup of the outgoing calls, either via
> modification of the bytecode or via a jumptable. Both have their pros
> and cons: The bytecode modifcation prohibits a readonly mmap of the
> data on disk and the fixup needs to be done at load-time but once
> this
> is done the impact on the runtimespeed is minimal, whereas the
> jumpcode is on extra indirection. But as stated somewere else the
> typical inter-segment jump will be call/tailcall/callmethod/invoke,
> which are at least two indirections.
> 
> The on disk version is a matter of serializing and deserializing this
> PMC.
> 
> > Keep in mind that parrot may be in the position where it has to
> ignore
> > or mistrust the metadata, so be really cautious with things you
> > propose as required.
> 
> Ok to summarize:
> 
> ByteCodeSegment = {
>   bytecode  => requiered;
>   constants => only neccessary if string or num constants;
>   fixup => (or jumptable) only neccessary if outgoing jumps;
>   symbols   => all possible incomming branchpoints, optional;
>   JIT   => will be filled when bytecode is invoked;
> 
>   source=> surely optional;
>   debuginfo => also optional;
>   ...
> }


I LIKE IT.
Bytecodes have a type? each bytecode has meta-data?
Here are the metadata I have collected from the parrot source code so
far. It should be a set of predicates to define all the other meta-data
needed.

First, this is the core meta-data for storing perl code  :
in order of simplicity
identifier_node
Name of things

boolean_type,integer_type,real_type
types of things that are simple

all *_decls have a type that is a type_*
all *_decls have a name that is a type_decl or identifier_node

const_decl
Constant values
var_decl
variable values


The rest of the more complex types need a tree_list 
tree_list

function_decl,  
parm_decl  # list of 

array_type
integer_cst, # list of 

enumeral_type
integer_cst  # list of 

record_type,union_type,
field_decl   # list of 

# a void is very special
void_type

The following are derived types :
pointer_type,reference_type

# function types allow for linkage
function_type,
type_* # we have a list of  

# here the user defines its own 
type_decl  

# this is a commonly defined user type
complex_type,








=
James Michael DuPont
http://introspector.sourceforge.net/

__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com



RE: Bytecode metadata

2003-01-23 Thread Dan Sugalski
At 2:48 PM -0800 1/23/03, Brent Dax wrote:

Dan Sugalski:
# At 12:10 AM -0800 1/23/03, Brent Dax wrote:
# >Dan Sugalski:
# ># Since it looks like it's time to extend the packfile
# format and the #
# >in-memory bytecode layout, this would be the time to start
# discussing #
# >metadata. What sorts of metadata do people think are useful
# to have #
# >in either the packfile (on disk) or in the bytecode (in memory).
# >
# >I do think that, whatever "native" (i.e. understood by
# Parrot) metadata
# >we support, we *must* allow for extensibility, both for
# future native
# >metadata and for third-party tools.
#
# "Must" is an awfully strong word, there. We don't really "must" do
# anything, though I do realize the feature is useful, hence my
# question.

A strong word for a strong opinion.  :^)  Besides, I did qualify it with
an "I do think", which is another way to say IMO.


Heh. I try and avoid the absolute statements. This is all 
engineering, and engineering is applied economics--you juggle 
features and make compromises to get the thing that meets your needs 
as best as possible at a cost you can manage. Allowing extensibility 
is Really Keen, but has its associated cost that has to be balanced 
against everything else.

Having said that, I think we can do this, but I want a better feel 
for what we need, what we want, and what it'll cost before we make a 
decision.

Are you expecting to have chunk type determined by order?


Yes and no. Yes in that I want the first few chunks, the ones that 
are required, to be at fixed offsets. Following that will be a 
directory, and from there we can index off to wherever we need to.
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


RE: Bytecode metadata

2003-01-23 Thread James Michael DuPont

--- Brent Dax <[EMAIL PROTECTED]> wrote:
> Dan Sugalski:
> # At 12:10 AM -0800 1/23/03, Brent Dax wrote:
> # >Dan Sugalski:
> # ># Since it looks like it's time to extend the packfile 
> # format and the # 
> # >in-memory bytecode layout, this would be the time to start 
> # discussing # 
> # >metadata. What sorts of metadata do people think are useful 
> # to have # 
> # >in either the packfile (on disk) or in the bytecode (in memory).
> # >
> # >I do think that, whatever "native" (i.e. understood by 
> # Parrot) metadata 
> # >we support, we *must* allow for extensibility, both for 
> # future native 
> # >metadata and for third-party tools.
> # 
> # "Must" is an awfully strong word, there. We don't really "must" do 
> # anything, though I do realize the feature is useful, hence my 
> # question.
> 
> A strong word for a strong opinion.  :^)  Besides, I did qualify it
> with
> an "I do think", which is another way to say IMO.
> 
> # >  Moreover, this must not be
> # >implemented with a special type of metadata block, or by using 
> # >sequentially-increasing numbers.  (The first means that any 
> # metadata we 
> # >decide to add in the future will be slower than the metadata we
> add 
> # >now; the second has problems with several third-party tools 
> # picking the 
> # >same
> # >number.)
> # 
> # I'm afraid extensible metadata is going to live in its own chunk 
> # unless someone can come up with a way to embed it without penalty. 
> # (And I'm generally considering using separate chunks for the
> metadata 
> # the engine does understand)
> 
> Are you expecting to have chunk type determined by order?  If so,
> what
> will you do if a future restructuring means you either don't need
> chunk
> type X or you need a new, highly incompatible version?  Will you
> leave
> in an "empty" ghost chunk?
> 
> I would suggest (roughly) the following format for a chunk:
> 
>  TYPE: One 32-bit number
>   VERSION: One 32-bit number; suggested usage is as four eight-bit
> components
>  SIZE: One 32-bit number of bytes (or maybe 64-bit)
>  DATA: arbitrary length
> 
> For C-heads, think of it like this:
> 
>   struct Chunk {
>   opcode_t type;
>   opcode_t version;
>   opcode_t size;
>   void data[];
>   };
> 
> Type IDs less than 256 would be reserved to Parrot (so we have plenty
> of
> room for future expansion); all third-party tools would use some sort
> of
> cryptographic checksum of the tool's name and the data structure's
> name,
> making sure (of course) that their type ID was greater than 255.
> 
> If there's a directory of some sort, it should record the type ID and
> the offset to the beginning of the chunk.  This should allow for a
> fairly quick lookup by type.  If you think that there might be a
> demand
> for multiple instances of the same type of metadata, you may want to
> add
> a chunk ID of some sort.

Cool!
that means we can use opcodes to store the introspector data!

We need to have the meta data paired with the opcodes.

basically this means storing the source code in some ast form in the
meta-data for full reflection and introspection on the expression
level.


mike

=
James Michael DuPont
http://introspector.sourceforge.net/

__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com



Re: Bytecode metadata

2003-01-23 Thread Leopold Toetsch
Dan Sugalski wrote:


Since it looks like it's time to extend the packfile format and the 
in-memory bytecode layout, this would be the time to start discussing 
metadata. What sorts of metadata do people think are useful to have in 
either the packfile (on disk) or in the bytecode (in memory).

I'm currently simplifying the whole packfile routines. It still does 
read the old format, but the compat code is centralized now in one place.

The main change is now this structure:
struct PackFile_funcs {
PackFile_Segment_new_func_t new_seg;
PackFile_Segment_destroy_func_t destroy;
PackFile_Segment_packed_size_func_t packed_size;
PackFile_Segment_pack_func_t pack;
PackFile_Segment_unpack_func_t unpack;
PackFile_Segment_dump_func_t dump;
};

All registered types define these funtions to make pack/unpack/dump work 
for their type.
Registered types are consecutively numbered, unknown types still get 
unpacked or dumped:

typedef enum {
PF_DIR_SEG,
PF_UNKNOWN_SEG,
PF_FIXUP_SEG,
PF_CONST_SEG,
PF_BYTEC_SEG,
PF_DEBUG_SEG,

PF_MAX_SEG
} pack_file_flags;

All packfiles sizes/offsets are in opcode_t not bytes for simplicity - 
though this might need a conversion (but we don't seem to handle 
wordsize transforms now anyway).

leo



Re: Bytecode metadata

2003-01-23 Thread Leopold Toetsch
Dave Mitchell wrote:


On Thu, Jan 23, 2003 at 09:21:45PM +0100, Juergen Boemmels wrote:


My current idea for the in memory format of the bytecode is this:



I would strongly urge any file-based byte-code format to arranged
in such a way that it (or most of it) can simply be mmap-ed in (RO),
analogously to executables.



How many mmap's can $arch have for one program and for all?
Could we hit some limits here, if every module loaded gets (and stays) 
mmap()ed.


Dave.


leo





Re: Bytecode metadata

2003-01-23 Thread Dan Sugalski
At 7:23 AM +0100 1/24/03, Leopold Toetsch wrote:

Dave Mitchell wrote:


On Thu, Jan 23, 2003 at 09:21:45PM +0100, Juergen Boemmels wrote:


My current idea for the in memory format of the bytecode is this:



I would strongly urge any file-based byte-code format to arranged
in such a way that it (or most of it) can simply be mmap-ed in (RO),
analogously to executables.



How many mmap's can $arch have for one program and for all?
Could we hit some limits here, if every module loaded gets (and 
stays) mmap()ed.

We certainly could, which I suppose would argue for building in 
sufficient smarts to the bytecode loader to switch to file reading if 
an mmap fails. It'll be slower, but working is generally a good thing.
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Odd JIT timings

2003-01-23 Thread Dan Sugalski
I just gave a run of examples/assembly/mops_p.pasm, getting some 
performance numbers. Here's an interesting timing.

  no jit:   24.9 seconds
  with jit: 33.6 seconds

This is... odd. And on PPC, FWIW, and I'm not sure if it happens on x86.

Someone care to check it out and poke around a bit?
--
Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk