RE: Bytecode metadata
Dan Sugalski: # Since it looks like it's time to extend the packfile format and the # in-memory bytecode layout, this would be the time to start discussing # metadata. What sorts of metadata do people think are useful to have # in either the packfile (on disk) or in the bytecode (in memory). I do think that, whatever "native" (i.e. understood by Parrot) metadata we support, we *must* allow for extensibility, both for future native metadata and for third-party tools. Moreover, this must not be implemented with a special type of metadata block, or by using sequentially-increasing numbers. (The first means that any metadata we decide to add in the future will be slower than the metadata we add now; the second has problems with several third-party tools picking the same number.) --Brent Dax <[EMAIL PROTECTED]> @roles=map {"Parrot $_"} qw(embedding regexen Configure) >How do you "test" this 'God' to "prove" it is who it says it is? "If you're God, you know exactly what it would take to convince me. Do that." --Marc Fleury on alt.atheism
Re: [PATCH] nci.pmc mark routine
Steve Fink wrote: I'm confused by nci.pmc's mark() routine. It calls pobject_lives() on the ->cache.struct_val pointer. But in set_string_keyed(), that seems to be set to a pointer to a function, which is definitely not a PObj*. Yes, that's right. The ->data field, on the other hand, appears to be a PObj*. No the data field is either the statically built native calling thunk from nci.c:build_call_func() or a malloced thunk from JITs equivalent. ... Is the below patch correct? Almost :-) No, we don't have a PMC to mark finally, the line PObj_custom_mark_SET(SELF); is wrong (and makes the mark obsolete) Thanks for that analysis, leo
Re: Transferring control between code segments, eval, and suchlike things
Dan Sugalski wrote: Okay, since this has all come up, here's the scoop from a design perspective. Hard stuff did meet my printer at midnight, reading it onscreen twice didn't help ;-) First: Definition #0: A bytecode segment is a sequence of code, which is loaded into memory with no execution of such code intersparsed. So all subs, modules, whatever loaded from zig files may be one code segment, *if* the runloop wasn't entered. Or: as soon as the code is running, loading additional bytecode puts this code into a different bytecode segment. Design Edict #1: Branches, which is any transfer of control that takes an offset, may *not* escape the current bytecode segment. Design Edict #2: Jumps may go anywhere. Design Edict #3: All destinations *must* be marked as such in the bytecode metadata segment. (I am officially nervous about this, as I can see a number of ways to subvert this for evil) I would define: Jumps may go to any location aquired per set_addr call or to branch tables. Jumping somewhere else may kill your dog. Jumping to a set_addr label is recognized already, jump tables may probably need some marker around them, so that the jump targets won't get killed by dead code elimination. I'm only keeping jumps (and their corresponding jsr) around for nostalgic reasons, and with the vague hope they may be useful. I'm not sure about this. They would be useful for a computed goto. s/compreg/compile/g for($below); The compreg op should compile the passed code ... Design Edict #7: the compreg opcode will execute the compiled code, calling in with parrot's calling conventions. If it should return something, then it had darned well better build it and return it. If the compile opcode has to execute the code, I would call it "eval". But: When compile and eval are separate stages, the HL might be able to pull the compile stage out of e.g. loops. So I think keeping compiling and evaling separate makes sense. Thanks for putting this together, leo
Re: Transferring control between code segments, eval, and suchlike things
Benjamin Stuhl wrote: At 03:00 PM 1/22/2003 -0500, you wrote: ... Although, all this would seem to suggest that we'd need/want a special-purpose allocator for bytecode segments, since every sub has to fit within precisely one segment (and I know _I'd_ like to keep bytecode segments on their own memory pages, to e.g. maximize sharing on fork()). IMHO this is a big waste of memory - and running this page aligned code JITted doesn't buy anything. Design Edict #7: the compreg opcode will execute the compiled code, calling in with parrot's calling conventions. If it should return something, then it had darned well better build it and return it. How does this play with eval 'sub bar { change_foo(); } BEGIN { bar(); } (...stuff that depends on foo...)'; ? The semantics of BEGIN{} would seem to require that bar be installed into the symbol table immediately... but then how do we reproduce that if we're e.g. loading precompiled bytecode? Precompiled PBC and eval is a PITA. This issue seems to imply some extra parsing during load time and setting up symbols. I dunno yet, how to handle this. leo
Re: Parrot Developer Day(s)?
Dan Sugalski sent the following bits through the ether: > Also, I know that we do have people scattered all over the world, but > if someone wants to try and get a list of who's where, we may find > it's worth it to get groups of people together. (I don't, after all, > have to be involved... :) Right, I'm going to organise a sort of world map of Parrot developers. If you consider yourself a Parrot developer (ie write Parrot code, patches, assembler, whatever, the more the merrier!) please email me *privately* filling in the following form: Name: Email: Nickname: City: County/State: Country: Latitude: Longitude: The longitude and latitude should be +-DDD.DDD, that is decimal degrees. For example London, UK would be (51.500, -0.083). Useful resources for finding your longitude, latitude: http://www.getty.edu/research/tools/vocabulary/tgn/ http://www.ckdhr.com/dns-loc/finding.html Thanks! Leon -- Leon Brocard.http://www.astray.com/ scribot.http://www.scribot.com/ ... Famous last words - You and what army?
Re: [PATCH] nci.pmc mark routine
Steve Fink wrote: I'm confused by nci.pmc's mark() routine. Ok, nci's mark() is gone. But - what confuses me - this patch needs a "make progclean" for changes to take effect. Without default_mark gets called, because I don't know where, something isn't recompiled. Argh: pmc->vtable->init or ->set_string_keyed() is a totally static thing. This implies, that after changing a vtable function, each file, that is using a PMC of this type has to be rebuilt. Our dependencies don't reflect this case. leo
Re: Parrot Developer Day(s)?
On Thu, Jan 23, 2003 at 10:23:22AM +, Leon Brocard wrote: > Latitude: > Longitude: You forgot altitude. A proper ICBM block needs altitude :-) > Useful resources for finding your longitude, latitude: If you're in the UK you can get lat and long conversions from http://www.streetmap.co.uk/ Which is useful, as they let you look up locations by postcode, OS grid reference and lots of other things. I await to see how many people this is useful for. Nicholas Clark
Re: Objects, finally (try 1)
If memory serves me right, Erik Bågfors wrote: > > :-) Python basically requires that each step in the process be > > overridable. (1. look up attribute 2. call attribute, at least in > > `callmethod's case). would this be more of what you need ? obj.__dict__["foo"].__call__(); /me again shows up and says that the compiler designer can do this with ease ... Or in this case the interpreter designer can implement an ``InvocationExpression'' in anyway they want ... I think Jython would be a fine example of how this could be done (though speed suffers on a hash lookup without engine support ?). > Ruby needs to call the missing_method method (if I remember correctly). > So if "foo" doesn't exist, it would be good to be able to override > callmethods behavior and make it call missing_method. like I said , the compiler designer can put that explicitly in the generated code ... You don't actually need instructions to do that. Also the explicit generation might prove to be better to handle all the quirks future languages might encounter My interest here is to obtain a clear and fast way to call stuff for static compiled languages. :) Gopal -- The difference between insanity and genius is measured by success
Re: Objects, finally (try 1)
On Thu, 2003-01-23 at 15:46, Gopal V wrote: > > Ruby needs to call the missing_method method (if I remember correctly). > > So if "foo" doesn't exist, it would be good to be able to override > > callmethods behavior and make it call missing_method. > > like I said , the compiler designer can put that explicitly in the > generated code ... You don't actually need instructions to do that. > Also the explicit generation might prove to be better to handle all > the quirks future languages might encounter Sure, there is only one problem with that. I don't know if it's a real problem or not. But if I write a library in ruby that depends on the missing_method method it will not be usable from other languages, since those languages doesn't call missing_method if the method they try to call doesn't exist. Of course, in real life I don't think that's a problem because I haven't seen much use of missing_method. Also, having a instruction would be faster which of course is more fun :) > My interest here is to obtain a clear and fast way to call stuff for > static compiled languages. :) But the really interesting thing about parrot is that it is primarily made for very dynamic languages. Personally I think it's quite ok if C# is a little bit slower under parrot than under mono/dotgnu/MS.NET, as long as the dynamic languages are as fast or faster than they are now. /Erik -- Erik Bågfors | [EMAIL PROTECTED] Supporter of free software | GSM +46 733 279 273 fingerprint: A85B 95D3 D26B 296B 6C60 4F32 2C0B 693D 6E32
Re: Objects, finally (try 1)
If memory serves me right, Erik Bågfors wrote: > But if I write a library in ruby that depends on the missing_method > method it will not be usable from other languages, since those languages > doesn't call missing_method if the method they try to call doesn't > exist. Hmm... that's twisting language features with virtual machine instructions. Actually that's a gray area as far as I can see ... we'll have to go on with `n' number of methods for `n' languages for member resolution ... > Of course, in real life I don't think that's a problem because I haven't > seen much use of missing_method. Unfortunately I use a lot of __getattr__ for my python code (especially GUI) ... > Also, having a instruction would be faster which of course is more fun > :) Yes... But this not only makes it ugly (as an instruction) , but slow as well ? . Like you said only a small number of people use this feature , does it make sense to slow down the rest ? . And how does hacker ZZ add a new language with a different member lookup without getting his patches inside Parrot ?.. Anyway I'm not *against* implementing this , I'm just questioning the *need* to implement this ... Just a question for the philosophers ... Gopal -- The difference between insanity and genius is measured by success
Re: Bytecode metadata
On Wed, 22 Jan 2003 13:27:47 +, Dan Sugalski wrote: > Since it looks like it's time to extend the packfile format and the > in-memory bytecode layout, this would be the time to start discussing > metadata. What sorts of metadata do people think are useful to have > in either the packfile (on disk) or in the bytecode (in memory). Comments, if a disassembler is to be able to reconstruct the original source sufficiently well[1]. -- c 1) for the various values of "well" that include "semantic equivalence"
Re: Bytecode metadata
Hello, after quite a long time away from keyboard and fighting through a huge backlog of mail I'm (hopefully) back again. Dan Sugalski <[EMAIL PROTECTED]> writes: > Since it looks like it's time to extend the packfile format and the > in-memory bytecode layout, this would be the time to start discussing > metadata. What sorts of metadata do people think are useful to have in > either the packfile (on disk) or in the bytecode (in memory). My current idea for the in memory format of the bytecode is this: One bytecodesegment is a PMC consisting of three parts the actual bytecode (a flat array of opcode_t), the associated constants, which don't fit into an opcode_t (floats and strings), and a scratch area for the JITed code. All other Metadata will be attached as properties (or maybe as elements of an aggregate). This will be an easy way for future extension. The invoke call to this pmc would simply start the bytecode from the first instruction. To support inter-segment jumps a kind of symboltable is also neccessary. All externally reachable codepoints need some special markup. This could be a special opcode extlabel_sc or an entry in a symboltable. Also needed is a fixup of the outgoing calls, either via modification of the bytecode or via a jumptable. Both have their pros and cons: The bytecode modifcation prohibits a readonly mmap of the data on disk and the fixup needs to be done at load-time but once this is done the impact on the runtimespeed is minimal, whereas the jumpcode is on extra indirection. But as stated somewere else the typical inter-segment jump will be call/tailcall/callmethod/invoke, which are at least two indirections. The on disk version is a matter of serializing and deserializing this PMC. > Keep in mind that parrot may be in the position where it has to ignore > or mistrust the metadata, so be really cautious with things you > propose as required. Ok to summarize: ByteCodeSegment = { bytecode => requiered; constants => only neccessary if string or num constants; fixup => (or jumptable) only neccessary if outgoing jumps; symbols => all possible incomming branchpoints, optional; JIT => will be filled when bytecode is invoked; source=> surely optional; debuginfo => also optional; ... } bye boe. -- Juergen Boemmels[EMAIL PROTECTED] Fachbereich Physik Tel: ++49-(0)631-205-2817 Universitaet Kaiserslautern Fax: ++49-(0)631-205-3906 PGP Key fingerprint = 9F 56 54 3D 45 C1 32 6F 23 F6 C7 2F 85 93 DD 47
Re: Bytecode metadata
On Thu, Jan 23, 2003 at 09:21:45PM +0100, Juergen Boemmels wrote: > My current idea for the in memory format of the bytecode is this: I would strongly urge any file-based byte-code format to arranged in such a way that it (or most of it) can simply be mmap-ed in (RO), analogously to executables. This means that a Perl server that relies on a lot of modules, and which forks for each connection (imagine a Perl-based web server), doesn't consume acres of swap space just to have an in-memory image per Perl process, of all the modules. This is a real problem that's hitting me hard with Perl 5 in my day job. Dave. -- Any [programming] language that doesn't occasionally surprise the novice will pay for it by continually surprising the expert. - Larry Wall
Re: Bytecode metadata
--- chromatic <[EMAIL PROTECTED]> wrote: > On Wed, 22 Jan 2003 13:27:47 +, Dan Sugalski wrote: > > > Since it looks like it's time to extend the packfile format and the > > > in-memory bytecode layout, this would be the time to start > discussing > > metadata. What sorts of metadata do people think are useful to have > > > in either the packfile (on disk) or in the bytecode (in memory). > > Comments, if a disassembler is to be able to reconstruct the original > source > sufficiently well[1]. > > -- c > > 1) for the various values of "well" that include "semantic equivalence" Yes! Deparsing. that would be great. mike = James Michael DuPont http://introspector.sourceforge.net/ __ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com
Re: Transferring control between code segments, eval, and suchlike things
On Wed, Jan 22, 2003 at 03:00:37PM -0500, Dan Sugalski wrote: > Destinations. These are a pain, since if we can go anywhere then the > JIT has to do all sorts of nasty and unpleasant things to compensate, > and to make every op a valid destination. Yuck. Arbitrary jumps are not that difficult to deal with in the JIT. The JIT compiler can handle jumps to arbitrary addresses by falling back into the interpreter if the destination does not coincide with a previously known entry point, reentering the JIT code later at a safe point. pbc2c generated code does this. This way the JIT does not have to support making every instruction a safe branch destination. -- Jason
RE: Transferring control between code segments, eval, and suchlik e things
> Design Edict #3: All destinations *must* be marked as such in the > bytecode metadata segment. (I am officially nervous about this, as I > can see a number of ways to subvert this for evil) [...] > Design Edict #4: Dan is officially iffy on jumps, but can see them as > useful for lower-level statically bound languages such as forth, > Scheme, or C. The combination of jumps and destinations would help the implementation of COME FROM. == Mark Leighton Fisher [EMAIL PROTECTED] Thomson, Inc. Indianapolis IN "we have tamed lightning and used it to teach sand to think"
Re: Bytecode metadata
--- Dave Mitchell <[EMAIL PROTECTED]> wrote: > On Thu, Jan 23, 2003 at 09:21:45PM +0100, Juergen Boemmels wrote: > > My current idea for the in memory format of the bytecode is this: > > I would strongly urge any file-based byte-code format to arranged > in such a way that it (or most of it) can simply be mmap-ed in (RO), > analogously to executables. > > This means that a Perl server that relies on a lot of modules, and > which > forks for each connection (imagine a Perl-based web server), doesn't > consume acres of swap space just to have an in-memory image per Perl > process, of all the modules. sounds good. could that be seen as similar to shared memory communication with the compile, via mem-mapped file interfaces? mike > This is a real problem that's hitting me hard with Perl 5 in my day > job. > > Dave. > > -- > Any [programming] language that doesn't occasionally surprise the > novice will pay for it by continually surprising the expert. > - Larry Wall = James Michael DuPont http://introspector.sourceforge.net/ __ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com
Re: Transferring control between code segments, eval, and suchlike things
Dan Sugalski <[EMAIL PROTECTED]> writes: > Okay, since this has all come up, here's the scoop from a design perspective. > > First, the branch opcodes (branch, bsr, and the conditionals) are all > meant for movement within a segment of bytecode. They are *not* > supposed to leave a segment. To do so was arguably a bad idea, now > it's officially an error. If you need to do so, branch to an op that > can transfer across boundaries. > > > Design Edict #1: Branches, which is any transfer of control that takes > an offset, may *not* escape the current bytecode segment. Okay with that. > Next, jumps. Jumps take absolute addresses, so either need fixup at > load time (blech), are only valid in dynamically generated code (okay, > but limiting), or can only jump to values in registers (that's > fine). Jumps aren't a problem in general. > > > Design Edict #2: Jumps may go anywhere. In the sense that every possible target (via #3) can be reached with a jump, but bad things may happen if target isnt valid. > Destinations. These are a pain, since if we can go anywhere then the > JIT has to do all sorts of nasty and unpleasant things to compensate, > and to make every op a valid destination. Yuck. > > > Design Edict #3: All destinations *must* be marked as such in the > bytecode metadata segment. (I am officially nervous about this, as I > can see a number of ways to subvert this for evil) This is not more or less evil than branch -1 The destinations can be rangechecked at load time, the assembler will hopefully emit these offsets correct, and they will be read-only after compilation. > I'm only keeping jumps (and their corresponding jsr) around for > nostalgic reasons, and with the vague hope they may be useful. I'm not > sure about this. > > > Design Edict #4: Dan is officially iffy on jumps, but can see them as > useful for lower-level statically bound languages such as forth, > Scheme, or C. > > > That leads us to > > Design Edict #5: Dan will accommodate semantics for languages outside > the core set (perl, python, ruby) only if they don't compromise > performance for the core set. > > > Calling actual routines--subs, methods, functions, whatever--at the > high level isn't done with branches or jumps. It is, instead, done > with the call series of ops. (call, callmeth, callcc, tailcall, > tailcallmeth, tailcallcc (though that one makes my head hurt), invoke) > These are specifically for calling code that's potentially in other > segments, and to call into them at fixed points. I think these need to > be hashed out a bit to make them more JIT-friendly, but they're the > primary transfer destination point This calls are allways jumps or jsr in disguise. In the end they always do a goto ADDRESS(something). These means that every sub/method/continuation must be marked by #3 > Design Edict #6: The first op in a sub is always a valid > jump/branch/control transfer destination This is the essentally #3 > Now. Eval. The compile opcode going in is phenomenally cool (thanks, > Leo!) but has pointed out some holes in the semantics. I got handwavey > and, well, it shows. No cookie for me. > > > The compreg op should compile the passed code in the language that is > indicated and should load that bytecode into the current > interpreter. That means that if there are any symbols that get > installed because someone's defined a sub then, well, they should get > installed into the interpreter's symbol tables. Not the compile would install the symbols in the interpreters symbol table, it would store it somewhere in the bytecode metadata. The eval should install this in the interpreters symboltable. The problem really starts if BEGIN {...} blocks are used because they will be evaluated after the block compiled but before the whole compile is finished. > Compiled code is an interesting thing. In some cases it should return > a sub PMC, in some cases it should execute and return a value, and in > some cases it should install a bunch of stuff in a symbol table and > then return a value. These correspond to: > > > > eval "print 12"; > > $foo = eval "sub bar{return 1;}"; > > require foo.pm; > > respectively. It's sort of a mixed bag, and unfortunately we can't > count on the code doing the compilation to properly handle the > semantics of the language being compiled. So... > > > Design Edict #7: the compreg opcode will execute the compiled code, > calling in with parrot's calling conventions. If it should return > something, then it had darned well better build it and return it. I find it better to leave compile and eval seperate. The compile opcode should simply return a bytecode-PMC which then can be invoked sometimes later. > Oh, and: > > Design Edict #8: compreg is prototyped. It takes a single string and > must return a single PMC. The compiler may cheat as need be. (No need > to check and see if it returned a string, or an int) It should return a bytecodesegment. > Yes, this
Re: Bytecode metadata
At 10:29 PM -0800 1/22/03, James Michael DuPont wrote: You will probably think that this is overkill for parrot, Why yes, yes I do. On the other hand, when we hand people bazookas to deal with their fly problems, we often find they start in on the elephant problems as well. The proposal in general interests me--it looks like a general annotation system we can attach to the bytecode. (I admit, I haven't read the page you pointed at) I will admit, though, that I was thinking more about metadata that the engine could use itself, or would provide to programs running on it, but the scheme you've outlined may be useful for that. 'Swhat I get for asking a too-general question. :) -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Objects, finally (try 1)
At 10:10 PM +0530 1/23/03, Gopal V wrote: If memory serves me right, Erik Bågfors wrote: But if I write a library in ruby that depends on the missing_method method it will not be usable from other languages, since those languages doesn't call missing_method if the method they try to call doesn't exist. Hmm... that's twisting language features with virtual machine instructions. Actually that's a gray area as far as I can see ... we'll have to go on with `n' number of methods for `n' languages for member resolution ... Grey? Heck, take a step back, lots of parrot is done up in neon paisley. :-P > Of course, in real life I don't think that's a problem because I haven't seen much use of missing_method. Unfortunately I use a lot of __getattr__ for my python code (especially GUI) ... Perl also makes heavy use of this in some of the more interesting modules. > Also, having a instruction would be faster which of course is more fun :) Yes... But this not only makes it ugly (as an instruction) , but slow as well ? . Like you said only a small number of people use this feature , does it make sense to slow down the rest ? . And how does hacker ZZ add a new language with a different member lookup without getting his patches inside Parrot ?.. That's why the smarts isn't in the opcode function, but rather in the vtable method lookup function. That way you only pay the cost if the feature is used. This can also work to the advantage of languages that need this feature, as it means that classes that don't have to have a fallback method lookup can use a faster lookup function that doesn't need to do a second trace to look for missing functions. Anyway I'm not *against* implementing this , I'm just questioning the *need* to implement this ... Just a question for the philosophers ... The philosophers, alas, got drunk and started fighting over the one fork at the table. Very messy. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
RE: Bytecode metadata
At 12:10 AM -0800 1/23/03, Brent Dax wrote: Dan Sugalski: # Since it looks like it's time to extend the packfile format and the # in-memory bytecode layout, this would be the time to start discussing # metadata. What sorts of metadata do people think are useful to have # in either the packfile (on disk) or in the bytecode (in memory). I do think that, whatever "native" (i.e. understood by Parrot) metadata we support, we *must* allow for extensibility, both for future native metadata and for third-party tools. "Must" is an awfully strong word, there. We don't really "must" do anything, though I do realize the feature is useful, hence my question. Moreover, this must not be implemented with a special type of metadata block, or by using sequentially-increasing numbers. (The first means that any metadata we decide to add in the future will be slower than the metadata we add now; the second has problems with several third-party tools picking the same number.) I'm afraid extensible metadata is going to live in its own chunk unless someone can come up with a way to embed it without penalty. (And I'm generally considering using separate chunks for the metadata the engine does understand) -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Objects, finally (try 1)
At 8:42 AM +0100 1/23/03, Erik Bågfors wrote: On Wed, 2003-01-22 at 19:46, Christopher Armstrong wrote: On Wed, Jan 15, 2003 at 01:57:28AM -0500, Dan Sugalski wrote: > At 9:37 PM -0500 1/14/03, Christopher Armstrong wrote: > >But who knows, maybe it could be made modular enough (i.e., more > >interface-oriented?) to allow the best of both worlds -- I'm far too > >novice wrt Parrot to figure out what it'd look like, unfortunately. > > It'll actually look like what we have now. If you can come up with > something more abstract than: > > callmethod P1, "foo" > > that delegates the calling of the foo method to the method dispatch > vtable entry for the object in P1, well... gimme, I want it. :) Just curious. Exactly how overridable is that `callmethod'? I don't really know anything about the vtable stuff in Parrot, but is it possible to totally delegate the lookup/calling of "foo" to a function that's bound somehow to P1? Or does the "foo" entry have to exist in the vtable already? Sorry for the naive question :-) Oh, and if you just want to point me at a source file, I guess I can try reading it :-) Python basically requires that each step in the process be overridable. (1. look up attribute 2. call attribute, at least in `callmethod's case). Ruby needs to call the missing_method method (if I remember correctly). So if "foo" doesn't exist, it would be good to be able to override callmethods behavior and make it call missing_method. This'll be core functionality if languages want to use it. Perl has a similar function, AUTOLOAD, that gets called if you make a nonexistent method call. It sounds like we need generic pre and post method call handler functionality as well, which should be interesting to design. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Objects, finally (try 1)
At 1:46 PM -0500 1/22/03, Christopher Armstrong wrote: On Wed, Jan 15, 2003 at 01:57:28AM -0500, Dan Sugalski wrote: At 9:37 PM -0500 1/14/03, Christopher Armstrong wrote: >But who knows, maybe it could be made modular enough (i.e., more >interface-oriented?) to allow the best of both worlds -- I'm far too >novice wrt Parrot to figure out what it'd look like, unfortunately. It'll actually look like what we have now. If you can come up with something more abstract than: callmethod P1, "foo" that delegates the calling of the foo method to the method dispatch vtable entry for the object in P1, well... gimme, I want it. :) Just curious. Exactly how overridable is that `callmethod'? Completely. It ultimately delegates finding the method to the PMC via its vtable, so you can then do whatever you want. We're going to provide some convenience functions and predefined functionality so everyone doesn't have to reimplement the same stuff over and over. Delegating to the PMC also means that perl objects or ruby objects will behave the way they should regardless of what language's code is using them. I'm not really expecting too much in the way of different behaviour between the languages, but the differences that are there should be respected. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Objects, finally (try 1)
At 8:16 PM +0530 1/23/03, Gopal V wrote: If memory serves me right, Erik Bågfors wrote: > Ruby needs to call the missing_method method (if I remember correctly). So if "foo" doesn't exist, it would be good to be able to override callmethods behavior and make it call missing_method. like I said , the compiler designer can put that explicitly in the generated code ... You don't actually need instructions to do that. Also the explicit generation might prove to be better to handle all the quirks future languages might encounter Or hiding it in the objects themselves, so we can make sure the expense of generality is only in place for those objects or classes that need it, rather than for everyone. My interest here is to obtain a clear and fast way to call stuff for static compiled languages. :) Fair enough, though that would argue for embedding the functionality in the objects and not the generated code, as AUTOLOAD searching should be done for a method call on a perl object regardless of whether the language making the method call supports it. If your C# code calls a method on a perl object it gets, that method resolution should be done with perl semantics, not C# semantics. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Bytecode metadata
Dave Mitchell <[EMAIL PROTECTED]> writes: > On Thu, Jan 23, 2003 at 09:21:45PM +0100, Juergen Boemmels wrote: > > My current idea for the in memory format of the bytecode is this: > > I would strongly urge any file-based byte-code format to arranged > in such a way that it (or most of it) can simply be mmap-ed in (RO), > analogously to executables. > > This means that a Perl server that relies on a lot of modules, and which > forks for each connection (imagine a Perl-based web server), doesn't > consume acres of swap space just to have an in-memory image per Perl > process, of all the modules. This might be possible if the byteorder, wordsize, defaultencoding etc. are the same in the file on disk and the host. bye boe -- Juergen Boemmels[EMAIL PROTECTED] Fachbereich Physik Tel: ++49-(0)631-205-2817 Universitaet Kaiserslautern Fax: ++49-(0)631-205-3906 PGP Key fingerprint = 9F 56 54 3D 45 C1 32 6F 23 F6 C7 2F 85 93 DD 47
Re: Bytecode metadata
At 10:31 PM +0100 1/23/03, Juergen Boemmels wrote: Dave Mitchell <[EMAIL PROTECTED]> writes: On Thu, Jan 23, 2003 at 09:21:45PM +0100, Juergen Boemmels wrote: > My current idea for the in memory format of the bytecode is this: I would strongly urge any file-based byte-code format to arranged in such a way that it (or most of it) can simply be mmap-ed in (RO), analogously to executables. This means that a Perl server that relies on a lot of modules, and which forks for each connection (imagine a Perl-based web server), doesn't consume acres of swap space just to have an in-memory image per Perl process, of all the modules. This might be possible if the byteorder, wordsize, defaultencoding etc. are the same in the file on disk and the host. Which will generally be the case, I expect. Tell a sysadmin that they can reduce the memory footprint of mod_parrot by 50% by running a utility (that we provide in the parrot kit) over the library and I expect you'll see smoke from the keyboard as he/she whips off the command at supersonic speeds... :) -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Bytecode metadata
Dan Sugalski <[EMAIL PROTECTED]> writes: > >This might be possible if the byteorder, wordsize, defaultencoding > >etc. are the same in the file on disk and the host. > > Which will generally be the case, I expect. Tell a sysadmin that they > can reduce the memory footprint of mod_parrot by 50% by running a > utility (that we provide in the parrot kit) over the library and I > expect you'll see smoke from the keyboard as he/she whips off the > command at supersonic speeds... :) It might be even possible to dump the jitted code. This would increase the startup. Then strip the bytecode to reduce the size of the file and TADA: Yet another new binary format. I'm really not sure if I'm serious here boe -- Juergen Boemmels[EMAIL PROTECTED] Fachbereich Physik Tel: ++49-(0)631-205-2817 Universitaet Kaiserslautern Fax: ++49-(0)631-205-3906 PGP Key fingerprint = 9F 56 54 3D 45 C1 32 6F 23 F6 C7 2F 85 93 DD 47
Re: Bytecode metadata
At 8:39 PM + 1/23/03, Dave Mitchell wrote: On Thu, Jan 23, 2003 at 09:21:45PM +0100, Juergen Boemmels wrote: My current idea for the in memory format of the bytecode is this: I would strongly urge any file-based byte-code format to arranged in such a way that it (or most of it) can simply be mmap-ed in (RO), analogously to executables. This is the way the bytecode currently works, and we will *not* switch to any bytecode format that doesn't at least allow the executable code to be mmapped in. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
RE: Bytecode metadata
Dan Sugalski: # At 12:10 AM -0800 1/23/03, Brent Dax wrote: # >Dan Sugalski: # ># Since it looks like it's time to extend the packfile # format and the # # >in-memory bytecode layout, this would be the time to start # discussing # # >metadata. What sorts of metadata do people think are useful # to have # # >in either the packfile (on disk) or in the bytecode (in memory). # > # >I do think that, whatever "native" (i.e. understood by # Parrot) metadata # >we support, we *must* allow for extensibility, both for # future native # >metadata and for third-party tools. # # "Must" is an awfully strong word, there. We don't really "must" do # anything, though I do realize the feature is useful, hence my # question. A strong word for a strong opinion. :^) Besides, I did qualify it with an "I do think", which is another way to say IMO. # > Moreover, this must not be # >implemented with a special type of metadata block, or by using # >sequentially-increasing numbers. (The first means that any # metadata we # >decide to add in the future will be slower than the metadata we add # >now; the second has problems with several third-party tools # picking the # >same # >number.) # # I'm afraid extensible metadata is going to live in its own chunk # unless someone can come up with a way to embed it without penalty. # (And I'm generally considering using separate chunks for the metadata # the engine does understand) Are you expecting to have chunk type determined by order? If so, what will you do if a future restructuring means you either don't need chunk type X or you need a new, highly incompatible version? Will you leave in an "empty" ghost chunk? I would suggest (roughly) the following format for a chunk: TYPE: One 32-bit number VERSION: One 32-bit number; suggested usage is as four eight-bit components SIZE: One 32-bit number of bytes (or maybe 64-bit) DATA: arbitrary length For C-heads, think of it like this: struct Chunk { opcode_t type; opcode_t version; opcode_t size; void data[]; }; Type IDs less than 256 would be reserved to Parrot (so we have plenty of room for future expansion); all third-party tools would use some sort of cryptographic checksum of the tool's name and the data structure's name, making sure (of course) that their type ID was greater than 255. If there's a directory of some sort, it should record the type ID and the offset to the beginning of the chunk. This should allow for a fairly quick lookup by type. If you think that there might be a demand for multiple instances of the same type of metadata, you may want to add a chunk ID of some sort. --Brent Dax <[EMAIL PROTECTED]> @roles=map {"Parrot $_"} qw(embedding regexen Configure) >How do you "test" this 'God' to "prove" it is who it says it is? "If you're God, you know exactly what it would take to convince me. Do that." --Marc Fleury on alt.atheism
Re: Bytecode metadata
At 11:48 AM -0800 1/23/03, chromatic wrote: On Wed, 22 Jan 2003 13:27:47 +, Dan Sugalski wrote: Since it looks like it's time to extend the packfile format and the in-memory bytecode layout, this would be the time to start discussing metadata. What sorts of metadata do people think are useful to have in either the packfile (on disk) or in the bytecode (in memory). Comments, if a disassembler is to be able to reconstruct the original source sufficiently well[1]. Noted. I can see problems with multiline comments across multiline code, but that's probably rare enough to not really care much about. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Bytecode metadata
--- Juergen Boemmels <[EMAIL PROTECTED]> wrote: > Hello, > > after quite a long time away from keyboard and fighting through a > huge > backlog of mail I'm (hopefully) back again. > > Dan Sugalski <[EMAIL PROTECTED]> writes: > > > Since it looks like it's time to extend the packfile format and the > > in-memory bytecode layout, this would be the time to start > discussing > > metadata. What sorts of metadata do people think are useful to have > in > > either the packfile (on disk) or in the bytecode (in memory). > > My current idea for the in memory format of the bytecode is this: > One bytecodesegment is a PMC consisting of three parts the actual > bytecode (a flat array of opcode_t), the associated constants, which > don't fit into an opcode_t (floats and strings), and a scratch area > for the JITed code. All other Metadata will be attached as > properties (or maybe as elements of an aggregate). This will be an > easy way for future extension. The invoke call to this pmc would > simply start the bytecode from the first instruction. > > To support inter-segment jumps a kind of symboltable is also > neccessary. All externally reachable codepoints need some special > markup. This could be a special opcode extlabel_sc or an entry in a > symboltable. Also needed is a fixup of the outgoing calls, either via > modification of the bytecode or via a jumptable. Both have their pros > and cons: The bytecode modifcation prohibits a readonly mmap of the > data on disk and the fixup needs to be done at load-time but once > this > is done the impact on the runtimespeed is minimal, whereas the > jumpcode is on extra indirection. But as stated somewere else the > typical inter-segment jump will be call/tailcall/callmethod/invoke, > which are at least two indirections. > > The on disk version is a matter of serializing and deserializing this > PMC. > > > Keep in mind that parrot may be in the position where it has to > ignore > > or mistrust the metadata, so be really cautious with things you > > propose as required. > > Ok to summarize: > > ByteCodeSegment = { > bytecode => requiered; > constants => only neccessary if string or num constants; > fixup => (or jumptable) only neccessary if outgoing jumps; > symbols => all possible incomming branchpoints, optional; > JIT => will be filled when bytecode is invoked; > > source=> surely optional; > debuginfo => also optional; > ... > } I LIKE IT. Bytecodes have a type? each bytecode has meta-data? Here are the metadata I have collected from the parrot source code so far. It should be a set of predicates to define all the other meta-data needed. First, this is the core meta-data for storing perl code : in order of simplicity identifier_node Name of things boolean_type,integer_type,real_type types of things that are simple all *_decls have a type that is a type_* all *_decls have a name that is a type_decl or identifier_node const_decl Constant values var_decl variable values The rest of the more complex types need a tree_list tree_list function_decl, parm_decl # list of array_type integer_cst, # list of enumeral_type integer_cst # list of record_type,union_type, field_decl # list of # a void is very special void_type The following are derived types : pointer_type,reference_type # function types allow for linkage function_type, type_* # we have a list of # here the user defines its own type_decl # this is a commonly defined user type complex_type, = James Michael DuPont http://introspector.sourceforge.net/ __ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com
RE: Bytecode metadata
At 2:48 PM -0800 1/23/03, Brent Dax wrote: Dan Sugalski: # At 12:10 AM -0800 1/23/03, Brent Dax wrote: # >Dan Sugalski: # ># Since it looks like it's time to extend the packfile # format and the # # >in-memory bytecode layout, this would be the time to start # discussing # # >metadata. What sorts of metadata do people think are useful # to have # # >in either the packfile (on disk) or in the bytecode (in memory). # > # >I do think that, whatever "native" (i.e. understood by # Parrot) metadata # >we support, we *must* allow for extensibility, both for # future native # >metadata and for third-party tools. # # "Must" is an awfully strong word, there. We don't really "must" do # anything, though I do realize the feature is useful, hence my # question. A strong word for a strong opinion. :^) Besides, I did qualify it with an "I do think", which is another way to say IMO. Heh. I try and avoid the absolute statements. This is all engineering, and engineering is applied economics--you juggle features and make compromises to get the thing that meets your needs as best as possible at a cost you can manage. Allowing extensibility is Really Keen, but has its associated cost that has to be balanced against everything else. Having said that, I think we can do this, but I want a better feel for what we need, what we want, and what it'll cost before we make a decision. Are you expecting to have chunk type determined by order? Yes and no. Yes in that I want the first few chunks, the ones that are required, to be at fixed offsets. Following that will be a directory, and from there we can index off to wherever we need to. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
RE: Bytecode metadata
--- Brent Dax <[EMAIL PROTECTED]> wrote: > Dan Sugalski: > # At 12:10 AM -0800 1/23/03, Brent Dax wrote: > # >Dan Sugalski: > # ># Since it looks like it's time to extend the packfile > # format and the # > # >in-memory bytecode layout, this would be the time to start > # discussing # > # >metadata. What sorts of metadata do people think are useful > # to have # > # >in either the packfile (on disk) or in the bytecode (in memory). > # > > # >I do think that, whatever "native" (i.e. understood by > # Parrot) metadata > # >we support, we *must* allow for extensibility, both for > # future native > # >metadata and for third-party tools. > # > # "Must" is an awfully strong word, there. We don't really "must" do > # anything, though I do realize the feature is useful, hence my > # question. > > A strong word for a strong opinion. :^) Besides, I did qualify it > with > an "I do think", which is another way to say IMO. > > # > Moreover, this must not be > # >implemented with a special type of metadata block, or by using > # >sequentially-increasing numbers. (The first means that any > # metadata we > # >decide to add in the future will be slower than the metadata we > add > # >now; the second has problems with several third-party tools > # picking the > # >same > # >number.) > # > # I'm afraid extensible metadata is going to live in its own chunk > # unless someone can come up with a way to embed it without penalty. > # (And I'm generally considering using separate chunks for the > metadata > # the engine does understand) > > Are you expecting to have chunk type determined by order? If so, > what > will you do if a future restructuring means you either don't need > chunk > type X or you need a new, highly incompatible version? Will you > leave > in an "empty" ghost chunk? > > I would suggest (roughly) the following format for a chunk: > > TYPE: One 32-bit number > VERSION: One 32-bit number; suggested usage is as four eight-bit > components > SIZE: One 32-bit number of bytes (or maybe 64-bit) > DATA: arbitrary length > > For C-heads, think of it like this: > > struct Chunk { > opcode_t type; > opcode_t version; > opcode_t size; > void data[]; > }; > > Type IDs less than 256 would be reserved to Parrot (so we have plenty > of > room for future expansion); all third-party tools would use some sort > of > cryptographic checksum of the tool's name and the data structure's > name, > making sure (of course) that their type ID was greater than 255. > > If there's a directory of some sort, it should record the type ID and > the offset to the beginning of the chunk. This should allow for a > fairly quick lookup by type. If you think that there might be a > demand > for multiple instances of the same type of metadata, you may want to > add > a chunk ID of some sort. Cool! that means we can use opcodes to store the introspector data! We need to have the meta data paired with the opcodes. basically this means storing the source code in some ast form in the meta-data for full reflection and introspection on the expression level. mike = James Michael DuPont http://introspector.sourceforge.net/ __ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com
Re: Bytecode metadata
Dan Sugalski wrote: Since it looks like it's time to extend the packfile format and the in-memory bytecode layout, this would be the time to start discussing metadata. What sorts of metadata do people think are useful to have in either the packfile (on disk) or in the bytecode (in memory). I'm currently simplifying the whole packfile routines. It still does read the old format, but the compat code is centralized now in one place. The main change is now this structure: struct PackFile_funcs { PackFile_Segment_new_func_t new_seg; PackFile_Segment_destroy_func_t destroy; PackFile_Segment_packed_size_func_t packed_size; PackFile_Segment_pack_func_t pack; PackFile_Segment_unpack_func_t unpack; PackFile_Segment_dump_func_t dump; }; All registered types define these funtions to make pack/unpack/dump work for their type. Registered types are consecutively numbered, unknown types still get unpacked or dumped: typedef enum { PF_DIR_SEG, PF_UNKNOWN_SEG, PF_FIXUP_SEG, PF_CONST_SEG, PF_BYTEC_SEG, PF_DEBUG_SEG, PF_MAX_SEG } pack_file_flags; All packfiles sizes/offsets are in opcode_t not bytes for simplicity - though this might need a conversion (but we don't seem to handle wordsize transforms now anyway). leo
Re: Bytecode metadata
Dave Mitchell wrote: On Thu, Jan 23, 2003 at 09:21:45PM +0100, Juergen Boemmels wrote: My current idea for the in memory format of the bytecode is this: I would strongly urge any file-based byte-code format to arranged in such a way that it (or most of it) can simply be mmap-ed in (RO), analogously to executables. How many mmap's can $arch have for one program and for all? Could we hit some limits here, if every module loaded gets (and stays) mmap()ed. Dave. leo
Re: Bytecode metadata
At 7:23 AM +0100 1/24/03, Leopold Toetsch wrote: Dave Mitchell wrote: On Thu, Jan 23, 2003 at 09:21:45PM +0100, Juergen Boemmels wrote: My current idea for the in memory format of the bytecode is this: I would strongly urge any file-based byte-code format to arranged in such a way that it (or most of it) can simply be mmap-ed in (RO), analogously to executables. How many mmap's can $arch have for one program and for all? Could we hit some limits here, if every module loaded gets (and stays) mmap()ed. We certainly could, which I suppose would argue for building in sufficient smarts to the bytecode loader to switch to file reading if an mmap fails. It'll be slower, but working is generally a good thing. -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Odd JIT timings
I just gave a run of examples/assembly/mops_p.pasm, getting some performance numbers. Here's an interesting timing. no jit: 24.9 seconds with jit: 33.6 seconds This is... odd. And on PPC, FWIW, and I'm not sure if it happens on x86. Someone care to check it out and poke around a bit? -- Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk