Re: Program lifecycle
At 09:34 PM 8/10/00 -0400, Bradley M. Kuhn wrote: Three notes on the Syntax tree (which I would probably call Intermediate Representation, or IR, but the name is irrelevant :). Yep. It's more a MI, with bytecodes being an LI. (And both are IRs) I've been browsing through compiler design books, as if you couldn't tell... :) First, I believe that it is completely reasonable and probably useful to consider allowing an optimization step that operates directly on the IR. Bytecodes are often harder to optimize than the IR. Fair enough. I think we should pass the syntax tree on to the optimzier in addition to the bytecode stream for just that sort of thing. Second, I think that it is terribly important that they format of the IR be well described in a document separate from the code. I am willing to help maintain this document, but I think it is imperative that we don't let the "implementation be the reference". If the code changes, the document much change to reflect it. Oh, yes! Once things get a bit more solid on the language end, I expect we'll have syntax and bytecode working groups to hammer those out and nail 'em down. Third, I am very glad that Dan has placed the execution engine very far from the IR. Whether or not we want to have an execution engine (which I tend to call a VM :) that works directly on the IR or one that always goes through bytecode, or both, I think we must keep a high wall of abstraction between the IR and the VM. Bytecode is an IR too, just with a different target. I do want that heavy wall, though. That backend could be a perl2jvm translator, or a TIL version of the interpreter (if we don't go that way to start), or a frontend to a 'real' compiler like GCC or the Dec compilers. (Both gcc and the dec compilers have one backend for a whole bunch of language front-ends. If they can front-end fortran, C++, and cobol, we ought to be able to do it with perl...) Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Program lifecycle
At 09:57 PM 8/9/00 -0700, Matthew Cline wrote: On Wed, 09 Aug 2000, Nathan Torkington wrote: It seems to me that a perl5 program exists as several things: - pure source code (ASCII or Unicode) - a stream of tokens from the parser - a munged stream of tokens from the parser (e.g., use Foo has become BEGIN { require Foo; Foo-import }) - an unthreaded and unoptimized optree Isn't there a tree of whatchamacallits between a token stream and the optree, and also a symbol table? I'm not too up on compilers... I think so. There are some thingamabobs in there too. :) I think we'll see at least a syntax tree, a bytecode stream, and an optree in perl 6, depending on where you look. That's still sort of up in the air, though. (We might see machine code too, if I can convince myself that it can be done portably) Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Program lifecycle
At 10:01 PM 8/9/00 -0600, Nathan Torkington wrote: Would it make sense for the parsing of a Perl program to be done as: - tokenize without rewriting (e.g., use stays as it is) - structure without rewriting (e.g., constant subs are unfolded) - rewrite for optimizations and actual ops The structure I've been thinking of looks like: Program Text | | | V +--+ | Lex/parse | +--+ | Syntax tree | V +--+ | Bytecoder| +--+ | Bytecodes | V +--+ | Optimizer| +--+ | Optimized bytecodes | V +--+ | Execution| | Engine | +--+ With each box being replaceable, and the process being freezable between boxes. The lexer and parser probably ought to be separated, thinking about it, and we probably want to allow folks to wedge at least C code into each bit. (I'm not sure whether allowing you to write part of the optimizer in perl would be a win, but I suppose if it was saving the byte stream to disk...) Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Program lifecycle
You may also want to be able to short circuit some of the steps. Especially where the startup time may outweigh the win of optimization. And if there could be different execution engines. Machine level, bytecode, (and perhaps straight out of the syntax tree.) Hmm, might that make some debugging easier? chaim "DS" == Dan Sugalski [EMAIL PROTECTED] writes: DS The structure I've been thinking of looks like: DS Program Text DS | DS | DS | DS V DS +--+ DS | Lex/parse | DS +--+ DS | DS Syntax tree DS | DS V DS +--+ DS | Bytecoder| DS +--+ DS | DS Bytecodes DS | DS V DS +--+ DS | Optimizer| DS +--+ DS | DS Optimized DS bytecodes DS | DS V DS +--+ DS | Execution| DS | Engine | DS +--+ DS With each box being replaceable, and the process being freezable between DS boxes. The lexer and parser probably ought to be separated, thinking about DS it, and we probably want to allow folks to wedge at least C code into each DS bit. (I'm not sure whether allowing you to write part of the optimizer in DS perl would be a win, but I suppose if it was saving the byte stream to disk...) -- Chaim FrenkelNonlinear Knowledge, Inc. [EMAIL PROTECTED] +1-718-236-0183
Re: Program lifecycle
"NT" == Nathan Torkington [EMAIL PROTECTED] writes: NT - source filters munge the pure source code NT - cpp-like macros would work with token streams NT - pretty printers need unmunged tokens in an unoptimized tree, which NTmay well be unfeasible I was thinking of macros as being passed some arguments but then can either manipulate the raw source code or ask the lexer/parser for parsed tokens. chaim -- Chaim FrenkelNonlinear Knowledge, Inc. [EMAIL PROTECTED] +1-718-236-0183
Re: Program lifecycle
At 03:36 PM 8/10/00 -0400, Chaim Frenkel wrote: You may also want to be able to short circuit some of the steps. Especially where the startup time may outweigh the win of optimization. The only one that's skippable is the optimizer, really. I'd planned on having to pass it some indicator of how aggressive it should be, And if there could be different execution engines. Machine level, bytecode, (and perhaps straight out of the syntax tree.) Yup. Hence the "replaceable" bit. :) The boxes would all have a fixed and well-defined interface, and the various streams (syntax tree and bytcode) would also be well-defined. If you wanted to build an execution box that instead dumped out java bytecodes, well, sounds like a good plan to me. :) Hmm, might that make some debugging easier? Might. Hard to say, though if we get them as black boxes at least it'll make debugging more compartmentalized. Dan --"it's like this"--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk