Re: PGE bug?
On Thu, Jun 30, 2005 at 10:44:03AM -0500, Patrick R. Michaud wrote: If it would help for me to give more details about the bsr/ret scheme I'm using, I'll be glad to post it. I could certainly give a Perl 6 equivalent of the rule we're looking at. But essentially the key is that a bsr always represents a backtracking point, a ret always indicates fail, a goto R* is a tailcall, and the named registers are variables in the scope of all of the R* subroutines. Does this mean that you're using the same recursive approach that the perl 5 regular expression engine uses? (Not that I understand much of the perl 5 engine, except that uses recursion to maintain parts of state) Nicholas Clark
Re: [perl #36374] [PATCH] segmented context and register memory
On Mon, Jun 27, 2005 at 08:29:59PM -0400, Chip Salzenberg wrote: On Mon, Jun 27, 2005 at 05:11:10PM -0400, Andy Dougherty wrote: Well, I think there are already way too many pointer casts and related games in the source. Perhaps more to the point, not all casts are going to work. Well, there is that. :-) struct group *g = malloc(sizeof *g) /* OR WHATEVER METHOD OF ALLOC */ ^typo Then gcc should just shut up, IMO. When I cast from void* or char* to whatever*, it's my lookout as programmer if that's an error, and there's nothing about it that deserves a compiler warning ... mostly because there are some things in C that require it, and the compiler can't possibly distinguish the good from the bogus. It's like warning about '=' in boolean expressions. Yeah, I know. :-P We may well have the warning because we turned on as many bitchy flags in gcc as possible. Do all our casts avoid creating aliasing problems? Nicholas Clark
Re: Parrot Segfault
On Wed, Jun 29, 2005 at 08:37:44AM -0700, chromatic wrote: On Wed, 2005-06-29 at 10:59 -0400, Matt Fowles wrote: Would it be reasonable to not run tests that are known to leave core files? I feel like after a successful build there should not be evidence like this left around... People could set ulimit (though we'd have to tell them all to do that) or Parrot::Test could have an END block that removes all core.* files unless an environment variable is set. You're making an assumption here about the name of core files, and where they are located. (Although setting ulimit -c above 0 on OS X is painful, given the size of the core files generated. So it's unlikely that parrot will be writing into /cores/ ) Nicholas Clark
Re: [pirate] Setting up Pirate Parrot
Curtis Hall wrote: Ok, update. Have Pirate and Parrot running smoothly now. Had Python version 2.4, which Pirate didn't like. Had to link it to my 2.3 version. Had to mess with PATHing a bit and now I'm all set. Wanted to introduce myself. I'm a senior here at the UofA in Tucson, AZ working in comp sci. I'm going to be working on Pirate so as to generalize the IMC emitter, plus work on some of the constructs such as slices, floor divisions, power operators, and to-string operators. Hopefully, in that order, and by the end of summer. I'm learning as much as I can about Pirate and Parrot and would welcome any advice or expertise. Curt ___ pirate mailing list [EMAIL PROTECTED] http://cornerhost.com/mailman/listinfo/pirate I've been working on a python compiler also, feel free to take a look,, svn co http://svn.openfoundry.org/pyparrot languages/python/pyparrot My current boggle is how to handle the self parameter to method functions. You can do things like this in python def foobar( arg1, arg2 ): print arg1, arg2 class A(): self.__add__ = foobar aa = A() print aa + 5 I'm happy to help out and discuss. Kevin Tew
Re: PGE bug?
Patrick R. Michaud wrote: I suspect it's an issue with register spilling, that I15 is being reused somewhere later to represent something other than the cutting value. Please not that this has nothing to do with register spilling, where due to a lack of registers these are stored into (and fetched from) an array in P31. We've got a problem of a register changing it's value - or not. After another half of a day with gdb, I've eventually tracked it down. What happens is this: the pattern is ident, which happens to be just another Rule. So during matching the precompiled coroutine for matching ident is entered to try this rule at various positions. But the relevant code piece in EVAL_1 [1] looks like this: lastpos = length target if pos = 0 goto try_at_pos pos = 0 try_match: cutting = 0 ... try_at_pos: ... if cutting != 0 goto fail bails out at first pos is already 1, therefore cutting get's never initialized and has some garbage value. I've a rather lengthy typescript here, which tracks all changes to I15 and clearly shows this reason. I don't think that Parrot should guarantee that registers have some predifined value on entering a subroutine. We have to NULL out PMC and STRING registers already for GC, which is a rather costy operation. But if folks prefer that we should clean registers, we can of course do it. It's probably not the best idea though (for the long run and performance-wise). An alternative would be to fill integer and number registers with some arbitrary and changing values (in debug builds) to simplify catching such errors. There is further some (disabled) code in imcc/cfg.c that tries to catch such uninitialized variables but I doubt that it would have warned in that case, as there is one possible execution path to initialize cutting. leo [1] parrot -D20 foo.pir creates EVAL_n (n = 1...) for each evaled code piece in turn. So EVAL_1 is the precompiled ident rule, EVAL_2 the code for matching the test file pattern.
Re: [pirate] Setting up Pirate Parrot
Kevin Tew wrote: I've been working on a python compiler also, feel free to take a look,, svn co http://svn.openfoundry.org/pyparrot languages/python/pyparrot My current boggle is how to handle the self parameter to method functions. You can do things like this in python def foobar( arg1, arg2 ): print arg1, arg2 class A(): self.__add__ = foobar aa = A() print aa + 5 Well, the Python translator has to recognice that __add__ and such has a special meaning in CPython. Therefore the getattribute (in that case of the class - but not only) has to check for such special names and in that case store the Sub object for foobar into the A namespace under the name __add - that is Parrot's notation for the thingy. The rest i.e. calling the sub on the last line, is done by Parrot's MMD implementation. Kevin Tew leo
another month - another code freeze
Time flies like an arrow. I remember doing the last release was just a few days ago. Anyway: * feature freeze starts now - please no feature changes to parrot core - bug-fixes, documentation updates, test reports (PLATFORMS) are very welcome - updates to languages are welcome to, but please make sure that tests are rather silent for the release * release will be done around Sun Jul, 3rd 10:00 MESZ which should be 8:00 GMT - am. leo
Re: PGE bug?
On Fri, Jul 01, 2005 at 02:34:27PM +0200, Leopold Toetsch wrote: Please not that this has nothing to do with register spilling, where due to a lack of registers these are stored into (and fetched from) an array in P31. We've got a problem of a register changing it's value - or not. Noted, and thanks for the clarification. After another half of a day with gdb, I've eventually tracked it down. [...] lastpos = length target if pos = 0 goto try_at_pos pos = 0 try_match: cutting = 0 ... try_at_pos: ... if cutting != 0 goto fail bails out at first pos is already 1, therefore cutting get's never initialized and has some garbage value. You're absolutely correct, and many thanks for the work in catching this. Originally the cutting = 0 initializer was at the top of the routine, but apparently when I added the try_match loop I didn't move it to outside of the loop. I don't think that Parrot should guarantee that registers have some predifined value on entering a subroutine. I fully agree. Again, thanks for the effort in tracking this down. Now fixed in r8484. Pm
Re: PGE bug?
On Fri, Jul 01, 2005 at 08:38:01AM +0100, Nicholas Clark wrote: On Thu, Jun 30, 2005 at 10:44:03AM -0500, Patrick R. Michaud wrote: If it would help for me to give more details about the bsr/ret scheme I'm using, I'll be glad to post it. I could certainly give a Perl 6 equivalent of the rule we're looking at. But essentially the key is that a bsr always represents a backtracking point, a ret always indicates fail, a goto R* is a tailcall, and the named registers are variables in the scope of all of the R* subroutines. Does this mean that you're using the same recursive approach that the perl 5 regular expression engine uses? (Not that I understand much of the perl 5 engine, except that uses recursion to maintain parts of state) Well, I don't know if I'm using the same recursive approach that perl 5 uses, but yes PGE uses a subroutine call chain to maintain certain parts of its backtracking status. The only recursive call is for quantified group expressions of various sorts -- subpatterns and subrules. Outside of that, it's just a linear chain of subroutine calls (no recursion), and even then subroutine calls are only invoked where some sort of range quantification is needed to keep backtracking points. And the subroutine calls used are of the bsr/ret variety, as opposed to using the continuation passing style calls. For example, an expression like: /^ab c**{3} \d+ xyz$/ will have a maximum bsr/ret call depth of 1, to handle the \d+. Something like: /^(abc)**{3} xyz$/ will have a maximum bsr/ret call depth of 4, to handle the quantification of the group and match object. (Groups require an extra initial bsr call to initialize the group.) And something like /^( [abc | def]**{3} )* xyz$/ can end up with a lot of bsr/rets because of the nested and quantified groups. I certainly won't claim this is the best way to do it, but for the time being it seemed easier and more straightforward to just use bsr/ret to maintain state than to build a separate chain to avoid using the execution stack. Each pattern gets its own Coroutine and its own copy of the stacks anyway, and the option to take an entirely different approach is still available to us if/when we need it -- we'll just change the codegen. (We can even have/use multiple codegen algorithms and let an optimizer pick the best one.) Pm
Re: PGE bug?
On Fri, Jul 01, 2005 at 08:38:01AM +0100, Nicholas Clark wrote: : Does this mean that you're using the same recursive approach that the perl 5 : regular expression engine uses? (Not that I understand much of the perl 5 : engine, except that uses recursion to maintain parts of state) No, Perl 5 has to use recursion to emulate co-routines. Parrot has real co-routines/continuations, so PGE can actually return at the end of a submatch where P5 has to recurse, since P5 can only represent failure by a return. So I think PGE's recursion is much more faithful to the shape of the problem. Larry
Re: PGE bug?
On Fri, Jul 01, 2005 at 11:11:01AM -0500, Patrick R. Michaud wrote: On Fri, Jul 01, 2005 at 08:38:01AM +0100, Nicholas Clark wrote: Does this mean that you're using the same recursive approach that the perl 5 regular expression engine uses? (Not that I understand much of the perl 5 engine, except that uses recursion to maintain parts of state) Well, I don't know if I'm using the same recursive approach that perl 5 uses, but yes PGE uses a subroutine call chain to maintain certain parts of its backtracking status. The only recursive call is for quantified group expressions of various sorts -- subpatterns and subrules. Outside of that, it's just a linear chain of subroutine Does that include simple quantifiers such as * and + ? I certainly won't claim this is the best way to do it, but for the time being it seemed easier and more straightforward to just use bsr/ret to maintain state than to build a separate chain to avoid using the execution stack. Each pattern gets its own Coroutine and its own It's just that I'm wondering whether you're going the same way as the Perl 5 regexp engine and hence the current implementation will share its biggest weakness - repetitive matches against long strings blowing the stack. Nicholas Clark
Re: PGE bug?
On Fri, Jul 01, 2005 at 05:46:30PM +0100, Nicholas Clark wrote: On Fri, Jul 01, 2005 at 11:11:01AM -0500, Patrick R. Michaud wrote: On Fri, Jul 01, 2005 at 08:38:01AM +0100, Nicholas Clark wrote: Does this mean that you're using the same recursive approach that the perl 5 regular expression engine uses? (Not that I understand much of the perl 5 engine, except that uses recursion to maintain parts of state) Well, I don't know if I'm using the same recursive approach that perl 5 uses, but yes PGE uses a subroutine call chain to maintain certain parts of its backtracking status. The only recursive call is for quantified group expressions of various sorts -- subpatterns and subrules. Outside of that, it's just a linear chain of subroutine Does that include simple quantifiers such as * and + ? The thing that causes recursion is quantified groups; a quantifier on a simple atom won't do it on its own. For example, the pattern / a* \d+ / will never cause recursion, regardless of the number of a's or digits encountered. There will be at most two bsr's. However, the pattern / (a)* (\d)+ / will have some recursion in it depending on the input string, because of the need to generate and destroy the intervening match objects for each captured 'a' and digit. We may be able to eventually optimize simple cases like this to be a bit smarter about how it manages the call stack, but in the general case (with potentially nested and quantified subpatterns and rules) I think we pretty much need to have the stack around. It's just that I'm wondering whether you're going the same way as the Perl 5 regexp engine and hence the current implementation will share its biggest weakness - repetitive matches against long strings blowing the stack. Well, since each rule invocation ends up with its own stack (it's a Coroutine), I'm hoping this won't be a big issue. But if it does turn out to be one, I think we'll find a way to deal with it then. :-) Pm
Re: PGE bug?
On Fri, Jul 01, 2005 at 12:02:44PM -0500, Patrick R. Michaud wrote: : Well, since each rule invocation ends up with its own stack : (it's a Coroutine), I'm hoping this won't be a big issue. But if : it does turn out to be one, I think we'll find a way to deal with : it then. :-) Well, for simple subpatterns you can compress it somewhat if you have a foolproof way of reconstructing prior states from fail states, so that you don't have to keep actual states around, but maybe just a range of possible states. But the simple fact is that you have to keep the backtracking info *somewhere*, or you can't backtrack. The only practical solution is finding some way to cut out unreachable states, and that's hard in the absence of explicit user guidance. All that being said, heaps are easier to share than stacks--though my vague recollection is that Parrot stacks are really in the heap, so maybe that particular subproblem is already dealt with. Larry
Re: [perl #36374] [PATCH] segmented context and register memory
On Jun 30, 2005, at 21:30, Andrew Dougherty wrote: Failed 7/157 test scripts, 95.54% okay. 22/2625 subtests failed, 99.16% okay. Failed Test Stat Wstat Total Fail Failed List of Failed Thanks for trying it out and testing it. I've found hopefully a lot of these bug. The one tail-call did just use an uninitialized object. I'll send an updated patch probably tomorrow. Thanks, leo
Re: [pirate] Setting up Pirate Parrot
On Jul 1, 2005, at 19:46, Michal Wallace wrote: aa = A() print aa + 5 Hmm. I'm pretty sure this is handled automagically by the Python pmc's in pirate... Using the + in pir (or the add op) actually invokes a dispatch Err, *if* the python translater emits $Px = aa + 5 it's of course up to Parrot to call A.__add() (it one is there). If the Python translator creates a function call for each 'add' opcode, well then performance will suck. ... I'm not 100% sure if Sam implemented it, but if so it would be in the pmc's, not the code generator. I'm pretty sure that he did not implement it. Pirate doesn't have to know anything special about __add__ because foobar is just another pmc - one that happens to be a callable. Well foobar is a plain subroutine. The only tricky part I see is that because foobar doesn't have the explicit self parameter, Err, why should a plain subroutine have a self parameter? Anyway, I *think* the existing pmc's solve all these problems. Why not test it? leo
Re: PGE bug?
On Fri, Jul 01, 2005 at 09:43:02AM -0700, Larry Wall wrote: On Fri, Jul 01, 2005 at 08:38:01AM +0100, Nicholas Clark wrote: : Does this mean that you're using the same recursive approach that the perl 5 : regular expression engine uses? (Not that I understand much of the perl 5 : engine, except that uses recursion to maintain parts of state) No, Perl 5 has to use recursion to emulate co-routines. Parrot has real co-routines/continuations, so PGE can actually return at the end of a submatch where P5 has to recurse, since P5 can only represent failure by a return. So I think PGE's recursion is much more faithful to the shape of the problem. Aha. I didn't realise that co-routines were the need for recursion in perl 5's regexp engine. Does this mean that to remove recursion from perl 5, instead of re-writing the engine to be iterative, it might be easier to emulate co-routines using setjmp/longjmp, retaining almost all of the existing code? http://www.chiark.greenend.org.uk/~sgtatham/coroutines.html Nicholas Clark
Re: PGE bug?
On Fri, Jul 01, 2005 at 07:12:40PM +0100, Nicholas Clark wrote: : Does this mean that to remove recursion from perl 5, : instead of re-writing the engine to be iterative, it might be easier to : emulate co-routines using setjmp/longjmp, retaining almost all of the existing : code? Easier, yes. More efficient--probably not, depending on the architecture. Of course, if we slow down Perl 5 enough it'll encourage people to migrate. Hopefully to Perl 6, but that's not guaranteed. :-) And it might not be easier if your setjmp/longjmp clobbers the wrong registers. It's been a while since I've played with setjmp/longjmp, but my recollection of the docs tells me that you basically can't trust your registers after a longjmp. Possibly a smattering of volatile declarations fixes that, along with slowing the code down. Dunno. Larry
Re: [perl #36437] [BUG] PGE recursion, bus error
Attempting to come up with a simplistic math grammar that has one possible operand (A) and one possible operator (*) - so that things like A, A*A, and A*A*A*A*A are all parsed. This simplistic example (thanks to spinclad on #perl6) cause PGE to explode. $ cat ta.p6r grammar f; rule atom { A } rule binary { expr \* atom } rule expr { binary } That probably shouldn't die so horribly, but it should die. We don't support left-recursive grammars, and this is one. A non-left-recursive grammar that matches the same language is: grammar f; rule atom { A } rule binary { atom \* binary } Luke
Re: [perl #36437] [BUG] PGE recursion, bus error
All~ On 7/1/05, Luke Palmer [EMAIL PROTECTED] wrote: Attempting to come up with a simplistic math grammar that has one possible operand (A) and one possible operator (*) - so that things like A, A*A, and A*A*A*A*A are all parsed. This simplistic example (thanks to spinclad on #perl6) cause PGE to explode. $ cat ta.p6r grammar f; rule atom { A } rule binary { expr \* atom } rule expr { binary } That probably shouldn't die so horribly, but it should die. We don't support left-recursive grammars, and this is one. A non-left-recursive grammar that matches the same language is: grammar f; rule atom { A } rule binary { atom \* binary } Not to pick gnits or show off my ignorance too much, but shouldn't the binary rule make the second part optional somehow? rule binary { atom [\* binary]? } Matt -- Computer Science is merely the post-Turing Decline of Formal Systems Theory. -???
Re: PGE recursion, bus error
Er, that doesn't seem to match A or A*A or A*A*A... grammar f; rule atom { A } rule binary { atom \* expr } rule expr { atom | binary } looks better. Now.. how to make this preferentially match the /whole/ string... Ah: grammar f; rule atom { A } rule binary { atom \* any } rule any{ [atom | binary ]} rule expr { ^^ any $$ } Thanks, Luke! (The original ticket about PGE exploding still stands, but I don't need anything else at the moment.) Luke Palmer via RT writes: Attempting to come up with a simplistic math grammar that has one possible operand (A) and one possible operator (*) - so that things like A, A*A, and A*A*A*A*A are all parsed. This simplistic example (thanks to spinclad on #perl6) cause PGE to explode. $ cat ta.p6r grammar f; rule atom { A } rule binary { expr \* atom } rule expr { binary } That probably shouldn't die so horribly, but it should die. We don't support left-recursive grammars, and this is one. A non-left-recursive grammar that matches the same language is: grammar f; rule atom { A } rule binary { atom \* binary } Luke
Re: [pirate] Setting up Pirate Parrot
On Fri, 1 Jul 2005, Leopold Toetsch wrote: Kevin Tew wrote: I've been working on a python compiler also, feel free to take a look,, svn co http://svn.openfoundry.org/pyparrot languages/python/pyparrot My current boggle is how to handle the self parameter to method functions. You can do things like this in python def foobar( arg1, arg2 ): print arg1, arg2 class A(): self.__add__ = foobar aa = A() print aa + 5 Well, the Python translator has to recognice that __add__ and such has a special meaning in CPython. Therefore the getattribute (in that case of the class - but not only) has to check for such special names and in that case store the Sub object for foobar into the A namespace under the name __add - that is Parrot's notation for the thingy. The rest i.e. calling the sub on the last line, is done by Parrot's MMD implementation. Hmm. I'm pretty sure this is handled automagically by the Python pmc's in pirate... Using the + in pir (or the add op) actually invokes a dispatch method in the PMC that handles the situation. I'm not 100% sure if Sam implemented it, but if so it would be in the pmc's, not the code generator. Pirate doesn't have to know anything special about __add__ because foobar is just another pmc - one that happens to be a callable. The only tricky part I see is that because foobar doesn't have the explicit self parameter, it has to be handled differently by getattr. (In cpython there's a difference beween a function and a 'bound' method, and I think the magic happens in getattr) Anyway, I *think* the existing pmc's solve all these problems. Sincerely, Michal J Wallace Sabren Enterprises, Inc. - contact: [EMAIL PROTECTED] hosting: http://www.cornerhost.com/ my site: http://www.withoutane.com/ -
Re: [TCLCORE] ParTcl, Perl6 Grammars
Will Coleda wrote: Thanks to Matt Diephouse, partcl (parrot on tcl) is now able to run part of tcl's cvs-latest test suite. We don't run enough of tcl at the moment to run the tests natively, but by pulling the tests out of the tcltest framework and converting them (sanely, we hope), we are now passing 10% of tcl's test suite. (discounting tests for [clock], which are rather numerous. =-). As we build up enough speed, the plan will be to drop the test conversion and simply run them natively. Cool! Doing clock should actually be mostly fairly easy as it is largely implemented in Tcl. It might be easier to focus next on getting things complete enough that you can use the tcltest framework though. That's actually a useful fraction of the language, and will also allow you to leverage everybody else's work without the problems of accidentally- introduced bloopers... Perl6 is introducing a few new features, at least one of which I think might be worth stealing into Tcl. For now, the easiest way to get these is to implement them in partcl, where we'll basically get them for free. Good idea. However, there are a couple of things that we really will hold the line on very strongly. They are: 1) strict left-to-right evaluation, and 2) strict immutability of values as seen at the Tcl script level. (You can change what value is in a variable, but you can't change the value itself.) 3) Tcl's super-late binding would be nice to retain, but it's hard to do any optimization under its conditions; feel free to ignore that principle for now (and possibly indefinitely.) But apart from these items, you can steal anything that's not nailed down. ;^D In Perl6, regular expressions (will) have been updated to rules. This is not something I can simply summarize in this email, but some examples and explanations have already been developed: http://dev.perl.org/perl6/doc/design/apo/A05.html :: Apocalypse 5 (Initial Design Info) http://dev.perl.org/perl6/doc/design/exe/E05.html :: Exegesis 5 (Explanation) http://dev.perl.org/perl6/doc/design/syn/S05.html :: Synopsis 5 (Summary of changes from Perl5) These rules (and the grammars they can be combined into) are already (partially, at least) available directly in parrot, so wrapping them with a bit of Tcl syntax would be straightforward. Whoosh! That's a lot of changes. Some of which make sense from a Tcl perspective and some of which don't (e.g. the difference between bytes and characters). It will be interesting to see what you wrest out of it. If anyone would like to take the time to figure out an appropriately Tcl-like syntax (add more options to regsub? create a new builtin? Put this all in a namespace/ensemble itself?), I'd be happy to add the prototype command to partcl. Perhaps the TIP process would be the best way to approach this (with the thought that this wouldn't be considered for the actual core, at least not at first.) In process-terms, producing a new RE engine as an extension package requires no permission from anyone at all, let alone a TIP. Just do it and please report on what you've achieved; I'm sure we'd be very interested to hear more details (I'd suggest a paper in the Tcl conference, but you've probably slipped even the real-deadline after the official-deadline for this year). Please interpret this message as encouragement and an interest in how things turn out; it is intended as such. (Can't help though; I'm over- committed to other projects.) Donal.
lower in default find_name scope...?
The following PIR code produces NCI as the output on my system: $ cat lower.pir .sub main @MAIN $P0 = find_name lower $S0 = typeof $P0 print $S0 print \n .end $ parrot lower.pir NCI $ I somewhat expected find_name to return a 'not found' error, as it does below for alpha: $ cat alpha.pir .sub main @MAIN $P0 = find_name alpha $S0 = typeof $P0 print $S0 print \n .end $ parrot alpha.pir Name 'alpha' not found current instr.: 'main' pc 0 (alpha.pir:2) $ What symbol entry am I managing to accidentally grab for lower, and where is it coming from? I've tried find_global and don't seem to find it: $ cat lower2.pir .sub main @MAIN $P0 = find_global lower $S0 = typeof $P0 print $S0 print \n .end $ parrot lower2.pir Global 'lower' not found current instr.: 'main' pc 0 (lower2.pir:2) $ Suggestions? Pm