Re: Rules and hypotheticals: continuations versus callbacks
At 12:00 AM + 3/20/03, Simon Cozens wrote: [EMAIL PROTECTED] (Matthijs Van Duin) writes: OK, I suppose that works although that still means you're moving the complexity from the perl implementation to its usage: in this case, the perl 6 parser which is written in perl 6 No, I don't believe that's what's happening. My concern is that at some point, there *will* need to be a bootstrapped parser which is written in some low level language, outputting Parrot bytecode, and it *will* need to be able to reconfigure itself mid-match. I think. I can't remember why I'm so convinced of this, and I'm too tired to think it through with examples right now, and I might be wrong anyway, but at least I can be ready with a solution if it proves necessary. :) You may well be right--I don't think so, but I'm not at my clearest either. I don't see that it'll be needed outside the initial bootstrap parser if at all, so I'm not too worried. (And the low-level language for it will probably be perl 5, since I'd far rather build something with a Parse::RecDescent grammar than a hand-nibbler in C) -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Rules and hypotheticals: continuations versus callbacks
--- Matthijs van Duin [EMAIL PROTECTED] wrote: On Wed, Mar 19, 2003 at 03:46:50PM -0500, Dan Sugalski wrote: They should be though, if a variable was hypothesized when the continuation was taken, then it should be hypothesized when that continuation is invoked. Should they? Does hypotheticalization count as data modification (in which case it shouldn't) or control modification (in which case it should), Isn't that the whole point of hypotheses in perl 6? You talk about successful and unsuccessful de-hypothesizing, and about abormals exits etc.. you seem to have a much complexer model of hypotheses than what's in my head. The complex model is right -- in other words, if hypotheses are to be a first-class part of the language then they must interoperate with all the other language features. So there's several ways out of a sub. What does a hypo do in all these cases? let $x - Declares a hypothesis. I think this should be a verb. (Which is to say, a function instead of a storage class.) (This in turn suggests that primitive types can't be hypothesized, although arrays thereof could be.) fail - Pretty strongly suggests that the hypo is ignored. Also suggests continuation/backtracking behavior. Do we really know what fail does? See discard, below. discard $x - (suggested) Obvious keyword for failing one single hypothesis. keep $x - Suggests the value is made permanent. This should be a verb. Ckeep with no args should just keep every hypo in the current fillintheSCOPE. return - This is unclear. On the one hand, it's a transfer of control and I think that Clet and Ckeep are data, not control, modifiers. On the other hand, it's one of the normal ways to leave a block, and could be argued either way. (Also: remember the bad old days of Cmy vs. Clocal. I propose that hypos fail by default - to keep the distinction clear in the minds of newbies.) throw- Exceptions unwind the call stack. If let is a control action, it should undo. If let is a data action, it should not. To me, Clet is an explicit action taken against a variable, and should not be undone by this. (Of course, if unwinding the call stack causes the variable to go out of scope, it's not an issue.) continuation: goto - Again: continuations are transfers of control, not data. If let is a control action, continuations will have to know if they are transferring back up the stack or if they are transferring to some new, never-before-seen (on the stack) place. I think that Clet should be a data action, in which case this doesn't affect the hypothesis. generators: yield - Same issues, although the argument can be made that since you can resume a generator, the hypo should be confined to the extant-but-inactive scope. To me, what 'let' does is temporize a variable except it doesn't get restored if you leave the scope, but only if you use a continuation to go back to a point where it wasn't hypothesized yet. Yes and no. I agree it shouldn't get restored, see above. However, continuations don't touch data. So a global variable that has been hypo'ed should (IMO) remain so after the continuation. Frankly, if you mix the two, it's YOUR job to understand the ramifications. When the last continuation taken *before* the hypothesis is gone, so is the old version and thus the hypothesized variable becomes permanent. I disagree. Proposal and acceptance should be explicit actions. The behavior regarding coroutines followed naturally from this, and so does the behavior inside regexen if they use the continuation semantics for backtracking -- which is what I'm suggesting. This leave only behavior regarding preemptive threads, which is actually very easy to solve: disallow hypothesizing shared variables -- it simply makes no sense to do that. Now that I think of it, temporizing shared variables is equally bad news, so this isn't something new. Why? If you constrain hypotheses to the thread (making it a control action instead of a data action) this could be a way to get cheap MUXing. Hypothesize all the new values you wish, then pay once to get a mux, then keep all the data values while you've got the mux. Shrinks your critical region: {: is synchronized($mux) keep all; } OTOH, if you are really using threads well, then your app may construct a hypothesis based on user input, and the math and visualizer and gui threads may all need to work in that hypothetical space. (Which makes continuations
Re: Rules and hypotheticals: continuations versus callbacks
On Thu, Mar 20, 2003 at 08:49:28AM -0800, Austin Hastings wrote: --- Matthijs van Duin [EMAIL PROTECTED] wrote: you seem to have a much complexer model of hypotheses than what's in my head. The complex model is right -- in other words, if hypotheses are to be a first-class part of the language then they must interoperate with all the other language features. (lots of explanation here) You're simply expanding on the details your complex model - not on the need for it in the first place. I'll see if I can write some details/examples of my model later, and show how it interacts with various language features in a simple way. This leave only behavior regarding preemptive threads, which is actually very easy to solve: disallow hypothesizing shared variables -- it simply makes no sense to do that. Now that I think of it, temporizing shared variables is equally bad news, so this isn't something new. I just realize there's another simple alternative: make it cause the variable become thread-local for that particular thread. Hypothesize all the new values you wish, then pay once to get a mux, then keep all the data values while you've got the mux. Shrinks your critical region You're introducing entirely new semantics here, and personally I think you're abusing hypotheses, although I admit in an interesting and potentially useful way. I'll spend some thought on that. My experience has been that when anyone says I don't see why anyone would ..., Damian immediately posts an example of why. No problem since it works fine in my model (I had already mentioned that earlier) - I just said *I* don't see why anyone would.. :-) So, stop talking about rexen. When everyone groks how continuations should work, it'll fall out. rexen were the main issue: Dan was worried about performance (And if you reimplement the rexengine using continuations and outperform Dan's version by 5x or better, then we'll have another Geek Cruise to Glacier Bay and strand Dan on an iceberg. :-) I don't intend to outperform him.. I intend to get the same performance with cleaner, simpler and more generic semantics. But as I said in my previous post.. give me some time to work out the details.. maybe I'll run into fatal problems making the whole issue moot :) BTW, you say reimplement ? Last time I checked hypothetic variables weren't implemented yet, let alone properly interact with continuations. Maybe it's just sitting in someone's local version, but until I have something to examine, I can't really compare its performance to my system. -- Matthijs van Duin -- May the Forth be with you!
Re: Rules and hypotheticals: continuations versus callbacks
On Tue, Mar 18, 2003 at 09:28:43PM -0700, Luke Palmer wrote: Plan 1: Pass each rule a Isuccess continuation (rather than a backtrack one), and have it just return on failure. The big difference between this and your example is that Clets are now implemented just like Ctemps. Too bad Clet needs non-regex behavior, too. That's mechanism #2, not #1 You probably don't mean the word continuation here though, since a continuation never returns, so once a rule would invoke the success continuation it can't regain control except via another continuation You probably simply mean a closure Plan 2: Call subrules as plain old subs and have them throw a backtrack exception on failure (or just return a failure-reporting value... same difference, more or less). But.. say you have: foo bar Would would this be implemented? When bar fails, it needs to backtrack into foo, which has already returned. Are you saying every rule will be an explicit state machine? This has the advantage that Clet behaves consistently with the rest of Perl What do you mean? I looked around in Parrot a little, and it seems like continuations are done pretty efficiently. Yes, I noticed that do -- Matthijs van Duin -- May the Forth be with you!
Re: Rules and hypotheticals: continuations versus callbacks
Matthijs van Duin wrote: Which system is likely to run faster on parrot? I would propose, estimate the ops you need and test it :) E.g. call a continuation 1e6 times and communicate state with one global (a lexical is probably the same speed, i.e. a hash lookup) $ cat a.pasm new P5, .PerlInt set P5, 100 store_global foo, P5 new P0, .Continuation set_addr I0, endcont set P0, I0 endcont: find_global P4, foo unless P4, done dec P4 # store_global foo, P4 --no need to store, P4 is a reflike thingy invoke done: print done\n end $ time imcc -P a.pasm done real0m0.881s $ imcc -p a.pasm done OPERATION PROFILE CODE OP FULL NAMECALLS TOTAL TIMEAVG TIME - - --- -- -- 0 end 10.010.01 40 set_addr_i_ic 10.010.01 66 set_p_i 10.010.01 67 set_p_ic10.040.04 226 unless_p_ic 1010.5950250.01 276 dec_p 1000.5999460.01 758 store_global_sc_p 10.060.06 760 find_global_p_sc 1011.0379220.01 786 new_p_ic20.110.05 819 invoke1000.9140630.01 883 print_sc10.0052990.005299 - - --- -- -- 114103.1522800.01 So you can estimate that the more heavy opcodes take about 1us, more light vtable functions are ~double that speed with the slow, profiling core. CGP (or JIT) is 3 - 4 times faster. -O3 compiled parrot, Athlon 800, i386/linux leo
Re: Rules and hypotheticals: continuations versus callbacks
On Wed, Mar 19, 2003 at 10:38:54AM +0100, Leopold Toetsch wrote: I would propose, estimate the ops you need and test it :) Hmm, good point Or even better.. I should just implement both examples and benchmark them; they're simple enough and the ops are available. I guess it's time to familiarize myself with pasm :) -- Matthijs van Duin -- May the Forth be with you!
Re: Rules and hypotheticals: continuations versus callbacks
On Wed, Mar 19, 2003 at 01:01:28PM +0100, Matthijs van Duin wrote: On Wed, Mar 19, 2003 at 10:38:54AM +0100, Leopold Toetsch wrote: I would propose, estimate the ops you need and test it :) Hmm, good point Or even better.. I should just implement both examples and benchmark them; they're simple enough and the ops are available. except I forgot entirely about let however the implementation let will have impact on the performance of both systems.. oh well, I'll just have to estimate like you said :-) -- Matthijs van Duin -- May the Forth be with you!
Re: Rules and hypotheticals: continuations versus callbacks
On Wed, Mar 19, 2003 at 10:38:54AM +0100, Leopold Toetsch wrote: I would propose, estimate the ops you need and test it :) I haven't completed testing yet, however it's becoming clear to me that this is likely to be a pointless effort There are so many variables that can affect performance here that the results I may find in these tests are unlikely to have any relation to the performance of rules in practice. 1. making continuations affects the performance of *other* code (COW) 2. the let operation is missing and all attempts to fake it are silly 3. to really test it, I'd need to make subrules full subroutines, but then the performance difference will probably disappear in the overhead of all other stuff. To test I'd need large, realistic patterns; and I'm certainly not in the mood to write PIR for them manually. And it appears that on my machine continuations and garbage collection have a quarrel, which also makes testing problematic. I guess the only way to find out is to implement both systems and compare them using a large test set of realistic grammars. Or ofcourse just implement it using continuations (system #1), since the speed difference probably isn't gonna be huge anyway. Here is my test program for continuation and the results on my machine: # aaab ~~ / ^ [ a | a* ] ab fail / set I5, 1000 sweepoff# or bus error collectoff # or segmentation fault begin: set S0, aaab set I0, 0 new P0, .Continuation set_addr I1, second set P0, I1 rx_literal S0, I0, a, backtrack branch third second: new P0, .Continuation set_addr I1, fail set P0, I1 deeper: rx_literal S0, I0, a, third save P0 # normally hypothesize new P0, .Continuation set_addr I1, unwind set P0, I1 branch deeper unwind: dec I0 # normally de-hypothesize restore P0 # normally de-hypothesize third: rx_literal S0, I0, ab, backtrack sub I0, 2 # normally de-hypothesize backtrack: invoke fail: dec I5 if I5, begin end OPERATION PROFILE CODE OP FULL NAME CALLS TOTAL TIMEAVG TIME - --- -- -- 0 end 10.290.29 40 set_addr_i_ic 50000.0249280.05 46 set_i_ic 10010.0105730.11 60 set_s_sc 10000.0057170.06 66 set_p_i 50000.0162010.03 213 if_i_ic 10000.0028480.03 274 dec_i 40000.0113900.03 370 sub_i_ic 10000.0042270.04 675 save_p30000.1923090.64 682 restore_p 30000.2464570.82 719 branch_ic 40000.0122160.03 770 sweepoff 10.140.14 772 collectoff 10.030.03 786 new_p_ic 50000.1794030.36 819 invoke50000.0262850.05 962 rx_literal_s_i_sc_ic 10.0542600.05 - --- -- -- 16 480040.7868610.16 iBook; PPC G3; 700 Mhz -- Matthijs van Duin -- May the Forth be with you!
Re: Rules and hypotheticals: continuations versus callbacks
At 10:05 AM +0100 3/19/03, Matthijs van Duin wrote: But.. say you have: foo bar Would would this be implemented? When bar fails, it needs to backtrack into foo, which has already returned. Are you saying every rule will be an explicit state machine? By compile-time interpolation. foo isn't so much a subroutine as a macro. For this to work, if we had: foo: \w+? bar: [plugh]{2,5} then what the regex engine *really* got to compile would be: (\w+?) ([plugh]{2,5}) with names attached to the two paren groups. Treating them as actual subroutines leads to madness, continuations don't quite work, and coroutines could pull it off if we could pass data back into a coroutine on reinvocation, but... We do, after all, want this fast, right? -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Rules and hypotheticals: continuations versus callbacks
On Wed, Mar 19, 2003 at 10:40:02AM -0500, Dan Sugalski wrote: By compile-time interpolation. foo isn't so much a subroutine as a macro. For this to work, if we had: foo: \w+? bar: [plugh]{2,5} then what the regex engine *really* got to compile would be: (\w+?) ([plugh]{2,5}) with names attached to the two paren groups. Treating them as actual subroutines leads to madness, Ehm, Foo.test cannot inline Foo.foo since it may be overridden: grammar Foo { rule foo { \w+? } rule bar { [plugh]{2,5} } rule test { foo bar } } grammar Bar is Foo { rule foo { alpha+? } } What you say is only allowed if I put is inline on foo. continuations don't quite work Care to elaborate on that? I'd say they work fine We do, after all, want this fast, right? Ofcourse, and we should optimize as much as we can - but not optimize *more* than we can. Rules need generic backtracking semantics, and that's what I'm talking about. Optimizations to avoid the genericity of these backtracking semantics is for later. -- Matthijs van Duin -- May the Forth be with you!
Re: Rules and hypotheticals: continuations versus callbacks
At 4:52 PM +0100 3/19/03, Matthijs van Duin wrote: On Wed, Mar 19, 2003 at 10:40:02AM -0500, Dan Sugalski wrote: By compile-time interpolation. foo isn't so much a subroutine as a macro. For this to work, if we had: foo: \w+? bar: [plugh]{2,5} then what the regex engine *really* got to compile would be: (\w+?) ([plugh]{2,5}) with names attached to the two paren groups. Treating them as actual subroutines leads to madness, Ehm, Foo.test cannot inline Foo.foo since it may be overridden: grammar Foo { rule foo { \w+? } rule bar { [plugh]{2,5} } rule test { foo bar } } grammar Bar is Foo { rule foo { alpha+? } } What you say is only allowed if I put is inline on foo. At the time I run the regex, I can inline things. There's nothing that prevents it. Yes, at compile time it's potentially an issue, since things can be overridden later, but that's going to be relatively rare, and can be dealt with by selective recompilation. By the time the regex is actually executed, it's fully specified. By definition if nothing else--you aren't allowed to selectively redefine rules in the middle of a regex that uses those rules. Or, rather, you can but the update won't take effect until after the end of the regex, the same way that you can't redefine a sub you're in the middle of executing. (And yes, I'm aware that if you do that you'll pick up the new version if you recursively call, but that won't work with regexes) continuations don't quite work Care to elaborate on that? I'd say they work fine There's issues with hypothetical variables and continuations. (And with coroutines as well) While this is a general issue, they come up most with regexes. We do, after all, want this fast, right? Ofcourse, and we should optimize as much as we can - but not optimize *more* than we can. Rules need generic backtracking semantics, and that's what I'm talking about. No. No, in fact they don't. Rules need very specific backtracking semantics, since rules are fairly specific. We're talking about backtracking in regular expressions, which is a fairly specific generality. If you want to talk about a more general backtracking that's fine, but it won't apply to how regexes backtrack. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Rules and hypotheticals: continuations versus callbacks
On Wed, Mar 19, 2003 at 11:09:01AM -0500, Dan Sugalski wrote: At the time I run the regex, I can inline things. There's nothing that prevents it. Yes, at compile time it's potentially an issue, since things can be overridden later, OK, but that's not how you initially presented it :-) you aren't allowed to selectively redefine rules in the middle of a regex that uses those rules. Or, rather, you can but the update won't take effect until after the end I don't recall having seen such a restriction mentioned in Apoc 5. While I'm a big fan of optimization, especially for something like this, I think we should be careful with introducing mandatory restrictions just to aid optimization. (is inline will allow such optimizations ofcourse) There's issues with hypothetical variables and continuations. (And with coroutines as well) While this is a general issue, they come up most with regexes. I'm still curious what you're referring to exactly. I've outlined possible semantics for hypothetical variables in earlier posts that should work. We do, after all, want this fast, right? Ofcourse, and we should optimize as much as we can - but not optimize *more* than we can. Rules need generic backtracking semantics, and that's what I'm talking about. No. No, in fact they don't. Rules need very specific backtracking semantics, since rules are fairly specific. We're talking about backtracking in regular expressions, which is a fairly specific generality. If you want to talk about a more general backtracking that's fine, but it won't apply to how regexes backtrack. My impression from A5 and A6 is that rules are methods. They're looked up like methods, they can be invoked like methods, etc. I certainly want to be able to write rules myself, manually, when I think it's appropriate; and use these as subrules in other methods. Generic backtracking semantics are needed for that, and should at least conceptually also apply to normal rules. When common sub-patterns are inlined, simple regexen will not use runtime subrules at all, so the issue doesn't exist there - that covers everything you would do with regexen in perl 5 for example. When you do use real sub-rules, you're getting into the domain previously held by Parse::RecDescent and the like. While these should ofcourse still be as fast as possible, a tiny bit of overhead on top of regular regex is understandable. However, such overhead might not be even needed at all: whenever possible optimizations should be applied, and rules are free to use special hacky but fast calling semantics to subrules if they determine that's possible. But I don't think a special optimization should be elevated to the official semantics. I say, make generic semantics first, and then optimize the heck out of it. -- Matthijs van Duin -- May the Forth be with you!
Re: Rules and hypotheticals: continuations versus callbacks
Matthijs van Duin wrote: sweepoff# or bus error collectoff# or segmentation fault Please try : /* set this to 1 for tracing the system stack and processor registers */ #define TRACE_SYSTEM_AREAS 1 in dod.c (works for me). Though I don't know, if processor registers on PPC gets traced by this (it might not stand optimization if not). Code is not that deeply looked at, that we can savely turn off stack tracing yet. leo
Re: Rules and hypotheticals: continuations versus callbacks
On Wed, Mar 19, 2003 at 11:09:01AM -0500, Dan Sugalski wrote: By the time the regex is actually executed, it's fully specified. By definition if nothing else--you aren't allowed to selectively redefine rules in the middle of a regex that uses those rules. Or, rather, you can but the update won't take effect until after the end of the regex, the same way that you can't redefine a sub you're in the middle of executing. (And yes, I'm aware that if you do that you'll pick up the new version if you recursively call, but that won't work with regexes) Are you implying that $fred = rx/fred/; $string ~~ m:w/ $fred { $fred = rx/barney/; } rubble / won't match barney rubble? -Scott -- Jonathan Scott Duff [EMAIL PROTECTED]
Re: Rules and hypotheticals: continuations versus callbacks
On Wed, 19 Mar 2003, Jonathan Scott Duff wrote: Are you implying that $fred = rx/fred/; $string ~~ m:w/ $fred { $fred = rx/barney/; } rubble / won't match barney rubble? Or, worse, that $fred = rx/fred/; $string ~~ m:w/ { $fred = rx/barney/; } $fred rubble / won't, either? /s
Re: Rules and hypotheticals: continuations versus callbacks
At 10:41 AM -0600 3/19/03, Jonathan Scott Duff wrote: On Wed, Mar 19, 2003 at 11:09:01AM -0500, Dan Sugalski wrote: By the time the regex is actually executed, it's fully specified. By definition if nothing else--you aren't allowed to selectively redefine rules in the middle of a regex that uses those rules. Or, rather, you can but the update won't take effect until after the end of the regex, the same way that you can't redefine a sub you're in the middle of executing. (And yes, I'm aware that if you do that you'll pick up the new version if you recursively call, but that won't work with regexes) Are you implying that $fred = rx/fred/; $string ~~ m:w/ $fred { $fred = rx/barney/; } rubble / won't match barney rubble? Potentially, no. What, then, should happen if you do: $barney = rx/barney/; $string = barney rubble; $string ~~ m:w/ $barney { $barney = rx/fred/; } rubble /; The regex shouldn't match, since you've invalidated part of the match in the middle. I can potentially see constructs of the form $var be taken as indirect rule invocations and their dispatch left to runtime, complete with the potential for bizarre after-the-fact invalidations, but as regex rules in the regex stream rather than as generic code. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Rules and hypotheticals: continuations versus callbacks
At 5:38 PM +0100 3/19/03, Matthijs van Duin wrote: On Wed, Mar 19, 2003 at 11:09:01AM -0500, Dan Sugalski wrote: At the time I run the regex, I can inline things. There's nothing that prevents it. Yes, at compile time it's potentially an issue, since things can be overridden later, OK, but that's not how you initially presented it :-) Then I wasn't clear enough, sorry. This is perl -- the state of something at compile time is just a suggestion as to how things ultimately work. The state at the time of is the only thing that really matters, and I shortcut. you aren't allowed to selectively redefine rules in the middle of a regex that uses those rules. Or, rather, you can but the update won't take effect until after the end I don't recall having seen such a restriction mentioned in Apoc 5. I'll nudge Larry to add it explicitly, but in general redefinitons of code that you're in the middle of executing don't take effect immediately, and it's not really any different for regex rules than for subs. While I'm a big fan of optimization, especially for something like this, I think we should be careful with introducing mandatory restrictions just to aid optimization. (is inline will allow such optimizations ofcourse) Actually, we should be extraordinarily liberal with the application of restrictions at this phase. It's far easier to lift a restriction later than to impose it later, and I very much want to stomp out any constructs that will force slow code execution. Yes, I may lose, but if I don't try... My job, after all, is to make it go fast. If you want something that'll require things to be slow then I don't want you to have it. :) There's issues with hypothetical variables and continuations. (And with coroutines as well) While this is a general issue, they come up most with regexes. I'm still curious what you're referring to exactly. I've outlined possible semantics for hypothetical variables in earlier posts that should work. The issue of hypotheticals is complex. We do, after all, want this fast, right? Ofcourse, and we should optimize as much as we can - but not optimize *more* than we can. Rules need generic backtracking semantics, and that's what I'm talking about. No. No, in fact they don't. Rules need very specific backtracking semantics, since rules are fairly specific. We're talking about backtracking in regular expressions, which is a fairly specific generality. If you want to talk about a more general backtracking that's fine, but it won't apply to how regexes backtrack. My impression from A5 and A6 is that rules are methods. They're looked up like methods, they can be invoked like methods, etc. They aren't methods, though. They're not code in general, they're regex constructions in specific. Because they live in the symbol table and in some cases can be invoked as subs/methods doesn't make them subs or methods, it makes them regex constructs with funky wrappers if you want to use them in a non-regex manner. I certainly want to be able to write rules myself, manually, when I think it's appropriate; and use these as subrules in other methods. Generic backtracking semantics are needed for that, and should at least conceptually also apply to normal rules. No, no it shouldn't. Rule are rules for regexes, they are *not* subs. If you want generic backtracking to work, then there can't be any difference between: rule foo { \w+ } and sub foo { ... } but there must be. With rules as regex constructs the semantics are much simpler. If we allow rules to be arbitrary code not only do we have to expose a fair amount of the internals of the regex engine to the sub so it can actually work on the stream and note its position (which is fine, I can do that) we also need to be able to pause foo in the middle and jump back in while passing in parameters of some sort. Neither continuations nor standard coroutines are sufficient in this instance, since the reinvocation must *both* preserve the state of the code at the time it exited but also pass in an indication as to what the sub should do. For example, if the foo sub was treated as a rule and we backtrack, should it slurp more or less? If rules are just plain regex rules and not potentially arbitrary code, the required semantics are much simpler. Then there's the issue of being able to return continuations from within arbitrary unnamed blocks, since the block in this: $foo ~~ m:w/alpha {...} number/; should be able to participate in the backtracking activities if we're not drawing a distinction between rules and generic code. (Yeah, the syntax is wrong, but you get the point) Ultimately the question is How do you backtrack into arbitrary code, and how do we know that the arbitrary code can be backtracked into? My answer is we don't, but I'm not sure how popular that particular answer is. When common sub-patterns are inlined, simple regexen will not use runtime subrules at all, so the
Re: Rules and hypotheticals: continuations versus callbacks
[EMAIL PROTECTED] (Dan Sugalski) writes: you aren't allowed to selectively redefine rules in the middle of a regex that uses those rules. This is precisely what a macro does. -- How should I know if it works? That's what beta testers are for. I only coded it. (Attributed to Linus Torvalds, somewhere in a posting)
Re: Rules and hypotheticals: continuations versus callbacks
At 5:47 PM + 3/19/03, Simon Cozens wrote: [EMAIL PROTECTED] (Dan Sugalski) writes: you aren't allowed to selectively redefine rules in the middle of a regex that uses those rules. This is precisely what a macro does. Not once execution starts, no. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Rules and hypotheticals: continuations versus callbacks
[EMAIL PROTECTED] (Dan Sugalski) writes: At 5:47 PM + 3/19/03, Simon Cozens wrote: [EMAIL PROTECTED] (Dan Sugalski) writes: you aren't allowed to selectively redefine rules in the middle of a regex that uses those rules. This is precisely what a macro does. Not once execution starts, no. Compilation's just execution of a regex, albeit the Perl6::Grammar::program regex, and that regex will need to be modified while it's in operation in order to pick up macro is parsed definitions and apply them to the rest of what it's parsing. -- * DrForr digs around for a fresh IV drip bag and proceeds to hook up. dngor Coffee port. DrForr Firewalled, like everything else around here.
Re: Rules and hypotheticals: continuations versus callbacks
At 5:54 PM + 3/19/03, Simon Cozens wrote: [EMAIL PROTECTED] (Dan Sugalski) writes: At 5:47 PM + 3/19/03, Simon Cozens wrote: [EMAIL PROTECTED] (Dan Sugalski) writes: you aren't allowed to selectively redefine rules in the middle of a regex that uses those rules. This is precisely what a macro does. Not once execution starts, no. Compilation's just execution of a regex, albeit the Perl6::Grammar::program regex, and that regex will need to be modified while it's in operation in order to pick up macro is parsed definitions and apply them to the rest of what it's parsing. Ah, damn, I wasn't thinking far enough out. I'm not sure it'll work quite like that, with a single call to the regex engine that spits out everything in one go. More likely it'll be a set of iterative calls to the engine that terminate at natural sequence points, potentially with recursive calls into the parsing regex. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Rules and hypotheticals: continuations versus callbacks
[EMAIL PROTECTED] (Dan Sugalski) writes: Compilation's just execution of a regex, albeit the Perl6::Grammar::program regex, and that regex will need to be modified while it's in operation in order to pick up macro is parsed definitions and apply them to the rest of what it's parsing. Ah, damn, I wasn't thinking far enough out. This, you see, is precisely why some of us started work last year on a regular expression engine which could handle having its expressions rewritten during the match... ;) -- Last week I forgot how to ride a bicycle. -- Steven Wright
Re: Rules and hypotheticals: continuations versus callbacks
On Wed, Mar 19, 2003 at 12:35:19PM -0500, Dan Sugalski wrote: Then I wasn't clear enough, sorry. This is perl -- the state of something at compile time is just a suggestion as to how things ultimately work. Yes, hence my surprise about actually inlining stuff, luckily that was just a misunderstanding :-) I'll nudge Larry to add it explicitly, but in general redefinitons of code that you're in the middle of executing don't take effect immediately, and it's not really any different for regex rules than for subs. Ah, but we're not redefining the sub that's running, but the subs it's about to call. That works for subs, and Simon Cozens already pointed out we certainly also need it for rules :-) Actually, we should be extraordinarily liberal with the application of restrictions at this phase. It's far easier to lift a restriction later than to impose it later, This is perl 6, we can add a new restriction next week and I very much want to stomp out any constructs that will force slow code execution. Yes, I may lose, but if I don't try... You're absolutely right, and optimization is very important to me too. But you can't *only* look at the speed of constructs, or we'll be coding in C or assembly :-) We'll need to meet in the middle.. The issue of hypotheticals is complex. Well, I'm a big boy, I'm sure I can handle it. Are you even talking about semantics or implementation here? Because I already gave my insights on semantics, and I have 'em in my head for implementation too but I should probably take those to perl6-internals instead. Ultimately the question is How do you backtrack into arbitrary code, and how do we know that the arbitrary code can be backtracked into? My answer is we don't, but I'm not sure how popular that particular answer is. I say, make generic semantics first, and then optimize the heck out of it. That's fine. I disagree. :) Now that Simon Cozens has established that sub-rules need to be looked up at runtime, I think we can both be happy: As far as I can see, a rule will consist of two parts: The wrapper that will handle stuff when the rule is invoked as a normal method, perhaps handle modifiers, handle searches for unanchored matches, setup the state, etc; and the actual body that does a match at the current position. Now, what you want is that subrule-invocation goes directly from body to body, skipping the overhead of method invocation to the wrapper. I say, when you look up the method for a subrule, check if it is a regular rule and if so call its body directly, and otherwise use the generic mechanism. I'll get my lovely generic semantics with the direct body-body calling hidden away as an optimization details, and I get the ability to write rule-methods in perl code. You still get your low-overhead body-body calls and therefore the speed you desire (hopefully). Since you need to fetch the rule body anyway, there should be no extra overhead: where you'd normally throw an error (non-rule invoked as subrule) you'd switch to generic invocation instead. Sounds like a good deal? :-) -- Matthijs van Duin -- May the Forth be with you!
Re: Rules and hypotheticals: continuations versus callbacks
At 8:04 PM +0100 3/19/03, Matthijs van Duin wrote: On Wed, Mar 19, 2003 at 12:35:19PM -0500, Dan Sugalski wrote: I'll nudge Larry to add it explicitly, but in general redefinitons of code that you're in the middle of executing don't take effect immediately, and it's not really any different for regex rules than for subs. Ah, but we're not redefining the sub that's running, but the subs it's about to call. That works for subs, and Simon Cozens already pointed out we certainly also need it for rules :-) Well, I'm not 100% sure we need it for rules. Simon's point is well-taken, but on further reflection what we're doing is subclassing the existing grammar and reinvoking the regex engine on that subclassed grammar, rather than redefining the grammar actually in use. The former doesn't require runtime redefinitions, the latter does, and I think we're going to use the former scheme. Actually, we should be extraordinarily liberal with the application of restrictions at this phase. It's far easier to lift a restriction later than to impose it later, This is perl 6, we can add a new restriction next week We can't add them once we hit betas. I'd as soon add them now, rather than later. and I very much want to stomp out any constructs that will force slow code execution. Yes, I may lose, but if I don't try... You're absolutely right, and optimization is very important to me too. But you can't *only* look at the speed of constructs, or we'll be coding in C or assembly :-) We'll need to meet in the middle.. Well, not to be too cranky (I'm somewhat ill at the moment, so I'll apologize in advance) but... no. No, we don't actually have to, though if we could that'd be nice. The issue of hypotheticals is complex. Well, I'm a big boy, I'm sure I can handle it. Are you even talking about semantics or implementation here? Because I already gave my insights on semantics, and I have 'em in my head for implementation too but I should probably take those to perl6-internals instead. Semantics. Until Larry's nailed down what he wants, there are issues of reestablishing hypotheticals on continuation reinvocation, flushing those hypotheticals multiple times, what happens to hypotheticals when you invoke a continuation with hypotheticals in effect, what happens to hypotheticals inside of coroutines when you establish them then yield out, and when hypotheticals are visible to other threads. I read through your proposal (I'm assuming it's the one that started this thread) and it's not sufficient unless I missed something, which I may have. Ultimately the question is How do you backtrack into arbitrary code, and how do we know that the arbitrary code can be backtracked into? My answer is we don't, but I'm not sure how popular that particular answer is. I say, make generic semantics first, and then optimize the heck out of it. That's fine. I disagree. :) Now that Simon Cozens has established that sub-rules need to be looked up at runtime, Well Sounds like a good deal? :-) At the moment, no. It seems like a potentially large amount of overhead for no particular purpose, really. I don't see any win in the regex case, and you're not generalizing it out to the point where there's a win there. (I can see where it would be useful in the general case, but we've come nowhere near touching that) -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Rules and hypotheticals: continuations versus callbacks
On Wed, Mar 19, 2003 at 02:31:58PM -0500, Dan Sugalski wrote: Well, I'm not 100% sure we need it for rules. Simon's point is well-taken, but on further reflection what we're doing is subclassing the existing grammar and reinvoking the regex engine on that subclassed grammar, rather than redefining the grammar actually in use. The former doesn't require runtime redefinitions, the latter does, and I think we're going to use the former scheme. That's not the impression I got from Simon It would also be rather annoying.. think about balanced braces etc, take this rather contrieved, but valid example: $x ~~ m X { macro ... yada yada yada; } X; It seems to be that you're really inside a grammar rule when that macro is defined. Otherwise you'd have to keep a lot of state outside the parser to keep track of such things, which is exactly what perl grammars were supposed to avoid I think. We can't add them once we hit betas. I'd as soon add them now, rather than later. Well, I'd rather not add it at all :-) We'll need to meet in the middle.. Well, not to be too cranky (I'm somewhat ill at the moment, so I'll apologize in advance) but... no. No, we don't actually have to, though if we could that'd be nice. OK, strictly speaking that's true, but I think we can Semantics. Until Larry's nailed down what he wants, there are issues of reestablishing hypotheticals on continuation reinvocation, They should be though, if a variable was hypothesized when the continuation was taken, then it should be hypothesized when that continuation is invoked. flushing those hypotheticals multiple times, Not idea what you mean what happens to hypotheticals when you invoke a continuation with hypotheticals in effect, Basically de-hypothesize all current hypotheticals, and re-hypothesize the ones that were hypothesized when the continuation was taken. You can ofcourse optimize this by skipping the common ancestry, if you know what I mean what happens to hypotheticals inside of coroutines when you establish them then yield out, This follows directly from the implementation of coroutines: the first yield is a normal return, so if you hypothesize $x before that it'll stay hypothesized. if you then hypothesize $y outside the coroutine and call the coroutine again, $y will be de-hypothesized. If the coroutine then hypothesizes $z and yields out, $z will be de-hypothesized and $y re-hypothesized. $x will be unaffected by all this and when hypotheticals are visible to other threads. I haven't thought of that, but to be honest I'm not a big fan of preemptive threading anyway. Cooperative threading using continuations is probably faster, has no synchronization issues. And the behavior of hypotheticals follows naturally there (you can use 'let' or 'temp' to create thread- local variables in that case) I read through your proposal (I'm assuming it's the one that started this thread) and it's not sufficient unless I missed something, which I may have. Also look at Sean O'Rourke's reply and my reply to that; it contains additional info. Sounds like a good deal? :-) At the moment, no. It seems like a potentially large amount of overhead for no particular purpose, really. I have to admit I don't know the details of how your system works, but what I had in mind didn't have any extra overhead at all -- under the (apparently still debatable) assumption that you need to look up subrules at runtime anyway. You do agree that if that is possible, is *is* a good deal? I don't see any win in the regex case, and you're not generalizing it out to the point where there's a win there. (I can see where it would be useful in the general case, but we've come nowhere near touching that) We have come near it.. backtracking is easy using continuations, and we can certainly have rules set the standard for the general case. -- Matthijs van Duin -- May the Forth be with you!
Re: Rules and hypotheticals: continuations versus callbacks
At 9:14 PM +0100 3/19/03, Matthijs van Duin wrote: On Wed, Mar 19, 2003 at 02:31:58PM -0500, Dan Sugalski wrote: Well, I'm not 100% sure we need it for rules. Simon's point is well-taken, but on further reflection what we're doing is subclassing the existing grammar and reinvoking the regex engine on that subclassed grammar, rather than redefining the grammar actually in use. The former doesn't require runtime redefinitions, the latter does, and I think we're going to use the former scheme. That's not the impression I got from Simon It would also be rather annoying.. think about balanced braces etc, take this rather contrieved, but valid example: $x ~~ m X { macro ... yada yada yada; } X; It seems to be that you're really inside a grammar rule when that macro is defined. Right. Macro definition ends, you subclass off the parser object, then immediately call into it, and it eats until the end of the regex, at which point it exits and so does the parent, for lack of input, and the resulting parse tree is turned to bytecode and executed. Otherwise you'd have to keep a lot of state outside the parser to keep track of such things, which is exactly what perl grammars were supposed to avoid I think. You, as a user-level programmer, don't have to track the state. The parser code will, but that's not a big deal. We'll need to meet in the middle.. Well, not to be too cranky (I'm somewhat ill at the moment, so I'll apologize in advance) but... no. No, we don't actually have to, though if we could that'd be nice. OK, strictly speaking that's true, but I think we can Semantics. Until Larry's nailed down what he wants, there are issues of reestablishing hypotheticals on continuation reinvocation, They should be though, if a variable was hypothesized when the continuation was taken, then it should be hypothesized when that continuation is invoked. Should they? Does hypotheticalization count as data modification (in which case it shouldn't) or control modification (in which case it should), and do you restore the hypothetical value at the time the continuation was taken or just re-hypotheticalize the variables? (Which makes continuations potentially more expensive as you need to then save off more info so on invocation you can restore the hypothetical state) What about co-routines, then? And does a yield from a coroutine count as normal or abnormal exit for pushing of hypothetical state outward, or doesn't it count at all? flushing those hypotheticals multiple times, Not idea what you mean I hypotheticalize the variables. I then take a continuation. Flow continues normally, exits off the end normally, hypothetical values get pushed out. I invoke the continuation, flow continues, exits normally. Do I push the values out again? what happens to hypotheticals when you invoke a continuation with hypotheticals in effect, Basically de-hypothesize all current hypotheticals, How? Successfully or unsuccessfully? Does it even *count* as an exit at all if there's a pending continuation that could potentially exit the hypotheticalizing block later? what happens to hypotheticals inside of coroutines when you establish them then yield out, This follows directly from the implementation of coroutines: the first yield is a normal return, so if you hypothesize $x before that it'll stay hypothesized. if you then hypothesize $y outside the coroutine and call the coroutine again, $y will be de-hypothesized. Why? That doesn't make much sense, really. If a variable is hypotheticalized outside the coroutine when I invoke it, the coroutine should see the hypothetical variable. But what about yields from within a couroutine that's hypotheticalized a variable? That's neither a normal nor an abnormal return, so what happens? If the coroutine then hypothesizes $z and yields out, $z will be de-hypothesized and $y re-hypothesized. $x will be unaffected by all this Yech. I don't think that's the right thing to do. and when hypotheticals are visible to other threads. I haven't thought of that, but to be honest I'm not a big fan of preemptive threading anyway. Doesn't matter whether you like it or not, they're a fact that must be dealt with. (And scare up a dual or better processor machine and I'll blow the doors off a cooperative threading scheme, synchronization overhead or not) I read through your proposal (I'm assuming it's the one that started this Sounds like a good deal? :-) At the moment, no. It seems like a potentially large amount of overhead for no particular purpose, really. I have to admit I don't know the details of how your system works, but what I had in mind didn't have any extra overhead at all -- under the (apparently still debatable) assumption that you need to look up subrules at runtime anyway. You do agree that if that is possible, is *is* a good deal? No. Honestly I still don't see the *point*, certainly not in regards to regular expressions and
Re: Rules and hypotheticals: continuations versus callbacks
On Wed, Mar 19, 2003 at 03:46:50PM -0500, Dan Sugalski wrote: Right. Macro definition ends, you subclass off the parser object, then immediately call into it ... You, as a user-level programmer, don't have to track the state. The parser code will, but that's not a big deal. OK, I suppose that works although that still means you're moving the complexity from the perl implementation to its usage: in this case, the perl 6 parser which is written in perl 6 -- but I can well imagine other people want to do the same, and they'll have to do a similar hack. I really don't like that, perl normally moves the complexity away from the programmer and into perl. They should be though, if a variable was hypothesized when the continuation was taken, then it should be hypothesized when that continuation is invoked. Should they? Does hypotheticalization count as data modification (in which case it shouldn't) or control modification (in which case it should), Isn't that the whole point of hypotheses in perl 6? You talk about successful and unsuccessful de-hypothesizing, and about abormals exits etc.. you seem to have a much complexer model of hypotheses than what's in my head. I could be entirely missing things ofcourse, but I haven't seen any evidence to that yet. To me, what 'let' does is temporize a variable except it doesn't get restored if you leave the scope, but only if you use a continuation to go back to a point where it wasn't hypothesized yet. When the last continuation taken *before* the hypothesis is gone, so is the old version and thus the hypothesized variable becomes permanent. The behavior regarding coroutines followed naturally from this, and so does the behavior inside regexen if they use the continuation semantics for backtracking -- which is what I'm suggesting. This leave only behavior regarding preemptive threads, which is actually very easy to solve: disallow hypothesizing shared variables -- it simply makes no sense to do that. Now that I think of it, temporizing shared variables is equally bad news, so this isn't something new. (Which makes continuations potentially more expensive as you need to then save off more info so on invocation you can restore the hypothetical state) Actually, I think 'let' can handle this.. it's only invocation of continuations that will become more expensive because it needs to deal with the hypothesized variables What about co-routines, then? And does a yield from a coroutine count as normal or abnormal exit for pushing of hypothetical state outward, or doesn't it count at all? Your terminology gets rather foreign to me at this point. Assuming a co-routine is implemented using continuations, their behavior follows directly from the description above, and I think the resulting behavior looks fine. I don't see why people would hypothesize variables inside a co-routine anyway. I hypotheticalize the variables. I then take a continuation. Flow continues normally, exits off the end normally, hypothetical values get pushed out. I invoke the continuation, flow continues, exits normally. Do I push the values out again? If it ends normally, the variable isn't de-hypothesized at all. Also, the continuation was created *after* you hypothesized the variable, so when you invoke it nothing will happen to the variable. How? Successfully or unsuccessfully? Does it even *count* as an exit at all if there's a pending continuation that could potentially exit the hypotheticalizing block later? You're making 0% sense to me, apparently because your mental model of hypothesizing differs radically from mine. Why? That doesn't make much sense, really. Probably the same problem in opposite direction :-) (And scare up a dual or better processor machine and I'll blow the doors off a cooperative threading scheme, synchronization overhead or not) Ofcourse, for CPU-intensive applications that spread their computation over multiple threads on a multi-processor machine, you'll certainly need preemptive multithreading. When exactly is the last time you wrote such an application in perl? :-) Seriously though, I think in the common case cooperative threading is likely to be superior.. it has low overhead, it should have faster context switch time, you have no synchronization issues, and you can mostly avoid the need for explicit yielding: in many applications threads will regularly block on something anyway (which will yield to another thread) But anyway, this is getting off-topic.. I'll save it for later. Regex first No. Honestly I still don't see the *point*, certainly not in regards to regular expressions and rules. The hypothetical issues need dealing with in general for threads, coroutines, and continuations, but I don't see how any of this brings anything to rules for the parsing engine. The flow control semantics the regex/parser needs to deal with are small and simple. I just don't see the point of trying to make it
Re: Rules and hypotheticals: continuations versus callbacks
[EMAIL PROTECTED] (Matthijs Van Duin) writes: OK, I suppose that works although that still means you're moving the complexity from the perl implementation to its usage: in this case, the perl 6 parser which is written in perl 6 No, I don't believe that's what's happening. My concern is that at some point, there *will* need to be a bootstrapped parser which is written in some low level language, outputting Parrot bytecode, and it *will* need to be able to reconfigure itself mid-match. I think. I can't remember why I'm so convinced of this, and I'm too tired to think it through with examples right now, and I might be wrong anyway, but at least I can be ready with a solution if it proves necessary. :) -- There is no safe investment. To love at all is to be vulnerable. ... The only place outside Heaven where you can be perfectly safe from all the dangers and pertubations of love is Hell. -CS Lewis The Four Loves
Re: Rules and hypotheticals: continuations versus callbacks
I would like to express my sincere gratitude to all of you for working through these issues. I bent my brain on the Perl 5 regex engine, and that was just a simple recurse-on-success engine--and I'm not the only person it drove mad. I deeply appreciate that Perl 6's regex engine may drive you even madder. But such sacrifices are at the heart of why people love Perl. Thanks! Larry
Re: Rules and hypotheticals: continuations versus callbacks
On Tue, 18 Mar 2003, Matthijs van Duin wrote: and maybe also: What is the current plan? although I got the impression earlier that there isn't any yet for invoking subrules :-) See line 1014, languages/perl6/P6C/rule.pm. The hack I used was to call rules like ordinary subs, and have them push marks onto the regex stack before they return. I'm not sure if this can be made to work with hypotheticals, and I'm sure it won't interact kindly with continuation-taking, but there's _something_. As for the interaction with continuations, I was about to post some of my concerns when I received your long and well-thought-out mail. I need to think about the discussion so far a bit more, but briefly: (1) There's more than one way to go when combining dynamically-scoped variables with continuations: for example, do you use dynamic bindings from where the continuation was taken, or from where it's invoked? (see e.g. Scheme's dynamic-wind). (2) (internals) The functional-language people have found that full continuations are slow, and put a lot of effort into avoiding them where possible. Backtracking languages like Icon and Prolog are implemented by special mechanisms rather than general continuations, probably for this reason. So if we're forced to do a regex engine using full continuations, it will probably be dog-slow (3) On the other hand, we probably want people to intermix regex backtracking, continuation-taking, and hypothetical/dynamic variables, and have it do the right thing, where right means something like mind-bendingly difficult to reason about, but consistent. How do we want these features to play with each other? (4) (internals) Given that Parrot has so many different control mechanisms (call/ret, exceptions, closures, continuations, ...), how do we maintain consistency? And how much of that is parrot's responsibility (versus the perl6 compiler's)? /s
Rules and hypotheticals: continuations versus callbacks
A quick note before I begin: the stuff in this email isn't just an implementation detail - it impacts the language too, that's why I'm posting it here. Should I cross-post to perl6-internals ? (I'm not really familiar with that list yet) I've recently spent thought on backtracking into rules, and the exact nature of hypothetical variables etc. The two systems for backtracking into a subrule that are described below are the best I can think of right now, but maybe I'm completely missing something and is something much simpler possible - in which case I'd love to hear about it ofcourse. :-) My main questions are: Is there a simpler system I'm overlooking? Which of the two systems would you prefer if speed isn't the issue? Which system is likely to run faster on parrot? and maybe also: What is the current plan? although I got the impression earlier that there isn't any yet for invoking subrules :-) Anyway, I will use the following grammar for examples: rule foo { a } rule bar { a+ } rule quux { ab } rule test { [ foo | bar ] quux } Mechanism 1 -- Continuations Continuations can be used to reset the state of the world to the previous backtracking point. Various ways can be imagined to pass around the continuation. I picked one that seems fairly clean to me and doesn't create any more continuations than strictly necessary. One thing I'll need to explain is what 'let' means in this system: it makes sure the variable is restored if a continuation is invoked that was created before the variable was hypothesized. This generalization means you can hypothesize any variable you can temporize. OK, let's look at how the rule 'test' could be implemented using continuations. Note that I'm not paying attention to optimization at this point. method test (Continuation ?backtrack is rw) { backtrack or mark(backtrack) or return; $_ = .new; if (mark(backtrack)) { let $.{foo} = .foo(backtrack); } elsif (mark(backtrack)) { let $.{bar} = .bar(backtrack); } else { backtrack; } let $.{quux} = .quux(backtrack); return $_; } where mark is a utility sub defined as something like: sub mark (Continuation backtrack is rw) { my cc = callcc { $_ } or return 0; let backtrack = cc.assuming(undef); return 1; } Let's see how this would match on 'aaab': 0. A new state object is created (Ignore the first line, it's for later) 1. The first mark hypothesizes backtrack. 2. $.{foo} is hypothesized. foo matches 'a', and since it doesn't do any backtracking, it leaves backtrack alone. 3. $.{quux} is hypothesized. quux fails, it calls backtrack: a. $.{quux} is de-hypothesized. b. $.{foo} is de-hypothesized. c. backtrack is de-hypothesized. d. inside the first mark undef is returned from callcc. mark returns false. 4. The second mark hypothesizes backtrack. 5. $.{bar} is hypothesized. bar matches 'aaa' and hypothesizes backtrack 6. $.{quux} is hypothesized. quux fails, it calls backtrack: a. $.{quux} is de-hypothesized b. inside bar, it backtracks to match 'aa'. bar returns again 7. $.{quux} is hypothesized. quux matches, leaves backtrack alone. Note that the backtrack remains hypothesized after test completes. Let's say test is followed by fail and see that happens: 8. fail calls backtrack a. $.{quux} is de-hypothesized b. inside bar, it backtracks to match 'a'. bar returns again 9. $.{quux} is hypothesized. quux fails, it calls backtrack: a. $.{quux} is de-hypothesized b. inside bar, backtrack is de-hypothesized. bar calls backtrack c. $.{bar} is de-hypothesized d. backtrack is de-hypothesized e. inside the second mark undef is returned from callcc. mark returns false. 10. backtrack is called, causing backtracking into whatever was before test. This only leaves the issue of how to deal with the top-level, where the continuation is omitted. The magical first line will in that case create a continuation which will simply return from the rule if the match fails. If the top-level match succeeds then the backtrack variable disappears into thin air, and with it all backtracking information (the continuations and de-hypotheticalization info). Note that the user can ofcourse choose to retain the backtracking info, and use it later to cause backtracking into the match after it has completed: if Grammar.rule($string, my backtrack) { ... if i_dont_like_this_match { backtrack; # try another } } Mechanism 2 -- Callbacks The second mechanism is a bit more mundane. The idea is that every rule will get passed a closure that's called to match whatever comes next. If the whatever comes next fails, the rule can backtrack and call the closure again. This time 'let' is exactly the same as 'temp' except hypothetical variables are only allowed inside the dynamic scope of a subroutine with a special trait
Re: Rules and hypotheticals: continuations versus callbacks
My main questions are: Is there a simpler system I'm overlooking? Which of the two systems would you prefer if speed isn't the issue? Mechanism 1. Which system is likely to run faster on parrot? They're both likely to be very slow. and maybe also: What is the current plan? although I got the impression earlier that there isn't any yet for invoking subrules :-) Sure there is. It's boling away in my brain and in my local parrot copy. (You haven't seen any commits because I'm overhauling it and it doesn't... well... work yet). My plan is to allow whatever we find is best, and swap them around and benchmark them seperately. But the two engines I have in mind include one very similar to your two examples, and one more classical approach. Plan 1: Pass each rule a Isuccess continuation (rather than a backtrack one), and have it just return on failure. The big difference between this and your example is that Clets are now implemented just like Ctemps. Too bad Clet needs non-regex behavior, too. Plan 2: Call subrules as plain old subs and have them throw a backtrack exception on failure (or just return a failure-reporting value... same difference, more or less). This has the advantage that Clet behaves consistently with the rest of Perl. It has the disadvantage that we have to manually implement backtracking through individual rules. It has the advantage that it's easier to optimize. I looked around in Parrot a little, and it seems like continuations are done pretty efficiently. So, I can't really say which of these would be faster, but I'd guess the latter. I'll be writing them both, though, so we'll see :) Luke