Re: Rules and hypotheticals: continuations versus callbacks

2003-03-20 Thread Dan Sugalski
At 12:00 AM + 3/20/03, Simon Cozens wrote:
[EMAIL PROTECTED] (Matthijs Van Duin) writes:
 OK, I suppose that works although that still means you're moving the
 complexity from the perl implementation to its usage: in this case,
 the perl 6 parser which is written in perl 6
No, I don't believe that's what's happening. My concern is that at some
point, there *will* need to be a bootstrapped parser which is written in
some low level language, outputting Parrot bytecode, and it *will* need
to be able to reconfigure itself mid-match.
I think. I can't remember why I'm so convinced of this, and I'm too tired
to think it through with examples right now, and I might be wrong anyway,
but at least I can be ready with a solution if it proves necessary. :)
You may well be right--I don't think so,  but I'm not at my clearest 
either. I don't see that it'll be needed outside the initial 
bootstrap parser if at all, so I'm not too worried. (And the 
low-level language for it will probably be perl 5, since I'd far 
rather build something with a Parse::RecDescent grammar than a 
hand-nibbler in C)
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Rules and hypotheticals: continuations versus callbacks

2003-03-20 Thread Austin Hastings

--- Matthijs van Duin [EMAIL PROTECTED] wrote:
 On Wed, Mar 19, 2003 at 03:46:50PM -0500, Dan Sugalski wrote:
 
 They should be though, if a variable was hypothesized when the 
 continuation was taken, then it should be hypothesized when that 
 continuation is invoked.
 
 Should they? Does hypotheticalization count as data modification (in
 
 which case it shouldn't) or control modification (in which case it 
 should),
 
 Isn't that the whole point of hypotheses in perl 6?  You talk about 
 successful and unsuccessful de-hypothesizing, and about abormals
 exits etc..  you seem to have a much complexer model of hypotheses 
 than what's in my head.

The complex model is right -- in other words, if hypotheses are to be a
first-class part of the language then they must interoperate with all
the other language features.

So there's several ways out of a sub. What does a hypo do in all these
cases?

let $x   - Declares a hypothesis. I think this should be a verb.
   (Which is to say, a function instead of a storage
class.)
   (This in turn suggests that primitive types can't be 
   hypothesized, although arrays thereof could be.)

fail - Pretty strongly suggests that the hypo is ignored. 
   Also suggests continuation/backtracking behavior. Do we
   really know what fail does? See discard, below.

discard $x   - (suggested) Obvious keyword for failing one single
   hypothesis.

keep $x  - Suggests the value is made permanent. This should be 
   a verb. Ckeep with no args should just keep every 
   hypo in the current fillintheSCOPE.

return   - This is unclear. On the one hand, it's a transfer of 
   control and I think that Clet and Ckeep are data,
   not control, modifiers. On the other hand, it's one of 
   the normal ways to leave a block, and could be argued
   either way.

   (Also: remember the bad old days of Cmy vs. Clocal.
   I propose that hypos fail by default -  to keep the 
   distinction clear in the minds of newbies.)

throw- Exceptions unwind the call stack. If let is a 
   control action, it should undo. If let is a data
   action, it should not. To me, Clet is an explicit
   action taken against a variable, and should not be
   undone by this. (Of course, if unwinding the call stack
   causes the variable to go out of scope, it's not an
   issue.)

continuation: goto
 - Again: continuations are transfers of control, not 
   data. If let is a control action, continuations
   will have to know if they are transferring back up 
   the stack or if they are transferring to some new,
   never-before-seen (on the stack) place. I think that
   Clet should be a data action, in which case this
   doesn't affect the hypothesis. 

generators: yield
 - Same issues, although the argument can be made that 
   since you can resume a generator, the hypo should 
   be confined to the extant-but-inactive scope.


 To me, what 'let' does is temporize a variable except it doesn't get 
 restored if you leave the scope, but only if you use a continuation
 to go back to a point where it wasn't hypothesized yet.

Yes and no.

I agree it shouldn't get restored, see above. However, continuations
don't touch data. So a global variable that has been hypo'ed should
(IMO) remain so after the continuation. 

Frankly, if you mix the two, it's YOUR job to understand the
ramifications.

 When the last continuation taken *before* the hypothesis is gone, so
 is the old version and thus the hypothesized variable becomes 
 permanent.

I disagree. Proposal and acceptance should be explicit actions.

 The behavior regarding coroutines followed naturally from this, and
 so does the behavior inside regexen if they use the continuation
 semantics for  backtracking -- which is what I'm suggesting.
 
 This leave only behavior regarding preemptive threads, which is
 actually very easy to solve:  disallow hypothesizing shared 
 variables -- it simply makes no sense to do that.  Now that 
 I think of it, temporizing shared variables is equally bad news,
 so this isn't something new.

Why?

If you constrain hypotheses to the thread (making it a control action
instead of a data action) this could be a way to get cheap MUXing.
Hypothesize all the new values you wish, then pay once to get a mux,
then keep all the data values while you've got the mux. Shrinks your
critical region:

{: is synchronized($mux)
  keep all;
}

OTOH, if you are really using threads well, then your app may construct
a hypothesis based on user input, and the math and visualizer and gui
threads may all need to work in that hypothetical space.

 (Which makes continuations 

Re: Rules and hypotheticals: continuations versus callbacks

2003-03-20 Thread Matthijs van Duin
On Thu, Mar 20, 2003 at 08:49:28AM -0800, Austin Hastings wrote:
--- Matthijs van Duin [EMAIL PROTECTED] wrote:
you seem to have a much complexer model of hypotheses 
than what's in my head.
The complex model is right -- in other words, if hypotheses are to be a
first-class part of the language then they must interoperate with all
the other language features.
(lots of explanation here)
You're simply expanding on the details your complex model - not on the 
need for it in the first place.

I'll see if I can write some details/examples of my model later, and show 
how it interacts with various language features in a simple way.


This leave only behavior regarding preemptive threads, which is
actually very easy to solve:  disallow hypothesizing shared 
variables -- it simply makes no sense to do that.  Now that 
I think of it, temporizing shared variables is equally bad news,
so this isn't something new.
I just realize there's another simple alternative: make it cause the 
variable become thread-local for that particular thread.


Hypothesize all the new values you wish, then pay once to get a mux,
then keep all the data values while you've got the mux. Shrinks your
critical region
You're introducing entirely new semantics here, and personally I think 
you're abusing hypotheses, although I admit in an interesting and 
potentially useful way. I'll spend some thought on that.


My experience has been that when anyone says I don't see why anyone
would ..., Damian immediately posts an example of why.
No problem since it works fine in my model (I had already mentioned that 
earlier) - I just said *I* don't see why anyone would.. :-)


So, stop talking about rexen. When everyone groks how continuations
should work, it'll fall out. 
rexen were the main issue: Dan was worried about performance

(And if you reimplement the rexengine using continuations and outperform 
Dan's version by 5x or better, then we'll have another Geek Cruise to 
Glacier Bay and strand Dan on an iceberg. :-)
I don't intend to outperform him.. I intend to get the same performance 
with cleaner, simpler and more generic semantics.

But as I said in my previous post.. give me some time to work out the 
details.. maybe I'll run into fatal problems making the whole issue moot :)

BTW, you say reimplement ?  Last time I checked hypothetic variables 
weren't implemented yet, let alone properly interact with continuations. 
Maybe it's just sitting in someone's local version, but until I have 
something to examine, I can't really compare its performance to my system.

--
Matthijs van Duin  --  May the Forth be with you!


Re: Rules and hypotheticals: continuations versus callbacks

2003-03-19 Thread Matthijs van Duin
On Tue, Mar 18, 2003 at 09:28:43PM -0700, Luke Palmer wrote:
Plan 1:  Pass each rule a Isuccess continuation (rather than a
backtrack one), and have it just return on failure.  The big
difference between this and your example is that Clets are now
implemented just like Ctemps.  Too bad Clet needs non-regex
behavior, too.
That's mechanism #2, not #1

You probably don't mean the word continuation here though, since 
a continuation never returns, so once a rule would invoke the success 
continuation it can't regain control except via another continuation 

You probably simply mean a closure


Plan 2:  Call subrules as plain old subs and have them throw a
backtrack exception on failure (or just return a failure-reporting
value... same difference, more or less).
But.. say you have:

foo bar

Would would this be implemented?  When bar fails, it needs to backtrack 
into foo, which has already returned.  Are you saying every rule will be 
an explicit state machine?


This has the advantage that Clet behaves consistently with the 
rest of Perl
What do you mean?


I looked around in Parrot a little, and it seems like continuations
are done pretty efficiently.
Yes, I noticed that do

--
Matthijs van Duin  --  May the Forth be with you!


Re: Rules and hypotheticals: continuations versus callbacks

2003-03-19 Thread Leopold Toetsch
Matthijs van Duin wrote:

Which system is likely to run faster on parrot?


I would propose, estimate the ops you need and test it :)

E.g. call a continuation 1e6 times and communicate state with one global 
(a lexical is probably the same speed, i.e. a hash lookup)

$ cat a.pasm
new P5, .PerlInt
set P5, 100
store_global foo, P5
new P0, .Continuation
set_addr I0, endcont
set P0, I0
endcont:
find_global P4, foo
unless P4, done
dec P4
# store_global foo, P4 --no need to store, P4 is a reflike thingy
invoke
done:
print done\n
end
$ time imcc -P a.pasm
done
real0m0.881s  

$ imcc -p a.pasm
done
   OPERATION PROFILE
  CODE   OP FULL NAMECALLS  TOTAL TIMEAVG TIME
  -  - ---  --  --
  0  end 10.010.01
 40  set_addr_i_ic   10.010.01
 66  set_p_i 10.010.01
 67  set_p_ic10.040.04
226  unless_p_ic   1010.5950250.01
276  dec_p 1000.5999460.01
758  store_global_sc_p   10.060.06
760  find_global_p_sc  1011.0379220.01
786  new_p_ic20.110.05
819  invoke1000.9140630.01
883  print_sc10.0052990.005299
  -  - ---  --  --
 114103.1522800.01
So you can estimate that the more heavy opcodes take about 1us, more 
light vtable functions are ~double that speed with the slow, profiling 
core. CGP (or JIT) is 3 - 4 times faster.

-O3 compiled parrot, Athlon 800, i386/linux

leo



Re: Rules and hypotheticals: continuations versus callbacks

2003-03-19 Thread Matthijs van Duin
On Wed, Mar 19, 2003 at 10:38:54AM +0100, Leopold Toetsch wrote:
I would propose, estimate the ops you need and test it :)
Hmm, good point

Or even better.. I should just implement both examples and benchmark them; 
they're simple enough and the ops are available.

I guess it's time to familiarize myself with pasm :)

--
Matthijs van Duin  --  May the Forth be with you!


Re: Rules and hypotheticals: continuations versus callbacks

2003-03-19 Thread Matthijs van Duin
On Wed, Mar 19, 2003 at 01:01:28PM +0100, Matthijs van Duin wrote:
On Wed, Mar 19, 2003 at 10:38:54AM +0100, Leopold Toetsch wrote:
I would propose, estimate the ops you need and test it :)
Hmm, good point

Or even better.. I should just implement both examples and benchmark them; 
they're simple enough and the ops are available.
except I forgot entirely about let

however the implementation let will have impact on the performance of both 
systems.. oh well, I'll just have to estimate like you said :-)

--
Matthijs van Duin  --  May the Forth be with you!


Re: Rules and hypotheticals: continuations versus callbacks

2003-03-19 Thread Matthijs van Duin
On Wed, Mar 19, 2003 at 10:38:54AM +0100, Leopold Toetsch wrote:
I would propose, estimate the ops you need and test it :)
I haven't completed testing yet, however it's becoming clear to me that 
this is likely to be a pointless effort

There are so many variables that can affect performance here that the 
results I may find in these tests are unlikely to have any relation to 
the performance of rules in practice.

1. making continuations affects the performance of *other* code (COW)
2. the let operation is missing and all attempts to fake it are silly
3. to really test it, I'd need to make subrules full subroutines, but then 
  the performance difference will probably disappear in the overhead of all 
  other stuff.  To test I'd need large, realistic patterns; and I'm 
  certainly not in the mood to write PIR for them manually.

And it appears that on my machine continuations and garbage collection have
a quarrel, which also makes testing problematic.
I guess the only way to find out is to implement both systems and compare 
them using a large test set of realistic grammars.  Or ofcourse just 
implement it using continuations (system #1), since the speed difference 
probably isn't gonna be huge anyway.

Here is my test program for continuation and the results on my machine:

# aaab ~~ / ^ [ a | a* ] ab fail /

set I5, 1000
sweepoff# or bus error
collectoff  # or segmentation fault

begin:
set S0, aaab
set I0, 0
new P0, .Continuation
set_addr I1, second
set P0, I1
rx_literal S0, I0, a, backtrack
branch third
second:
new P0, .Continuation
set_addr I1, fail
set P0, I1
deeper:
rx_literal S0, I0, a, third
save P0 # normally hypothesize
new P0, .Continuation
set_addr I1, unwind
set P0, I1
branch deeper
unwind:
dec I0  # normally de-hypothesize
restore P0  # normally de-hypothesize
third:
rx_literal S0, I0, ab, backtrack
sub I0, 2   # normally de-hypothesize
backtrack:
invoke
fail:
dec I5
if I5, begin
end


  OPERATION PROFILE 

 CODE   OP FULL NAME CALLS  TOTAL TIMEAVG TIME
 -     ---  --  --
 0  end  10.290.29
40  set_addr_i_ic 50000.0249280.05
46  set_i_ic  10010.0105730.11
60  set_s_sc  10000.0057170.06
66  set_p_i   50000.0162010.03
   213  if_i_ic   10000.0028480.03
   274  dec_i 40000.0113900.03
   370  sub_i_ic  10000.0042270.04
   675  save_p30000.1923090.64
   682  restore_p 30000.2464570.82
   719  branch_ic 40000.0122160.03
   770  sweepoff 10.140.14
   772  collectoff   10.030.03
   786  new_p_ic  50000.1794030.36
   819  invoke50000.0262850.05
   962  rx_literal_s_i_sc_ic 10.0542600.05
 -     ---  --  --
16   480040.7868610.16
iBook; PPC G3; 700 Mhz

--
Matthijs van Duin  --  May the Forth be with you!


Re: Rules and hypotheticals: continuations versus callbacks

2003-03-19 Thread Dan Sugalski
At 10:05 AM +0100 3/19/03, Matthijs van Duin wrote:
But.. say you have:

foo bar

Would would this be implemented?  When bar fails, it needs to 
backtrack into foo, which has already returned.  Are you saying 
every rule will be an explicit state machine?
By compile-time interpolation. foo isn't so much a subroutine as a 
macro. For this to work, if we had:

  foo: \w+?
  bar: [plugh]{2,5}
then what the regex engine *really* got to compile would be:

   (\w+?) ([plugh]{2,5})

with names attached to the two paren groups. Treating them as actual 
subroutines leads to madness, continuations don't quite work, and 
coroutines could pull it off if we could pass data back into a 
coroutine on reinvocation, but...

We do, after all, want this fast, right?
--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Rules and hypotheticals: continuations versus callbacks

2003-03-19 Thread Matthijs van Duin
On Wed, Mar 19, 2003 at 10:40:02AM -0500, Dan Sugalski wrote:
By compile-time interpolation. foo isn't so much a subroutine as a 
macro. For this to work, if we had:

  foo: \w+?
  bar: [plugh]{2,5}
then what the regex engine *really* got to compile would be:

   (\w+?) ([plugh]{2,5})

with names attached to the two paren groups. Treating them as actual 
subroutines leads to madness,
Ehm, Foo.test cannot inline Foo.foo since it may be overridden:

grammar Foo {
rule foo { \w+? }
rule bar { [plugh]{2,5} }
rule test { foo bar }
}
grammar Bar is Foo {
rule foo { alpha+? }
}
What you say is only allowed if I put is inline on foo.



continuations don't quite work
Care to elaborate on that?  I'd say they work fine

We do, after all, want this fast, right?
Ofcourse, and we should optimize as much as we can - but not optimize 
*more* than we can.  Rules need generic backtracking semantics, and that's 
what I'm talking about.  Optimizations to avoid the genericity of these 
backtracking semantics is for later.

--
Matthijs van Duin  --  May the Forth be with you!


Re: Rules and hypotheticals: continuations versus callbacks

2003-03-19 Thread Dan Sugalski
At 4:52 PM +0100 3/19/03, Matthijs van Duin wrote:
On Wed, Mar 19, 2003 at 10:40:02AM -0500, Dan Sugalski wrote:
By compile-time interpolation. foo isn't so much a subroutine as 
a macro. For this to work, if we had:

  foo: \w+?
  bar: [plugh]{2,5}
then what the regex engine *really* got to compile would be:

   (\w+?) ([plugh]{2,5})

with names attached to the two paren groups. Treating them as 
actual subroutines leads to madness,
Ehm, Foo.test cannot inline Foo.foo since it may be overridden:

grammar Foo {
rule foo { \w+? }
rule bar { [plugh]{2,5} }
rule test { foo bar }
}
grammar Bar is Foo {
rule foo { alpha+? }
}
What you say is only allowed if I put is inline on foo.
At the time I run the regex, I can inline things. There's nothing 
that prevents it. Yes, at compile time it's potentially an issue, 
since things can be overridden later, but that's going to be 
relatively rare, and can be dealt with by selective recompilation.

By the time the regex is actually executed, it's fully specified. By 
definition if nothing else--you aren't allowed to selectively 
redefine rules in the middle of a regex that uses those rules. Or, 
rather, you can but the update won't take effect until after the end 
of the regex, the same way that you can't redefine a sub you're in 
the middle of executing. (And yes, I'm aware that if you do that 
you'll pick up the new version if you recursively call, but that 
won't work with regexes)

continuations don't quite work
Care to elaborate on that?  I'd say they work fine
There's issues with hypothetical variables and continuations. (And 
with coroutines as well) While this is a general issue, they come up 
most with regexes.

We do, after all, want this fast, right?
Ofcourse, and we should optimize as much as we can - but not 
optimize *more* than we can.  Rules need generic backtracking 
semantics, and that's what I'm talking about.
No. No, in fact they don't. Rules need very specific backtracking 
semantics, since rules are fairly specific. We're talking about 
backtracking in regular expressions, which is a fairly specific 
generality. If you want to talk about a more general backtracking 
that's fine, but it won't apply to how regexes backtrack.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Rules and hypotheticals: continuations versus callbacks

2003-03-19 Thread Matthijs van Duin
On Wed, Mar 19, 2003 at 11:09:01AM -0500, Dan Sugalski wrote:
At the time I run the regex, I can inline things. There's nothing 
that prevents it. Yes, at compile time it's potentially an issue, 
since things can be overridden later,
OK, but that's not how you initially presented it :-)


you aren't allowed to selectively redefine rules in the middle of a regex 
that uses those rules. Or, rather, you can but the update won't take 
effect until after the end
I don't recall having seen such a restriction mentioned in Apoc 5.

While I'm a big fan of optimization, especially for something like this, 
I think we should be careful with introducing mandatory restrictions just 
to aid optimization.  (is inline will allow such optimizations ofcourse)


There's issues with hypothetical variables and continuations. (And 
with coroutines as well) While this is a general issue, they come up 
most with regexes.
I'm still curious what you're referring to exactly.  I've outlined possible 
semantics for hypothetical variables in earlier posts that should work.


We do, after all, want this fast, right?
Ofcourse, and we should optimize as much as we can - but not 
optimize *more* than we can.  Rules need generic backtracking 
semantics, and that's what I'm talking about.
No. No, in fact they don't. Rules need very specific backtracking 
semantics, since rules are fairly specific. We're talking about 
backtracking in regular expressions, which is a fairly specific 
generality. If you want to talk about a more general backtracking 
that's fine, but it won't apply to how regexes backtrack.
My impression from A5 and A6 is that rules are methods.  They're looked up 
like methods, they can be invoked like methods, etc.

I certainly want to be able to write rules myself, manually, when I think 
it's appropriate; and use these as subrules in other methods.  Generic 
backtracking semantics are needed for that, and should at least conceptually 
also apply to normal rules.

When common sub-patterns are inlined, simple regexen will not use runtime 
subrules at all, so the issue doesn't exist there - that covers everything 
you would do with regexen in perl 5 for example.

When you do use real sub-rules, you're getting into the domain previously 
held by Parse::RecDescent and the like.  While these should ofcourse still 
be as fast as possible, a tiny bit of overhead on top of regular regex is 
understandable.

However, such overhead might not be even needed at all:  whenever possible 
optimizations should be applied, and rules are free to use special hacky 
but fast calling semantics to subrules if they determine that's possible. 
But I don't think a special optimization should be elevated to the official 
semantics.  I say, make generic semantics first, and then optimize the heck 
out of it.

--
Matthijs van Duin  --  May the Forth be with you!


Re: Rules and hypotheticals: continuations versus callbacks

2003-03-19 Thread Leopold Toetsch
Matthijs van Duin wrote:

sweepoff# or bus error
collectoff# or segmentation fault
Please try :

/* set this to 1 for tracing the system stack and processor registers */
#define TRACE_SYSTEM_AREAS 1
in dod.c (works for me).

Though I don't know, if processor registers on PPC gets traced by this 
(it might not stand optimization if not).

Code is not that deeply looked at, that we can savely turn off stack 
tracing yet.

leo



Re: Rules and hypotheticals: continuations versus callbacks

2003-03-19 Thread Jonathan Scott Duff
On Wed, Mar 19, 2003 at 11:09:01AM -0500, Dan Sugalski wrote:
 By the time the regex is actually executed, it's fully specified. By 
 definition if nothing else--you aren't allowed to selectively 
 redefine rules in the middle of a regex that uses those rules. Or, 
 rather, you can but the update won't take effect until after the end 
 of the regex, the same way that you can't redefine a sub you're in 
 the middle of executing. (And yes, I'm aware that if you do that 
 you'll pick up the new version if you recursively call, but that 
 won't work with regexes)

Are you implying that 

$fred = rx/fred/;
$string ~~ m:w/ $fred { $fred = rx/barney/; } rubble /

won't match barney rubble?

-Scott
-- 
Jonathan Scott Duff
[EMAIL PROTECTED]


Re: Rules and hypotheticals: continuations versus callbacks

2003-03-19 Thread Sean O'Rourke
On Wed, 19 Mar 2003, Jonathan Scott Duff wrote:
 Are you implying that

   $fred = rx/fred/;
   $string ~~ m:w/ $fred { $fred = rx/barney/; } rubble /

 won't match barney rubble?

Or, worse, that

   $fred = rx/fred/;
   $string ~~ m:w/ { $fred = rx/barney/; } $fred rubble /

won't, either?

/s



Re: Rules and hypotheticals: continuations versus callbacks

2003-03-19 Thread Dan Sugalski
At 10:41 AM -0600 3/19/03, Jonathan Scott Duff wrote:
On Wed, Mar 19, 2003 at 11:09:01AM -0500, Dan Sugalski wrote:
 By the time the regex is actually executed, it's fully specified. By
 definition if nothing else--you aren't allowed to selectively
 redefine rules in the middle of a regex that uses those rules. Or,
 rather, you can but the update won't take effect until after the end
 of the regex, the same way that you can't redefine a sub you're in
 the middle of executing. (And yes, I'm aware that if you do that
 you'll pick up the new version if you recursively call, but that
 won't work with regexes)
Are you implying that

$fred = rx/fred/;
$string ~~ m:w/ $fred { $fred = rx/barney/; } rubble /
won't match barney rubble?
Potentially, no. What, then, should happen if you do:

   $barney = rx/barney/;
   $string = barney rubble;
   $string ~~ m:w/ $barney { $barney = rx/fred/; } rubble /;
The regex shouldn't match, since you've invalidated part of the match 
in the middle.

I can potentially see constructs of the form $var be taken as 
indirect rule invocations and their dispatch left to runtime, 
complete with the potential for bizarre after-the-fact invalidations, 
but as regex rules in the regex stream rather than as generic code.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Rules and hypotheticals: continuations versus callbacks

2003-03-19 Thread Dan Sugalski
At 5:38 PM +0100 3/19/03, Matthijs van Duin wrote:
On Wed, Mar 19, 2003 at 11:09:01AM -0500, Dan Sugalski wrote:
At the time I run the regex, I can inline things. There's nothing 
that prevents it. Yes, at compile time it's potentially an issue, 
since things can be overridden later,
OK, but that's not how you initially presented it :-)
Then I wasn't clear enough, sorry. This is perl -- the state of 
something at compile time is just a suggestion as to how things 
ultimately work. The state at the time of is the only thing that 
really matters, and I shortcut.

you aren't allowed to selectively redefine rules in the middle of a 
regex that uses those rules. Or, rather, you can but the update 
won't take effect until after the end
I don't recall having seen such a restriction mentioned in Apoc 5.
I'll nudge Larry to add it explicitly, but in general redefinitons of 
code that you're in the middle of executing don't take effect 
immediately, and it's not really any different for regex rules than 
for subs.

While I'm a big fan of optimization, especially for something like 
this, I think we should be careful with introducing mandatory 
restrictions just to aid optimization.  (is inline will allow such 
optimizations ofcourse)
Actually, we should be extraordinarily liberal with the application 
of restrictions at this phase. It's far easier to lift a restriction 
later than to impose it later, and I very much want to stomp out any 
constructs that will force slow code execution. Yes, I may lose, but 
if I don't try...

My job, after all, is to make it go fast. If you want something 
that'll require things to be slow then I don't want you to have it. :)

There's issues with hypothetical variables and continuations. (And 
with coroutines as well) While this is a general issue, they come 
up most with regexes.
I'm still curious what you're referring to exactly.  I've outlined 
possible semantics for hypothetical variables in earlier posts that 
should work.
The issue of hypotheticals is complex.

We do, after all, want this fast, right?
Ofcourse, and we should optimize as much as we can - but not 
optimize *more* than we can.  Rules need generic backtracking 
semantics, and that's what I'm talking about.
No. No, in fact they don't. Rules need very specific backtracking 
semantics, since rules are fairly specific. We're talking about 
backtracking in regular expressions, which is a fairly specific 
generality. If you want to talk about a more general backtracking 
that's fine, but it won't apply to how regexes backtrack.
My impression from A5 and A6 is that rules are methods.  They're 
looked up like methods, they can be invoked like methods, etc.
They aren't methods, though. They're not code in general, they're 
regex constructions in specific. Because they live in the symbol 
table and in some cases can be invoked as subs/methods doesn't make 
them subs or methods, it makes them regex constructs with funky 
wrappers if you want to use them in a non-regex manner.

I certainly want to be able to write rules myself, manually, when I 
think it's appropriate; and use these as subrules in other methods. 
Generic backtracking semantics are needed for that, and should at 
least conceptually also apply to normal rules.
No, no it shouldn't. Rule are rules for regexes, they are *not* subs. 
If you want generic backtracking to work, then there can't be any 
difference between:

  rule foo { \w+ }

and
  sub foo { ... }
but there must be. With rules as regex constructs the semantics are 
much simpler. If we allow rules to be arbitrary code not only do we 
have to expose a fair amount of the internals of the regex engine to 
the sub so it can actually work on the stream and note its position 
(which is fine, I can do that) we also need to be able to pause foo 
in the middle and jump back in while passing in parameters of some 
sort. Neither continuations nor standard coroutines are sufficient in 
this instance, since the reinvocation must *both* preserve the state 
of the code at the time it exited but also pass in an indication as 
to what the sub should do. For example, if the foo sub was treated as 
a rule and we backtrack, should it slurp more or less?

If rules are just plain regex rules and not potentially arbitrary 
code, the required semantics are much simpler.

Then there's the issue of being able to return continuations from 
within arbitrary unnamed blocks, since the block in this:

   $foo ~~ m:w/alpha {...} number/;

should be able to participate in the backtracking activities if we're 
not drawing a distinction between rules and generic code. (Yeah, the 
syntax is wrong, but you get the point)

Ultimately the question is How do you backtrack into arbitrary code, 
and how do we know that the arbitrary code can be backtracked into? 
My answer is we don't, but I'm not sure how popular that particular 
answer is.

When common sub-patterns are inlined, simple regexen will not use 
runtime subrules at all, so the 

Re: Rules and hypotheticals: continuations versus callbacks

2003-03-19 Thread Simon Cozens
[EMAIL PROTECTED] (Dan Sugalski) writes:
 you aren't allowed to selectively redefine
 rules in the middle of a regex that uses those rules.

This is precisely what a macro does.

-- 
How should I know if it works?  That's what beta testers are for.  I only
coded it.
(Attributed to Linus Torvalds, somewhere in a posting)


Re: Rules and hypotheticals: continuations versus callbacks

2003-03-19 Thread Dan Sugalski
At 5:47 PM + 3/19/03, Simon Cozens wrote:
[EMAIL PROTECTED] (Dan Sugalski) writes:
 you aren't allowed to selectively redefine
 rules in the middle of a regex that uses those rules.
This is precisely what a macro does.
Not once execution starts, no.
--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Rules and hypotheticals: continuations versus callbacks

2003-03-19 Thread Simon Cozens
[EMAIL PROTECTED] (Dan Sugalski) writes:
 At 5:47 PM + 3/19/03, Simon Cozens wrote:
 [EMAIL PROTECTED] (Dan Sugalski) writes:
   you aren't allowed to selectively redefine
   rules in the middle of a regex that uses those rules.
 
 This is precisely what a macro does.
 
 Not once execution starts, no.

Compilation's just execution of a regex, albeit the Perl6::Grammar::program
regex, and that regex will need to be modified while it's in operation in
order to pick up macro is parsed definitions and apply them to the rest
of what it's parsing.

-- 
* DrForr digs around for a fresh IV drip bag and proceeds to hook up.
dngor Coffee port.
DrForr Firewalled, like everything else around here.


Re: Rules and hypotheticals: continuations versus callbacks

2003-03-19 Thread Dan Sugalski
At 5:54 PM + 3/19/03, Simon Cozens wrote:
[EMAIL PROTECTED] (Dan Sugalski) writes:
 At 5:47 PM + 3/19/03, Simon Cozens wrote:
 [EMAIL PROTECTED] (Dan Sugalski) writes:
   you aren't allowed to selectively redefine
   rules in the middle of a regex that uses those rules.
 
 This is precisely what a macro does.
 Not once execution starts, no.
Compilation's just execution of a regex, albeit the Perl6::Grammar::program
regex, and that regex will need to be modified while it's in operation in
order to pick up macro is parsed definitions and apply them to the rest
of what it's parsing.
Ah, damn, I wasn't thinking far enough out. I'm not sure it'll work 
quite like that, with a single call to the regex engine that spits 
out everything in one go. More likely it'll be a set of iterative 
calls to the engine that terminate at natural sequence points, 
potentially with recursive calls into the parsing regex.

--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Rules and hypotheticals: continuations versus callbacks

2003-03-19 Thread Simon Cozens
[EMAIL PROTECTED] (Dan Sugalski) writes:
 Compilation's just execution of a regex, albeit the Perl6::Grammar::program
 regex, and that regex will need to be modified while it's in operation in
 order to pick up macro is parsed definitions and apply them to the rest
 of what it's parsing.
 
 Ah, damn, I wasn't thinking far enough out. 

This, you see, is precisely why some of us started work last year on a
regular expression engine which could handle having its expressions
rewritten during the match... ;)

-- 
Last week I forgot how to ride a bicycle.  -- Steven Wright


Re: Rules and hypotheticals: continuations versus callbacks

2003-03-19 Thread Matthijs van Duin
On Wed, Mar 19, 2003 at 12:35:19PM -0500, Dan Sugalski wrote:
Then I wasn't clear enough, sorry. This is perl -- the state of 
something at compile time is just a suggestion as to how things 
ultimately work.
Yes, hence my surprise about actually inlining stuff, luckily that was 
just a misunderstanding :-)


I'll nudge Larry to add it explicitly, but in general redefinitons of 
code that you're in the middle of executing don't take effect 
immediately, and it's not really any different for regex rules than 
for subs.
Ah, but we're not redefining the sub that's running, but the subs it's 
about to call.  That works for subs, and Simon Cozens already pointed out 
we certainly also need it for rules :-)


Actually, we should be extraordinarily liberal with the application 
of restrictions at this phase. It's far easier to lift a restriction 
later than to impose it later,
This is perl 6, we can add a new restriction next week

and I very much want to stomp out any constructs that will force slow code 
execution. Yes, I may lose, but if I don't try...
You're absolutely right, and optimization is very important to me too.  But 
you can't *only* look at the speed of constructs, or we'll be coding in C 
or assembly :-)

We'll need to meet in the middle..


The issue of hypotheticals is complex.
Well, I'm a big boy, I'm sure I can handle it.  Are you even talking about 
semantics or implementation here?  Because I already gave my insights on 
semantics, and I have 'em in my head for implementation too but I should 
probably take those to perl6-internals instead.

Ultimately the question is How do you backtrack into arbitrary code, 
and how do we know that the arbitrary code can be backtracked into? 
My answer is we don't, but I'm not sure how popular that particular 
answer is.

I say, make generic semantics first, and then optimize the heck out of it.
That's fine. I disagree. :)
Now that Simon Cozens has established that sub-rules need to be looked up 
at runtime, I think we can both be happy:

As far as I can see, a rule will consist of two parts: The wrapper that 
will handle stuff when the rule is invoked as a normal method, perhaps 
handle modifiers, handle searches for unanchored matches, setup the state, 
etc;  and the actual body that does a match at the current position.

Now, what you want is that subrule-invocation goes directly from body to 
body, skipping the overhead of method invocation to the wrapper.  I say, 
when you look up the method for a subrule, check if it is a regular rule 
and if so call its body directly, and otherwise use the generic mechanism.

I'll get my lovely generic semantics with the direct body-body calling 
hidden away as an optimization details, and I get the ability to write 
rule-methods in perl code.

You still get your low-overhead body-body calls and therefore the speed 
you desire (hopefully).  Since you need to fetch the rule body anyway, 
there should be no extra overhead: where you'd normally throw an error 
(non-rule invoked as subrule) you'd switch to generic invocation instead.

Sounds like a good deal? :-)

--
Matthijs van Duin  --  May the Forth be with you!


Re: Rules and hypotheticals: continuations versus callbacks

2003-03-19 Thread Dan Sugalski
At 8:04 PM +0100 3/19/03, Matthijs van Duin wrote:
On Wed, Mar 19, 2003 at 12:35:19PM -0500, Dan Sugalski wrote:
I'll nudge Larry to add it explicitly, but in general redefinitons 
of code that you're in the middle of executing don't take effect 
immediately, and it's not really any different for regex rules than 
for subs.
Ah, but we're not redefining the sub that's running, but the subs 
it's about to call.  That works for subs, and Simon Cozens already 
pointed out we certainly also need it for rules :-)
Well, I'm not 100% sure we need it for rules. Simon's point is 
well-taken, but on further reflection what we're doing is subclassing 
the existing grammar and reinvoking the regex engine on that 
subclassed grammar, rather than redefining the grammar actually in 
use. The former doesn't require runtime redefinitions, the latter 
does, and I think we're going to use the former scheme.

Actually, we should be extraordinarily liberal with the application 
of restrictions at this phase. It's far easier to lift a 
restriction later than to impose it later,
This is perl 6, we can add a new restriction next week
We can't add them once we hit betas. I'd as soon add them now, rather 
than later.

and I very much want to stomp out any constructs that will force 
slow code execution. Yes, I may lose, but if I don't try...
You're absolutely right, and optimization is very important to me 
too.  But you can't *only* look at the speed of constructs, or we'll 
be coding in C or assembly :-)

We'll need to meet in the middle..
Well, not to be too cranky (I'm somewhat ill at the moment, so I'll 
apologize in advance) but... no. No, we don't actually have to, 
though if we could that'd be nice.

The issue of hypotheticals is complex.
Well, I'm a big boy, I'm sure I can handle it.  Are you even talking 
about semantics or implementation here?  Because I already gave my 
insights on semantics, and I have 'em in my head for implementation 
too but I should probably take those to perl6-internals instead.
Semantics. Until Larry's nailed down what he wants, there are issues 
of reestablishing hypotheticals on continuation reinvocation, 
flushing those hypotheticals multiple times, what happens to 
hypotheticals when you invoke a continuation with hypotheticals in 
effect, what happens to hypotheticals inside of coroutines when you 
establish them then yield out, and when hypotheticals are visible to 
other threads.

I read through your proposal (I'm assuming it's the one that started 
this thread) and it's not sufficient unless I missed something, which 
I may have.

Ultimately the question is How do you backtrack into arbitrary 
code, and how do we know that the arbitrary code can be backtracked 
into? My answer is we don't, but I'm not sure how popular that 
particular answer is.

I say, make generic semantics first, and then optimize the heck out of it.
That's fine. I disagree. :)
Now that Simon Cozens has established that sub-rules need to be 
looked up at runtime,
Well

Sounds like a good deal? :-)
At the moment, no. It seems like a potentially large amount of 
overhead for no particular purpose, really. I don't see any win in 
the regex case, and you're not generalizing it out to the point where 
there's a win there. (I can see where it would be useful in the 
general case, but we've come nowhere near touching that)
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Rules and hypotheticals: continuations versus callbacks

2003-03-19 Thread Matthijs van Duin
On Wed, Mar 19, 2003 at 02:31:58PM -0500, Dan Sugalski wrote:
Well, I'm not 100% sure we need it for rules. Simon's point is 
well-taken, but on further reflection what we're doing is subclassing 
the existing grammar and reinvoking the regex engine on that 
subclassed grammar, rather than redefining the grammar actually in 
use. The former doesn't require runtime redefinitions, the latter 
does, and I think we're going to use the former scheme.
That's not the impression I got from Simon

It would also be rather annoying.. think about balanced braces etc, take 
this rather contrieved, but valid example:

$x ~~ m X {
macro ... yada yada yada;
} X;
It seems to be that you're really inside a grammar rule when that macro 
is defined.  Otherwise you'd have to keep a lot of state outside the 
parser to keep track of such things, which is exactly what perl grammars 
were supposed to avoid I think.

We can't add them once we hit betas. I'd as soon add them now, rather 
than later.
Well, I'd rather not add it at all :-)


We'll need to meet in the middle..
Well, not to be too cranky (I'm somewhat ill at the moment, so I'll 
apologize in advance) but... no. No, we don't actually have to, 
though if we could that'd be nice.
OK, strictly speaking that's true, but I think we can


Semantics. Until Larry's nailed down what he wants, there are issues 
of reestablishing hypotheticals on continuation reinvocation, 
They should be though, if a variable was hypothesized when the continuation 
was taken, then it should be hypothesized when that continuation is invoked.

flushing those hypotheticals multiple times,
Not idea what you mean

what happens to hypotheticals when you invoke a continuation with 
hypotheticals in effect, 
Basically de-hypothesize all current hypotheticals, and re-hypothesize 
the ones that were hypothesized when the continuation was taken.  You can 
ofcourse optimize this by skipping the common ancestry, if you know 
what I mean

what happens to hypotheticals inside of coroutines when you 
establish them then yield out,
This follows directly from the implementation of coroutines: the first 
yield is a normal return, so if you hypothesize $x before that it'll stay 
hypothesized. if you then hypothesize $y outside the coroutine and call 
the coroutine again, $y will be de-hypothesized. If the coroutine then 
hypothesizes $z and yields out, $z will be de-hypothesized and $y
re-hypothesized.  $x will be unaffected by all this


and when hypotheticals are visible to other threads.
I haven't thought of that, but to be honest I'm not a big fan of preemptive 
threading anyway.  Cooperative threading using continuations is probably 
faster, has no synchronization issues.  And the behavior of hypotheticals 
follows naturally there (you can use 'let' or 'temp' to create thread-
local variables in that case)


I read through your proposal (I'm assuming it's the one that started 
this thread) and it's not sufficient unless I missed something, which 
I may have.
Also look at Sean O'Rourke's reply and my reply to that; it contains 
additional info.


Sounds like a good deal? :-)
At the moment, no. It seems like a potentially large amount of 
overhead for no particular purpose, really.
I have to admit I don't know the details of how your system works, but 
what I had in mind didn't have any extra overhead at all -- under the 
(apparently still debatable) assumption that you need to look up subrules 
at runtime anyway.

You do agree that if that is possible, is *is* a good deal?


I don't see any win in the regex case, and you're not generalizing it out 
to the point where there's a win there. (I can see where it would be 
useful in the general case, but we've come nowhere near touching that)
We have come near it.. backtracking is easy using continuations, and we can 
certainly have rules set the standard for the general case.

--
Matthijs van Duin  --  May the Forth be with you!


Re: Rules and hypotheticals: continuations versus callbacks

2003-03-19 Thread Dan Sugalski
At 9:14 PM +0100 3/19/03, Matthijs van Duin wrote:
On Wed, Mar 19, 2003 at 02:31:58PM -0500, Dan Sugalski wrote:
Well, I'm not 100% sure we need it for rules. Simon's point is 
well-taken, but on further reflection what we're doing is 
subclassing the existing grammar and reinvoking the regex engine on 
that subclassed grammar, rather than redefining the grammar 
actually in use. The former doesn't require runtime redefinitions, 
the latter does, and I think we're going to use the former scheme.
That's not the impression I got from Simon

It would also be rather annoying.. think about balanced braces etc, 
take this rather contrieved, but valid example:

$x ~~ m X {
macro ... yada yada yada;
} X;
It seems to be that you're really inside a grammar rule when that 
macro is defined.
Right. Macro definition ends, you subclass off the parser object, 
then immediately call into it, and it eats until the end of the 
regex, at which point it exits and so does the parent, for lack of 
input, and the resulting parse tree is turned to bytecode and 
executed.

  Otherwise you'd have to keep a lot of state outside the parser to 
keep track of such things, which is exactly what perl grammars were 
supposed to avoid I think.
You, as a user-level programmer, don't have to track the state. The 
parser code will, but that's not a big deal.

We'll need to meet in the middle..
Well, not to be too cranky (I'm somewhat ill at the moment, so I'll 
apologize in advance) but... no. No, we don't actually have to, 
though if we could that'd be nice.
OK, strictly speaking that's true, but I think we can

Semantics. Until Larry's nailed down what he wants, there are issues 
of reestablishing hypotheticals on continuation reinvocation,

They should be though, if a variable was hypothesized when the 
continuation was taken, then it should be hypothesized when that 
continuation is invoked.
Should they? Does hypotheticalization count as data modification (in 
which case it shouldn't) or control modification (in which case it 
should), and do you restore the hypothetical value at the time the 
continuation was taken or just re-hypotheticalize the variables? 
(Which makes continuations potentially more expensive as you need to 
then save off more info so on invocation you can restore the 
hypothetical state)

What about co-routines, then? And does a yield from a coroutine count 
as normal or abnormal exit for pushing of hypothetical state outward, 
or doesn't it count at all?

flushing those hypotheticals multiple times,
Not idea what you mean
I hypotheticalize the variables. I then take a continuation. Flow 
continues normally, exits off the end normally, hypothetical values 
get pushed out. I invoke the continuation, flow continues, exits 
normally. Do I push the values out again?

what happens to hypotheticals when you invoke a continuation with 
hypotheticals in effect,

Basically de-hypothesize all current hypotheticals,
How? Successfully or unsuccessfully? Does it even *count* as an exit 
at all if there's a pending continuation that could potentially exit 
the hypotheticalizing block later?

what happens to hypotheticals inside of coroutines when you 
establish them then yield out,
This follows directly from the implementation of coroutines: the 
first yield is a normal return, so if you hypothesize $x before that 
it'll stay hypothesized. if you then hypothesize $y outside the 
coroutine and call the coroutine again, $y will be de-hypothesized.
Why? That doesn't make much sense, really. If a variable is 
hypotheticalized outside the coroutine when I invoke it, the 
coroutine should see the hypothetical variable. But what about yields 
from within a couroutine that's hypotheticalized a variable? That's 
neither a normal nor an abnormal return, so what happens?

If the coroutine then hypothesizes $z and yields out, $z will be 
de-hypothesized and $y
re-hypothesized.  $x will be unaffected by all this
Yech. I don't think that's the right thing to do.

and when hypotheticals are visible to other threads.
I haven't thought of that, but to be honest I'm not a big fan of 
preemptive threading anyway.
Doesn't matter whether you like it or not, they're a fact that must 
be dealt with. (And scare up a dual or better processor machine and 
I'll blow the doors off a cooperative threading scheme, 
synchronization overhead or not)

I read through your proposal (I'm assuming it's the one that started this
Sounds like a good deal? :-)
At the moment, no. It seems like a potentially large amount of 
overhead for no particular purpose, really.
I have to admit I don't know the details of how your system works, 
but what I had in mind didn't have any extra overhead at all -- 
under the (apparently still debatable) assumption that you need to 
look up subrules at runtime anyway.

You do agree that if that is possible, is *is* a good deal?
No. Honestly I still don't see the *point*, certainly not in regards 
to regular expressions and 

Re: Rules and hypotheticals: continuations versus callbacks

2003-03-19 Thread Matthijs van Duin
On Wed, Mar 19, 2003 at 03:46:50PM -0500, Dan Sugalski wrote:
Right. Macro definition ends, you subclass off the parser object, 
then immediately call into it
...
You, as a user-level programmer, don't have to track the state. The 
parser code will, but that's not a big deal.
OK, I suppose that works although that still means you're moving the 
complexity from the perl implementation to its usage: in this case, the 
perl 6 parser which is written in perl 6 -- but I can well imagine other 
people want to do the same, and they'll have to do a similar hack.

I really don't like that, perl normally moves the complexity away from the 
programmer and into perl.


They should be though, if a variable was hypothesized when the 
continuation was taken, then it should be hypothesized when that 
continuation is invoked.
Should they? Does hypotheticalization count as data modification (in 
which case it shouldn't) or control modification (in which case it 
should),
Isn't that the whole point of hypotheses in perl 6?  You talk about 
successful and unsuccessful de-hypothesizing, and about abormals exits 
etc..  you seem to have a much complexer model of hypotheses than what's 
in my head.

I could be entirely missing things ofcourse, but I haven't seen any 
evidence to that yet.

To me, what 'let' does is temporize a variable except it doesn't get 
restored if you leave the scope, but only if you use a continuation to 
go back to a point where it wasn't hypothesized yet.

When the last continuation taken *before* the hypothesis is gone, so is 
the old version and thus the hypothesized variable becomes permanent.

The behavior regarding coroutines followed naturally from this, and so does 
the behavior inside regexen if they use the continuation semantics for 
backtracking -- which is what I'm suggesting.

This leave only behavior regarding preemptive threads, which is actually 
very easy to solve:  disallow hypothesizing shared variables -- it simply 
makes no sense to do that.  Now that I think of it, temporizing shared 
variables is equally bad news, so this isn't something new.


(Which makes continuations potentially more expensive as you need to 
then save off more info so on invocation you can restore the 
hypothetical state)
Actually, I think 'let' can handle this.. it's only invocation of 
continuations that will become more expensive because it needs to deal with 
the hypothesized variables

What about co-routines, then? And does a yield from a coroutine count 
as normal or abnormal exit for pushing of hypothetical state outward, 
or doesn't it count at all?
Your terminology gets rather foreign to me at this point.  Assuming a 
co-routine is implemented using continuations, their behavior follows 
directly from the description above, and I think the resulting behavior 
looks fine.  I don't see why people would hypothesize variables inside a 
co-routine anyway.


I hypotheticalize the variables. I then take a continuation. Flow 
continues normally, exits off the end normally, hypothetical values 
get pushed out. I invoke the continuation, flow continues, exits 
normally. Do I push the values out again?
If it ends normally, the variable isn't de-hypothesized at all.  Also, the 
continuation was created *after* you hypothesized the variable, so when you 
invoke it nothing will happen to the variable.


How? Successfully or unsuccessfully? Does it even *count* as an exit 
at all if there's a pending continuation that could potentially exit 
the hypotheticalizing block later?
You're making 0% sense to me, apparently because your mental model of 
hypothesizing differs radically from mine.

Why? That doesn't make much sense, really.
Probably the same problem in opposite direction :-)

(And scare up a dual or better processor machine and I'll blow the doors 
off a cooperative threading scheme, synchronization overhead or not)
Ofcourse, for CPU-intensive applications that spread their computation over 
multiple threads on a multi-processor machine, you'll certainly need 
preemptive multithreading.

When exactly is the last time you wrote such an application in perl? :-)

Seriously though, I think in the common case cooperative threading is likely 
to be superior.. it has low overhead, it should have faster context switch 
time, you have no synchronization issues, and you can mostly avoid the need 
for explicit yielding: in many applications threads will regularly block on 
something anyway (which will yield to another thread)

But anyway, this is getting off-topic.. I'll save it for later.  Regex first


No. Honestly I still don't see the *point*, certainly not in regards 
to regular expressions and rules. The hypothetical issues need 
dealing with in general for threads, coroutines, and continuations, 
but I don't see how any of this brings anything to rules for the 
parsing engine.

The flow control semantics the regex/parser needs to deal with are 
small and simple. I just don't see the point of trying to make it 

Re: Rules and hypotheticals: continuations versus callbacks

2003-03-19 Thread Simon Cozens
[EMAIL PROTECTED] (Matthijs Van Duin) writes:
 OK, I suppose that works although that still means you're moving the
 complexity from the perl implementation to its usage: in this case,
 the perl 6 parser which is written in perl 6

No, I don't believe that's what's happening. My concern is that at some
point, there *will* need to be a bootstrapped parser which is written in
some low level language, outputting Parrot bytecode, and it *will* need
to be able to reconfigure itself mid-match.

I think. I can't remember why I'm so convinced of this, and I'm too tired
to think it through with examples right now, and I might be wrong anyway,
but at least I can be ready with a solution if it proves necessary. :)

-- 
There is no safe investment. To love at all is to be vulnerable. ... 
The only place outside Heaven where you can be perfectly safe from all the
dangers and pertubations of love is Hell.
 -CS Lewis The Four Loves


Re: Rules and hypotheticals: continuations versus callbacks

2003-03-19 Thread Larry Wall
I would like to express my sincere gratitude to all of you for working
through these issues.  I bent my brain on the Perl 5 regex engine,
and that was just a simple recurse-on-success engine--and I'm not
the only person it drove mad.  I deeply appreciate that Perl 6's
regex engine may drive you even madder.  But such sacrifices are at
the heart of why people love Perl.  Thanks!

Larry


Re: Rules and hypotheticals: continuations versus callbacks

2003-03-19 Thread Sean O'Rourke
On Tue, 18 Mar 2003, Matthijs van Duin wrote:
 and maybe also:
  What is the current plan?

 although I got the impression earlier that there isn't any yet for invoking
 subrules :-)

See line 1014, languages/perl6/P6C/rule.pm.  The hack I used was to call
rules like ordinary subs, and have them push marks onto the regex stack
before they return.  I'm not sure if this can be made to work with
hypotheticals, and I'm sure it won't interact kindly with
continuation-taking, but there's _something_.

As for the interaction with continuations, I was about to post some of my
concerns when I received your long and well-thought-out mail.  I need to
think about the discussion so far a bit more, but briefly:

(1) There's more than one way to go when combining dynamically-scoped
variables with continuations: for example, do you use dynamic bindings
from where the continuation was taken, or from where it's invoked?  (see
e.g. Scheme's dynamic-wind).

(2) (internals) The functional-language people have found that full
continuations are slow, and put a lot of effort into avoiding them where
possible.  Backtracking languages like Icon and Prolog are implemented by
special mechanisms rather than general continuations, probably for this
reason.  So if we're forced to do a regex engine using full continuations,
it will probably be dog-slow

(3) On the other hand, we probably want people to intermix regex
backtracking, continuation-taking, and hypothetical/dynamic variables, and
have it do the right thing, where right means something like
mind-bendingly difficult to reason about, but consistent.  How do we
want these features to play with each other?

(4) (internals) Given that Parrot has so many different control mechanisms
(call/ret, exceptions, closures, continuations, ...), how do we maintain
consistency?  And how much of that is parrot's responsibility (versus the
perl6 compiler's)?

/s



Rules and hypotheticals: continuations versus callbacks

2003-03-18 Thread Matthijs van Duin
A quick note before I begin: the stuff in this email isn't just an 
implementation detail - it impacts the language too, that's why I'm 
posting it here.  Should I cross-post to perl6-internals ?  (I'm not 
really familiar with that list yet)

I've recently spent thought on backtracking into rules, and the exact 
nature of hypothetical variables etc.  The two systems for backtracking 
into a subrule that are described below are the best I can think of right 
now, but maybe I'm completely missing something and is something much 
simpler possible - in which case I'd love to hear about it ofcourse. :-)

My main questions are:

Is there a simpler system I'm overlooking?
Which of the two systems would you prefer if speed isn't the issue?
Which system is likely to run faster on parrot?
and maybe also:
What is the current plan?
although I got the impression earlier that there isn't any yet for invoking 
subrules :-)

Anyway, I will use the following grammar for examples:
   rule foo { a }
   rule bar { a+ }
   rule quux { ab }
   rule test { [ foo | bar ] quux }
 Mechanism 1 -- Continuations 

Continuations can be used to reset the state of the world to the 
previous backtracking point.

Various ways can be imagined to pass around the continuation.  I picked 
one that seems fairly clean to me and doesn't create any more 
continuations than strictly necessary.

One thing I'll need to explain is what 'let' means in this system: it 
makes sure the variable is restored if a continuation is invoked that was 
created before the variable was hypothesized.  This generalization means 
you can hypothesize any variable you can temporize.

OK, let's look at how the rule 'test' could be implemented using 
continuations.  Note that I'm not paying attention to optimization at this 
point.

method test (Continuation ?backtrack is rw) {
backtrack or mark(backtrack) or return;
$_ = .new;
if (mark(backtrack)) {
let $.{foo} = .foo(backtrack);
} elsif (mark(backtrack)) {
let $.{bar} = .bar(backtrack);
} else {
backtrack;
}
let $.{quux} = .quux(backtrack);
return $_;
}
where mark is a utility sub defined as something like:

sub mark (Continuation backtrack is rw) {
my cc = callcc { $_ }  or return 0;
let backtrack = cc.assuming(undef);
return 1;
}
Let's see how this would match on 'aaab':

0. A new state object is created  (Ignore the first line, it's for later)

1. The first mark hypothesizes backtrack.

2. $.{foo} is hypothesized.  foo matches 'a', and since it doesn't do any 
backtracking, it leaves backtrack alone.

3. $.{quux} is hypothesized.  quux fails, it calls backtrack:
 a. $.{quux} is de-hypothesized.
 b. $.{foo} is de-hypothesized.
 c. backtrack is de-hypothesized.
 d. inside the first mark undef is returned from callcc.  mark returns false.
4. The second mark hypothesizes backtrack.

5. $.{bar} is hypothesized.  bar matches 'aaa' and hypothesizes backtrack

6. $.{quux} is hypothesized.  quux fails, it calls backtrack:
 a. $.{quux} is de-hypothesized
 b. inside bar, it backtracks to match 'aa'.  bar returns again
7. $.{quux} is hypothesized.  quux matches, leaves backtrack alone.

Note that the backtrack remains hypothesized after test completes. Let's 
say test is followed by fail and see that happens:

8. fail calls backtrack
 a. $.{quux} is de-hypothesized
 b. inside bar, it backtracks to match 'a'.  bar returns again
9. $.{quux} is hypothesized.  quux fails, it calls backtrack:
 a. $.{quux} is de-hypothesized
 b. inside bar, backtrack is de-hypothesized. bar calls backtrack
 c. $.{bar} is de-hypothesized
 d. backtrack is de-hypothesized
 e. inside the second mark undef is returned from callcc.  mark returns false.
10. backtrack is called, causing backtracking into whatever was before test.

This only leaves the issue of how to deal with the top-level, where the 
continuation is omitted.  The magical first line will in that case create 
a continuation which will simply return from the rule if the match fails.  
If the top-level match succeeds then the backtrack variable disappears 
into thin air, and with it all backtracking information (the continuations 
and de-hypotheticalization info).

Note that the user can ofcourse choose to retain the backtracking info, 
and use it later to cause backtracking into the match after it has completed:

if Grammar.rule($string, my backtrack) {
   ...
   if i_dont_like_this_match {
   backtrack; # try another
   }
}
 Mechanism 2 -- Callbacks 

The second mechanism is a bit more mundane.  The idea is that every rule 
will get passed a closure that's called to match whatever comes next.  If 
the whatever comes next fails, the rule can backtrack and call the 
closure again.

This time 'let' is exactly the same as 'temp' except hypothetical 
variables are only allowed inside the dynamic scope of a subroutine with a 
special trait 

Re: Rules and hypotheticals: continuations versus callbacks

2003-03-18 Thread Luke Palmer
 My main questions are:
 
  Is there a simpler system I'm overlooking?
  Which of the two systems would you prefer if speed isn't the issue?

Mechanism 1.

  Which system is likely to run faster on parrot?

They're both likely to be very slow.

 and maybe also:
  What is the current plan?

 although I got the impression earlier that there isn't any yet for invoking 
 subrules :-)

Sure there is.  It's boling away in my brain and in my local parrot
copy.  (You haven't seen any commits because I'm overhauling it and it
doesn't... well... work yet).

My plan is to allow whatever we find is best, and swap them around
and benchmark them seperately.  But the two engines I have in mind
include one very similar to your two examples, and one more classical
approach.

Plan 1:  Pass each rule a Isuccess continuation (rather than a
backtrack one), and have it just return on failure.  The big
difference between this and your example is that Clets are now
implemented just like Ctemps.  Too bad Clet needs non-regex
behavior, too.

Plan 2:  Call subrules as plain old subs and have them throw a
backtrack exception on failure (or just return a failure-reporting
value... same difference, more or less).  This has the advantage that
Clet behaves consistently with the rest of Perl.  It has the
disadvantage that we have to manually implement backtracking through
individual rules.  It has the advantage that it's easier to optimize.

I looked around in Parrot a little, and it seems like continuations
are done pretty efficiently.  So, I can't really say which of these
would be faster, but I'd guess the latter.

I'll be writing them both, though, so we'll see :)

Luke