Re: Patterns

2007-01-07 Thread Larry Wall
On Sun, Jan 07, 2007 at 03:33:19AM -0800, Jonathan Lang wrote:
: IIRC, you don't even need .accepts for this.  You could just say
: 
:  given $pattern {
:when $a ~~ $_ { ... }
:when $b ~~ $_ { ... }
:when $c ~~ $_ { ... }
:  }
: 
: ...since an explicit smart-match is a boolean expression (it is, isn't
: it?), and boolean expressions implicitly smart-match according to
: their truth.  Right?

Very nearly, except that ~~ merely promises to return something
that can be used as a boolean, not something that is of Bool type.
And, in fact, for a normal regex you'd get a Match object, which as
a form of Capture would then try to compare itself back to $pattern,
which is not what you want.  So you'd have to force it:

given $pattern {
  when true $a ~~ $_ { ... }
  when true $b ~~ $_ { ... }
  when true $c ~~ $_ { ... }
}

Larry


Re: Patterns

2007-01-07 Thread Jonathan Lang

Larry Wall wrote:

Lots of interesting ideas.  But I don't think the reverse-test
situation will arise all that frequently.  How 'bout we let the user
just say:

my macro statement_control: { "when .accepts: " }

or some such...


IIRC, you don't even need .accepts for this.  You could just say

 given $pattern {
   when $a ~~ $_ { ... }
   when $b ~~ $_ { ... }
   when $c ~~ $_ { ... }
 }

...since an explicit smart-match is a boolean expression (it is, isn't
it?), and boolean expressions implicitly smart-match according to
their truth.  Right?

--
Jonathan "Dataweaver" Lang


Re: Patterns

2007-01-06 Thread Larry Wall
Lots of interesting ideas.  But I don't think the reverse-test
situation will arise all that frequently.  How 'bout we let the user
just say:

my macro statement_control: { "when .accepts: " }

or some such...

Larry


Re: Patterns

2007-01-06 Thread Ashley Winters

On 1/5/07, Larry Wall <[EMAIL PROTECTED]> wrote:

Anyway, that gives us:

given $pattern {
when .accepts(42) {...}
}


I think this type of usage should be encouraged with a bit more
huffmanization. My first thought would be to add C to invert
the arguments to ~~ versus C.

given @something {
   when $this { ... }# @something ~~ $this
   against $that { ... }# $that ~~ @something
}

That would help keep the ~~ DWIM table from trying to guess on which
side you really wanted @something on.

- Ashley Winters


Re: Patterns

2007-01-06 Thread Jonathan Lang

Larry Wall wrote:

Anyway, that gives us:

given $pattern {
when .accepts(42) {...}
}

which given typical usage patterns of switch statements is probably
adequately huffmanized, unless we want to go for something shorter
than accepts/rejects, like

acc/rej
pix/nix
ok/bogus


...at which point, legibility becomes an issue.  .accepts() and
.rejects() seem to work well enough on a case-by-case basis - although
it occurs to me that someone who intends to put the pattern first
within a given/when construct is probably going to want to do so for
every case, or very nearly so:

 given $pattern {
   when .accepts($a) { ... }
   when .accepts($b) { ... }
   when .accepts($c) { ... }
   when .accepts($d) { ... }
 }

It would be nice if there were some way to "factor out" the .accepts()
in the above code.  Maybe something like:

 given $pattern {
   when $a { ... } # if $a ~~ $pattern { ... ; next }
   when $b { ... } # if $b ~~ $pattern { ... ; next }
   when $c { ... } # if $c ~~ $pattern { ... ; next }
   when $d { ... } # if $d ~~ $pattern { ... ; next }
 } is backward;

Although I don't like the fact that you have to wait until reading the
whole block to find out that the cases are being processed inside-out.

--

Would it be possible to put a closure on the left and an argument list
on the right, with the effect being that the closure gets called using
the argument list?  This would let you do something like:

 given { ^x ~~ $pattern } {
   when $a { ... } # if $a ~~ $pattern { ... ; next }
   when $b { ... } # if $b ~~ $pattern { ... ; next }
   when $c { ... } # if $c ~~ $pattern { ... ; next }
   when $d { ... } # if $d ~~ $pattern { ... ; next }
 }

or

 given { ^x <= $number < ^y } {
   when 1, 2 { ... } $ if 1 <= $number < 2 { ... ; next }
   when 2, 3 { ... } $ if 2 <= $number < 3 { ... ; next }
 }

(Yes, I know that this second case could be handled more flexibly
using right-side Ranges.  TIMTOWTDI.)

The main problem that I have with this idea is its rigidity.  The
_only_ way that the topic can be used in the above is as a test for
the 'when' clauses; and it messes with the 'when *' idiom.  (But then,
any code that follows the final 'when' in a block is already the 'if
all else fails...' code; 'when *' is merely syntactic sugar to
highlight this fact.)

--

It would be nice to have something that is to 'given' as 'loop' is to
'for', letting you specify a topic and a separate case tester up
front:

 test $pattern, { $^x ~~ $_ } {
   when $a { ... } # if $a ~~ $pattern { ... ; next }
   when $b { ... } # if $b ~~ $pattern { ... ; next }
   when $c { ... } # if $c ~~ $pattern { ... ; next }
   when $d { ... } # if $d ~~ $pattern { ... ; next }
 }

Or, to avoid additional keywords, just say that a 'given' statement
with two arguments uses the first argument as its topic and the second
argument as the case tester to replace '{ $_ ~~ $^x }' in 'when'
clauses.  Better, use 'when' as is, but introduce a 'test' alternative
that is customized by the second parameter:

 given $pattern, { $^x ~~ $_ } {
   test $a { ... } # if $a ~~ $pattern { ... ; next }
   test $b { ... } # if $b ~~ $pattern { ... ; next }
   test $c { ... } # if $c ~~ $pattern { ... ; next }
   test $d { ... } # if $d ~~ $pattern { ... ; next }
   when * { ... } # if $pattern ~~ * { ... ; next }
 }

That way, 'given' and 'when' still behave exactly as currently
described; the case tester only comes into play when you explicitly
ask for it.

--

At this point, though, the case tester is looking like it might be
better as a property of the 'given' block:

 given $pattern {
   TEST { $^x ~~ $_ }
   test $a { ... } # if $a ~~ $pattern { ... ; next }
   test $b { ... } # if $b ~~ $pattern { ... ; next }
   test $c { ... } # if $c ~~ $pattern { ... ; next }
   test $d { ... } # if $d ~~ $pattern { ... ; next }
   when * { ... } # if $pattern ~~ * { ... ; next }
 }

Or go ahead and let TEST change the behavior of 'when' within the
block, and rely on nested blocks to allow different kinds of tests
within a 'given' block:

 given $pattern
 {
   {
 TEST -> $x { $x ~~ $_ }
 when $a { ... } # if $a ~~ $pattern { ... ; next }
 when $b { ... } # if $b ~~ $pattern { ... ; next }
 when $c { ... } # if $c ~~ $pattern { ... ; next }
 when $d { ... } # if $d ~~ $pattern { ... ; next }
   }
   when * { ... } # if $pattern ~~ * { ... ; next }
 }

...and someone silly enough to want to frequently change the case
tester within a given block is advised to use 'if' instead.

What do you think?

--
Jonathan "Dataweaver" Lang


Re: Patterns

2007-01-05 Thread Larry Wall
At the moment I'm leaning toward:

$pattern.accepts($thing)
$pattern.rejects($thing)

and going the other way

$thing ~~ $pattern
$thing.match($pattern)
$thing.subst($pattern)

and most generally, something like:

$thing.does($pattern)
$thing.when($pattern)

By the way, the Pattern type is already called Selector in the
synopses.  Or perhaps Selector is a bit more general.  Arguably Pattern
is the role that defines .accepts and .rejects,  but Selector is
really the Pattern|Junction type you use in a signature.  But we can't
just have Pattern does Junction or everything turns into a junction.  :/

Or maybe they should be the other way around, where Selector is the role
and Pattern is the sig type.  Or maybe the sig type should be Where so
we can say

multi grep (Where $where, [EMAIL PROTECTED]) {...}

and call it with :where{...} or where => {...}.  Selector is kind of
a stupid name in any event.

Anyway, that gives us:

given $pattern {
when .accepts(42) {...}
}

which given typical usage patterns of switch statements is probably
adequately huffmanized, unless we want to go for something shorter
than accepts/rejects, like

acc/rej
pix/nix
ok/bogus

Of course, all of the .rejects() variants could simply be spelled
.accepts().not, but that seem unnecessarily cruel.

Larry


Re: Patterns

2007-01-05 Thread Larry Wall
On Fri, Jan 05, 2007 at 08:47:18PM +, Luke Palmer wrote:
: I propose that we remove the following two lines from the smart match
: table in S03:
: 
:HashAny   hash entry existence exists $_{$x}
:Array   Any   array contains item* any($_) === $x
: 
: These are the two lines with Any on the right side.  I want to remove
: these so that we can have an extensible ~~ operator.
: 
: You can think of $x ~~ $y as saying "$x matches $y", i.e. $y is some
: data pattern and we are asking if $x conforms to it.  I want to be
: able to add my own kinds of patterns, and entries with Any in the
: right side undermine this.

It's much worse than just those two entries, because any entry not marked
with * is currently considered reversible.  However, I think you're
the path toward sanity.

: Why?  Let's say I created a pattern  which implemented another textual
: matcher alongside Perl's regexes, which was more convenient for some
: things (I shouldn't have to convince you that this is possible and
: useful).  Let's call it Irregex.  I want it to work just like Perl's
: regexes; i.e. I want it so that if you ask whether an array matches
: it, it will match against the elements of the array as though they
: were characters.  So I just overload ~~ to work on (Array, Irregex)
: and that overrules (Array, Any) so there is no MMD ambiguity.
: 
: However!  Let's say I'm using a payroll library (or something, sorry
: for the mundane examples; I'm trying to keep it realistic) which sends
: a notification if any user's status matches a pattern, and it is
: implemented like so:
: 
:method notify(Any|Junction $pattern) {
:if @.users>>. ~~ $pattern {
:# send notification
:}
:}
: 
: This is perfectly fine, elegant code given the ~~ table.  But it
: breaks since we have added our new pattern type!  It doesn't check if
: any user's status matches, it checks if the users' statuses'
: concatenated considered as atoms matches, totally wrong!

I'm not so worried about extensibility as I am about consistency between
compile time and run time, such that the optimizer can optimize without
surprising the user.  But this also argues against any anonymous Any on
the right.  If all the when expressions are Num, the optimizer wants to
be able to evaluate the given in a numeric context whether or not you
put + on the front of it.

: So, I'm essentially asking for ~~ to be singly-dispatched based on its
: right argument, which gets to pick how to interpret the left one.  In
: order to encourage this usage, I propose a design like so:
: 
:role Pattern {
:method match($item) {...}
:}
:# Code, Bool, Undef, Whatever, Num, Junction, Str, Hash, ... ...
: all do Pattern
: 
:sub *infix:<~~> ($item, Pattern $pat) {
:$pat.match($item);
:}

One little problem here is that .match is currently defined the reverse
of that for the method form of m//.

: So to add a new pattern, you just implement the Pattern role, and you are 
: safe.
: 
: If you are very attached do those two rows of the smart match table,
: we can reimplement them as Pattern objects, at the expense of some
: unreadability:
: 
:given @array {
:when 42 { ... }
:}
: 
: Becomes:
: 
:given @array {
:when contains 42 { ... }
:}
: 
: etc.

Rather than special casing array matching, I'd say we want a general form
for explicitly reversing the figure/ground.  If .match demands a pattern on
the left, then

given @array {
when .match(42) {...}
}

will already reverse the test to mean 42 ~~ @array.  As I say, we need to
figure out what to do with "foo".match(/bar/) though.

Larry


Re: Patterns and junctions

2003-04-04 Thread Paul

--- Luke Palmer <[EMAIL PROTECTED]> wrote:
[extremely large *SNIP*]
> Maybe the "|"/"||" distinction isn't needed, and we just need a
> declarator on rules that says they are side-effect-free, and can thus
> be optimized.
[snip]
> I like this solution better than making a new operator.  In Perl
> tradition, it puts the optimization burdon on the compiler, not the
> programmer.

Agreed. Minimize the need for new grammar in the language, including
operators, but provide a way that code can DWIM the same result. 
"If you build it, they will come".

__
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://tax.yahoo.com


Re: Patterns and junctions

2003-04-04 Thread Luke Palmer
> --- "Adam D. Lopresto" <[EMAIL PROTECTED]> wrote:
> > I propose that since the empty pattern is no longer legal (and
> > about time), we use "|" in patterns to indicate alternation without
> > preference, and "||" to indicate "try the first, then the second,
> > etc".
> 
> Hmm 
> A neat idea, but can you elaborate on when it would matter?

It would matter a lot for optimization.  If each alternative is a
subrule, each of which calls a lot of other subrules, then you could
gain a lot by not needing an ordered search through the possibilities.

I like Adam's symbols, with the junction/short-circuit
parallels.  Literally. :)

I'm currently working on Parse::OutsideIn, an experiment for larger
(non-peephole) optimizations on P6 regexen.  The idea is that it does
a recursive-descent match until it finds a top-down-independent
section of the grammar (very common), and then uses a compiled LR
bottom-up method on that section.

And of course, the "|"/"||" distinction could make that easier.

> > So
> >   "cat dog fox" ~~ /( fox | dog | cat )/;
> > might match any of the three, and in fact is quite likely to match
> > cat (since scanning the string could be a very long process), but
> >   "cat dog fox" ~~ /( fox || dog || cat )/;
> > would be guaranteed to first try fox, then if that fails backtrack
> > and try dog, and if that fails backtract to try cat.  The choice of
> > symbols parallels the junction and short-circuiting or operators,
> > and in most cases when an alternation is specified, the programmer
> > has no preference among the alternatives (indeed, in most cases only
> > would could match anyway), so it seems silly to force the engine to
> > prefer the first one.
> 
> I thought the requirement to prefer the first was largely an
> optimization, so that it could bail when it had a match without
> continuing to search. ("Wouldn't you know, it was in the *last* place I
> looked for it!")
> 
> Are you proposing a more DFA-ish behavior, maybe?
> Some programs actually embad a DFA for a quick "does it match?" as
> *well* as an NFA for pulling out backreferences once the DFA finds a
> hit, but I doubt anybody is planning to embed a pre-matching DFA in
> Perl (if so, somebody please tell me now).

I am, in some guise.  Here's another, related optimization I'm
considering.  I've noticed that Parse::RecDescent (through $RD_TRACE)
spends a lot of time trying trees that would _never_ match... none of
their subrules/grandsubrules could possibly.

So, at grammar compile-time, I compute the closure (not the Perl kind)
of each rule's initial state, to find all the terminals that could
come up first.  Then, at a high level, I try to match one of the
terminals.  If it did match, I cache the result and progress down the
tree (perhaps guided by which one matched).  If it didn't match, I
fail the rule and don't even bother.

For instance:

statement: expression | declaration
declaration: sub_declaration
# ... lots of rules rooted at declaration
sub_decl_keyword: /sub/
method_decl_keyword: /method/
# ... rules rooted at expression

So at compile time, it would find the closure of C, which
would turn out to be /sub/|/method/.  It would match one of those,
before trying C and if it didn't, it would move on to
matching C.

My theory is that, if P6 is going to be used for grammars, that
little, local optimizations aren't going to help as much as these
kinds.

> > I'm imagining that in the first example, the implementation would
> > probably build an FSA and process each letter as it comes, while the
> > second would rely on backtracking.
> 
> Sounds like DFA/NFA to me

Or an FSA.  You can't do grammars in a DFA.

> > What think you all?

I rather like the idea.  Finally, I'll talk about the language
implications of the idea

So you're proposing that "|" be a "match order not guaranteed" match.
There are some constructs that that doesn't like very much:

m:w/ if  then  else 
   | if  then /

One would argue that this would be a case where you would use "||".
The times where match order matters in a recursive-descent match are
the same as the times of a shift/reduce conflict in shift/reduce
parsers.

So what I'm looking for is some kind of guarantee that "|" could
provide, instead of being entirely random.  Junctions don't guarantee
that they evaluate all their junctends, or that they do it in any
particular order.  But they do guarantee that the result would be the
same (ignoring side-effects) as if they had.

Maybe that's where our guarantee lies---failure side-effects.

grammar X {
rule A { B || C }
rule B { (\w+) {
if $1 == $naughty_word {
print "No bad words!";
fail;
}
}}
rule C {...}
}
grammar Y is X {
rule A { B | C }
}

The difference between these two grammars is that X would be required
to print "No bad words!" if C was the one that finally matched, while
Y would no

Re: Patterns and junctions

2003-04-04 Thread Paul

--- "Adam D. Lopresto" <[EMAIL PROTECTED]> wrote:
> I propose that since the empty pattern is no longer legal (and
> about time), we use "|" in patterns to indicate alternation without
> preference, and "||" to indicate "try the first, then the second,
> etc".

Hmm 
A neat idea, but can you elaborate on when it would matter?

> So
>   "cat dog fox" ~~ /( fox | dog | cat )/;
> might match any of the three, and in fact is quite likely to match
> cat (since scanning the string could be a very long process), but
>   "cat dog fox" ~~ /( fox || dog || cat )/;
> would be guaranteed to first try fox, then if that fails backtrack
> and try dog, and if that fails backtract to try cat.  The choice of
> symbols parallels the junction and short-circuiting or operators,
> and in most cases when an alternation is specified, the programmer
> has no preference among the alternatives (indeed, in most cases only
> would could match anyway), so it seems silly to force the engine to
> prefer the first one.

I thought the requirement to prefer the first was largely an
optimization, so that it could bail when it had a match without
continuing to search. ("Wouldn't you know, it was in the *last* place I
looked for it!")

Are you proposing a more DFA-ish behavior, maybe?
Some programs actually embad a DFA for a quick "does it match?" as
*well* as an NFA for pulling out backreferences once the DFA finds a
hit, but I doubt anybody is planning to embed a pre-matching DFA in
Perl (if so, somebody please tell me now).

> I'm imagining that in the first example, the implementation would
> probably build an FSA and process each letter as it comes, while the
> second would rely on backtracking.

Sounds like DFA/NFA to me
 
> What think you all?

Dunno. Would swell the footprint some, but might have some merit.
DFA's are generally faster, I believe, but P5's NFA is optimized to a
beautiful degree. Might be possible just by tuning the engine some
more, but then the codebase is still swelling

Dan? Parrot's pure NFA, right?

__
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://tax.yahoo.com