Re: Patterns
On Sun, Jan 07, 2007 at 03:33:19AM -0800, Jonathan Lang wrote: : IIRC, you don't even need .accepts for this. You could just say : : given $pattern { :when $a ~~ $_ { ... } :when $b ~~ $_ { ... } :when $c ~~ $_ { ... } : } : : ...since an explicit smart-match is a boolean expression (it is, isn't : it?), and boolean expressions implicitly smart-match according to : their truth. Right? Very nearly, except that ~~ merely promises to return something that can be used as a boolean, not something that is of Bool type. And, in fact, for a normal regex you'd get a Match object, which as a form of Capture would then try to compare itself back to $pattern, which is not what you want. So you'd have to force it: given $pattern { when true $a ~~ $_ { ... } when true $b ~~ $_ { ... } when true $c ~~ $_ { ... } } Larry
Re: Patterns
Larry Wall wrote: Lots of interesting ideas. But I don't think the reverse-test situation will arise all that frequently. How 'bout we let the user just say: my macro statement_control: { "when .accepts: " } or some such... IIRC, you don't even need .accepts for this. You could just say given $pattern { when $a ~~ $_ { ... } when $b ~~ $_ { ... } when $c ~~ $_ { ... } } ...since an explicit smart-match is a boolean expression (it is, isn't it?), and boolean expressions implicitly smart-match according to their truth. Right? -- Jonathan "Dataweaver" Lang
Re: Patterns
Lots of interesting ideas. But I don't think the reverse-test situation will arise all that frequently. How 'bout we let the user just say: my macro statement_control: { "when .accepts: " } or some such... Larry
Re: Patterns
On 1/5/07, Larry Wall <[EMAIL PROTECTED]> wrote: Anyway, that gives us: given $pattern { when .accepts(42) {...} } I think this type of usage should be encouraged with a bit more huffmanization. My first thought would be to add C to invert the arguments to ~~ versus C. given @something { when $this { ... }# @something ~~ $this against $that { ... }# $that ~~ @something } That would help keep the ~~ DWIM table from trying to guess on which side you really wanted @something on. - Ashley Winters
Re: Patterns
Larry Wall wrote: Anyway, that gives us: given $pattern { when .accepts(42) {...} } which given typical usage patterns of switch statements is probably adequately huffmanized, unless we want to go for something shorter than accepts/rejects, like acc/rej pix/nix ok/bogus ...at which point, legibility becomes an issue. .accepts() and .rejects() seem to work well enough on a case-by-case basis - although it occurs to me that someone who intends to put the pattern first within a given/when construct is probably going to want to do so for every case, or very nearly so: given $pattern { when .accepts($a) { ... } when .accepts($b) { ... } when .accepts($c) { ... } when .accepts($d) { ... } } It would be nice if there were some way to "factor out" the .accepts() in the above code. Maybe something like: given $pattern { when $a { ... } # if $a ~~ $pattern { ... ; next } when $b { ... } # if $b ~~ $pattern { ... ; next } when $c { ... } # if $c ~~ $pattern { ... ; next } when $d { ... } # if $d ~~ $pattern { ... ; next } } is backward; Although I don't like the fact that you have to wait until reading the whole block to find out that the cases are being processed inside-out. -- Would it be possible to put a closure on the left and an argument list on the right, with the effect being that the closure gets called using the argument list? This would let you do something like: given { ^x ~~ $pattern } { when $a { ... } # if $a ~~ $pattern { ... ; next } when $b { ... } # if $b ~~ $pattern { ... ; next } when $c { ... } # if $c ~~ $pattern { ... ; next } when $d { ... } # if $d ~~ $pattern { ... ; next } } or given { ^x <= $number < ^y } { when 1, 2 { ... } $ if 1 <= $number < 2 { ... ; next } when 2, 3 { ... } $ if 2 <= $number < 3 { ... ; next } } (Yes, I know that this second case could be handled more flexibly using right-side Ranges. TIMTOWTDI.) The main problem that I have with this idea is its rigidity. The _only_ way that the topic can be used in the above is as a test for the 'when' clauses; and it messes with the 'when *' idiom. (But then, any code that follows the final 'when' in a block is already the 'if all else fails...' code; 'when *' is merely syntactic sugar to highlight this fact.) -- It would be nice to have something that is to 'given' as 'loop' is to 'for', letting you specify a topic and a separate case tester up front: test $pattern, { $^x ~~ $_ } { when $a { ... } # if $a ~~ $pattern { ... ; next } when $b { ... } # if $b ~~ $pattern { ... ; next } when $c { ... } # if $c ~~ $pattern { ... ; next } when $d { ... } # if $d ~~ $pattern { ... ; next } } Or, to avoid additional keywords, just say that a 'given' statement with two arguments uses the first argument as its topic and the second argument as the case tester to replace '{ $_ ~~ $^x }' in 'when' clauses. Better, use 'when' as is, but introduce a 'test' alternative that is customized by the second parameter: given $pattern, { $^x ~~ $_ } { test $a { ... } # if $a ~~ $pattern { ... ; next } test $b { ... } # if $b ~~ $pattern { ... ; next } test $c { ... } # if $c ~~ $pattern { ... ; next } test $d { ... } # if $d ~~ $pattern { ... ; next } when * { ... } # if $pattern ~~ * { ... ; next } } That way, 'given' and 'when' still behave exactly as currently described; the case tester only comes into play when you explicitly ask for it. -- At this point, though, the case tester is looking like it might be better as a property of the 'given' block: given $pattern { TEST { $^x ~~ $_ } test $a { ... } # if $a ~~ $pattern { ... ; next } test $b { ... } # if $b ~~ $pattern { ... ; next } test $c { ... } # if $c ~~ $pattern { ... ; next } test $d { ... } # if $d ~~ $pattern { ... ; next } when * { ... } # if $pattern ~~ * { ... ; next } } Or go ahead and let TEST change the behavior of 'when' within the block, and rely on nested blocks to allow different kinds of tests within a 'given' block: given $pattern { { TEST -> $x { $x ~~ $_ } when $a { ... } # if $a ~~ $pattern { ... ; next } when $b { ... } # if $b ~~ $pattern { ... ; next } when $c { ... } # if $c ~~ $pattern { ... ; next } when $d { ... } # if $d ~~ $pattern { ... ; next } } when * { ... } # if $pattern ~~ * { ... ; next } } ...and someone silly enough to want to frequently change the case tester within a given block is advised to use 'if' instead. What do you think? -- Jonathan "Dataweaver" Lang
Re: Patterns
At the moment I'm leaning toward: $pattern.accepts($thing) $pattern.rejects($thing) and going the other way $thing ~~ $pattern $thing.match($pattern) $thing.subst($pattern) and most generally, something like: $thing.does($pattern) $thing.when($pattern) By the way, the Pattern type is already called Selector in the synopses. Or perhaps Selector is a bit more general. Arguably Pattern is the role that defines .accepts and .rejects, but Selector is really the Pattern|Junction type you use in a signature. But we can't just have Pattern does Junction or everything turns into a junction. :/ Or maybe they should be the other way around, where Selector is the role and Pattern is the sig type. Or maybe the sig type should be Where so we can say multi grep (Where $where, [EMAIL PROTECTED]) {...} and call it with :where{...} or where => {...}. Selector is kind of a stupid name in any event. Anyway, that gives us: given $pattern { when .accepts(42) {...} } which given typical usage patterns of switch statements is probably adequately huffmanized, unless we want to go for something shorter than accepts/rejects, like acc/rej pix/nix ok/bogus Of course, all of the .rejects() variants could simply be spelled .accepts().not, but that seem unnecessarily cruel. Larry
Re: Patterns
On Fri, Jan 05, 2007 at 08:47:18PM +, Luke Palmer wrote: : I propose that we remove the following two lines from the smart match : table in S03: : :HashAny hash entry existence exists $_{$x} :Array Any array contains item* any($_) === $x : : These are the two lines with Any on the right side. I want to remove : these so that we can have an extensible ~~ operator. : : You can think of $x ~~ $y as saying "$x matches $y", i.e. $y is some : data pattern and we are asking if $x conforms to it. I want to be : able to add my own kinds of patterns, and entries with Any in the : right side undermine this. It's much worse than just those two entries, because any entry not marked with * is currently considered reversible. However, I think you're the path toward sanity. : Why? Let's say I created a pattern which implemented another textual : matcher alongside Perl's regexes, which was more convenient for some : things (I shouldn't have to convince you that this is possible and : useful). Let's call it Irregex. I want it to work just like Perl's : regexes; i.e. I want it so that if you ask whether an array matches : it, it will match against the elements of the array as though they : were characters. So I just overload ~~ to work on (Array, Irregex) : and that overrules (Array, Any) so there is no MMD ambiguity. : : However! Let's say I'm using a payroll library (or something, sorry : for the mundane examples; I'm trying to keep it realistic) which sends : a notification if any user's status matches a pattern, and it is : implemented like so: : :method notify(Any|Junction $pattern) { :if @.users>>. ~~ $pattern { :# send notification :} :} : : This is perfectly fine, elegant code given the ~~ table. But it : breaks since we have added our new pattern type! It doesn't check if : any user's status matches, it checks if the users' statuses' : concatenated considered as atoms matches, totally wrong! I'm not so worried about extensibility as I am about consistency between compile time and run time, such that the optimizer can optimize without surprising the user. But this also argues against any anonymous Any on the right. If all the when expressions are Num, the optimizer wants to be able to evaluate the given in a numeric context whether or not you put + on the front of it. : So, I'm essentially asking for ~~ to be singly-dispatched based on its : right argument, which gets to pick how to interpret the left one. In : order to encourage this usage, I propose a design like so: : :role Pattern { :method match($item) {...} :} :# Code, Bool, Undef, Whatever, Num, Junction, Str, Hash, ... ... : all do Pattern : :sub *infix:<~~> ($item, Pattern $pat) { :$pat.match($item); :} One little problem here is that .match is currently defined the reverse of that for the method form of m//. : So to add a new pattern, you just implement the Pattern role, and you are : safe. : : If you are very attached do those two rows of the smart match table, : we can reimplement them as Pattern objects, at the expense of some : unreadability: : :given @array { :when 42 { ... } :} : : Becomes: : :given @array { :when contains 42 { ... } :} : : etc. Rather than special casing array matching, I'd say we want a general form for explicitly reversing the figure/ground. If .match demands a pattern on the left, then given @array { when .match(42) {...} } will already reverse the test to mean 42 ~~ @array. As I say, we need to figure out what to do with "foo".match(/bar/) though. Larry
Re: Patterns and junctions
--- Luke Palmer <[EMAIL PROTECTED]> wrote: [extremely large *SNIP*] > Maybe the "|"/"||" distinction isn't needed, and we just need a > declarator on rules that says they are side-effect-free, and can thus > be optimized. [snip] > I like this solution better than making a new operator. In Perl > tradition, it puts the optimization burdon on the compiler, not the > programmer. Agreed. Minimize the need for new grammar in the language, including operators, but provide a way that code can DWIM the same result. "If you build it, they will come". __ Do you Yahoo!? Yahoo! Tax Center - File online, calculators, forms, and more http://tax.yahoo.com
Re: Patterns and junctions
> --- "Adam D. Lopresto" <[EMAIL PROTECTED]> wrote: > > I propose that since the empty pattern is no longer legal (and > > about time), we use "|" in patterns to indicate alternation without > > preference, and "||" to indicate "try the first, then the second, > > etc". > > Hmm > A neat idea, but can you elaborate on when it would matter? It would matter a lot for optimization. If each alternative is a subrule, each of which calls a lot of other subrules, then you could gain a lot by not needing an ordered search through the possibilities. I like Adam's symbols, with the junction/short-circuit parallels. Literally. :) I'm currently working on Parse::OutsideIn, an experiment for larger (non-peephole) optimizations on P6 regexen. The idea is that it does a recursive-descent match until it finds a top-down-independent section of the grammar (very common), and then uses a compiled LR bottom-up method on that section. And of course, the "|"/"||" distinction could make that easier. > > So > > "cat dog fox" ~~ /( fox | dog | cat )/; > > might match any of the three, and in fact is quite likely to match > > cat (since scanning the string could be a very long process), but > > "cat dog fox" ~~ /( fox || dog || cat )/; > > would be guaranteed to first try fox, then if that fails backtrack > > and try dog, and if that fails backtract to try cat. The choice of > > symbols parallels the junction and short-circuiting or operators, > > and in most cases when an alternation is specified, the programmer > > has no preference among the alternatives (indeed, in most cases only > > would could match anyway), so it seems silly to force the engine to > > prefer the first one. > > I thought the requirement to prefer the first was largely an > optimization, so that it could bail when it had a match without > continuing to search. ("Wouldn't you know, it was in the *last* place I > looked for it!") > > Are you proposing a more DFA-ish behavior, maybe? > Some programs actually embad a DFA for a quick "does it match?" as > *well* as an NFA for pulling out backreferences once the DFA finds a > hit, but I doubt anybody is planning to embed a pre-matching DFA in > Perl (if so, somebody please tell me now). I am, in some guise. Here's another, related optimization I'm considering. I've noticed that Parse::RecDescent (through $RD_TRACE) spends a lot of time trying trees that would _never_ match... none of their subrules/grandsubrules could possibly. So, at grammar compile-time, I compute the closure (not the Perl kind) of each rule's initial state, to find all the terminals that could come up first. Then, at a high level, I try to match one of the terminals. If it did match, I cache the result and progress down the tree (perhaps guided by which one matched). If it didn't match, I fail the rule and don't even bother. For instance: statement: expression | declaration declaration: sub_declaration # ... lots of rules rooted at declaration sub_decl_keyword: /sub/ method_decl_keyword: /method/ # ... rules rooted at expression So at compile time, it would find the closure of C, which would turn out to be /sub/|/method/. It would match one of those, before trying C and if it didn't, it would move on to matching C. My theory is that, if P6 is going to be used for grammars, that little, local optimizations aren't going to help as much as these kinds. > > I'm imagining that in the first example, the implementation would > > probably build an FSA and process each letter as it comes, while the > > second would rely on backtracking. > > Sounds like DFA/NFA to me Or an FSA. You can't do grammars in a DFA. > > What think you all? I rather like the idea. Finally, I'll talk about the language implications of the idea So you're proposing that "|" be a "match order not guaranteed" match. There are some constructs that that doesn't like very much: m:w/ if then else | if then / One would argue that this would be a case where you would use "||". The times where match order matters in a recursive-descent match are the same as the times of a shift/reduce conflict in shift/reduce parsers. So what I'm looking for is some kind of guarantee that "|" could provide, instead of being entirely random. Junctions don't guarantee that they evaluate all their junctends, or that they do it in any particular order. But they do guarantee that the result would be the same (ignoring side-effects) as if they had. Maybe that's where our guarantee lies---failure side-effects. grammar X { rule A { B || C } rule B { (\w+) { if $1 == $naughty_word { print "No bad words!"; fail; } }} rule C {...} } grammar Y is X { rule A { B | C } } The difference between these two grammars is that X would be required to print "No bad words!" if C was the one that finally matched, while Y would no
Re: Patterns and junctions
--- "Adam D. Lopresto" <[EMAIL PROTECTED]> wrote: > I propose that since the empty pattern is no longer legal (and > about time), we use "|" in patterns to indicate alternation without > preference, and "||" to indicate "try the first, then the second, > etc". Hmm A neat idea, but can you elaborate on when it would matter? > So > "cat dog fox" ~~ /( fox | dog | cat )/; > might match any of the three, and in fact is quite likely to match > cat (since scanning the string could be a very long process), but > "cat dog fox" ~~ /( fox || dog || cat )/; > would be guaranteed to first try fox, then if that fails backtrack > and try dog, and if that fails backtract to try cat. The choice of > symbols parallels the junction and short-circuiting or operators, > and in most cases when an alternation is specified, the programmer > has no preference among the alternatives (indeed, in most cases only > would could match anyway), so it seems silly to force the engine to > prefer the first one. I thought the requirement to prefer the first was largely an optimization, so that it could bail when it had a match without continuing to search. ("Wouldn't you know, it was in the *last* place I looked for it!") Are you proposing a more DFA-ish behavior, maybe? Some programs actually embad a DFA for a quick "does it match?" as *well* as an NFA for pulling out backreferences once the DFA finds a hit, but I doubt anybody is planning to embed a pre-matching DFA in Perl (if so, somebody please tell me now). > I'm imagining that in the first example, the implementation would > probably build an FSA and process each letter as it comes, while the > second would rely on backtracking. Sounds like DFA/NFA to me > What think you all? Dunno. Would swell the footprint some, but might have some merit. DFA's are generally faster, I believe, but P5's NFA is optimized to a beautiful degree. Might be possible just by tuning the engine some more, but then the codebase is still swelling Dan? Parrot's pure NFA, right? __ Do you Yahoo!? Yahoo! Tax Center - File online, calculators, forms, and more http://tax.yahoo.com