Re: Hypothetical synonyms
Aaron Sherman wrote at Wed, 28 Aug 2002 00:34:15 +0200: $stuff = (defined($1)?$1:$2) if /^\s*(?:(.*?)|(\S+))/; It gives me the idea of a missing feature: What really should be expressed is: my ($stuff) = /^\s*(°.*?°|\S+)/; where the ° character would mean, Don't capture the previous element. I think that such a meaning of uncapturing elements from a regexp would be really nice, as it would help to express things directly, instead of going complicated ways. The ° character doesn't have any special meaning, that's why I choosed it in the above example. However, it also symbolizes a little capturing and as it isn't filled, it could really symbolize an uncapturing. I don't know how hard it would be to implement or whether it had already discussed yet. Greetings, Janek
Capturing alternations (was Re: Hypothetical synonyms)
In a message dated Thu, 29 Aug 2002, Janek Schleicher writes: Aaron Sherman wrote at Wed, 28 Aug 2002 00:34:15 +0200: $stuff = (defined($1)?$1:$2) if /^\s*(?:(.*?)|(\S+))/; It gives me the idea of a missing feature: What really should be expressed is: my ($stuff) = /^\s*(°.*?°|\S+)/; where the ° character would mean, Don't capture the previous element. Hmm. One thing that has always bothered me about regexes is capturing parentheses in alternations. It seems to me that: my ($stuff) = /^\s* [ (.*?) | (\S+) ]/; should DWIM somehow, since it's impossible that both parens will capture. So when the same number of capturing parens appear in each of an alternation, they should factor out to being a single return value. Is this possible in the general case? Trey
Re: Capturing alternations (was Re: Hypothetical synonyms)
Piers wrote: Not exactly DWIM, but how about: my $stuff = /^\s* [ (.*?) | (\S+) ] : { $foo := $+ }/; Assuming $+ means 'the last capture group matched' as it does now. Or just: my $stuff = /^\s* [ $foo:=(.*?) | $foo:=(\S+) ]/; BTW, that doesn't actually *do* the match. It merely puts a reference to a rule object into $stuff. Perhaps we all actually meant variants on: my $stuff = m/^\s* [ $0:=(.*?) | $0:=(\S+) ]/; ??? Damian
Re: Does ::: constrain the pattern engine implementation?
On Wed, Aug 28, 2002 at 10:36:54AM -0700, Larry Wall wrote: That is a worthy consideration, but expressiveness takes precedence over it in this case. DFAs are really only good for telling you *whether* and *where* a pattern matches as a whole. They are relatively useless for telling you *how* a pattern matches. Actually, DFAs can also tell you *how* a pattern matches matches. For instance the RE library (http://www.sourceforge.net/projects/libre/) is DFA-based and support a lot of Perl 5 regular expression features: - submatches - left-most matching semantics - greedy and non-greedy operators (*, *?, ?, ??) - zero-length assertions such as ^, $, \b. I don't know how to implement back-references and embedded Perl code using a DFA. Look-ahead and look-behind assertion can probably be implemented, though this looks tricky. But, these features are not used that often. So, I believe than most real-life Perl 5 regular expressions can be handled by a DFA. Add to that the fact that most real-life patterns don't generally do much backtracking, because they're written to succeed, not to fail. This pattern never backtracks, for instance: my ($num) = /^Items: (\d+)/; Another advantage of DFAs over NFAs is that the core of a DFA-based pattern matching implementation is a small simple loop which is executed very efficiently by modern CPUs. On the other hand, the core of a NFA-based implementation is much larger and much more complex, and I'm not sure it is executed as efficiently. In particular, there is probably a lot more branch mispredictions. As an example of what performance improvement one can sometimes achieve using a DFA-based rather than a NFA-based implementation, here are the results I get on my computer for the regular expression benchmark from the Great Computer Language Shootout. (http://www.bagley.org/~doug/shootout/bench/regexmatch/) Perl 5 3.49s OCaml with PCRE (NFA-based) 3.17s OCaml with RE (DFA-based) 0.34s -- Jerome
backtracking into { code }
A question: Do rules matched in a { code } block set backtrack points for the outer rule? For example, are these rules equivalent? rule expr1 { term { /operators/ or fail } term } rule expr2 { term operators term } And a comment: It would be nice to have procedural control over back- tracking so that { code } can fail, succeed (not fail), or succeed and commit. Right now we can follow { code } with ::, :::, etc. but that does not allow much control. I'm a little afraid of what happens in an LL(Inf) grammar if backtracking states aren't aggressively pruned. - Ken
Re: Hypothetical synonyms
The ° character doesn't have any special meaning, that's why I choosed it in the above example. However, it also symbolizes a little capturing and as it isn't filled, it could really symbolize an uncapturing. Interesting idea. I'm not sure if I agree with it yet. However, I don't agree with your syntax, as I can't type that character. Is it possible to modify what was captured? / ([ \\ . { chop; chop } | [^\\] ]*?) / Or is that just too ugly? Luke
Re: backtracking into { code }
On Thu, 2002-08-29 at 08:05, Ken Fox wrote: A question: Do rules matched in a { code } block set backtrack points for the outer rule? For example, are these rules equivalent? rule expr1 { term { /operators/ or fail } term } rule expr2 { term operators term } And a comment: It would be nice to have procedural control over back- tracking so that { code } can fail, succeed (not fail), or succeed and commit. Right now we can follow { code } with ::, :::, etc. but that does not allow much control. I'm a little afraid of what happens in an LL(Inf) grammar if backtracking states aren't aggressively pruned. Well, if /.../ is returning a result object (Let's say CORE::RX::Result), then I would imagine it's an easy enough thing to let you create your own, or return the one from a rule that you invoke. e.g.: rule { term { /operators/.commit(1) or fail } term } The hypothetical commit() method being one that would take a number and modify the result object so that it commits as if you had used that many colons. {} inside a rule would, I imagine be implemented like so: sub rxbraces ($code) { my $stat = $code(); if $stat.isa(CORE::RX::Result) { return $stat; } else { my $r is CORE::RX::Result; $r.success($stat); # Boolean status-setting method return $r; } } Or the moral equiv In other words, it should be able to return a result of your choosing. Sorry if I've missed some of the design. My Perl 6 pseudo-code may not be legal.
Re: backtracking into { code }
Aaron Sherman wrote: rule { term { /operators/.commit(1) or fail } term } The hypothetical commit() method being one that would take a number and That would only be useful if the outer rule can backtrack into the inner /operators/ rule. Can it? I agree with you that a commit method would be useful -- especially when used on $self. I'd probably write your example as: rule { term { m/operators { $self.commit(1) }/ or fail } term } which is of course just a complicated rule { term { m/operators :/ or fail } term } BTW, why isn't fail a method? Then a rule could pass itself to a sub-rule and allow the sub-rule to fail it's parent, but not the entire match. Isn't failing just invoking the last continuation on the backtrack stack? - Ken
Re: auto deserialization
From: Dan Sugalski [EMAIL PROTECTED] I actually had something a bit more subversive in mind, where the assignment operator for the Date class did some magic the same way we do now when we do math on strings. I was thinking a simple general purpose rule. If the variable is typed, and its class has a standard static method for instantiating from a string, and if a String object is being assigned to the variable, then the class's deserialization method is called, returning the new object and assigning it to the variable. _ Send and receive Hotmail on your mobile device: http://mobile.msn.com
Re: backtracking into { code }
On Thu, 2002-08-29 at 10:28, Ken Fox wrote: Aaron Sherman wrote: rule { term { /operators/.commit(1) or fail } term } The hypothetical commit() method being one that would take a number and That would only be useful if the outer rule can backtrack into the inner /operators/ rule. Can it? Of course not. In the same way that rule foo { b } rule bar { a foo+ b } abb =~ /bar/ would not. You backtrack OVER it, and that's when your commit (of whatever degree) would come into play. I agree with you that a commit method would be useful -- especially when used on $self. I'd probably write your example as: rule { term { m/operators { $self.commit(1) }/ or fail } term }. which is of course just a complicated rule { term { m/operators :/ or fail } term } There's no way that can affect anything, as : doesn't affect calling rules, e.g.: rule foo { b : } rule bar { a foo+ b } abb =~ /bar/ will match, because the foo rule never needs to backtrack. If foo had used C commit , then you'd fail, but that's a horse of a different animal. The goal was to dynamically cause backtracking over inline code to fail.
Re: auto deserialization
On Thu, Aug 29, 2002 at 07:52:42AM -0700, Steve Canfield wrote: From: Dan Sugalski [EMAIL PROTECTED] I actually had something a bit more subversive in mind, where the assignment operator for the Date class did some magic the same way we do now when we do math on strings. I was thinking a simple general purpose rule. If the variable is typed, and its class has a standard static method for instantiating from a string, and if a String object is being assigned to the variable, then the class's deserialization method is called, returning the new object and assigning it to the variable. This is possibly more an internals question, but I was assuming that the serialization/deserialization methods would normally be converting an object to an efficient packed 8 bit binary serial format (much like Storable does). In which case, is it a counterproductive assumption to expect (or mandate) that the incoming serialization method on a class accepts well formed human readable Unicode (or Shift-JIS or ASCII or whatever) strings? Surely a class is allowed to make a distinction between the format that it uses to serialize itself, and the format(s) of initialization strings it accepts? Nicholas Clark
Re: Hypothetical synonyms
Luke Palmer wrote at Thu, 29 Aug 2002 15:21:57 +0200: The ° character doesn't have any special meaning, that's why I choosed it in the above example. However, it also symbolizes a little capturing and as it isn't filled, it could really symbolize an uncapturing. Interesting idea. I'm not sure if I agree with it yet. However, I don't agree with your syntax, as I can't type that character. Year, that's of course a problem. But I don't have any imagination what over typeable character with no other meaning could be choosen. Is it possible to modify what was captured? / ([ \\ . { chop; chop } | [^\\] ]*?) / Or is that just too ugly? IMHO, that looks as ugly as the other workaround solutions :-) I think, the greatest strength of Perl is that it expresses simple things in a simple, short and natural way. Such a regexp behaviour would simplify a lot of jobs where we have to make workarounds instead about the simple stuff Match it, capture the relevant parts and ignore some irrelevant subparts. It's always possible to implemented with - more captures, joined together later or - a substitution regexp/translitariton for the captured part to remove the irrelevant subparts It's from my IMHO comparable to problem Group it, but don't capture it what had been solved with the (?:) sytnax. From that regarding, a (?_...) (Questionmark underscore) syntax could also be an idea with the meaning Group it, don't capture it even not in surrounding captures. With it, the OP problem would look like: /\s*((?_).*?(?_°)|\S+)/; (I choosed the underscore, as it is typeable and could have the mnemonic meaning of some underlying unimport background group) But perhaps, I'm only dreaming Cheerio, Janek
Re: Hypothetical synonyms
Don't forget you can parameterize rules with subrules. I don't see any reason you couldn't write a pick (.*?) | (\S+) kind of rule and do whatever you like with the submatched bits. Larry
Re: declaring if and while (was: rule, rx and sub)
On Thu, 29 Aug 2002, Thomas A. Boyer wrote: : Am I getting this straight? As straight as any of us are getting it thus far. :-) The process is intended to be convergent. That doesn't guarantee it will converge, but that's the intention. When I'm playing golf, I always expect to knock the ball into the hole. And I'm happy if the ball ends up closer to the hole than it was. Larry