Re: RFC 166 (does-not-match)

Richard Proctor Tue, 29 Aug 2000 11:12:38 -0700
On Tue 29 Aug, Mark-Jason Dominus wrote:
> 
> Richard Proctor's RFC166 says:
> 
> > =head2 Matching Not a pattern
> > 
> > (?^pattern) matches anything that does not match the pattern.  On
> > its own, one can use !~ etc to negatively match patterns, but to
> > match a pattern that has foo(anything but not baz)bar is currently
> > difficult.  With this syntax it would simply be /foo(?^baz)bar/.
> 
> The problem with this proposal is that it's really unclear what it
> means.

This is going to need a much better definition...

> 
> The reason we don't have this feature today is not that it has never
> been thought of before.  People have thought of this a hundred times.
> The problem is that nobody has ever figured out how it should work.
> I don't mean that the implemenation is difficult. I mean  that nobody
> understand what such a a feature actually means.   Richard doesn't say
> this in his RFC, even for the simple examples he raises.  He just
> assumes that it will be obvious, but it isn't.  
> 
>         "foo-bazbar"  =~ /foo(?^baz)bar/    # true or false?
>         "foo-baz-bar" =~ /foo(?^baz)bar/    # true or false?

The simple answer is both are false.

> OK, I'm going to try to invent a meaning for (?^baz).  I'm going to
> choose what appears to be a reasonable choice, and see what happens.
> 
> Let's suppose that what (?^baz) means is "match any substring that is
> not 'baz'."  That is a reasonably clear meaning.  Then it behaves like
> (.*)(?{$1 ne 'baz'}) does today.  Then the examples above are both
> true.

No your example is wrong it should behave as (.*)(?{$1 !~ /baz/}) both the
examples are false.  (?^foo) matches any substring that does not match the
pattern foo.

> 
> Now let's see how that choice works out.
> 
>         "foobaz" =~ /foo.*(?^baz)/
> 
> This is TRUE, because "foo" matches "foo", ".*" matches "baz", and
> "(?^baz)" matches the empty string at the end, which is a substring
> that is not "baz".

This is a traditional problem with a greedy .* this however does beg
the question is (?^baz) greedy?  I think the right answer is that it should
not be (but I am open to debate on that).

> 
> In fact, with this apparently reasonable choice of meaning for
> (?^baz), /foo.*(?^baz)/ will match anything that /foo.*/ will.  The
> (?^baz) has hardly any effect at all.

With a greedy .* the (?^baz) has no effect, unless something follows that
has to be matched.

> 
> It is a good thing that we did not implement it that way, because it
> is sure to become an instant FAQ:  "Why does /foo.*(?^baz)/ match
> 'foobaz'?"  You are going to see this question in comp.lang.perl.misc
> every week.

I think one should outlaw .* before or after a (?^foo) construct, as
the result is meaningless.

> 
> So this choice I made for the meaning of (?^baz) appears to have been
> the wrong one. I could go on and make a different reasonable-seeming
> choice and show what was wrong with it, but I don't want to belabor my
> point, which is:
> 
> Every choice anyone has ever made for the meaning of (?^baz) has
> always been the wrong one for one reason or another.  So without a
> detailed explanation of what (?^baz) might mean, suggesting that Perl
> 6 have one is not helpful.  

I can tighten the definition up.  If there have been calls for a 
(?^baz) type construct before, there will be again.  It is a matter of
getting the definition straightforward and useable.

Richard

-- 

[EMAIL PROTECTED]
Re: RFC 166 (does-not-match)

Reply via email to