On Sat, 2 Sep 2000 15:16:20 -0400, Peter Heslin wrote:
>> This looks more natural to me:
>>
>> /(?`!G+A+T+)GA+C/
>Your version is closer to the way lookbehind works now, so this syntax
>might be thought to be clearer; I should add to the RFC an explicit
>note about this.
Look at your original requirement:
>>If you want to match text that matches /GA+C/, but not when it
>>follows /G+A+T+/,
I find that "my" syntax most closely matches this requirement.
The reason why I find it clearer is because it states WHAT it should
match, not HOW it should match.
>Perhaps the same functionality I want could be
>achieved with your syntax. Would /(?`!G+A+T+)(?:GA+)C/ mean "Match GA+,
>then do the lookbehind, then match C"? Is so, I'd be happy with that.
I have the feeling that you're looking too close to the implementation
details. Lookbehind of a more complicated kind, like this one, should
*automatically* be postponed until a time where it would make sense. At
least, this thing should find a "G", of even "GA", before even
reconsidering that it must be preceded by a "T", let alone something
that matches /G+A+T+/.
But, that is not your problem. That is a problem of regex optimization.
I feel that your originally proposed syntax is weird. Look at this
variation:
/GA+(?:C(?!(?r)T+A+G+)|T(?!(?r)G+A+C+))/
which says: /GA+C/ but not preceded with /T+A+G+/, or /GA+T/ but not
preceded with /G+A+C+/.
The discrepancy between *where* this is specified, and where it should
match, really bugs me.
Here's my version:
/(?`!T+A+G+)GA+C|(?`!G+A+C+)GA+T/
You gain nothing, you loose clarity.
p.s. /(?`!T+A+G+)/ does not mean: must match something that doesn't
match /T+A+G+/. Instead, if something, *anything*, can match it, the
whole match fails.
--
Bart.