Philip Hazel wrote:
> On Sat, 12 Sep 2009, Sheri wrote:
>
>   
>> Interesting, thanks for the detailed explanation. Seems odd however that 
>> a lookbehind version works in 7.9?
>>
>>   re> /(?<=abc)|(?<=def)/g
>> data> abcdefghi
>>  0:
>>  0:
>> data>
>>     
>
> Lookbehind is different! It finds the empty string at the point at which 
> it is starting the match. That is, in your example above , the match
> "bumpalong point" (to use Friedl's terminology) is 3. But if you use
>
>   /abc\K|def\K/g
>   
> the match happens when the bumpalong point is 0. The lookbehind version 
> does indeed work in 7.9. Having found the empty match at offset 3, it 
> fails to find a non-empty match, and so moves on by one character. 
> Matches then fail until it reaches offset 6, at which point it finds the 
> def match.
>
> With \K, however, doing the same thing doesn't work. Having found the 
> empty match, it looks for a non-empty match starting at offset 3, and 
> fails. So it moves to offset 4, and so misses the next match. That's why 
> it has to ignore only the empty match at offset 3 itself, not one that 
> is found later in the string. Is that clear? I know it's tricky!
>
>   
>> I understand why you are making another option, but it sounds like as a 
>> result all user apps that do multiple matching (and the C++ module) will 
>> need to be modified to benefit. In fact if using a shared library, it 
>> will need to be processed one way if using a version less than 8.0 and 
>> another if using 7.9 and earlier.
>>     
>
> Aarrgghh. I had not thought of that. What I *had* thought of was that 
> people might be using PCRE_NOTEMPTY for completely different purposes, 
> and I did not want to break their applications.
>
>   
>> Have you considered giving the new option value to the old functionality?
>>     
>
> Oh dear, oh dear, this looks like I have to make a judgement as to which 
> change will annoy the fewest people, remembering that the problem arises 
> only if \K is used in a pattern that matches an empty string. Something 
> like  
>
>   /ab\Kc/de\Kf/
>   
> works fine in 7.9. I take your point about shared libraries etc. I am 
> now in a quandary as to what it the best way to proceed.
>
> Anybody else on this list got any ideas? If I do as Sheri suggests, and 
> give PCRE_NOTEMPTY_ATSTART the bit value that was PCRE_NOTEMPTY, and 
> give PCRE_NOTEMPTY a new value, pre-compiled applications that use a 
> shared library would automatically move to the new functionality, but 
> any that actually wanted the PCRE_NOTEMPTY functionality would go wrong.
>
> [At this point, I went away and played Mozart (string quartet) for an
> hour. Helps clear the brain...]
>
> Hmm... having thought about this for a bit, I think I am going to stick 
> with things the way they are. This is my reasoning:
>
> * If I change the functionality of the existing options bit, programs 
> that use PCRE_EMPTY for completely independent purposes (nothing to do 
> with /g-style processing) will suddenly break on a PCRE upgrade.
>
> * On the other hand, programs which at present use PCRE_EMPTY for 
> /g-style processing are not working properly in the presence of \K 
> patterns that match empty strings, a relatively rare situation (at 
> least, that's my guess).
>
> It seems to me to be better to choose the action that allows people to
> improve the behaviour of their programs that are not working quite right
> over an action that breaks programs that are working well.
>
> I take the point about PCRE versions. But if programmers are changing 
> their programs anyway, and want to remain compatible with previous 
> releases, it isn't too hard to do something like
>
> #if PCRE_MAJOR >= 8
> options |= PCRE_NOTEMPTY_ATSTART
> #else
> options |= PCRE_NOTEMPTY
> #endif
>
> which, in any case, is exactly the same as what you have to do for any 
> new option that is added.
>
> Philip
>
>   
Will you or Craig be implementing the fix in the cpp wrapper?

Regards,
Sheri

-- 
## List details at http://lists.exim.org/mailman/listinfo/pcre-dev 

Reply via email to