Philip Hazel wrote: > On Sat, 12 Sep 2009, Sheri wrote: > > >> Interesting, thanks for the detailed explanation. Seems odd however that >> a lookbehind version works in 7.9? >> >> re> /(?<=abc)|(?<=def)/g >> data> abcdefghi >> 0: >> 0: >> data> >> > > Lookbehind is different! It finds the empty string at the point at which > it is starting the match. That is, in your example above , the match > "bumpalong point" (to use Friedl's terminology) is 3. But if you use > > /abc\K|def\K/g > > the match happens when the bumpalong point is 0. The lookbehind version > does indeed work in 7.9. Having found the empty match at offset 3, it > fails to find a non-empty match, and so moves on by one character. > Matches then fail until it reaches offset 6, at which point it finds the > def match. > > With \K, however, doing the same thing doesn't work. Having found the > empty match, it looks for a non-empty match starting at offset 3, and > fails. So it moves to offset 4, and so misses the next match. That's why > it has to ignore only the empty match at offset 3 itself, not one that > is found later in the string. Is that clear? I know it's tricky! > > >> I understand why you are making another option, but it sounds like as a >> result all user apps that do multiple matching (and the C++ module) will >> need to be modified to benefit. In fact if using a shared library, it >> will need to be processed one way if using a version less than 8.0 and >> another if using 7.9 and earlier. >> > > Aarrgghh. I had not thought of that. What I *had* thought of was that > people might be using PCRE_NOTEMPTY for completely different purposes, > and I did not want to break their applications. > > >> Have you considered giving the new option value to the old functionality? >> > > Oh dear, oh dear, this looks like I have to make a judgement as to which > change will annoy the fewest people, remembering that the problem arises > only if \K is used in a pattern that matches an empty string. Something > like > > /ab\Kc/de\Kf/ > > works fine in 7.9. I take your point about shared libraries etc. I am > now in a quandary as to what it the best way to proceed. > > Anybody else on this list got any ideas? If I do as Sheri suggests, and > give PCRE_NOTEMPTY_ATSTART the bit value that was PCRE_NOTEMPTY, and > give PCRE_NOTEMPTY a new value, pre-compiled applications that use a > shared library would automatically move to the new functionality, but > any that actually wanted the PCRE_NOTEMPTY functionality would go wrong. > > [At this point, I went away and played Mozart (string quartet) for an > hour. Helps clear the brain...] > > Hmm... having thought about this for a bit, I think I am going to stick > with things the way they are. This is my reasoning: > > * If I change the functionality of the existing options bit, programs > that use PCRE_EMPTY for completely independent purposes (nothing to do > with /g-style processing) will suddenly break on a PCRE upgrade. > > * On the other hand, programs which at present use PCRE_EMPTY for > /g-style processing are not working properly in the presence of \K > patterns that match empty strings, a relatively rare situation (at > least, that's my guess). > > It seems to me to be better to choose the action that allows people to > improve the behaviour of their programs that are not working quite right > over an action that breaks programs that are working well. > > I take the point about PCRE versions. But if programmers are changing > their programs anyway, and want to remain compatible with previous > releases, it isn't too hard to do something like > > #if PCRE_MAJOR >= 8 > options |= PCRE_NOTEMPTY_ATSTART > #else > options |= PCRE_NOTEMPTY > #endif > > which, in any case, is exactly the same as what you have to do for any > new option that is added. > > Philip > > Will you or Craig be implementing the fix in the cpp wrapper?
Regards, Sheri -- ## List details at http://lists.exim.org/mailman/listinfo/pcre-dev
