Hi, Bram On Fri, Mar 28, 2008 at 2:47 AM, Bram Moolenaar <[EMAIL PROTECTED]> wrote: > > > Xiaozhou Liu wrote: > > > During the development of the new regexp, one thing confuses me a lot: > > ordered alternation. (e.g. given r.e. 'ab\|abc' and text 'abc', 'ab' > > matched, not 'abc') > > > > I know that 100% compatibility is one of the project goals. So I try > > to keep this feature > > in the new regexp. But the problem is, ordered alternation is kind of > > 'side effect' > > of the original back track regexp matcher. AFAIK, It is very hard to > > implement this > > feature in the new, truly NFA matcher, if it is not impossible. We can > resort > > to the original regexp when we see '\|', but we don't solve the > > problem perfectly. > > > > So does anyone really need this feature to be kept? If so, please do tell > me. > > For me, the removal of this 'feature' won't break anything. > > It is close to impossible to check that a change like this doesn't break > existing scripts. And when something breaks, e.g. a syntax file, a > normal user is very unlikely to be able to figure out what caused the > problem.
Yes, I fully understand the importance of compatibility with the old regexp engine. But I brought this issue up because this particular feature is actually a side-effect of the old engine, and also non-compliant with the POSIX standard, as others have pointed out. So I think it is not entirely beyond reason to consider changing existing scripts to be not dependent on ordered alternation. > I stick to the opinion that the new regexp engine must work exactly > like the existing one. OK, if you think ordered alternation should not be an exception to this rule, then I will proceed to implement this behavior. > Most things can be made to work that way. I > also thought that this behavior of an alternate branch could be made to > work in a DFA, with some effort. Yes, it is doable, but is unnatural for an NFA matcher, and kind of breaks its "beauty". > And otherwise we would have to fall > back to the old engine when there is an alternate branch in the regexp. As Nikolai have pointed out, due to the importance of \|, we should not go with this solution. Regards, Xiaozhou --~--~---------~--~----~------------~-------~--~----~ You received this message from the "vim_dev" maillist. For more information, visit http://www.vim.org/maillist.php -~----------~----~----~----~------~----~------~--~---