Hi, Bram

On Fri, Mar 28, 2008 at 2:47 AM, Bram Moolenaar <[EMAIL PROTECTED]> wrote:
>
>
>  Xiaozhou Liu wrote:
>
>  > During the development of the new regexp, one thing confuses me a lot:
>  > ordered alternation. (e.g. given r.e. 'ab\|abc' and text 'abc', 'ab'
>  > matched, not 'abc')
>  >
>  > I know that 100% compatibility is one of the project goals. So I try
>  > to keep this feature
>  > in the new regexp. But the problem is, ordered alternation is kind of
>  > 'side effect'
>  > of the original back track regexp matcher. AFAIK, It is very hard to
>  > implement this
>  > feature in the new, truly NFA matcher, if it is not impossible. We can 
> resort
>  > to the original regexp when we see '\|',  but we don't solve the
>  > problem perfectly.
>  >
>  > So does anyone really need this feature to be kept? If so, please do tell 
> me.
>  > For me, the removal of this 'feature' won't break anything.
>
>  It is close to impossible to check that a change like this doesn't break
>  existing scripts.  And when something breaks, e.g. a syntax file, a
>  normal user is very unlikely to be able to figure out what caused the
>  problem.

Yes, I fully understand the importance of compatibility with the old regexp
engine. But I brought this issue up because this particular feature is
actually a side-effect of the old engine, and also non-compliant with the
POSIX standard, as others have pointed out. So I think it is not entirely
beyond reason to consider changing existing scripts to be not dependent on
ordered alternation.

>  I stick to the opinion that the new regexp engine must work exactly
>  like the existing one.

OK, if you think ordered alternation should not be an exception to this rule,
then I will proceed to implement this behavior.

>  Most things can be made to work that way.  I
>  also thought that this behavior of an alternate branch could be made to
>  work in a DFA, with some effort.

Yes, it is doable, but is unnatural for an NFA matcher, and kind of breaks its
"beauty".

>  And otherwise we would have to fall
>  back to the old engine when there is an alternate branch in the regexp.


As Nikolai have pointed out, due to the importance of \|, we should not go
with this solution.

Regards,
Xiaozhou

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_dev" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Raspunde prin e-mail lui