Re: regexp: does anyone really need ordered alternation?

2008-03-31 Fir de Conversatie Ian Young

On Mon, Mar 31, 2008 at 9:04 AM, Antony Scriven [EMAIL PROTECTED] wrote:

  On 31/03/2008, Ian Young [EMAIL PROTECTED] wrote:

[...]

  
I did a couple informal benchmarks when I preparing for
a short talk I gave at my school.  The slides are at
http://vim-soc-regexp.googlecode.com/files/ThursExtra.odp,
and near the end are some graphs with the data
I collected.  Short version: the pathological cases
mirror Russ Cox's results pretty closely. The
non-pathological cases tend to be a bit slower with the
new engine than the old, but we're talking differences of
a few nanoseconds in most cases.  A few cases had more
substantial differences (in the order of hundreds of
nanoseconds), which is the reason I believe some work
should go into optimizing before we incorporate the new
engine into a release.

  Do you know why the new engine is slower than the old one in
  these cases? Are your benchmarks for matching only, or do
  they include the compilation phase? Also doesn't the old
  engine use something like Boyer-Moore on string literals in
  the regexp to home in quickly on a potential match? Could
  you make use of that code too? --Antony



To some extent, a minor slowdown on many regexps is unavoidable.  When
the backtracking engine guesses right on one of the first tries, it
has executed very quickly. In contrast, the NFA implementation tries
all paths of execution simultaneously until at least one match is
found.  So on regexps with a lot of branches, the NFA engine will
often do more work.  The idea is that it's worth it to trade a few
nanoseconds here and there for the really massive delays on cases the
backtracking engine handles poorly (such as those illustrated in Russ'
article and my slides).

The graphs in the slideshow are only execution time. Compile time for
the new engine is slightly slower across the board, but nothing in
compilation topped 20ns, so it seemed less important (although our
compilation phase is certainly not as streamlined as it could be).

I forgot to link to the second file, which is a more complete set of
data for the curious.  You can find it at
http://vim-soc-regexp.googlecode.com/files/times_basic.ods.  It
includes graphs of both compilation and execution times, plus all the
raw data: time dumps for each run and the line numbers that will match
bars on the graph to the regexp with that runtime. You can see in this
file which of the regexps I tested were noticeably slower in the new
implementation: generally those with lots of branches where the greedy
backtracking approach guesses right the first time, such as
/a*a*a*a*a*b/ against 'ab'.

As for some sort of optimization for string literals, it's certainly a
possibility, but we haven't looked into it (we've been focusing on
getting it running correctly before we bother optimizing).  I don't
know if the old code has such a thing - I haven't looked at it
carefully enough.


Ian

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: regexp: does anyone really need ordered alternation?

2008-03-30 Fir de Conversatie Xiaozhou Liu

On Thu, Mar 27, 2008 at 5:54 PM, Antony Scriven [EMAIL PROTECTED] wrote:

  On 26/03/2008, Xiaozhou Liu [EMAIL PROTECTED] wrote:

 Hi Vimmers,
   
 During the development of the new regexp, one thing
 confuses me a lot: ordered alternation. (e.g. given r.e.
 'ab\|abc' and text 'abc', 'ab' matched, not 'abc')
   
 I know that 100% compatibility is one of the project
 goals. So I try to keep this feature in the new regexp.
 But the problem is, ordered alternation is kind of 'side
 effect' of the original back track regexp matcher.
 AFAIK, It is very hard to implement this feature in the
 new, truly NFA matcher, if it is not impossible. We can
 resort to the original regexp when we see '\|',  but we
 don't solve the problem perfectly.

  I thought Russ Cox had solved this in the code on his
  website, or am I mistaken?

Yes, Russ had solved this :-)

Xiaozhou

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: regexp: does anyone really need ordered alternation?

2008-03-30 Fir de Conversatie Bram Moolenaar


Nikolai Weibull wrote:

 On Sat, Mar 29, 2008 at 8:14 PM, Bram Moolenaar [EMAIL PROTECTED] wrote:
 
   Considering the recent OOXML fuzz I have lowered my appreciation for
   standards considerably.
 
 Considering that much of what people are complaining about regarding
 OOXML is things that exist in OOXML due to Office's backwards
 compatibility, this is somewhat ironic.

I wasn't referring to the text of the proposed standard, I was referring
to how it is forced to become a standard no matter what the quality is.
Microsoft wants OOXML to become an ISO standard.  Lots of people look
the other way, only a small group dares to stand up and say what they
think.  This is all political, has nothing to do with the quality of the
standard.

For people who don't know what I'm talking about:
http://news.google.com/news?hl=enq=ooxml
The first article at the moment has a nice overview of what is
happening:
http://community.zdnet.co.uk/blog/0,100567,10007689o-2000331761b,00.htm

-- 
hundred-and-one symptoms of being an internet addict:
191. You rate eating establishments not by the quality of the food,
 but by the availability of electrical outlets for your PowerBook.

 /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net   \\\
///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\download, build and distribute -- http://www.A-A-P.org///
 \\\help me help AIDS victims -- http://ICCF-Holland.org///

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: regexp: does anyone really need ordered alternation?

2008-03-30 Fir de Conversatie Ian Young

Here's another problem with changing behavior in the new engine: we
would have to modify the backtracking engine as well to prevent it
from using ordered alternation.  Since our new engine can't handle
certain things and falls back to the old one, they must behave the
same. For example, we can't have aa\|aab and \(a\)\1\|aab match
different things - the horror!

Where is Russ' code that solves this problem?  I believe at one point
we discussed using a simplified type of tagged nfa to get around
this, but I can't find that conversation anywhere.

On Fri, Mar 28, 2008 at 9:53 AM,  [EMAIL PROTECTED] wrote:

  If the new engine is supposed to have Posix-compatible behaviour,

It's not, but I think this is a relevant issue.  What about the idea
of having a flag to change regexp behavior from Vim style to POSIX
style (or at least a close approximation)?  Would this be useful?

Let me throw a couple different ideas out:

It's possible someone could rewrite the current engines to behave
differently depending on the desired behavior.  However, I don't know
how much work that would be.  The type of greediness should be easy to
manipulate in the new engine, but I can't speak to changing the old
engine, or to the other things that Vim does differently from POSIX.

Alternatively, the changes we made to Vim code make it fairly easy to
plug in different regexp engines at will - I got part of the way on a
PCRE-driven Vim, although I don't believe I got certain features like
multiline matching working.  It should be doable, though, and it's
possible that a PCRE (or another regexp engine) dependency could be
included at compile time or released as an unofficial patch or
something.  Again, runtime behavior could be specified with a flag.


  To show that this would make a big difference, solid benchmark data
  would be valuable. Has anyone benchmarked the new engine?

I did a couple informal benchmarks when I preparing for a short talk I
gave at my school.  The slides are at
http://vim-soc-regexp.googlecode.com/files/ThursExtra.odp, and near
the end are some graphs with the data I collected.  Short version: the
pathological cases mirror Russ Cox's results pretty closely. The
non-pathological cases tend to be a bit slower with the new engine
than the old, but we're talking differences of a few nanoseconds in
most cases.  A few cases had more substantial differences (in the
order of hundreds of nanoseconds), which is the reason I believe some
work should go into optimizing before we incorporate the new engine
into a release.


Cheers,
Ian

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: regexp: does anyone really need ordered alternation?

2008-03-29 Fir de Conversatie Bram Moolenaar


Matt Wozniski wrote:

 On Thu, Mar 27, 2008 at 2:47 PM, Bram Moolenaar wrote:
   Xiaozhou Liu wrote:
 
During the development of the new regexp, one thing confuses me a lot:
ordered alternation. (e.g. given r.e. 'ab\|abc' and text 'abc', 'ab'
matched, not 'abc')
   
I know that 100% compatibility is one of the project goals. So I try
to keep this feature
in the new regexp. But the problem is, ordered alternation is kind of
'side effect'
of the original back track regexp matcher. AFAIK, It is very hard to
implement this
feature in the new, truly NFA matcher, if it is not impossible. We can 
  resort
to the original regexp when we see '\|',  but we don't solve the
problem perfectly.
   
So does anyone really need this feature to be kept? If so, please do 
  tell me.
For me, the removal of this 'feature' won't break anything.
 
   It is close to impossible to check that a change like this doesn't break
   existing scripts.  And when something breaks, e.g. a syntax file, a
   normal user is very unlikely to be able to figure out what caused the
   problem.
 
   I stick to the opinion that the new regexp engine must work exactly
   like the existing one.  Most things can be made to work that way.  I
   also thought that this behavior of an alternate branch could be made to
   work in a DFA, with some effort.  And otherwise we would have to fall
   back to the old engine when there is an alternate branch in the regexp.
 
 For what it's worth, I disagree strongly.  This behavior is nothing
 but a bug in the existing implementation - a documented bug, but a bug
 nonetheless.

I don't see why one would call this a bug.  It's works this way
intentionally.

 In this particular case, I definitely think that we
 should strive for compatibility with other regex engines, rather than
 backwards compatibility with older vim versions.  And, since this new
 regex engine would likely not be introduced until vim 8.0, there is no
 better time to break backwards compatibility.  Since the old guarantee
 that the leftmost alternation is matched first in fact makes regexes
 work differently than in Perl, and in every POSIX implementation, and
 make a DFA-based regex engine harder to write, I think we should
 consider them as a historical fluke, and any syntax files that rely on
 them to have been too cozy with the current engine.  We should
 re-write such syntax files, rather than keep a useful improvement out
 of vim for their sake.

No, backwards compatibility is more important.  I don't want a newer
version of Vim not being used because it breaks something that people
can't figure out.  A jump from 7.x to 8.x doesn't mean it's allowed to
break things.

Considering the recent OOXML fuzz I have lowered my appreciation for
standards considerably.

The Vim regexp engine isn't conforming to a standard and doesn't work
like any other engine anyway, so I see no reason to make a detail work
that way.  It still will work quite different from other engines.

-- 
hundred-and-one symptoms of being an internet addict:
189. You put your e-mail address in the upper left-hand corner of envelopes.

 /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net   \\\
///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\download, build and distribute -- http://www.A-A-P.org///
 \\\help me help AIDS victims -- http://ICCF-Holland.org///

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: regexp: does anyone really need ordered alternation?

2008-03-29 Fir de Conversatie Nikolai Weibull

On Sat, Mar 29, 2008 at 8:14 PM, Bram Moolenaar [EMAIL PROTECTED] wrote:

  Considering the recent OOXML fuzz I have lowered my appreciation for
  standards considerably.

Considering that much of what people are complaining about regarding
OOXML is things that exist in OOXML due to Office's backwards
compatibility, this is somewhat ironic.

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: regexp: does anyone really need ordered alternation?

2008-03-28 Fir de Conversatie Matt Wozniski

On Thu, Mar 27, 2008 at 2:47 PM, Bram Moolenaar wrote:
  Xiaozhou Liu wrote:

   During the development of the new regexp, one thing confuses me a lot:
   ordered alternation. (e.g. given r.e. 'ab\|abc' and text 'abc', 'ab'
   matched, not 'abc')
  
   I know that 100% compatibility is one of the project goals. So I try
   to keep this feature
   in the new regexp. But the problem is, ordered alternation is kind of
   'side effect'
   of the original back track regexp matcher. AFAIK, It is very hard to
   implement this
   feature in the new, truly NFA matcher, if it is not impossible. We can 
 resort
   to the original regexp when we see '\|',  but we don't solve the
   problem perfectly.
  
   So does anyone really need this feature to be kept? If so, please do tell 
 me.
   For me, the removal of this 'feature' won't break anything.

  It is close to impossible to check that a change like this doesn't break
  existing scripts.  And when something breaks, e.g. a syntax file, a
  normal user is very unlikely to be able to figure out what caused the
  problem.

  I stick to the opinion that the new regexp engine must work exactly
  like the existing one.  Most things can be made to work that way.  I
  also thought that this behavior of an alternate branch could be made to
  work in a DFA, with some effort.  And otherwise we would have to fall
  back to the old engine when there is an alternate branch in the regexp.

For what it's worth, I disagree strongly.  This behavior is nothing
but a bug in the existing implementation - a documented bug, but a bug
nonetheless.  In this particular case, I definitely think that we
should strive for compatibility with other regex engines, rather than
backwards compatibility with older vim versions.  And, since this new
regex engine would likely not be introduced until vim 8.0, there is no
better time to break backwards compatibility.  Since the old guarantee
that the leftmost alternation is matched first in fact makes regexes
work differently than in Perl, and in every POSIX implementation, and
make a DFA-based regex engine harder to write, I think we should
consider them as a historical fluke, and any syntax files that rely on
them to have been too cozy with the current engine.  We should
re-write such syntax files, rather than keep a useful improvement out
of vim for their sake.

~Matt

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: regexp: does anyone really need ordered alternation?

2008-03-28 Fir de Conversatie Matthew Winn

On Fri, 28 Mar 2008 02:09:19 -0400, Matt Wozniski [EMAIL PROTECTED]
wrote:

 For what it's worth, I disagree strongly.  This behavior is nothing
 but a bug in the existing implementation - a documented bug, but a bug
 nonetheless.  In this particular case, I definitely think that we
 should strive for compatibility with other regex engines, rather than
 backwards compatibility with older vim versions.
[snip]
 Since the old guarantee
 that the leftmost alternation is matched first in fact makes regexes
 work differently than in Perl,

Many regular expression engines have ordered alternation, including
Perl's. Given that alternations do have an order, why force that order
to be disregarded? Responding to the ordering of alternatives may mean
that the regular expressions aren't pure, but any notion of purity
in the regular expression language was abandoned years ago in favour
of functionality.

-- 
Matthew Winn

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: regexp: does anyone really need ordered alternation?

2008-03-28 Fir de Conversatie Antony Scriven

On 28/03/2008, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

  [...]
 
  Jeffrey Friedel discusses this in his book Mastering
  Regular Expressions in chapter 4, section NFA, DFA, and
  POSIX. [...] Jeffrey writes: If efficiency
  is an issue with a Traditional NFA (and with
  backtracking, believe me, it can be), it is doubly so
  with a POSIX NFA since there can be so much more
  backtracking. [...]
 
  [...]

Friedl appears to be a bit confused with his terminology.
When he says NFA he appears to mean a backtracking
implementation. Xiaozhou Liu is proposing a true NFA
implementation which searches for alternatives in parallel
without backtracking. --Antony

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: regexp: does anyone really need ordered alternation?

2008-03-28 Fir de Conversatie Mikołaj Machowski

lurker mode off

From user point of view: does this new way of treating \| bring speed 
gains?

\| is one of the most time expensive operators in Vim. If we could have 
improvement here I would say - go for it.

lurker mode on


Piotr Żaczek na Pokładzie! Gdynia 03.04.08, godz. 19.00 
http://klik.wp.pl/?adr=http%3A%2F%2Fcorto.www.wp.pl%2Fas%2Fzaczek.htmlsid=293



--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: regexp: does anyone really need ordered alternation?

2008-03-28 Fir de Conversatie Nico Weber

 Interesting selection of languages to try ;-)  You may have picked the
 ones with a common regex code base.  Russ Cox's article certainly  
 shows
 similar performance/behaviour (See section and graph on Performance at
 http://swtch.com/~rsc/regexp/regexp1.html)  I wonder what Tcl and  
 awk do?

That's looks like an interesting article, thanks.

It doesn't really matter if all those languages share implementation  
code. I just wanted to say that leftmost-longest is not what many  
popular languages do. Java doesn't offer a repl, so I can't be  
bothered to check that ;-)

Tcl does do a longest match, Javascript does the same as Python, Ruby  
and Perl (at least the Spidermonkey implementation; the spec suggests  
that this is correct ( 
http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf 
  , 15.10.2.3)).

 nico$ tclsh
 % regexp {ab|abc} abc matched
 1
 % puts $matched
 abc

 nico$ ~/src/js/spidermonkey/src/Darwin_DBG.OBJ/js
 js abc.match(/ab|abc/)
 ab

Nico

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: regexp: does anyone really need ordered alternation?

2008-03-27 Fir de Conversatie Antony Scriven

On 26/03/2008, Xiaozhou Liu [EMAIL PROTECTED] wrote:

   Hi Vimmers,
 
   During the development of the new regexp, one thing
   confuses me a lot: ordered alternation. (e.g. given r.e.
   'ab\|abc' and text 'abc', 'ab' matched, not 'abc')
 
   I know that 100% compatibility is one of the project
   goals. So I try to keep this feature in the new regexp.
   But the problem is, ordered alternation is kind of 'side
   effect' of the original back track regexp matcher.
   AFAIK, It is very hard to implement this feature in the
   new, truly NFA matcher, if it is not impossible. We can
   resort to the original regexp when we see '\|',  but we
   don't solve the problem perfectly.

I thought Russ Cox had solved this in the code on his
website, or am I mistaken?

   So does anyone really need this feature to be kept?

I don't need it, and though I'd prefer the longest match
rather than the first alternative (as specified by POSIX)
I don't really care too much as long as it is well
documented. And since the original vi didn't have
alternation, we don't need to worry about compatibility in
that regard.

  If so, please do tell me. For me, the removal of this
  'feature' won't break anything.

It won't break anything that I use regexps for. But...
I know Parsing Expression Grammars can make use of this
feature to give precedence to one match over another. You
might want to check whether any of the syntax files do
something similar. --Antony

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: regexp: does anyone really need ordered alternation?

2008-03-27 Fir de Conversatie Ben Schmidt

Antony Scriven wrote:
 On 26/03/2008, Xiaozhou Liu [EMAIL PROTECTED] wrote:
 
Hi Vimmers,
  
During the development of the new regexp, one thing
confuses me a lot: ordered alternation. (e.g. given r.e.
'ab\|abc' and text 'abc', 'ab' matched, not 'abc')
  
I know that 100% compatibility is one of the project
goals. So I try to keep this feature in the new regexp.
But the problem is, ordered alternation is kind of 'side
effect' of the original back track regexp matcher.
AFAIK, It is very hard to implement this feature in the
new, truly NFA matcher, if it is not impossible. We can
resort to the original regexp when we see '\|',  but we
don't solve the problem perfectly.
 
 I thought Russ Cox had solved this in the code on his
 website, or am I mistaken?
 
So does anyone really need this feature to be kept?
 
 I don't need it, and though I'd prefer the longest match
 rather than the first alternative (as specified by POSIX)
 I don't really care too much as long as it is well
 documented. And since the original vi didn't have
 alternation, we don't need to worry about compatibility in
 that regard.

An interesting twist. Can you clarify which behaviour POSIX specifies (your 
sentence above is ambiguous)?

   If so, please do tell me. For me, the removal of this
   'feature' won't break anything.
 
 It won't break anything that I use regexps for. But...
 I know Parsing Expression Grammars can make use of this
 feature to give precedence to one match over another. You
 might want to check whether any of the syntax files do
 something similar. --Antony

I'm not sure a syntax file itself should, though it possibly could, because it 
would be in the ordering of the commands and Vim internal implementation of 
syntax 
highlighting that it would be relevant.

I imagine the most robust way to test this would be automated testing--get a 
Vim 
script to output the synID for each character in a complex source file in each 
language and compare the output of Vims compiled with each regex engine.

Ben.


--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: regexp: does anyone really need ordered alternation?

2008-03-27 Fir de Conversatie Antony Scriven

On 27/03/2008, Ben Schmidt [EMAIL PROTECTED] wrote:

   Antony Scriven wrote:
   [...]
  
   [...] I'd prefer the longest match rather than the
   first alternative (as specified by POSIX) [...]
 
  An interesting twist. Can you clarify which behaviour
  POSIX specifies (your sentence above is ambiguous)?

You're right, sorry. POSIX specifies longest match. --Antony

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: regexp: does anyone really need ordered alternation?

2008-03-27 Fir de Conversatie Nikolai Weibull

On Thu, Mar 27, 2008 at 1:16 PM, Antony Scriven [EMAIL PROTECTED] wrote:

  On 27/03/2008, Ben Schmidt [EMAIL PROTECTED] wrote:

 Antony Scriven wrote:

   I'd prefer the longest match rather than the
   first alternative (as specified by POSIX)

  An interesting twist. Can you clarify which behaviour
  POSIX specifies (your sentence above is ambiguous)?

 You're right, sorry. POSIX specifies longest match.

/left-most/ longest.  Big difference.

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: regexp: does anyone really need ordered alternation?

2008-03-27 Fir de Conversatie Antony Scriven

On 27/03/2008, Nikolai Weibull [EMAIL PROTECTED] wrote:

  On Thu, Mar 27, 2008 at 1:16 PM, Antony Scriven
  [EMAIL PROTECTED] wrote:
 
On 27/03/2008, Ben Schmidt [EMAIL PROTECTED] wrote:
 
 Antony Scriven wrote:
 
 I'd prefer the longest match rather than the
 first alternative (as specified by POSIX)
 
An interesting twist. Can you clarify which behaviour
POSIX specifies (your sentence above is ambiguous)?
 
   You're right, sorry. POSIX specifies longest match.
 
  /left-most/ longest.  Big difference.

I thought the `left-most' part was a given and we were
discussing which of the alternatives would be subsequently
picked; sorry if I was misleading. In this case POSIX
specifies that the alternative which gives the longest
overall match is picked, whereas backtracking
implementations (like that in Vim) are usually designed to
pick the first alternative that gives an overall match, even
if a different choice would give a longer overall match.
I know you know this already, I'm just trying to be clear!
Now, I'm still not sure I've managed to do that, so please
rephrase my explanation for me if you think it is necessary
:-) --Antony

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: regexp: does anyone really need ordered alternation?

2008-03-27 Fir de Conversatie Nikolai Weibull

On Thu, Mar 27, 2008 at 5:37 PM, Antony Scriven [EMAIL PROTECTED] wrote:

  On 27/03/2008, Nikolai Weibull [EMAIL PROTECTED] wrote:

  /left-most/ longest.  Big difference.

  I thought the `left-most' part was a given and we were
  discussing which of the alternatives would be subsequently
  picked; sorry if I was misleading. In this case POSIX
  specifies that the alternative which gives the longest
  overall match is picked, whereas backtracking
  implementations (like that in Vim) are usually designed to
  pick the first alternative that gives an overall match, even
  if a different choice would give a longer overall match.
  I know you know this already, I'm just trying to be clear!
  Now, I'm still not sure I've managed to do that, so please
  rephrase my explanation for me if you think it is necessary

Don't worry.  I was just making it clear to everyone what we were discussing.
(And your explanation is fine ;-)

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: regexp: does anyone really need ordered alternation?

2008-03-27 Fir de Conversatie Xiaozhou Liu

On Thu, Mar 27, 2008 at 8:39 AM, Ben Schmidt
[EMAIL PROTECTED] wrote:


  I prefer the behaviour which I presume you have in your NFA implementation, 
 of
  preferring longer matches, just as * is greedy by default, so would actually
  welcome the change.

Yes, that indeed is the behavior of the NFA matcher.

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: regexp: does anyone really need ordered alternation?

2008-03-27 Fir de Conversatie Xiaozhou Liu

On Thu, Mar 27, 2008 at 5:54 PM, Antony Scriven [EMAIL PROTECTED] wrote:
  I thought Russ Cox had solved this in the code on his
  website, or am I mistaken?

Thanks for the pointer, I'm not aware of this. I'll have a look at the code.

 So does anyone really need this feature to be kept?

  I don't need it, and though I'd prefer the longest match
  rather than the first alternative (as specified by POSIX)
  I don't really care too much as long as it is well
  documented. And since the original vi didn't have
  alternation, we don't need to worry about compatibility in
  that regard.

OK. That's good.

If so, please do tell me. For me, the removal of this
'feature' won't break anything.

  It won't break anything that I use regexps for. But...
  I know Parsing Expression Grammars can make use of this
  feature to give precedence to one match over another. You
  might want to check whether any of the syntax files do
  something similar. --Antony

This sounds like a pretty serious problem. I'll do the checking after I tie up
the last few remaining bugs in the NFA matcher. Or better yet, can someone
more familiar with the syntax files whether any syntax is dependent on this
particular behavior of the regexp engine? Thanks!

Xiaozhou

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: regexp: does anyone really need ordered alternation?

2008-03-27 Fir de Conversatie Bram Moolenaar


Xiaozhou Liu wrote:

 During the development of the new regexp, one thing confuses me a lot:
 ordered alternation. (e.g. given r.e. 'ab\|abc' and text 'abc', 'ab'
 matched, not 'abc')
 
 I know that 100% compatibility is one of the project goals. So I try
 to keep this feature
 in the new regexp. But the problem is, ordered alternation is kind of
 'side effect'
 of the original back track regexp matcher. AFAIK, It is very hard to
 implement this
 feature in the new, truly NFA matcher, if it is not impossible. We can resort
 to the original regexp when we see '\|',  but we don't solve the
 problem perfectly.
 
 So does anyone really need this feature to be kept? If so, please do tell me.
 For me, the removal of this 'feature' won't break anything.

It is close to impossible to check that a change like this doesn't break
existing scripts.  And when something breaks, e.g. a syntax file, a
normal user is very unlikely to be able to figure out what caused the
problem.

I stick to the opinion that the new regexp engine must work exactly
like the existing one.  Most things can be made to work that way.  I
also thought that this behavior of an alternate branch could be made to
work in a DFA, with some effort.  And otherwise we would have to fall
back to the old engine when there is an alternate branch in the regexp.

-- 
hundred-and-one symptoms of being an internet addict:
174. You know what a listserv is.

 /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net   \\\
///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\download, build and distribute -- http://www.A-A-P.org///
 \\\help me help AIDS victims -- http://ICCF-Holland.org///

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: regexp: does anyone really need ordered alternation?

2008-03-27 Fir de Conversatie Xiaozhou Liu

Hi, Bram

On Fri, Mar 28, 2008 at 2:47 AM, Bram Moolenaar [EMAIL PROTECTED] wrote:


  Xiaozhou Liu wrote:

   During the development of the new regexp, one thing confuses me a lot:
   ordered alternation. (e.g. given r.e. 'ab\|abc' and text 'abc', 'ab'
   matched, not 'abc')
  
   I know that 100% compatibility is one of the project goals. So I try
   to keep this feature
   in the new regexp. But the problem is, ordered alternation is kind of
   'side effect'
   of the original back track regexp matcher. AFAIK, It is very hard to
   implement this
   feature in the new, truly NFA matcher, if it is not impossible. We can 
 resort
   to the original regexp when we see '\|',  but we don't solve the
   problem perfectly.
  
   So does anyone really need this feature to be kept? If so, please do tell 
 me.
   For me, the removal of this 'feature' won't break anything.

  It is close to impossible to check that a change like this doesn't break
  existing scripts.  And when something breaks, e.g. a syntax file, a
  normal user is very unlikely to be able to figure out what caused the
  problem.

Yes, I fully understand the importance of compatibility with the old regexp
engine. But I brought this issue up because this particular feature is
actually a side-effect of the old engine, and also non-compliant with the
POSIX standard, as others have pointed out. So I think it is not entirely
beyond reason to consider changing existing scripts to be not dependent on
ordered alternation.

  I stick to the opinion that the new regexp engine must work exactly
  like the existing one.

OK, if you think ordered alternation should not be an exception to this rule,
then I will proceed to implement this behavior.

  Most things can be made to work that way.  I
  also thought that this behavior of an alternate branch could be made to
  work in a DFA, with some effort.

Yes, it is doable, but is unnatural for an NFA matcher, and kind of breaks its
beauty.

  And otherwise we would have to fall
  back to the old engine when there is an alternate branch in the regexp.


As Nikolai have pointed out, due to the importance of \|, we should not go
with this solution.

Regards,
Xiaozhou

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



regexp: does anyone really need ordered alternation?

2008-03-26 Fir de Conversatie Xiaozhou Liu

Hi Vimmers,

During the development of the new regexp, one thing confuses me a lot:
ordered alternation. (e.g. given r.e. 'ab\|abc' and text 'abc', 'ab'
matched, not 'abc')

I know that 100% compatibility is one of the project goals. So I try
to keep this feature
in the new regexp. But the problem is, ordered alternation is kind of
'side effect'
of the original back track regexp matcher. AFAIK, It is very hard to
implement this
feature in the new, truly NFA matcher, if it is not impossible. We can resort
to the original regexp when we see '\|',  but we don't solve the
problem perfectly.

So does anyone really need this feature to be kept? If so, please do tell me.
For me, the removal of this 'feature' won't break anything.

Thanks,
Xiaozhou

--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: regexp: does anyone really need ordered alternation?

2008-03-26 Fir de Conversatie Ben Schmidt

Xiaozhou Liu wrote:
 Hi Vimmers,
 
 During the development of the new regexp, one thing confuses me a lot:
 ordered alternation. (e.g. given r.e. 'ab\|abc' and text 'abc', 'ab'
 matched, not 'abc')
 
 I know that 100% compatibility is one of the project goals. So I try
 to keep this feature
 in the new regexp. But the problem is, ordered alternation is kind of
 'side effect'
 of the original back track regexp matcher. AFAIK, It is very hard to
 implement this
 feature in the new, truly NFA matcher, if it is not impossible. We can resort
 to the original regexp when we see '\|',  but we don't solve the
 problem perfectly.
 
 So does anyone really need this feature to be kept? If so, please do tell me.
 For me, the removal of this 'feature' won't break anything.

I prefer the behaviour which I presume you have in your NFA implementation, of 
preferring longer matches, just as * is greedy by default, so would actually 
welcome the change.

Ben.




--~--~-~--~~~---~--~~
You received this message from the vim_dev maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---