Re: regexp: does anyone really need ordered alternation?
On Mon, Mar 31, 2008 at 9:04 AM, Antony Scriven [EMAIL PROTECTED] wrote: On 31/03/2008, Ian Young [EMAIL PROTECTED] wrote: [...] I did a couple informal benchmarks when I preparing for a short talk I gave at my school. The slides are at http://vim-soc-regexp.googlecode.com/files/ThursExtra.odp, and near the end are some graphs with the data I collected. Short version: the pathological cases mirror Russ Cox's results pretty closely. The non-pathological cases tend to be a bit slower with the new engine than the old, but we're talking differences of a few nanoseconds in most cases. A few cases had more substantial differences (in the order of hundreds of nanoseconds), which is the reason I believe some work should go into optimizing before we incorporate the new engine into a release. Do you know why the new engine is slower than the old one in these cases? Are your benchmarks for matching only, or do they include the compilation phase? Also doesn't the old engine use something like Boyer-Moore on string literals in the regexp to home in quickly on a potential match? Could you make use of that code too? --Antony To some extent, a minor slowdown on many regexps is unavoidable. When the backtracking engine guesses right on one of the first tries, it has executed very quickly. In contrast, the NFA implementation tries all paths of execution simultaneously until at least one match is found. So on regexps with a lot of branches, the NFA engine will often do more work. The idea is that it's worth it to trade a few nanoseconds here and there for the really massive delays on cases the backtracking engine handles poorly (such as those illustrated in Russ' article and my slides). The graphs in the slideshow are only execution time. Compile time for the new engine is slightly slower across the board, but nothing in compilation topped 20ns, so it seemed less important (although our compilation phase is certainly not as streamlined as it could be). I forgot to link to the second file, which is a more complete set of data for the curious. You can find it at http://vim-soc-regexp.googlecode.com/files/times_basic.ods. It includes graphs of both compilation and execution times, plus all the raw data: time dumps for each run and the line numbers that will match bars on the graph to the regexp with that runtime. You can see in this file which of the regexps I tested were noticeably slower in the new implementation: generally those with lots of branches where the greedy backtracking approach guesses right the first time, such as /a*a*a*a*a*b/ against 'ab'. As for some sort of optimization for string literals, it's certainly a possibility, but we haven't looked into it (we've been focusing on getting it running correctly before we bother optimizing). I don't know if the old code has such a thing - I haven't looked at it carefully enough. Ian --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: regexp: does anyone really need ordered alternation?
On Thu, Mar 27, 2008 at 5:54 PM, Antony Scriven [EMAIL PROTECTED] wrote: On 26/03/2008, Xiaozhou Liu [EMAIL PROTECTED] wrote: Hi Vimmers, During the development of the new regexp, one thing confuses me a lot: ordered alternation. (e.g. given r.e. 'ab\|abc' and text 'abc', 'ab' matched, not 'abc') I know that 100% compatibility is one of the project goals. So I try to keep this feature in the new regexp. But the problem is, ordered alternation is kind of 'side effect' of the original back track regexp matcher. AFAIK, It is very hard to implement this feature in the new, truly NFA matcher, if it is not impossible. We can resort to the original regexp when we see '\|', but we don't solve the problem perfectly. I thought Russ Cox had solved this in the code on his website, or am I mistaken? Yes, Russ had solved this :-) Xiaozhou --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: regexp: does anyone really need ordered alternation?
Nikolai Weibull wrote: On Sat, Mar 29, 2008 at 8:14 PM, Bram Moolenaar [EMAIL PROTECTED] wrote: Considering the recent OOXML fuzz I have lowered my appreciation for standards considerably. Considering that much of what people are complaining about regarding OOXML is things that exist in OOXML due to Office's backwards compatibility, this is somewhat ironic. I wasn't referring to the text of the proposed standard, I was referring to how it is forced to become a standard no matter what the quality is. Microsoft wants OOXML to become an ISO standard. Lots of people look the other way, only a small group dares to stand up and say what they think. This is all political, has nothing to do with the quality of the standard. For people who don't know what I'm talking about: http://news.google.com/news?hl=enq=ooxml The first article at the moment has a nice overview of what is happening: http://community.zdnet.co.uk/blog/0,100567,10007689o-2000331761b,00.htm -- hundred-and-one symptoms of being an internet addict: 191. You rate eating establishments not by the quality of the food, but by the availability of electrical outlets for your PowerBook. /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net \\\ ///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\ \\\download, build and distribute -- http://www.A-A-P.org/// \\\help me help AIDS victims -- http://ICCF-Holland.org/// --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: regexp: does anyone really need ordered alternation?
Here's another problem with changing behavior in the new engine: we would have to modify the backtracking engine as well to prevent it from using ordered alternation. Since our new engine can't handle certain things and falls back to the old one, they must behave the same. For example, we can't have aa\|aab and \(a\)\1\|aab match different things - the horror! Where is Russ' code that solves this problem? I believe at one point we discussed using a simplified type of tagged nfa to get around this, but I can't find that conversation anywhere. On Fri, Mar 28, 2008 at 9:53 AM, [EMAIL PROTECTED] wrote: If the new engine is supposed to have Posix-compatible behaviour, It's not, but I think this is a relevant issue. What about the idea of having a flag to change regexp behavior from Vim style to POSIX style (or at least a close approximation)? Would this be useful? Let me throw a couple different ideas out: It's possible someone could rewrite the current engines to behave differently depending on the desired behavior. However, I don't know how much work that would be. The type of greediness should be easy to manipulate in the new engine, but I can't speak to changing the old engine, or to the other things that Vim does differently from POSIX. Alternatively, the changes we made to Vim code make it fairly easy to plug in different regexp engines at will - I got part of the way on a PCRE-driven Vim, although I don't believe I got certain features like multiline matching working. It should be doable, though, and it's possible that a PCRE (or another regexp engine) dependency could be included at compile time or released as an unofficial patch or something. Again, runtime behavior could be specified with a flag. To show that this would make a big difference, solid benchmark data would be valuable. Has anyone benchmarked the new engine? I did a couple informal benchmarks when I preparing for a short talk I gave at my school. The slides are at http://vim-soc-regexp.googlecode.com/files/ThursExtra.odp, and near the end are some graphs with the data I collected. Short version: the pathological cases mirror Russ Cox's results pretty closely. The non-pathological cases tend to be a bit slower with the new engine than the old, but we're talking differences of a few nanoseconds in most cases. A few cases had more substantial differences (in the order of hundreds of nanoseconds), which is the reason I believe some work should go into optimizing before we incorporate the new engine into a release. Cheers, Ian --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: regexp: does anyone really need ordered alternation?
Matt Wozniski wrote: On Thu, Mar 27, 2008 at 2:47 PM, Bram Moolenaar wrote: Xiaozhou Liu wrote: During the development of the new regexp, one thing confuses me a lot: ordered alternation. (e.g. given r.e. 'ab\|abc' and text 'abc', 'ab' matched, not 'abc') I know that 100% compatibility is one of the project goals. So I try to keep this feature in the new regexp. But the problem is, ordered alternation is kind of 'side effect' of the original back track regexp matcher. AFAIK, It is very hard to implement this feature in the new, truly NFA matcher, if it is not impossible. We can resort to the original regexp when we see '\|', but we don't solve the problem perfectly. So does anyone really need this feature to be kept? If so, please do tell me. For me, the removal of this 'feature' won't break anything. It is close to impossible to check that a change like this doesn't break existing scripts. And when something breaks, e.g. a syntax file, a normal user is very unlikely to be able to figure out what caused the problem. I stick to the opinion that the new regexp engine must work exactly like the existing one. Most things can be made to work that way. I also thought that this behavior of an alternate branch could be made to work in a DFA, with some effort. And otherwise we would have to fall back to the old engine when there is an alternate branch in the regexp. For what it's worth, I disagree strongly. This behavior is nothing but a bug in the existing implementation - a documented bug, but a bug nonetheless. I don't see why one would call this a bug. It's works this way intentionally. In this particular case, I definitely think that we should strive for compatibility with other regex engines, rather than backwards compatibility with older vim versions. And, since this new regex engine would likely not be introduced until vim 8.0, there is no better time to break backwards compatibility. Since the old guarantee that the leftmost alternation is matched first in fact makes regexes work differently than in Perl, and in every POSIX implementation, and make a DFA-based regex engine harder to write, I think we should consider them as a historical fluke, and any syntax files that rely on them to have been too cozy with the current engine. We should re-write such syntax files, rather than keep a useful improvement out of vim for their sake. No, backwards compatibility is more important. I don't want a newer version of Vim not being used because it breaks something that people can't figure out. A jump from 7.x to 8.x doesn't mean it's allowed to break things. Considering the recent OOXML fuzz I have lowered my appreciation for standards considerably. The Vim regexp engine isn't conforming to a standard and doesn't work like any other engine anyway, so I see no reason to make a detail work that way. It still will work quite different from other engines. -- hundred-and-one symptoms of being an internet addict: 189. You put your e-mail address in the upper left-hand corner of envelopes. /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net \\\ ///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\ \\\download, build and distribute -- http://www.A-A-P.org/// \\\help me help AIDS victims -- http://ICCF-Holland.org/// --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: regexp: does anyone really need ordered alternation?
On Sat, Mar 29, 2008 at 8:14 PM, Bram Moolenaar [EMAIL PROTECTED] wrote: Considering the recent OOXML fuzz I have lowered my appreciation for standards considerably. Considering that much of what people are complaining about regarding OOXML is things that exist in OOXML due to Office's backwards compatibility, this is somewhat ironic. --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: regexp: does anyone really need ordered alternation?
On Thu, Mar 27, 2008 at 2:47 PM, Bram Moolenaar wrote: Xiaozhou Liu wrote: During the development of the new regexp, one thing confuses me a lot: ordered alternation. (e.g. given r.e. 'ab\|abc' and text 'abc', 'ab' matched, not 'abc') I know that 100% compatibility is one of the project goals. So I try to keep this feature in the new regexp. But the problem is, ordered alternation is kind of 'side effect' of the original back track regexp matcher. AFAIK, It is very hard to implement this feature in the new, truly NFA matcher, if it is not impossible. We can resort to the original regexp when we see '\|', but we don't solve the problem perfectly. So does anyone really need this feature to be kept? If so, please do tell me. For me, the removal of this 'feature' won't break anything. It is close to impossible to check that a change like this doesn't break existing scripts. And when something breaks, e.g. a syntax file, a normal user is very unlikely to be able to figure out what caused the problem. I stick to the opinion that the new regexp engine must work exactly like the existing one. Most things can be made to work that way. I also thought that this behavior of an alternate branch could be made to work in a DFA, with some effort. And otherwise we would have to fall back to the old engine when there is an alternate branch in the regexp. For what it's worth, I disagree strongly. This behavior is nothing but a bug in the existing implementation - a documented bug, but a bug nonetheless. In this particular case, I definitely think that we should strive for compatibility with other regex engines, rather than backwards compatibility with older vim versions. And, since this new regex engine would likely not be introduced until vim 8.0, there is no better time to break backwards compatibility. Since the old guarantee that the leftmost alternation is matched first in fact makes regexes work differently than in Perl, and in every POSIX implementation, and make a DFA-based regex engine harder to write, I think we should consider them as a historical fluke, and any syntax files that rely on them to have been too cozy with the current engine. We should re-write such syntax files, rather than keep a useful improvement out of vim for their sake. ~Matt --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: regexp: does anyone really need ordered alternation?
On Fri, 28 Mar 2008 02:09:19 -0400, Matt Wozniski [EMAIL PROTECTED] wrote: For what it's worth, I disagree strongly. This behavior is nothing but a bug in the existing implementation - a documented bug, but a bug nonetheless. In this particular case, I definitely think that we should strive for compatibility with other regex engines, rather than backwards compatibility with older vim versions. [snip] Since the old guarantee that the leftmost alternation is matched first in fact makes regexes work differently than in Perl, Many regular expression engines have ordered alternation, including Perl's. Given that alternations do have an order, why force that order to be disregarded? Responding to the ordering of alternatives may mean that the regular expressions aren't pure, but any notion of purity in the regular expression language was abandoned years ago in favour of functionality. -- Matthew Winn --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: regexp: does anyone really need ordered alternation?
On 28/03/2008, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: [...] Jeffrey Friedel discusses this in his book Mastering Regular Expressions in chapter 4, section NFA, DFA, and POSIX. [...] Jeffrey writes: If efficiency is an issue with a Traditional NFA (and with backtracking, believe me, it can be), it is doubly so with a POSIX NFA since there can be so much more backtracking. [...] [...] Friedl appears to be a bit confused with his terminology. When he says NFA he appears to mean a backtracking implementation. Xiaozhou Liu is proposing a true NFA implementation which searches for alternatives in parallel without backtracking. --Antony --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: regexp: does anyone really need ordered alternation?
lurker mode off From user point of view: does this new way of treating \| bring speed gains? \| is one of the most time expensive operators in Vim. If we could have improvement here I would say - go for it. lurker mode on Piotr Żaczek na Pokładzie! Gdynia 03.04.08, godz. 19.00 http://klik.wp.pl/?adr=http%3A%2F%2Fcorto.www.wp.pl%2Fas%2Fzaczek.htmlsid=293 --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: regexp: does anyone really need ordered alternation?
Interesting selection of languages to try ;-) You may have picked the ones with a common regex code base. Russ Cox's article certainly shows similar performance/behaviour (See section and graph on Performance at http://swtch.com/~rsc/regexp/regexp1.html) I wonder what Tcl and awk do? That's looks like an interesting article, thanks. It doesn't really matter if all those languages share implementation code. I just wanted to say that leftmost-longest is not what many popular languages do. Java doesn't offer a repl, so I can't be bothered to check that ;-) Tcl does do a longest match, Javascript does the same as Python, Ruby and Perl (at least the Spidermonkey implementation; the spec suggests that this is correct ( http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf , 15.10.2.3)). nico$ tclsh % regexp {ab|abc} abc matched 1 % puts $matched abc nico$ ~/src/js/spidermonkey/src/Darwin_DBG.OBJ/js js abc.match(/ab|abc/) ab Nico --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: regexp: does anyone really need ordered alternation?
On 26/03/2008, Xiaozhou Liu [EMAIL PROTECTED] wrote: Hi Vimmers, During the development of the new regexp, one thing confuses me a lot: ordered alternation. (e.g. given r.e. 'ab\|abc' and text 'abc', 'ab' matched, not 'abc') I know that 100% compatibility is one of the project goals. So I try to keep this feature in the new regexp. But the problem is, ordered alternation is kind of 'side effect' of the original back track regexp matcher. AFAIK, It is very hard to implement this feature in the new, truly NFA matcher, if it is not impossible. We can resort to the original regexp when we see '\|', but we don't solve the problem perfectly. I thought Russ Cox had solved this in the code on his website, or am I mistaken? So does anyone really need this feature to be kept? I don't need it, and though I'd prefer the longest match rather than the first alternative (as specified by POSIX) I don't really care too much as long as it is well documented. And since the original vi didn't have alternation, we don't need to worry about compatibility in that regard. If so, please do tell me. For me, the removal of this 'feature' won't break anything. It won't break anything that I use regexps for. But... I know Parsing Expression Grammars can make use of this feature to give precedence to one match over another. You might want to check whether any of the syntax files do something similar. --Antony --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: regexp: does anyone really need ordered alternation?
Antony Scriven wrote: On 26/03/2008, Xiaozhou Liu [EMAIL PROTECTED] wrote: Hi Vimmers, During the development of the new regexp, one thing confuses me a lot: ordered alternation. (e.g. given r.e. 'ab\|abc' and text 'abc', 'ab' matched, not 'abc') I know that 100% compatibility is one of the project goals. So I try to keep this feature in the new regexp. But the problem is, ordered alternation is kind of 'side effect' of the original back track regexp matcher. AFAIK, It is very hard to implement this feature in the new, truly NFA matcher, if it is not impossible. We can resort to the original regexp when we see '\|', but we don't solve the problem perfectly. I thought Russ Cox had solved this in the code on his website, or am I mistaken? So does anyone really need this feature to be kept? I don't need it, and though I'd prefer the longest match rather than the first alternative (as specified by POSIX) I don't really care too much as long as it is well documented. And since the original vi didn't have alternation, we don't need to worry about compatibility in that regard. An interesting twist. Can you clarify which behaviour POSIX specifies (your sentence above is ambiguous)? If so, please do tell me. For me, the removal of this 'feature' won't break anything. It won't break anything that I use regexps for. But... I know Parsing Expression Grammars can make use of this feature to give precedence to one match over another. You might want to check whether any of the syntax files do something similar. --Antony I'm not sure a syntax file itself should, though it possibly could, because it would be in the ordering of the commands and Vim internal implementation of syntax highlighting that it would be relevant. I imagine the most robust way to test this would be automated testing--get a Vim script to output the synID for each character in a complex source file in each language and compare the output of Vims compiled with each regex engine. Ben. --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: regexp: does anyone really need ordered alternation?
On 27/03/2008, Ben Schmidt [EMAIL PROTECTED] wrote: Antony Scriven wrote: [...] [...] I'd prefer the longest match rather than the first alternative (as specified by POSIX) [...] An interesting twist. Can you clarify which behaviour POSIX specifies (your sentence above is ambiguous)? You're right, sorry. POSIX specifies longest match. --Antony --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: regexp: does anyone really need ordered alternation?
On Thu, Mar 27, 2008 at 1:16 PM, Antony Scriven [EMAIL PROTECTED] wrote: On 27/03/2008, Ben Schmidt [EMAIL PROTECTED] wrote: Antony Scriven wrote: I'd prefer the longest match rather than the first alternative (as specified by POSIX) An interesting twist. Can you clarify which behaviour POSIX specifies (your sentence above is ambiguous)? You're right, sorry. POSIX specifies longest match. /left-most/ longest. Big difference. --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: regexp: does anyone really need ordered alternation?
On 27/03/2008, Nikolai Weibull [EMAIL PROTECTED] wrote: On Thu, Mar 27, 2008 at 1:16 PM, Antony Scriven [EMAIL PROTECTED] wrote: On 27/03/2008, Ben Schmidt [EMAIL PROTECTED] wrote: Antony Scriven wrote: I'd prefer the longest match rather than the first alternative (as specified by POSIX) An interesting twist. Can you clarify which behaviour POSIX specifies (your sentence above is ambiguous)? You're right, sorry. POSIX specifies longest match. /left-most/ longest. Big difference. I thought the `left-most' part was a given and we were discussing which of the alternatives would be subsequently picked; sorry if I was misleading. In this case POSIX specifies that the alternative which gives the longest overall match is picked, whereas backtracking implementations (like that in Vim) are usually designed to pick the first alternative that gives an overall match, even if a different choice would give a longer overall match. I know you know this already, I'm just trying to be clear! Now, I'm still not sure I've managed to do that, so please rephrase my explanation for me if you think it is necessary :-) --Antony --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: regexp: does anyone really need ordered alternation?
On Thu, Mar 27, 2008 at 5:37 PM, Antony Scriven [EMAIL PROTECTED] wrote: On 27/03/2008, Nikolai Weibull [EMAIL PROTECTED] wrote: /left-most/ longest. Big difference. I thought the `left-most' part was a given and we were discussing which of the alternatives would be subsequently picked; sorry if I was misleading. In this case POSIX specifies that the alternative which gives the longest overall match is picked, whereas backtracking implementations (like that in Vim) are usually designed to pick the first alternative that gives an overall match, even if a different choice would give a longer overall match. I know you know this already, I'm just trying to be clear! Now, I'm still not sure I've managed to do that, so please rephrase my explanation for me if you think it is necessary Don't worry. I was just making it clear to everyone what we were discussing. (And your explanation is fine ;-) --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: regexp: does anyone really need ordered alternation?
On Thu, Mar 27, 2008 at 8:39 AM, Ben Schmidt [EMAIL PROTECTED] wrote: I prefer the behaviour which I presume you have in your NFA implementation, of preferring longer matches, just as * is greedy by default, so would actually welcome the change. Yes, that indeed is the behavior of the NFA matcher. --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: regexp: does anyone really need ordered alternation?
On Thu, Mar 27, 2008 at 5:54 PM, Antony Scriven [EMAIL PROTECTED] wrote: I thought Russ Cox had solved this in the code on his website, or am I mistaken? Thanks for the pointer, I'm not aware of this. I'll have a look at the code. So does anyone really need this feature to be kept? I don't need it, and though I'd prefer the longest match rather than the first alternative (as specified by POSIX) I don't really care too much as long as it is well documented. And since the original vi didn't have alternation, we don't need to worry about compatibility in that regard. OK. That's good. If so, please do tell me. For me, the removal of this 'feature' won't break anything. It won't break anything that I use regexps for. But... I know Parsing Expression Grammars can make use of this feature to give precedence to one match over another. You might want to check whether any of the syntax files do something similar. --Antony This sounds like a pretty serious problem. I'll do the checking after I tie up the last few remaining bugs in the NFA matcher. Or better yet, can someone more familiar with the syntax files whether any syntax is dependent on this particular behavior of the regexp engine? Thanks! Xiaozhou --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: regexp: does anyone really need ordered alternation?
Xiaozhou Liu wrote: During the development of the new regexp, one thing confuses me a lot: ordered alternation. (e.g. given r.e. 'ab\|abc' and text 'abc', 'ab' matched, not 'abc') I know that 100% compatibility is one of the project goals. So I try to keep this feature in the new regexp. But the problem is, ordered alternation is kind of 'side effect' of the original back track regexp matcher. AFAIK, It is very hard to implement this feature in the new, truly NFA matcher, if it is not impossible. We can resort to the original regexp when we see '\|', but we don't solve the problem perfectly. So does anyone really need this feature to be kept? If so, please do tell me. For me, the removal of this 'feature' won't break anything. It is close to impossible to check that a change like this doesn't break existing scripts. And when something breaks, e.g. a syntax file, a normal user is very unlikely to be able to figure out what caused the problem. I stick to the opinion that the new regexp engine must work exactly like the existing one. Most things can be made to work that way. I also thought that this behavior of an alternate branch could be made to work in a DFA, with some effort. And otherwise we would have to fall back to the old engine when there is an alternate branch in the regexp. -- hundred-and-one symptoms of being an internet addict: 174. You know what a listserv is. /// Bram Moolenaar -- [EMAIL PROTECTED] -- http://www.Moolenaar.net \\\ ///sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\ \\\download, build and distribute -- http://www.A-A-P.org/// \\\help me help AIDS victims -- http://ICCF-Holland.org/// --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: regexp: does anyone really need ordered alternation?
Hi, Bram On Fri, Mar 28, 2008 at 2:47 AM, Bram Moolenaar [EMAIL PROTECTED] wrote: Xiaozhou Liu wrote: During the development of the new regexp, one thing confuses me a lot: ordered alternation. (e.g. given r.e. 'ab\|abc' and text 'abc', 'ab' matched, not 'abc') I know that 100% compatibility is one of the project goals. So I try to keep this feature in the new regexp. But the problem is, ordered alternation is kind of 'side effect' of the original back track regexp matcher. AFAIK, It is very hard to implement this feature in the new, truly NFA matcher, if it is not impossible. We can resort to the original regexp when we see '\|', but we don't solve the problem perfectly. So does anyone really need this feature to be kept? If so, please do tell me. For me, the removal of this 'feature' won't break anything. It is close to impossible to check that a change like this doesn't break existing scripts. And when something breaks, e.g. a syntax file, a normal user is very unlikely to be able to figure out what caused the problem. Yes, I fully understand the importance of compatibility with the old regexp engine. But I brought this issue up because this particular feature is actually a side-effect of the old engine, and also non-compliant with the POSIX standard, as others have pointed out. So I think it is not entirely beyond reason to consider changing existing scripts to be not dependent on ordered alternation. I stick to the opinion that the new regexp engine must work exactly like the existing one. OK, if you think ordered alternation should not be an exception to this rule, then I will proceed to implement this behavior. Most things can be made to work that way. I also thought that this behavior of an alternate branch could be made to work in a DFA, with some effort. Yes, it is doable, but is unnatural for an NFA matcher, and kind of breaks its beauty. And otherwise we would have to fall back to the old engine when there is an alternate branch in the regexp. As Nikolai have pointed out, due to the importance of \|, we should not go with this solution. Regards, Xiaozhou --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---
Re: regexp: does anyone really need ordered alternation?
Xiaozhou Liu wrote: Hi Vimmers, During the development of the new regexp, one thing confuses me a lot: ordered alternation. (e.g. given r.e. 'ab\|abc' and text 'abc', 'ab' matched, not 'abc') I know that 100% compatibility is one of the project goals. So I try to keep this feature in the new regexp. But the problem is, ordered alternation is kind of 'side effect' of the original back track regexp matcher. AFAIK, It is very hard to implement this feature in the new, truly NFA matcher, if it is not impossible. We can resort to the original regexp when we see '\|', but we don't solve the problem perfectly. So does anyone really need this feature to be kept? If so, please do tell me. For me, the removal of this 'feature' won't break anything. I prefer the behaviour which I presume you have in your NFA implementation, of preferring longer matches, just as * is greedy by default, so would actually welcome the change. Ben. --~--~-~--~~~---~--~~ You received this message from the vim_dev maillist. For more information, visit http://www.vim.org/maillist.php -~--~~~~--~~--~--~---