Re: Feedback new regex engine - more utf-8 and an esoteric \NULL case?

Bram Moolenaar Tue, 21 May 2013 03:05:19 -0700

Marc Weber wrote:

> 1) That the new regex *silently* fails if something is not supported is
>   no option - you should throw an error IMHO so that people know that
>   something goes wrong.


It happens if you set 'regexpengine' to 2.

> 2) https://gist.github.com/MarcWeber/5616733
> I've created an unfinished QuickCheck script to compare the old and the
> new engine - however because the "new engine" is not documented other
> than "should work on most syntax files" and "does not implement
> everything" I'm not sure what to include in that text
> 
> Summary it looks promising. I expected to find more issues.

More testing is always good.  I'm using a bunch of syntax highlighted
files.  Found a few problems.  And found one problem in the old engine!
It's not sufficient though.

> It found this cases behaving differently:
> 
> The first [] is always the regex, the second is the string to match
> against (using matchall).
> 
> 1)
>   RegexTests [\_F] ["\NULa"]
>   \NUL is the 0 byte - which is read by readfile() (not using 'b' flag)
> 
>   new: ['a', '', '', '', '', '', '', '', '', '']
>   old: ['^@', '', '', '', '', '', '', '', '', ''] 

I fixed \F yesterday.  Hmm, but you say you include patch 981.
What is the Vim command to reproduce this?  The newline character should
represent a NUL.  However, a string cannot contain a NUL.

> 2) and all the others: they seem to be utf-8 related
>   echo '1' =~ '\%#=1\o{\?Ä\Z' 
>   echo '1' =~ '\%#=2\o{\?Ä\Z' 

Yes, that looks like a bug.

> From what I tested I got no segfault, and most generated tests seem to
> pass. Please note that I consider "new engine finding something which is
> not implemented" and "old engine does not parse regex" success.
> 
> Another test is this: (first is regex, second is the string to match
> against):
>   [ú\Z] [""]
> 
>   I cannot reproduce this using such viml code only:
> 
>   let reg = 'ú\Z'
>   let t = ""
>   echo matchlist('\%#=1'.reg, t)
>   echo matchlist('\%#=2'.reg, t)
> 
>   setting t to '1' however causes the difference
> 
>   result of matchlist:
>     new: ['', '', '', '', '', '', '', '', '', '']
>     old: []  
> 
>   So maybe with t="" this is a readfile related issue, too?

Isn't this the same for the old and the new engine?


> RegexTests [\p\+] ["\236a"]
>   new: ['a', '', '', '', '', '', '', '', '', '']
>   old: ['ìa', '', '', '', '', '', '', '', '', ''] 
> again an utf-8 issue as well as this:
> RegexTests [¤\|\Z] ["a"]

Yes, apparently \236 is not seen as a printable character.

> Tested with version:  Included patches: 1-981 
> 
> I'm not sure what is influencing vim's utf-8 handling?
> I have &encoding=utf-8 set.

Need more testing...

> client-server communication fails every 500 times or so - no bytes
> are returned.

Timing issue?

-- 
JOHN CLEESE PLAYED: SECOND SOLDIER WITH A KEEN INTEREST IN BIRDS, LARGE MAN
                    WITH DEAD BODY, BLACK KNIGHT, MR NEWT (A VILLAGE
                    BLACKSMITH INTERESTED IN BURNING WITCHES), A QUITE
                    EXTRAORDINARILY RUDE FRENCHMAN, TIM THE WIZARD, SIR
                    LAUNCELOT
                 "Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD

 /// Bram Moolenaar -- [email protected] -- http://www.Moolenaar.net   \\\
///        sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\  an exciting new programming language -- http://www.Zimbu.org        ///
 \\\            help me help AIDS victims -- http://ICCF-Holland.org    ///

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: Feedback new regex engine - more utf-8 and an esoteric \NULL case?

Raspunde prin e-mail lui