Nikolay Pavlov wrote:

> 2016-01-03 3:53 GMT+03:00 Random832 <[email protected]>:
> 
> > mrosic (Vim Github Repository) <[email protected]> writes:
> > > "\a" is an alphabetic character [A-Za-z] Unfortunately that doesn't
> > > seem to include "ä" which makes Austrian sysadmins sad because they
> > > don't have syntax highlighting for Log files in January due to that
> > > error Note that for example echo -e "ä\na" | grep [a-z] will match
> > > both lines because unlike vim, grep considers "ä" to be a part of
> > > [a-z] which is exactly how it should be
> >
> >
> > That's a somewhat dangerous way of thinking. [a-z] is _not_ "all
> > lowercase characters" in any definition. For example, "ž" isn't in a-z
> > in most locales. In many locales, all but one of the basic 26 uppercase
> > letters is in a-z (typically A is not included, since it is before A;
> > occasionally this is reversed and Z is excluded instead).
> >
> > Certainly ä wouldn't be in a-z in Swedish. In this day and age it's
> > really not "safe" to use ranges as character classes in normal locales
> > (they're safe, as far as it goes, in the POSIX locale, but obviously
> > won't include non-ASCII characters). It is irritating that [[:alpha:]]
> > doesn't work with unicode characters, though.
> >
> 
> Help explicitly says that character classes like [[:alpha:]] work only for
> 8-bit characters. I am actually surprised with [[:lower:][:upper:]]
> matching ä (also Russian а and Greek α) in a UTF-8 locale because all three
> are not 8-bit characters at once in any locale.

The old regex engine had limits on what [] could contain.  It basically
made a list of all possible characters.  Thus it was limited to 8-bit
characters.

The new engine can do this properly.  However, for [:alpha:] it still
has the check for a character between 1 and 255.  for [:lower:] and
[:upper:] there is no such check.

While migrating from the old to the new regexp engine we wanted them to
do exactly the same, so that we were sure nothing breaks.  But now that
the new engine works well (except for a few bugs), we might take
advantage of what it does and implement character classes for multi-byte
characters.

Alternatively, we change this also for the old engine, but that's not
going to be easy.

-- 
User:       I'm having problems with my text editor.
Help desk:  Which editor are you using?
User:       I don't know, but it's version VI (pronounced: 6).
Help desk:  Oh, then you should upgrade to version VIM (pronounced: 994).

 /// Bram Moolenaar -- [email protected] -- http://www.Moolenaar.net   \\\
///        sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\  an exciting new programming language -- http://www.Zimbu.org        ///
 \\\            help me help AIDS victims -- http://ICCF-Holland.org    ///

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Raspunde prin e-mail lui