Re: [vim/vim] Search for Russian letter range `[а-яА-Я ]` misses the letters `ё` and `Ё` (#1751)

2017-11-14 Fir de Conversatie Marvin Renich
* Marvin Renich  [171114 14:57]:
> It suggests using [[:lower:][:upper:]] to do something close to what you
> want (it will also find non-Russian letters).  The help does not mention
> any character class that includes exactly Russian letters, so the best
> you are going to be able to do is [А-яЁё].

If encoding is cp1251, [[:alpha:]] might work (i.e. find ASCII and
Russian letters, including Ё and ё), since that is an 8-bit encoding,
but I haven't tried it.

...Marvin

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [vim/vim] Search for Russian letter range `[а-яА-Я ]` misses the letters `ё` and `Ё` (#1751)

2017-11-14 Fir de Conversatie Marvin Renich
* sergeevabc  [171114 08:15]:
> @10110111, stumbled upon your comment accidentally and decided to test on my 
> end.
> ```
> $ set LC_ALL=ru_RU.utf8
> 
> $ grep --version
> grep (GNU grep) 3.0
> 
> $ echo Ёжик под зелёной ёлкой. | grep --color "[а-яА-Я ]"
> Ёжик под зелёной ёлкой.
> ^   ^^^
> ```
> Ё, ё and . are not painted red.

In vim patterns, [a-z] is a character range, not a character class.  It
specifically searches for characters whose code values are within the
range.  Ё and ё are outside the range [а-яА-Я ] for both cp1251 and
utf-8.

If you read at :help /collections and go down to the discussion of
character classes, you will notice that it has character classes for
[:alpha:], [:lower:], and [:upper:], among others.  It also says

  These items only work for 8-bit characters, except [:lower:] and
  [:upper:] also work for multi-byte characters when using the new
  regexp engine.

It suggests using [[:lower:][:upper:]] to do something close to what you
want (it will also find non-Russian letters).  The help does not mention
any character class that includes exactly Russian letters, so the best
you are going to be able to do is [А-яЁё].

Vim's regexp engine is working as defined; the fact that Unicode and
cp1251 do not have all the Russian alphabetic characters in a single
range is the issue.  You could request that a character class be added
to do what you want; if you can also provide a patch, that would
significantly increase the chance that the feature would be added.

...Marvin

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [vim/vim] Search for Russian letter range `[а-яА-Я ]` misses the letters `ё` and `Ё` (#1751)

2017-11-14 Fir de Conversatie Tony Mechelynck
This (not catching Ёё with [А-Яа-я]) is expected when $LC_COLLATE is a
locale with no knowledge of Cyrillic alphabetization, for instance C:

Ё U+0401 CYRILLIC CAPITAL LETTER IO
А U+0410 CYRILLIC CAPITAL LETTER A
Я U+042F CYRILLIC CAPITAL LETTER YA
а U+0430 CYRILLIC SMALL LETTER A
я U+044F CYRILLIC SMALL LETTER YA
ё U+0451 CYRILLIC SMALL LETTER IO

As you can see, Cyrillic Ё and ё are outside the range [А-Яа-я]. This
is why, under ":help /[], the pattern [А-яЁё] is mentioned to catch
all (Russian) Cyrillic letters.

OTOH, with $LC_COLLATE set to some Cyrillic locale (and assuming Vim
takes it into consideration, about which I'm not sure), Ё and ё sort
together with Е and е, between Дд and Жж, so they would be included.

Best regards,
Tony.

On Tue, Nov 14, 2017 at 2:15 PM, sergeevabc  wrote:
> @10110111, stumbled upon your comment accidentally and decided to test on my
> end.
>
> $ set LC_ALL=ru_RU.utf8
>
> $ grep --version
> grep (GNU grep) 3.0
>
> $ echo Ёжик под зелёной ёлкой. | grep --color "[а-яА-Я ]"
> Ёжик под зелёной ёлкой.
> ^   ^^^
>
> Ё, ё and . are not painted red.
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
>
> --
> --
> You received this message from the "vim_dev" maillist.
> Do not top-post! Type your reply below the text you are replying to.
> For more information, visit http://www.vim.org/maillist.php
>
> ---
> You received this message because you are subscribed to the Google Groups
> "vim_dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to vim_dev+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [vim/vim] Search for Russian letter range `[а-яА-Я ]` misses the letters `ё` and `Ё` (#1751)

2017-06-07 Fir de Conversatie Nikolay Aleksandrovich Pavlov
2017-06-07 16:47 GMT+03:00 Ruslan Kabatsayev :
> Not correct, not semantically. What Vim does is deliberately ignoring
> LC_COLLATE while grep acts according to this category which puts “ё”
> between “е” and “ж” like in the Russian alphabet.
>
> I'm not really sure that LC_COLLATE or even any LC_* influences this. Even
> if I unset all LC_* variables and set LC_ALL=en_US.UTF-8, I still get the
> expected behavior of grep and sed.

You need to set LC_COLLATE to some locale that uses cyrillic letters
on its own, *but has different alphabet* (specifically the one which
does not have “ё” between “а” and “я”). I do not know how libc authors
determine defaults for languages foreign to locale, but most likely
they just took DUCET (Default Unicode Collation Element Table) and
only altered needed locales, this is the intended DUCET usage after
all: (from http://unicode.org/reports/tr10/)

> Instead, the goal of DUCET is to provide a reasonable default ordering for 
> all scripts that are not tailored. Any characters used in the language of 
> primary interest for collation are expected to be tailored to meet all the 
> appropriate linguistic requirements for that language. For example, for a 
> user interested primarily in the Malayalam language, DUCET would be tailored 
> to get all details correct for the expected Malayalam collation order, while 
> leaving other characters (Greek, Cyrillic, Han, and so forth) in the default 
> order, because the order of those other characters is not of primary concern. 
> Conversely, a user interested primarily in the Greek language would use a 
> Greek-specific tailoring, while leaving the Malayalam (and other) characters 
> in their default order in the table.

>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
>
> --
> --
> You received this message from the "vim_dev" maillist.
> Do not top-post! Type your reply below the text you are replying to.
> For more information, visit http://www.vim.org/maillist.php
>
> ---
> You received this message because you are subscribed to the Google Groups
> "vim_dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to vim_dev+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [vim/vim] Search for Russian letter range `[а-яА-Я ]` misses the letters `ё` and `Ё` (#1751)

2017-06-07 Fir de Conversatie Nikolay Aleksandrovich Pavlov
2017-06-07 14:17 GMT+03:00 Christian Brabandt :
> Hm, ё is 'ё' U+0451 Dec:1105 CYRILLIC SMALL LETTER IO (io) , while я
> is
> 'я' U+044F Dec:1103 CYRILLIC SMALL LETTER YA (ja) .
> Also Ё is 'Ё' U+0401 Dec:1025 CYRILLIC CAPITAL LETTER IO (IO) , while
> А is
> 'А' U+0410 Dec:1040 CYRILLIC CAPITAL LETTER A (A=) .
>
> So both letters are clearly out of your range. I would say Vim is correct
> here.

Not correct, not semantically. What Vim does is deliberately ignoring
LC_COLLATE while grep acts according to this category which puts “ё”
between “е” and “ж” like in the Russian alphabet.

>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
>
> --
> --
> You received this message from the "vim_dev" maillist.
> Do not top-post! Type your reply below the text you are replying to.
> For more information, visit http://www.vim.org/maillist.php
>
> ---
> You received this message because you are subscribed to the Google Groups
> "vim_dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to vim_dev+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to vim_dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.