On 30/03/14 09:03, Nikolay Pavlov wrote:


On Mar 30, 2014 3:35 AM, "Dmitry Frank" <[email protected] <mailto:[email protected]>> wrote:
>
> Hello all.
>
> match() function returns index of first match, but if there are multi-byte chars before first match, then each multi-byte chars is interpreted as several chars, so, index becomes wrong.
>
> Say, match("foobar", "bar") returns 3, which is correct. But match("яfoobar", "bar") returns 5, which is wrong (should be 4)

This is completely correct. What are you going to do with 4? "яfoobar"[4] is "o" (specifically, second one).


This is only marginally correct, even according to my documentation (7.3.475) which *starts* by talking about characters and *ends* by talking about bytes,
even when referring to the same notions. stridx(), strpart(), and most other
functions start from the outset by talking about bytes with no mention of
characters. At minimum, the OP was probably mislead by the match()'s description.


> But we surely need to make match() work as expected when &encoding is "utf-8" too.

>

Also col(), string indexing /\%Nc and so on? Not going to happen, this is incompatible change.


This kind of flat-refusal mentality gets nobody anywhere.

You can't go touting ViM around as a multilingual editor and fill it with lots of features and settings that handle multi-byte encodings and ISO-10646 support if this kind of English-only support prevails in the script language and prevents you from
processing what the user has input in the first place.

There are so many easy real-life examples I could cherry-pick as to why the OPs
thinking is correct it isn't funny.

For example, say in Japanese (the input language I use) I'm processing buffer lines or user input where the first 20 characters are not useful. So you think I can go and
just do this?

    match(szUserInput, szSearchString, 20)

In 8-byte *legacy* encodings, maybe. But in UTF-8? You must be kidding! Here's what
I have as my input:

    "????? ???? ?     ???2????????????????????",

I am looking for "??" in the right hand portion (character 33). Just how on earth do I specify the position *in bytes*, as match() expects, of the 20th *character*? By having to force me, the user, to *binary dump* every string I want to use to extract the byte index? What about if that position has to be calculated dynamically based on previous user/file input (this is typically necessary as even whitespace can vary in width in Japanese, meaning an isspace()-like whitespace test succeeds but the number
of bytes occupied varies).

Incidentally, in the above example, character 20 is the first character of "??", the word after the larger whitespace portion in the middle. However, *byte* 20 is
the "?" of the 3rd word "???". Thus, the ViM script:

    szLine = "????? ???? ?     ???2?????????? ??????????"
    szSearch = input(...)
    ...
    match(szLine, szInput, 20)

comes back with 24 (byte 24). At minimum, I want it to come back with 79 (the byte index of what I'm looking for) except that there was no easy way to dynamically compute 40, the byte position of where the search actually needs to start from.

This basic lack of support in the script language for multi-lingual features needs to be addressed, either through new functions or through fixing of the existing ones so they match the behaviour that the user expects when modifying *related* settings
like encoding, fileencoding, etc.




>
> --
> Regards,
> Dmitry
>
> --
> --
> You received this message from the "vim_dev" maillist.
> Do not top-post! Type your reply below the text you are replying to.
> For more information, visit http://www.vim.org/maillist.php
>
> ---
> You received this message because you are subscribed to the Google Groups "vim_dev" group. > To unsubscribe from this group and stop receiving emails from it, send an email to [email protected] <mailto:vim_dev%[email protected]>.
> For more options, visit https://groups.google.com/d/optout.

--
--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected] <mailto:[email protected]>.
For more options, visit https://groups.google.com/d/optout.

--
--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- You received this message because you are subscribed to the Google Groups "vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Raspunde prin e-mail lui