On 30/03/14 09:03, Nikolay Pavlov wrote:
On Mar 30, 2014 3:35 AM, "Dmitry Frank" <[email protected]
<mailto:[email protected]>> wrote:
>
> Hello all.
>
> match() function returns index of first match, but if there are
multi-byte chars before first match, then each multi-byte chars is
interpreted as several chars, so, index becomes wrong.
>
> Say, match("foobar", "bar") returns 3, which is correct. But
match("яfoobar", "bar") returns 5, which is wrong (should be 4)
This is completely correct. What are you going to do with 4?
"яfoobar"[4] is "o" (specifically, second one).
This is only marginally correct, even according to my documentation
(7.3.475)
which *starts* by talking about characters and *ends* by talking about
bytes,
even when referring to the same notions. stridx(), strpart(), and most other
functions start from the outset by talking about bytes with no mention of
characters. At minimum, the OP was probably mislead by the match()'s
description.
> But we surely need to make match() work as expected when &encoding
is "utf-8" too.
>
Also col(), string indexing /\%Nc and so on? Not going to happen, this
is incompatible change.
This kind of flat-refusal mentality gets nobody anywhere.
You can't go touting ViM around as a multilingual editor and fill it
with lots of
features and settings that handle multi-byte encodings and ISO-10646
support if this
kind of English-only support prevails in the script language and
prevents you from
processing what the user has input in the first place.
There are so many easy real-life examples I could cherry-pick as to why
the OPs
thinking is correct it isn't funny.
For example, say in Japanese (the input language I use) I'm processing
buffer lines
or user input where the first 20 characters are not useful. So you think
I can go and
just do this?
match(szUserInput, szSearchString, 20)
In 8-byte *legacy* encodings, maybe. But in UTF-8? You must be kidding!
Here's what
I have as my input:
"????? ???? ? ???2????????????????????",
I am looking for "??" in the right hand portion (character 33). Just how
on earth
do I specify the position *in bytes*, as match() expects, of the 20th
*character*?
By having to force me, the user, to *binary dump* every string I want to
use to extract
the byte index? What about if that position has to be calculated
dynamically based on
previous user/file input (this is typically necessary as even whitespace
can vary in
width in Japanese, meaning an isspace()-like whitespace test succeeds
but the number
of bytes occupied varies).
Incidentally, in the above example, character 20 is the first character
of "??",
the word after the larger whitespace portion in the middle. However,
*byte* 20 is
the "?" of the 3rd word "???". Thus, the ViM script:
szLine = "????? ???? ? ???2?????????? ??????????"
szSearch = input(...)
...
match(szLine, szInput, 20)
comes back with 24 (byte 24). At minimum, I want it to come back with 79
(the byte
index of what I'm looking for) except that there was no easy way to
dynamically
compute 40, the byte position of where the search actually needs to
start from.
This basic lack of support in the script language for multi-lingual
features needs
to be addressed, either through new functions or through fixing of the
existing ones
so they match the behaviour that the user expects when modifying
*related* settings
like encoding, fileencoding, etc.
>
> --
> Regards,
> Dmitry
>
> --
> --
> You received this message from the "vim_dev" maillist.
> Do not top-post! Type your reply below the text you are replying to.
> For more information, visit http://www.vim.org/maillist.php
>
> ---
> You received this message because you are subscribed to the Google
Groups "vim_dev" group.
> To unsubscribe from this group and stop receiving emails from it,
send an email to [email protected]
<mailto:vim_dev%[email protected]>.
> For more options, visit https://groups.google.com/d/optout.
--
--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
---
You received this message because you are subscribed to the Google
Groups "vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected]
<mailto:[email protected]>.
For more options, visit https://groups.google.com/d/optout.
--
--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
---
You received this message because you are subscribed to the Google Groups "vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.