Ozaki Kiichi wrote:

> I have some questions about vim_iswordc() and vim_iswordp().
> 
> [1] I recognize that we can use unicode character as an argument of
> vim_iswordc (and vim_iswordc_buf).
> If so, I think that the following must be satisfied; Is this correct?
> 
> premise:
> 
> * 'encoding' is utf8
> * "c" is int, unicode character (U+NNNNNNNN)
> * "p" is (char_u *), utf8 sequence of "c"
> 
> (e.g. c == 0x100 <=> p == "\xC4\x80")
> 
> then: vim_iswordc(c) == vim_iswordp(p) is TRUE
> 
> 
> [2] If [1] is correct, it is a problem that there is a character such that
> vim_iswordc(c) != vim_iswordp(p) with specific 'iskeyword'.
> 
> when c == 0xA0, p == "\xC2\xA0", and 'iskeyword' includes 160
> 
> (e.g. iskeyword=@,48-57,_,128-167,224-235 (default on windows))
> 
> then: vim_iswordc(c) == 1, but vim_iswordp(p) == 0.
> 
> bacause: 
> 
> In vim_iswordc_buf();
> 
> https://github.com/vim/vim/blob/342156637/src/charset.c#L911
> 
> c == 0xA0 < 0x100
> 
> https://github.com/vim/vim/blob/342156637/src/charset.c#L911
> 
> c > 0 and c < 0x100, and c is in buf->b_chartab
> (GET_CHARTAB(buf, c) != 0).
> 
> Thus, vim_iswordc_buf(c, buf) is TRUE.
> 
> In vim_iswordp_buf();
> 
> https://github.com/vim/vim/blob/342156637/src/charset.c#L921
> 
> has_mbyte is TRUE and MB_BYTE2LEN(*p) == 2. (*p == 0xC2)
> 
> https://github.com/vim/vim/blob/342156637/src/charset.c#L922
> https://github.com/vim/vim/blob/342156637/src/mbyte.c#L898
> 
> calls mb_get_class(p) -> utf_class(utf_ptr2char(p)).
> utf_ptr2char(p) == 0xA0, is equal to c, so utf_class(0xA0).
> 
> https://github.com/vim/vim/blob/342156637/src/mbyte.c#L2781
> 
> c < 0x100 and c == 0xa0, then return 0.
> 
> Thus, vim_iswordp_buf(p, buf) is FALSE.
> 
> 
> [3] This is not mush to do with above;
> 
> It appears that
> * vim_iswordp_buf() should use mb_get_class_buf()
> * vim_iswordp() should return vim_iswordp_buf(p, curbuf)
> 
> https://gist.github.com/ichizok/1be6efa8364777cf167003e2c5676d8f

Thanks for looking into this.  The patch seems to be missing a change to
mbyte.c.  mb_get_class_buf() needs to be adjusted to use
vim_iswordc_buf() when the character is below 256.

We also should have a test for this.


-- 
% cat /usr/include/life.h
void life(void);

 /// Bram Moolenaar -- [email protected] -- http://www.Moolenaar.net   \\\
///        sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\  an exciting new programming language -- http://www.Zimbu.org        ///
 \\\            help me help AIDS victims -- http://ICCF-Holland.org    ///

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Raspunde prin e-mail lui