Re: word segmentation in Vim

Tony Mechelynck Tue, 20 Jan 2009 17:31:16 -0800

On 20/01/09 17:36, Xie wrote:
> hi everybody
>
> Vim is being used around the world, in many different languages. As
> the help indicated, a "word" in Vim is defined as "a sequence of
> letters, digits and underscores ... bla bla bla ...". But that's the
> word for alphabetic languages. Has Vim considered expanding this
> concept to more complex multi-byte languages such as Chinese, Japanese
> or Korean and use some word segmentation algorithm accordingly for
> "w"/"b" etc ?
>
>
> --
> Xie


Well, not only "this can be changed" (for single-byte characters) "by 
the 'iskeyword' option", but also (for multibyte characters) Vim "knows" 
that most characters are "word characters", but that some (such as 
U+3000 IDEOGRAPHIC SPACE, U+3001 IDEOGRAPHIC COMMA, U+3002 IDEOGRAPHIC 
FULL STOP etc.) are non-word characters.

What Vim does _not_ do AFAIK is regard every CJK character as a separate 
"word". If you want that, you should use the commands for "character 
under cursor" etc. rather than "word under cursor" etc.


Best regards,
Tony.
-- 
A lack of leadership is no substitute for inaction.

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_dev" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Re: word segmentation in Vim

Raspunde prin e-mail lui