On 04/01/09 07:53, pansz wrote:
> Tony Mechelynck 写道:
>> I'm not sure. I suppose that option was defined before Unicode became
>> well-known, maybe even before it existed, when most charsets were of the
>> 8-bit kind except for East-Asian scripts, which required "special" MBCS
>> versions of the OSes anyway (such as MS-DOS 2.25).
>>
>> Once the Unicode standard was published, it included not only mappings
>> of codepoints to glyphs but also quite a lot of metadata about these
>> codepoints (such as wide vs. narrow vs. ambiguous, LTR vs. RTL vs.
>> ambiguous, lower/ upper/ titlecase, punctuation, number systems, etc.).
>> However, Vim versions with -multi_byte must still be supported, and they
>> don't have access to that wealth of meta-information. Also, IIUC it's in
>> the ASCII range that there is most variation between programming
>> languages, operating systems, human languages, etc. concerning which
>> characters may be used in which circumstances.
>
> Human languages of CJK are not in the ASCII range at all and I bet CJK
> have more than 30% of the world population. Vim is for programmers, is
> it _only_ for programmers?

No, but each hanzi (not fullwidth punct) is supposed to be a "word" or 
"word part" of some kind, with punctuation, whitespace and diacritics 
all totally outside the "word" range. "Not" is a word in English, 
regardless of whether it's used alone or in "cannot" or 
"notwithstanding". These two uses sound almost Chinese-like to me... who 
don't really know more than a handful of Chinese words. I suppose that 
if English, like Japanese, used Han-script, "notwithstanding" might be 
written not-against-stay-now with four glyphs? But I'm daydreaming.

>
> The difficulties may be that 'iskeyword' is a whitelist, not a
> blacklist, we cannot easily blacklist a single Unicode character in
> 'iskeyword' without knowing *all* the Unicode characters which matches
> iswalpha().

A more important difficulty is that 'iskeyword' applies only to Unicode 
codepoints U+0000 to U+007F when 'encoding' is UTF-8 (or any Unicode 
value aliased to UTF-8 for internal memory), and to characters 0x00 to 
0xFF when it isn't. Otherwise we might perhaps use ":setlocal isk-=不 
isk-=之" or some such. This would also mean several arrays of 2 gigabits 
rather than 256 bits to remember the settings (Vim treats the Unicode 
range as 0 to 7FFFFFFF. Even if it limited itself to the current 
official maximum of 10FFFD it would still mean a big increase.)

>
> Perhaps the simplest approach is to add an option 'isnkeyword' which
> supports any Unicode character and we can blacklist some Unicode
> characters while still retain the 'iskeyword' option functioning.

Hm. Don't know if Bram would accept that, but you can always try to 
publish (and maintain) an unofficial patch to the C source. Don't know 
how easy (and foolproof) it would be. For a single option, a has() 
feature might be useful but it's less needed than for a whole batch of 
them: we would always be able to test ":if exists('+isnkeyword')".


Best regards,
Tony.
-- 
A truly wise man never plays leapfrog with a unicorn.

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Reply via email to