Andre Sihera <[email protected]> wrote:

> On 01/11/15 00:01, mattn wrote:
>>>
>>> Thanks.  Is there any corner case where we would need a few more bytes
>>> than MAXPATHL?
>>
>> In utf-8, max bytes of letter should be 4. So MAXPATHL * 4.
>>
> No, the maximum length of a UTF-8 character is 6 bytes, as that is the
> maximum required to encode all characters in ISO10646. The currently
> defined character space only uses 4 bytes but new characters are always
> being added.
>
> Note that ISO10646 is *not* a linear space. New characters can be
> added anywhere in the space, including the very last character at the
> top end (0xFFFFFFFF).
>
> We don't want to be changing this every time new characters are added
> to the ISO standard, and its hardly an issue of memory, so just set to the
> maximum from the start.

No, Unicode is limited to U+10FFFF. Yes, UTF-8 could encode more
but it's not allowed. So the maximum allowed sequence size is 4 bytes.
See:

https://en.wikipedia.org/wiki/UTF-8
http://stackoverflow.com/questions/5924105/how-many-characters-can-be-mapped-with-unicode

According to https://en.wikipedia.org/wiki/Unicode, Unicode-8.0 (the latest)
currently defines 120,737 characters. So there is still plenty of available
code points.

Having said that, a file could may contain invalid utf-8 with
sequences longer than 4 bytes.
It should be treated as errors and should not crash Vim.

Dominique

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Raspunde prin e-mail lui