Andre Sihera <[email protected]> wrote: > On 01/11/15 00:01, mattn wrote: >>> >>> Thanks. Is there any corner case where we would need a few more bytes >>> than MAXPATHL? >> >> In utf-8, max bytes of letter should be 4. So MAXPATHL * 4. >> > No, the maximum length of a UTF-8 character is 6 bytes, as that is the > maximum required to encode all characters in ISO10646. The currently > defined character space only uses 4 bytes but new characters are always > being added. > > Note that ISO10646 is *not* a linear space. New characters can be > added anywhere in the space, including the very last character at the > top end (0xFFFFFFFF). > > We don't want to be changing this every time new characters are added > to the ISO standard, and its hardly an issue of memory, so just set to the > maximum from the start.
No, Unicode is limited to U+10FFFF. Yes, UTF-8 could encode more but it's not allowed. So the maximum allowed sequence size is 4 bytes. See: https://en.wikipedia.org/wiki/UTF-8 http://stackoverflow.com/questions/5924105/how-many-characters-can-be-mapped-with-unicode According to https://en.wikipedia.org/wiki/Unicode, Unicode-8.0 (the latest) currently defines 120,737 characters. So there is still plenty of available code points. Having said that, a file could may contain invalid utf-8 with sequences longer than 4 bytes. It should be treated as errors and should not crash Vim. Dominique -- -- You received this message from the "vim_dev" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
