Matt Wozniski wrote:
> Bram Moolenaar wrote: > > > > Tony Mechelynck wrote: > > > >> Vim is now capable of displaying any Unicode codepoint for which the > >> installed 'guifont' has a glyph, even outside the BMP (i.e., even above > >> U+FFFF), but there's no easy way to represent those "high" codepoints by > >> Unicode value in strings: I mean, "\uxxxx" and \Uxxxx" still accept no > >> more than four hex digits. > >> > >> I propose to keep "\uxxxx" at its present meaning, but extend > >> "\Uxxxxxxxx" to allow additional hex digits (either up to a total of 8 > >> hex digits, in line with ^VUxxxxxxxx as opposed to ^Vuxxxx in Insert > >> mode, or at least up to the value \U10FFFF, above which the Unicode > >> Consortium has decided that "there never shall be a valid Unicode > >> codepoint at any future time". > > > > It does cause problems for something like "\U12345" which would now be > > the character 0x1234 followed by the character 5. =C2=A0After the change = > it > > would become one character 0x12345. > > > > I don't see a convenient alternative though. =C2=A0Anyone? > > Well, I don't know about *convenient*, but one option would be to > continue allowing \u to use 1-to-4 hex digits, and require that \U use > exactly 8 (or exactly 6, if we only support up to \U10FFFF) hex > digits. On the one hand, it will break just about every existing > place where someone used \U instead of \u. On the other hand, the fix > is trivial, and it gives an actual reason for supporting both \u and > \U. I think it's better than the alternative you propose, since > changing the definition from "1-to-4 hex digits" to "1-to-8 hex > digits" will cause things to fail in non-obvious ways, and changing > the defiintion to "exactly 8 hex digits" should usually cause a more > obvious failure that we could assign a helpful error number to. Requiring exactly 8 hex digits helps for the incompatibility. However, most Unicode characters are only 6 digits, so one needs to type two more. And it's easy to type the wrong number of digits with such a long sequence.. The other suggestion about Perl give me this idea: "\x(123456)". This has two advantages: 1. It's backwards compatible. 2. Avoids accidentally typing the wrong number of hex digits. 3. Allows typing a hex digit next as a separate character. Eh, _three_ advantages. I think perl uses "\x{123456}", but () is easier to type than {}, especially on some keyboards. Don't see a reason to use {}. -- Not too long ago, unzipping in public was illegal... /// Bram Moolenaar -- b...@moolenaar.net -- http://www.Moolenaar.net \\\ /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\ \\\ download, build and distribute -- http://www.A-A-P.org /// \\\ help me help AIDS victims -- http://ICCF-Holland.org /// --~--~---------~--~----~------------~-------~--~----~ You received this message from the "vim_dev" maillist. For more information, visit http://www.vim.org/maillist.php -~----------~----~----~----~------~----~------~--~---