On Sun, Mar 26, 2017 at 5:58 PM, Chris Angelico <ros...@gmail.com> wrote:
>> The Windows console can render any character in the BMP, but it
>> requires configuring font linking for fallback fonts. It's Windows, so
>> of course the supported UTF format is UTF-16. The console's UTF-8
>> support (codepage 65001) is too buggy to even consider using it.
>
> Is it actually UTF-16, or is it UCS-2?

Pedantically speaking it's UCS-2. Console buffers aren't necessarily
valid UTF-16, i.e. they can have lone surrogate codes or invalid
surrogate pairs. The way a surrogate code gets rendered depends on the
font. It could be an empty box, a box containing a question mark, or
simply empty space. That applies even if it's a valid UTF-16 surrogate
pair, so the console can't display non-BMP characters such as emojis.
They can be copied to the clipboard and displayed in another program.

Windows file systems are also UCS-2. For the most part it's not an
issue since the source of text and filenames will be valid UTF-16.
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to