Re: Dealing with Unencodeable Characters

Ken Whistler Thu, 06 Oct 2016 11:33:04 -0700


On 10/6/2016 7:54 AM, Charlotte Buff wrote:

If theoretically I wanted to convert an old Shift JIS documentcontaining emoji to Unicode, how should I ideally handle Shibuya 109?

And the general answer to that is convert to U+FFFD, unless you aredoing something specific and know what you are doing. ... in which caseyou can use PUA or insert an image, or whatever else you need to do.

This is not a character *standardization* issue that requires the UTC tocome up with a generic interchange solution for every pre-Unicodecharacter encoding of everything that ever was, whether it be someoddball Shift JIS extensions that were omitted in the consensus onencoding the Japanese Carrier Emoji:


http://www.unicode.org/reports/tr51/tr51-7.html#Japanese_Carrier

or other odds and ends from bizarre, dead-end, disused characterencodings from a previous generation.

By the way, the biggest ongoing problem we deal with here is thecontinuing urge to proliferate font-encoded hacks for particularlanguages and writing systems. The text interchange problems that suchschemes pose on an ongoing basis far far outweigh issues like what to dowith a Shibuya 109 emoji, imo.


--Ken

Re: Dealing with Unencodeable Characters

Reply via email to