Hi Chris, On 20 November 2018 03:39:02 GMT+05:30, Chris Koerner <[email protected]> wrote: > >== Did you know? ==
Thanks for the informative did you know section. It was an interesting read. :-) >* Letters are encoded internally by computers as numbers—for example, >“A” is 65 and “a” is 97.[3] Years ago, programs and even websites >would use different encodings[4] to represent text, often leading to >unreadable gibberish on screen. Unicode[5] was intended to be a single >encoding for most of the world’s writing systems. The most-used parts >of it fit into a 16-bit representation,[6] which can handle about 65 >thousand characters. But that's not enough for the very large number >of rare and historical Chinese, Japanese, and Korean (CJK) characters, >which are represented in 16-bit Unicode using “surrogate pairs”.[7] >1,024 Unicode characters are set aside to be “high surrogates”—the >first half of a 32-bit character—and 1,024 characters are set aside to >be “low surrogates”—the second half. By themselves, the surrogates >aren’t valid and don’t represent anything, but in pairs they can >represent over a million additional characters. Since these characters >are usually rare, software can sometimes treat them incorrectly split >them up, which can result in you seeing the Unicode replacement >character �,[8] which is used when something has gone wrong processing >Unicode text. (When the character is fine, but you don’t have a font >to show it, you sometimes get little squares instead. Since the most >common source of these squares for English speakers is unrepresented >CJK characters, a slang term for the squares is “tofu”.[9]) > >[0] https://phabricator.wikimedia.org/T168427 >[1] https://phabricator.wikimedia.org/T209293 >[2] https://phabricator.wikimedia.org/T209156 >[3] https://en.wikipedia.org/wiki/ASCII#Printable_characters >[4] >https://en.wikipedia.org/wiki/Character_encoding#Common_character_encodings >[5] https://en.wikipedia.org/wiki/Unicode >[6] https://en.wikipedia.org/wiki/UTF-16 >[7] >https://en.wikipedia.org/wiki/Universal_Character_Set_characters#Surrogates >[8] >https://en.wikipedia.org/wiki/Specials_(Unicode_block)#Replacement_character >[9] https://en.wiktionary.org/wiki/tofu#Noun > -- Sivaraam Sent from my Android device with K-9 Mail. Please excuse my brevity. _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
