Dzahn added a comment. let me paste my entire gerrit comment from back then here:
-- so, we are removing x01-x08 x0b,x0c and x0f-x1f, which means the entire control character range [1], besides keeping x00, x09, x0a, x0d and x0e. These are C0 and C1 control characters. Let's look at the ones we keep: x00 = NUL [3] x09 = (horizontal) tab (\t) x0a = line feed (\n) x0d = carriage return (\r) x0e = shift out (to a differente char set) [4] I'm not sure about NUL and 'shift out' but \t, \n and \r certainly make sense. Wikipedia says on [5] that "Most of these characters play no explicit role in Unicode text handling. The characters U+0000 <control-0000>, U+0009 <control-0009> (HT), U+000A <control-000A> (LF), U+000D <control-000D> (CR), and U+0085 <control-0085> (CR+LF) are commonly used in text processing as formatting characters." [1] http://www.utf8-chartable.de/unicode-utf8-table.pl?utf8=string-literal [2] https://en.wikipedia.org/wiki/C0_and_C1_control_codes [3] https://en.wikipedia.org/wiki/Null_character [4] https://en.wikipedia.org/wiki/Shift_Out_and_Shift_In_characters [5] https://en.wikipedia.org/wiki/Unicode_control_characters#ISO_6429_control_characters_.28C0_and_C1.29 -- TASK DETAIL https://phabricator.wikimedia.org/T815 REPLY HANDLER ACTIONS Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign <username>. To: Dzahn Cc: wikibugs-l, chasemp, Dzahn, QChris, Aklapper, Qgil _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
