Dzahn added a comment.

let me paste my entire gerrit comment from back then here:

--
so, we are removing x01-x08 x0b,x0c and x0f-x1f, which means the entire control 
character range [1], besides keeping x00, x09, x0a, x0d and x0e. These are C0 
and C1 control characters.

Let's look at the ones we keep:

x00 = NUL [3] x09 = (horizontal) tab (\t) x0a = line feed (\n) x0d = carriage 
return (\r) x0e = shift out (to a differente char set) [4]

I'm not sure about NUL and 'shift out' but \t, \n and \r certainly make sense.

Wikipedia says on [5] that "Most of these characters play no explicit role in 
Unicode text handling. The characters U+0000 <control-0000>, U+0009 
<control-0009> (HT), U+000A <control-000A> (LF), U+000D <control-000D> (CR), 
and U+0085 <control-0085> (CR+LF) are commonly used in text processing as 
formatting characters."

[1] http://www.utf8-chartable.de/unicode-utf8-table.pl?utf8=string-literal [2] 
https://en.wikipedia.org/wiki/C0_and_C1_control_codes [3] 
https://en.wikipedia.org/wiki/Null_character [4] 
https://en.wikipedia.org/wiki/Shift_Out_and_Shift_In_characters [5] 
https://en.wikipedia.org/wiki/Unicode_control_characters#ISO_6429_control_characters_.28C0_and_C1.29
--

TASK DETAIL
  https://phabricator.wikimedia.org/T815

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
<username>.

To: Dzahn
Cc: wikibugs-l, chasemp, Dzahn, QChris, Aklapper, Qgil



_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to