> I do think we should use some kind of Unicode-standard-endorsed
> definition of "printable" (as long as it excludes all ASCII escapes),

I think

  unicodedata.category(c)[0] != "C"

is fairly close. That excludes control characters (Cc), format
characters (Cf), surrogates (Cs), private-use (Co) and unassigned
characters (Cn). We should then also escape \, ' and ", following
the traditional algorithm.

Printable then would be all letters, numbers, punctuation, symbols,
but also marks (e.g. TILDE, COMBINING RIGHT HARPOON ABOVE) and
separators (SPACE, NO-BREAK SPACE, THREE-PER-EM SPACE, LINE SEPARATOR,
PARAGRAPH SEPARATOR). It might be reasonable to also exclude line
separators (Zl) and paragraph separators (Zp), each category having
only one character in them.

Regards,
Martin
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to