> I do think we should use some kind of Unicode-standard-endorsed > definition of "printable" (as long as it excludes all ASCII escapes),
I think unicodedata.category(c)[0] != "C" is fairly close. That excludes control characters (Cc), format characters (Cf), surrogates (Cs), private-use (Co) and unassigned characters (Cn). We should then also escape \, ' and ", following the traditional algorithm. Printable then would be all letters, numbers, punctuation, symbols, but also marks (e.g. TILDE, COMBINING RIGHT HARPOON ABOVE) and separators (SPACE, NO-BREAK SPACE, THREE-PER-EM SPACE, LINE SEPARATOR, PARAGRAPH SEPARATOR). It might be reasonable to also exclude line separators (Zl) and paragraph separators (Zp), each category having only one character in them. Regards, Martin _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com