Rauli Ruohonen writes: > On 6/3/07, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote: > > Sure - but how can Python tell whether a non-normalized string was > > intentionally put into the source, or as a side effect of the editor > > modifying it? > > It can't, but does it really need to? It could always assume the latter.
No, it can't. One might want to write Python code that implements normalization algorithms, for example, and there will be "binary strings". Only in the context of Unicode text are you allowed to do those things. This would require Python to internally distinguish between Unicode text files and other files. [example of a dictionary application using Unicode strings] > Now if these are written by two different people using different > editors, one might be normalized in a different way than the other, > and the code would look all right but mysteriously fail to work. It seems to me that once we have a proper separation between bytes objects and unicode objects, that the latter should always be compared internally to the dictionary using the kinds of techniques described in UTS#10 and UTR#30. External normalization is not the right way to handle this issue. > But a partial solution is better than no solution. Not if it leads to unexpected failures that are hard to diagnose, especially in the face of human belief that this problem has been "solved". > The line ending there is '\r\n', and Python normalizes it when > reading in the source code, even though '\r\n' matters even less > than doing NFC normalization. That's not a Python language normalization; that's an artifact of the line-reading function. It's deliberate, of course, but it's not really character-level, it's a line-level transformation. If I start up an interpreter and type >>> a = """^V^M^V^J""" >>> repr(a) "'\\r\\n'" (On my Mac, on other systems the quoting character for key entry of control characters is probably different.) _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com