Greg Ewing writes: > But an FF or VT is not *just* a line break, it can > have other semantics attatched to it as well. So > treating it just the same as a \n by default would be > wrong, I think.
*Python* does the right thing: it leaves the line break character(s) in place. It's not Python's problem if programmers go around stripping characters just because they happen to be at the end of the line. If you do care, you're already in trouble if you strip willy-nilly: >>> len("a\014\n") 3 >>> len("a\014\n".strip()) 1 >>> len("a\014\n".strip() + "\n") 2 >>> "a\r\n"[:-1] "a\r" I think the odds are really good that there are already more people who will expect Python to be Unicode-ly correct than who have already-defined semantics for FF or VT that just happen to work right if you strip the terminating LF but not a terminating FF. The remaining issue, embedding those characters in the interior of lines but considering them not line breaks, is considered by the Unicode technical committee a non-issue. Those characters are mandatory breaks because the expectation is *very* consistent (they say). I gather you think it's reasonable, too, you just worry that the additional semantics may get lost with current newline-stripping heuristics. As far as existing programs that will go postal if you hand them a line that's terminated with FF or VT, I don't see any conceptual problem with a codec (universal newline) that on input of "a\014" returns "a\014\n". Getting the details right (ie, respecting POLA) will require some thought and maybe some fiddly options, but it will work. Always-do-right-it-will-gratify-some-people-and-astonish-the-rest-ly y'rs _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com