On 26/09/2007, Dino Viehland <[EMAIL PROTECTED]> wrote: > My understanding is that users can write code that uses only \n and Python > will write the > end-of-line character(s) that are appropriate for the platform when writing > to a file. That's > what I meant by uses \n for everything internally.
OK, so far so good - although I'm not *quite* sure there's a self-consistent definition of "code that only uses \n". I'll assume you mean code that has a concept of lines, that lines never contain anything other than text (specifically, neither \r or \n can appear in a line, I'll punt on whether other weird stuff like form feed are legal), and that whenever your code needs to write data to a file, it writes lines with \n alone between them. > But if you write \r\n to a file Python completely ignores the presence of the > \r and > transforms the \n into a \r\n anyway, hence the \r\r in the resulting stream. > My last > question is simply does anyone find writing \r\r\n when the original string > contained \r\n a > useful behavior - personally I don't see how it is. In the above model, lines can't contain \r and between lines you only ever write \n - so where did the \r\n come from? If you receive what you think are lines from an outside source, and they contain \r, then you didn't sanity check your data. If you receive a block of raw (effectively binary!) data which you want to translate into your model, it's up to you how you cut it up into lines. If you read data using one of Python's text modes, it's up to you to understand how it works. > But Guido's response makes this sound like it's a problem w/ VC++ stdio > implementation > and not something that Python is explicitly doing. I'm not sure it's a CRT issue. Certainly the \r\n vs \n confusion comes from the CRT - the underlying OS (just like Unix!!!!) only deals in files as streams of bytes. But ultimately, "lines" are an abstraction in your code. All the CRT (and Python) do is help (or maybe hinder) you with the "normal" cases. > Anyway, it'd might be useful to have a text-mode file that you can write > \r\n to and only > get \r\n in the resulting file. I can't comment on that, other than to say that if you better defined the semantic model (lines, how things are encoded/decoded to files, etc, somewhat like I tried to above) it would be more obvious what use case this was trying to address. > But if the general sentiment is s.replace('\r', '') is the way to go we can > advice our users > of the behavior when interoperating w/ APIs that return \r\n in strings. I'd say users of the relevant APIs need to understand how the APIs represent "lines", so that they can convert the received data to their program's model of lines. Of course, that probably corresponds to something like s.replace('\r','') or likely more correctly data_lines = s.split('\r\n'). A "rule of thumb" that doesn't make it clear that the concept of "line" has 2 different binary representations in 2 different areas (data back from APIs vs data from files) is likely to ultimately lead to mistakes and confusion. If you think this is bad, wait until you have to deal with Unicode issues like what *encoding* the data is being supplied to you in. Makes guessing newline conventions seem simple (at least to this parochial English-speaker :-)) Although as this is IronPython, you may already have that covered... Paul. PS In real life, you often just want a cheap and cheerful answer. For that, "strip out spurious \r characters" may be fine. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com