I thought some more about the universal newlines situation, and I think I can handle all the use cases with a single 'newline' parameter. The use cases are:
(A) input use cases: (1) newline=None: input with default universal newlines mode; lines may end in \r, \n, or \r\n, and these are translated to \n. (2) newline='': input with untranslated universal newlines mode; lines may end in \r, \n, or \r\n, and these are returned untranslated. (3) newline='\r', newline='\n', newline='\r\n': input lines must end with the given character(s), and these are translated to \n. (B) output use cases: (1) newline=None: every \n written is translated to os.linesep. (2) newline='': no translation takes place. (3) newline='\r', newline='\n', newline='\r\n': every \n written is translated to the value of newline. Note that cases (2) are new, and case (3) changes from the current PEP and/or from the current implementation (which seems to deviate from the PEP). Also note that it doesn't matter whether .readline(), .read() or .read(N) is used. The PEP is currently unclear on this and the implementation is wrong. Proposed language for the PEP: ``.__init__(self, buffer, encoding=None, newline=None)`` ``buffer`` is a reference to the ``BufferedIOBase`` object to be wrapped with the ``TextIOWrapper``. ``encoding`` refers to an encoding to be used for translating between the byte-representation and character-representation. If it is ``None``, then the system's locale setting will be used as the default. ``newline`` can be ``None``, ``''``, ``'\n'``, ``'\r'``, or ``'\r\n'``; all other values are illegal. It controls the handling of line endings. It works as follows: * On input, if ``newline`` is ``None``, universal newlines mode is enabled. Lines in the input can end in ``'\n'``, ``'\r'``, or ``'\r\n'``, and these are translated into ``'\n'`` before being returned to the caller. If it is ``''``, universal newline mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller translated to ``'\n'``. * On output, if ``newline`` is ``None``, any ``'\n'`` characters written are translated to the system default line separator, ``os.linesep``. If ``newline`` is ``''``, no translation takes place. If ``newline`` is any of the other legal values, any ``'\n'`` characters written are translated to the given string. Further notes on the ``newline`` parameter: * ``'\r'`` support is still needed for some OSX applications that produce files using ``'\r'`` line endings; Excel (when exporting to text) and Adobe Illustrator EPS files are the most common examples. * If translation is enabled, it happens regardless of which method is called for reading or writing. For example, {{{f.read()}}} will always produce the same result as {{{''.join(f.readlines())}}}. * If universal newlines without translation are requested on input (i.e. ``newline=''``), if a system read operation returns a buffer ending in ``'\r'``, another system read operation is done to determine whether it is followed by ``'\n'`` or not. In universal newlines mode with translation, the second system read operation may be postponed until the next read request, and if the following system read operation returns a buffer starting with ``'\n'``, that character is simply discarded. -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com