[issue20121] quopri_codec newline handling

2015-01-19 Thread Martin Panter

Martin Panter added the comment:

Here is patch v2, which fixes some more bugs I uncovered in the 
quoted-printable encoders:

* The binascii version would unnecessarily break a 76-character line (maximum 
length) if it would end with an =XX escape code
* The native Python version would insert soft line breaks in the middle of =XX 
escape codes

--
type:  -> behavior
Added file: http://bugs.python.org/file37783/quopri-newline.v2.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20121] quopri_codec newline handling

2015-01-17 Thread Martin Panter

Martin Panter added the comment:

Here is a patch that clarifies in the documentation and test suite how newlines 
work in the “quopri” and “binascii” modules. It also fixes the native Python 
implementation to support CRLFs.

* \n is used by default (e.g. for soft line breaks if the input has no hard 
line breaks)
* CRLF is used instead if found in input (even in non-text mode!)
* Typo errors in documentation
* quopri uses istext=True
* header flag does not affect newline encoding; only istext affects it

One corner case concerns me slightly: binascii.b2a_qp(istext=False) will use \n 
for soft line breaks by default, but will suddenly switch to CRLF if the input 
data happens to contain a CRLF sequence. This is despite the CRLFs from the 
data being encoded and therefore not appearing in the output themselves.

--
assignee:  -> docs@python
components: +Documentation
keywords: +patch
nosy: +docs@python
Added file: http://bugs.python.org/file37756/quopri-newline.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20121] quopri_codec newline handling

2014-12-17 Thread Serhiy Storchaka

Serhiy Storchaka added the comment:

Pure Python implementation returns different result.

>>> import quopri
>>> quopri.encodestring(b'\r\n')
b'\r\n'
>>> quopri.a2b_qp = quopri.b2a_qp = None
>>> quopri.encodestring(b'\r\n')
b'=0D\n'

See also issue18022.

--
nosy: +serhiy.storchaka

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20121] quopri_codec newline handling

2014-12-17 Thread Martin Panter

Martin Panter added the comment:

Okay so maybe the documentation should include these restrictions on encoding:

* The data being encoded should only include \r or \n bytes that are part of \n 
or \r\n newline sequences. Encoding arbitrary non-text data is not supported.
* The two kinds of newlines should not be mixed
* If \n is used for newlines in the input, the encoder will output \n newlines, 
and they will need converting to CRLF in a later step to conform to the RFC

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20121] quopri_codec newline handling

2014-12-17 Thread Marc-Andre Lemburg

Marc-Andre Lemburg added the comment:

I agree with Vajrasky: a patch for the documentation would probably be a good 
idea.

Note that mixing line end conventions in a single text is never a good idea. If 
you stick to one line end convention, there's no problem with the codec, AFAICT.

>>> codecs.encode(b'\r\n\r\n', 'quopri_codec')
b'\r\n\r\n'
>>> codecs.decode(_, 'quopri_codec')
b'\r\n\r\n'
>>> codecs.encode(b'\n\n', 'quopri_codec')
b'\n\n'
>>> codecs.decode(_, 'quopri_codec')
b'\n\n'

--
nosy: +lemburg

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20121] quopri_codec newline handling

2014-12-17 Thread Martin Panter

Martin Panter added the comment:

RFC 1521 says that a text newline should be encoded as CRLF, and that any 
combination of 0x0D and 0x0A bytes that do not represent newlines should be 
encoded like other control characters as =0D and =0A.

Since in Python 3 the codec outputs bytes, I don’t think there is any excuse 
for it to be outputting plain CR or LF bytes. The question is, do they 
represent newlines to be encoded as CRLF, or just data bytes that need ordinary 
encoding.

--
nosy: +vadmium
versions: +Python 3.4

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20121] quopri_codec newline handling

2014-01-05 Thread Vajrasky Kok

Vajrasky Kok added the comment:

The quopri_codec uses binascii.b2a_qp method.

>>> binascii.b2a_qp('\r\n\n\n\n')
'\r\n\r\n\r\n\r\n'

The logic in b2a_qp when dealing with newlines is check whether the first line 
uses \r\n or \n.

If it uses \r\n, then all remaning lines' new lines will be converted to \r\n. 
if it uses \n, then all remaning lines' new lines will be converted to \n.

It has comment on the source code.

/* See if this string is using CRLF line ends */
/* XXX: this function has the side effect of converting all of
 * the end of lines to be the same depending on this detection
 * here */

I am not sure what the appropriate action here. But doc fix should be 
acceptable.

--
nosy: +vajrasky

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20121] quopri_codec newline handling

2014-01-04 Thread R. David Murray

Changes by R. David Murray :


--
nosy: +r.david.murray

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue20121] quopri_codec newline handling

2014-01-04 Thread Fred Stober

New submission from Fred Stober:

While trying to encode some binary data, I encountered this behaviour of the 
quopri_codec:

>>> '\r\n\n'.encode('quopri_codec').decode('quopri_codec')
'\r\n\r\n'
>>> '\n\r\n'.encode('quopri_codec').decode('quopri_codec')
'\n\n'

If this behaviour is really intended, it should be mentioned in the 
documentation that this coded is not bijective.

--
components: Library (Lib)
messages: 207281
nosy: fredstober
priority: normal
severity: normal
status: open
title: quopri_codec newline handling
versions: Python 2.7

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com