Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-13 Thread Stephen J. Turnbull
Steven D'Aprano writes: I don't think anyone has ever suggested change for change's sake. If they have, I'd love to read the PEP for it. Not to mention the BDFL's pronouncement message!wink ___ Python-Dev mailing list Python-Dev@python.org

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-12 Thread lutz
From: Barry Warsaw ba...@python.org To: python-dev@python.org Subject: Re: [Python-Dev] Patch making the current email package (mostly) support bytes On Oct 08, 2010, at 03:44 PM, l...@rmi.net wrote: Ultimately, development in the open source world is driven by the very few with time

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-12 Thread Steven D'Aprano
On Wed, 13 Oct 2010 03:01:57 am l...@rmi.net wrote: So my point is just this: Change for change's sake is truly not what most Python users want.  If Python core developers want 3.X to become as popular as 2.X, they should be less concerned with posts on this list or hands at a conference, than

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-08 Thread Barry Warsaw
On Oct 08, 2010, at 12:37 PM, Stephen J. Turnbull wrote: Ouch. RFC 822 line wrapping is a bytes-bytes transformation, and the client shouldn't see it at all unless it inspects the wire format. Header wrapping sucks even more because it's supposed to take the semantic context into account, which

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-08 Thread Stephen J. Turnbull
Barry Warsaw writes: Header wrapping sucks even more because it's supposed to take the semantic context into account, which means that a generic Header wrapping algorithm cannot work for everything. E.g. Received: headers are supposed to wrap after the semicolon. Received headers are an

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-08 Thread lutz
tentative state of 3.X, stability matters. --Mark Lutz (http://learning-python.com, http://rmi.net/~lutz) -Original Message- From: R. David Murray rdmur...@bitdance.com To: l...@rmi.net Subject: Re: [Python-Dev] Patch making the current email package (mostly) support bytes Date

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-08 Thread lutz
.) --Mark Lutz (http://learning-python.com, http://rmi.net/~lutz) -Original Message- From: Stephen J. Turnbull step...@xemacs.org To: l...@rmi.net Subject: Re: [Python-Dev] Patch making the current email package (mostly)support bytes Date: Fri, 08 Oct 2010 14:33:22

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-08 Thread Stephen J. Turnbull
Barry Warsaw writes: On Oct 07, 2010, at 04:40 AM, Stephen J. Turnbull wrote: I'm fairly certain that most of the modern causes of [Unicode errors in Mailman] are post-parse modifications of the message. IOW, in Mailman's architecture, we try to parse the raw data into a Message object

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-08 Thread R. David Murray
On Fri, 08 Oct 2010 15:51:45 -, l...@rmi.net wrote: For my part, one week from now I'll be standing up again in front of a group of 20 Python beginners, and basically apologizing for both the present and ongoing 3.X changes they must conform to in the near future. Python may not be

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-08 Thread R. David Murray
On Sat, 09 Oct 2010 01:06:29 +0900, Stephen J. Turnbull step...@xemacs.org wrote: That mess is entirely unnecessary in Python 3. Text and wire format can be easily distinguished with three different representations of email: Unicode for the conceptual RFC 822 layer (of course this is an

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-08 Thread R. David Murray
On Fri, 08 Oct 2010 15:44:45 -, l...@rmi.net wrote: Thanks for both your reply and work, David. I'm going to have to test my email clients under the 3.2 patch when it gels. It's good to hear that email5 API support remains a goal. I just landed the patch (though without the MIME encoding

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-08 Thread R. David Murray
On Fri, 08 Oct 2010 23:55:37 +0900, Stephen J. Turnbull step...@xemacs.org wrote: I should think you *want* addresses and suchlike structured headers (Content-Type with several RFC 2231 parameters, anyone?) to line up nicely, too. So generic folding algorithms are really only applicable to

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-08 Thread R. David Murray
On Fri, 08 Oct 2010 12:37:38 +0900, Stephen J. Turnbull step...@xemacs.org wrote: *If* you have an 8-bit value of unknown encoding on input, this will appear in the Header's value as a surrogate. Hm, OK, I see the problem ... as usual, it's that the only efficient thing to do is encode using

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-08 Thread Stephen J. Turnbull
R. David Murray writes: On Sat, 09 Oct 2010 01:06:29 +0900, Stephen J. Turnbull step...@xemacs.org wrote: That mess is entirely unnecessary in Python 3. Text and wire format can be easily distinguished with three different representations of email: Unicode for the conceptual RFC 822

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-08 Thread Barry Warsaw
On Oct 08, 2010, at 03:44 PM, l...@rmi.net wrote: Ultimately, development in the open source world is driven by the very few with time to show up, rather than by the very many who depend on it. This can unfortunately lead to the perception of thrashing by end users. Some even come to see the

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-08 Thread R. David Murray
On Sat, 09 Oct 2010 02:48:23 +0900, Stephen J. Turnbull step...@xemacs.org wrote: R. David Murray writes: On Sat, 09 Oct 2010 01:06:29 +0900, Stephen J. Turnbull step...@xemacs.org wrote: That mess is entirely unnecessary in Python 3. Text and wire format can be easily

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-07 Thread R. David Murray
On Thu, 07 Oct 2010 03:31:34 +0900, Stephen J. Turnbull step...@xemacs.org wrote: R. David Murray writes: 5. Return the content, with non-ASCII bytes replaced with ? characters. That hadn't occurred to me (and it makes me sick to contemplate it). That said, this is probably

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-07 Thread R. David Murray
Stephen J. Turnbull stephen at xemacs.org writes: R. David Murray writes: We're (in the current patch) not punting on handling non-conforming email, we're punting on handling non-conforming bytes *if the headers that contain them need to be modified*. The headers can still be

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-07 Thread R. David Murray
On Thu, 07 Oct 2010 15:00:04 +0900, Stephen J. Turnbull step...@xemacs.org wrote: R. David Murray writes: But that's not interesting; you did that with Python 3. We want to Of course I did it with Python3. It's the Python3 email codebase I'm working with (and have to work

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-07 Thread lutz
Stephen J. Turnbull wrote (giving me an opening to jump in here): R. David Murray writes: In other words, my proposed patch only makes email5 1/8 to 1/4 broken, instead of half broken as it is now. But not un-broken enough for Mailman, it sounds like. IMO, not in the long run. But

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-07 Thread R. David Murray
On Thu, 07 Oct 2010 16:03:18 -, l...@rmi.net wrote: I'm forwarding a link to the code of these clients to David by private email in case they might be useful as a test case (O'Reilly has already posted them ahead of the book, but they may be a bit too heavy for use in formal testing).

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-07 Thread Barry Warsaw
On Oct 07, 2010, at 04:40 AM, Stephen J. Turnbull wrote: And the email API currently promises not to raise during parsing, which is a contract my patch does not change. Which is a contract that has historically been broken frequently. Unhandled UnicodeErrors have been one of the most common

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-07 Thread Stephen J. Turnbull
R. David Murray writes: The MIME-charset = UNKNOWN dodge might be a better way of handling this. That is a very interesting idea. It is the *right* thing to do, since it would mean that a message parsed as bytes could be generated via Generator and passed to, say, smtplib without

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-07 Thread Stephen J. Turnbull
l...@rmi.net writes: To put that more strongly, the Python user base is much larger than this list's readership. Agreed. Nevertheless, this is the channel (not channel) that the developers listen on, and substantial effort is made to let Python users know that. I think they do know it,

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-06 Thread Stephen J. Turnbull
R. David Murray writes: version of headers to the email5 API, but since any such data would be non-RFC compliant anyway, [access to non-conforming headers by reparsing the bytes] will just have to be good enough for now. But that's potentially unpleasant for, say, Mailman. AFAICS, what

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-06 Thread R. David Murray
On Wed, 06 Oct 2010 12:22:18 +0900, Stephen J. Turnbull step...@xemacs.org wrote: Nick Coghlan writes: - if you pass in bytes data and know what you are doing, then you can access that raw bytes data and do your own decoding At what level, though? To take an interesting example I

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-06 Thread R. David Murray
On Wed, 06 Oct 2010 22:55:00 +0900, Stephen J. Turnbull step...@xemacs.org wrote: R. David Murray writes: version of headers to the email5 API, but since any such data would be non-RFC compliant anyway, [access to non-conforming headers by reparsing the bytes] will just have to be

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-06 Thread Stephen J. Turnbull
R. David Murray writes: 5. Return the content, with non-ASCII bytes replaced with ? characters. That hadn't occurred to me (and it makes me sick to contemplate it). That said, this is probably good enough for Mailman-like apps to limp along for most users. It's certainly good enough

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-06 Thread Stephen J. Turnbull
R. David Murray writes: So the only parsing issue is if Mailman cares about *the non-ASCII bytes* in the headers it cares about. If it has to modify headers that contain non-ASCII bytes (for example, addresses and Subject) and cares about preserving the non-ASCII bytes, then there is

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-05 Thread Nick Coghlan
On Tue, Oct 5, 2010 at 3:41 PM, Stephen J. Turnbull step...@xemacs.org wrote: R. David Murray writes:   Only if the email package contains a coding error would the   surrogates escape and cause problems for user code. I don't think it is reasonable to internalize surrogates that way; some

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-05 Thread R. David Murray
On Tue, 05 Oct 2010 22:05:33 +1000, Nick Coghlan wrote: On Tue, Oct 5, 2010 at 3:41 PM, Stephen J. Turnbull step...@xemacs.org wrote: R. David Murray writes: Only if the email package contains a coding error would the surrogates escape and cause problems for user code. I don't think

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-05 Thread Stephen J. Turnbull
Nick Coghlan writes: - if you pass in bytes data and know what you are doing, then you can access that raw bytes data and do your own decoding At what level, though? To take an interesting example I used to see frequently: From: t...@tokyo.jp (Taro Yamada in 8-bit Shift JIS) So I

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-04 Thread Scott Dial
On 10/2/2010 7:00 PM, R. David Murray wrote: The clever hack (thanks ultimately to Martin) is to accept 8bit data by encoding it using the ASCII codec and the surrogateescape error handler. I've seen this idea pop up in a number of threads. I worry that you are all inventing a new kind of dual

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-04 Thread R. David Murray
On Mon, 04 Oct 2010 12:32:26 -0400, Scott Dial scott+python-...@scottdial.com wrote: On 10/2/2010 7:00 PM, R. David Murray wrote: The clever hack (thanks ultimately to Martin) is to accept 8bit data by encoding it using the ASCII codec and the surrogateescape error handler. I've seen

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-04 Thread Barry Warsaw
On Oct 02, 2010, at 07:00 PM, R. David Murray wrote: The advantage of this patch is that it means Python3.2 can have an email module that is capable of handling a significant proportion of the applications where the ability to process binary email data is required. Like others, I'm concerned

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-04 Thread Stephen J. Turnbull
R. David Murray writes: On Mon, 04 Oct 2010 12:32:26 -0400, Scott Dial scott+python-...@scottdial.com wrote: On 10/2/2010 7:00 PM, R. David Murray wrote: The clever hack (thanks ultimately to Martin) is to accept 8bit data by encoding it using the ASCII codec and the

[Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-02 Thread R. David Murray
A while back on some issue or another I remember telling someone that if there was any sort of clever hack that would allow the current email package (email5) to work with bytes we would have implemented it. Well, I've come up with a clever hack. The idea came out of a conversation with Antoine.

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-02 Thread Benjamin Peterson
2010/10/2 R. David Murray rdmur...@bitdance.com: Regardless of whether or not this patch or a descendant thereof is accepted I still intend to continue working on email6.  There are many other bugs in the current email package that require a rewrite of parts of its infrastructure, and the

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-02 Thread R. David Murray
On Sat, 02 Oct 2010 19:15:57 -0500, Benjamin Peterson benja...@python.org wrote: 2010/10/2 R. David Murray rdmur...@bitdance.com: Regardless of whether or not this patch or a descendant thereof is accepted I still intend to continue working on email6. =C2=A0There are ma= ny other bugs in

Re: [Python-Dev] Patch making the current email package (mostly) support bytes

2010-10-02 Thread Nick Coghlan
On Sun, Oct 3, 2010 at 9:00 AM, R. David Murray rdmur...@bitdance.com wrote: I do not propose that this is a *good* API, since it has the classic problem that if there are coding bugs in the email module strings may escape that have surrogates in them and we end up with programs that work most