Re: [Email-SIG] [Python-Dev] Dropping bytes support in json
On Apr 10, 2009, at 3:04 PM, Stephen J. Turnbull wrote: Shouldn't this thread move lock stock and .signature to email-sig? Yep. I'll try to be more conscientious about removing python-dev from the CC. Idempotency? I'm not sure what that means in the context of the email package ... multiplication by zero?wink Do you mean that .parse().to_wire() should be idempotent? Yes, I think that's a good idea, and it shouldn't be too hard to implement by (optionally?) caching the whole original message or individual components (headers with all whitespace including folding cached verbatim, etc). I think caching has to be done, since stuff like did the original fold with a leading tab or a leading space, and at what column and so on seems kind of pointless to encode as attributes on Header objects. I tend to agree. I'm also happy of there's a way to tell say the parser that an application doesn't care about that. All that extra caching will have a memory overhead that you should only pay for if you care. -Barry PGP.sig Description: This is a digitally signed message part ___ Email-SIG mailing list Email-SIG@python.org Your options: http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com
Re: [Email-SIG] [Python-Dev] Dropping bytes support in json
At 10:11 -0400 04/13/2009, Barry Warsaw wrote: On Apr 10, 2009, at 11:08 AM, James Y Knight wrote: Until you write a parser for every header, you simply cannot decode to unicode. The only sane choices are: 1) raw bytes 2) parsed structured data The email package does not need a parser for every header, but it should provide a framework that applications (or third party libraries) can use to extend the built-in header parsers. A bare minimum for functionality requires a Content-Type parser. I think the email package should also include an address header (Originator, Destination) parser, and a Message-ID header parser. Possibly others. The default would probably be some unstructured parser for headers like Subject. I think the email package should have a parser for every header. All the headers defined in normal mail RFCs should have their own parser, and there would be a default parser for unhandled headers, probably the Unstructured parser. Users could add their own, probably by importing something module that knew how to add its parsing to the email package parsers. -- TonyN.:' mailto:tonynel...@georgeanelson.com ' http://www.georgeanelson.com/ ___ Email-SIG mailing list Email-SIG@python.org Your options: http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com
Re: [Email-SIG] [Python-Dev] Dropping bytes support in json
On Apr 9, 2009, at 11:59 PM, Tony Nelson wrote: Thinking about this stuff makes me nostalgic for the sloppy happy days of Python 2.x You now have the opportunity to finally unsnarl that mess. It is not an insurmountable opportunity. No, it's just a full time job wink. Now where did I put that hack- drink-coffee-twitter clone? -Barry PGP.sig Description: This is a digitally signed message part ___ Email-SIG mailing list Email-SIG@python.org Your options: http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com
Re: [Email-SIG] [Python-Dev] Dropping bytes support in json
On approximately 4/10/2009 9:56 AM, came the following characters from the keyboard of Barry Warsaw: On Apr 10, 2009, at 1:19 AM, gl...@divmod.com wrote: On 02:38 am, ba...@python.org wrote: So, what I'm really asking is this. Let's say you agree that there are use cases for accessing a header value as either the raw encoded bytes or the decoded unicode. What should this return: message['Subject'] The raw bytes or the decoded unicode? My personal preference would be to just get deprecate this API, and get rid of it, replacing it with a slightly more explicit one. message.headers['Subject'] message.bytes_headers['Subject'] This is pretty darn clever Glyph. Stop that! :) I'm not 100% sure I like the name .bytes_headers or that .headers should be the decoded header (rather than have .headers return the bytes thingie and say .decoded_headers return the decoded thingies), but I do like the general approach. If one name has to be longer than the other, it should be the bytes version. Real user code is more likely to want to use the text version, and hopefully there will be more of that type of code than implementations using bytes. Of course, one could use message.header and message.bythdr and they'd be the same length. -- Glenn -- http://nevcal.com/ === A protocol is complete when there is nothing left to remove. -- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking ___ Email-SIG mailing list Email-SIG@python.org Your options: http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com
Re: [Email-SIG] [Python-Dev] Dropping bytes support in json
On Apr 10, 2009, at 2:00 PM, Glenn Linderman wrote: If one name has to be longer than the other, it should be the bytes version. Real user code is more likely to want to use the text version, and hopefully there will be more of that type of code than implementations using bytes. I'm not sure we know that yet, actually. Nothing written for Python 2 counts, and email is too broken in 3 for any sane person to be writing such code for Python 3. Of course, one could use message.header and message.bythdr and they'd be the same length. I was trying to figure out what a 'thdr' was that we'd want to index 'by' it. :) -Barry PGP.sig Description: This is a digitally signed message part ___ Email-SIG mailing list Email-SIG@python.org Your options: http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com
Re: [Email-SIG] [Python-Dev] Dropping bytes support in json
On Apr 10, 2009, at 2:06 PM, Michael Foord wrote: Shouldn't headers always be text? /me weeps PGP.sig Description: This is a digitally signed message part ___ Email-SIG mailing list Email-SIG@python.org Your options: http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com
Re: [Email-SIG] [Python-Dev] Dropping bytes support in json
Shouldn't this thread move lock stock and .signature to email-sig? Barry Warsaw writes: It does seem to make sense to think about headers as text header names and text header values. I disagree. IMHO, structured header types should have object values, and something like While I agree, there's still a need for a higher level API that make it easy to do the simple things. Sure. I'm suggesting that the way to determine whether something is simple or not is by whether it falls out naturally from correct structure. Ie, no operations that only a Cirque du Soleil juggler can perform are allowed. I agree that the Message class needs to be strict. A parser needs to be lenient; Not always. The Postel Principle only applies to stuph coming in off the wire. But we're *also* going to be parsing pseudo-email components that are being handed to us by applications (eg, the perennial control-character-in-the-unremovable-address Mailman bug). Our parser should Just Say No to that crap. see the .defects attribute introduced in the current email package. Oh, and this reminds me that we still haven't talked about idempotency. That's an important principle in the current email package, but do we need to give up on that? Idempotency? I'm not sure what that means in the context of the email package ... multiplication by zero?wink Do you mean that .parse().to_wire() should be idempotent? Yes, I think that's a good idea, and it shouldn't be too hard to implement by (optionally?) caching the whole original message or individual components (headers with all whitespace including folding cached verbatim, etc). I think caching has to be done, since stuff like did the original fold with a leading tab or a leading space, and at what column and so on seems kind of pointless to encode as attributes on Header objects. [Description of MessageTextView and MessageWireView elided.] This seems similar to Glyph's basic idea, but with a different spelling. Yes. I don't much care which way it's done, and Glyph's style of spelling is more explicit. But I was thinking in terms of the number of people who are surely going to sing Mama don' 'low no Unicodes roun' here and squeal codec WTF?! outta mah face, man! ___ Email-SIG mailing list Email-SIG@python.org Your options: http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com
Re: [Email-SIG] [Python-Dev] Dropping bytes support in json
On Apr 9, 2009, at 11:55 AM, Daniel Stutzbach wrote: On Thu, Apr 9, 2009 at 6:01 AM, Barry Warsaw ba...@python.org wrote: Anyway, aside from that decision, I haven't come up with an elegant way to allow /output/ in both bytes and strings (input is I think theoretically easier by sniffing the arguments). Won't this work? (assuming dumps() always returns a string) def dumpb(obj, encoding='utf-8', *args, **kw): s = dumps(obj, *args, **kw) return s.encode(encoding) So, what I'm really asking is this. Let's say you agree that there are use cases for accessing a header value as either the raw encoded bytes or the decoded unicode. What should this return: message['Subject'] The raw bytes or the decoded unicode? Okay, so you've picked one. Now how do you spell the other way? The Message class probably has these explicit methods: Message.get_header_bytes('Subject') Message.get_header_string('Subject') (or better names... it's late and I'm tired ;). One of those maps to message['Subject'] but which is the more obvious choice? Now, setting headers. Sometimes you have some unicode thing and sometimes you have some bytes. You need to end up with bytes in the ASCII range and you'd like to leave the header value unencoded if so. But in both cases, you might have bytes or characters outside that range, so you need an explicit encoding, defaulting to utf-8 probably. Message.set_header('Subject', 'Some text', encoding='utf-8') Message.set_header('Subject', b'Some bytes') One of those maps to message['Subject'] = ??? I'm open to any suggestions here! -Barry PGP.sig Description: This is a digitally signed message part ___ Email-SIG mailing list Email-SIG@python.org Your options: http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com
Re: [Email-SIG] [Python-Dev] Dropping bytes support in json
On Apr 9, 2009, at 11:21 PM, Nick Coghlan wrote: Barry Warsaw wrote: I don't know whether the parameter thing will work or not, but you're probably right that we need to get the bytes-everywhere API first. Given that json is a wire protocol, that sounds like the right approach for json as well. Once bytes-everywhere works, then a text API can be built on top of it, but it is difficult to build a bytes API on top of a text one. Agreed! So I guess the IO library *is* the right model: bytes at the bottom of the stack, with text as a wrapper around it (mediated by codecs). Yes, that's a very interesting (and proven?) model. I don't quite see how we could apply that email and json, but it seems like there's a good idea there. ;) -Barry PGP.sig Description: This is a digitally signed message part ___ Email-SIG mailing list Email-SIG@python.org Your options: http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com
Re: [Email-SIG] [Python-Dev] Dropping bytes support in json
Barry Warsaw writes: There are really two ways to look at an email message. It's either an unstructured blob of bytes, or it's a structured tree of objects. Indeed! Those objects have headers and payload. The payload can be of any type, though I think it generally breaks down into strings for text/ * types and bytes for anything else (not counting multiparts). *sigh* Why are you back-tracking? The payload should be of an appropriate *object* type. Atomic object types will have their content stored as string or bytes [nb I use Python 3 terminology throughout]. Composite types (multipart/*) won't need string or bytes attributes AFAICS. Start by implementing the application/octet-stream and text/plain;charset=utf-8 object types, of course. It does seem to make sense to think about headers as text header names and text header values. I disagree. IMHO, structured header types should have object values, and something like message['to'] = Barry 'da FLUFL' Warsaw ba...@python.org should be smart enough to detect that it's a string and attempt to (flexibly) parse it into a fullname and a mailbox adding escapes, etc. Whether these should be structured objects or they can be strings or bytes, I'm not sure (probably bytes, not strings, though -- see next exampl). OTOH message['to'] = b'''Barry 'da.FLUFL' Warsaw ba...@python.org''' should assume that the client knows what they are doing, and should parse it strictly (and I mean be a real bastard, eg, raise an exception on any non-ASCII octet), merely dividing it into fullname and mailbox, and caching the bytes for later insertion in a wire-format message. In that case, I think you want the values as unicodes, and probably the headers as unicodes containing only ASCII. So your table would be strings in both cases. OTOH, maybe your application cares about the raw underlying encoded data, in which case the header names are probably still strings of ASCII-ish unicodes and the values are bytes. It's this distinction (and I think the competing use cases) that make a true Python 3.x API for email more complicated. I don't see why you can't have the email API be specific, with message['to'] always returning a structured_header object (or maybe even more specifically an address_header object), and methods like message['to'].build_header_as_text() which returns To: Barry 'da.FLUFL' Warsaw ba...@python.org and message['to'].build_header_in_wire_format() which returns bTo: Barry 'da.FLUFL' Warsaw ba...@python.org Then have email.textview.Message and email.wireview.Message which provide a simple interface where message['to'] would invoke .build_header_as_text() and .build_header_in_wire_format() respectively. Thinking about this stuff makes me nostalgic for the sloppy happy days of Python 2.x Er, yeah. Nostalgic-for-the-BITNET-days-where-everything-was-Just-EBCDIC-ly y'rs, ___ Email-SIG mailing list Email-SIG@python.org Your options: http://mail.python.org/mailman/options/email-sig/archive%40mail-archive.com