[issue9360] nntplib cleanup

2010-09-29 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

I've committed the latest patch in r85111.

--
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9360
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9360] nntplib cleanup

2010-09-25 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

And here is a patch integrating doc fixes and updates.

--
Added file: http://bugs.python.org/file19011/nntplib_cleanup7.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9360
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9360] nntplib cleanup

2010-09-24 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Here is a patch adding tests for article(), head() and body(), as well as for 
the file parameter.
As far as code is concerned, this should now be complete. Documentation remains 
to be updated :)

--
Added file: http://bugs.python.org/file19002/nntplib_cleanup6.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9360
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9360] nntplib cleanup

2010-09-23 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Yes, unescaped utf-8 is explicitly allowed in headers by RFC 3977:

   The content of a header SHOULD be in UTF-8.  However, if an
   implementation receives an article from elsewhere that uses octets in
   the range 128 to 255 in some other manner, it MAY pass it to a client
   or server without modification.  Therefore, implementations MUST be
   prepared to receive such headers, and data derived from them (e.g.,
   in the responses from the OVER command, Section 8.3), and MUST NOT
   assume that they are always UTF-8.

(I guess this means NNTP is now more modern than SMTP :-)).

Actually, there's a test with an actual utf-8 header in the unit tests.
(as for “implementations MUST be prepared to receive such headers, and data 
derived from them”, this is accounted for by using surrogateescape).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9360
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9360] nntplib cleanup

2010-09-20 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 Gah.  I reviewed patch4, didn't noticed today's update :(

Oops, I'm sorry for that. It turns out most of your comments are useful anyway.

 Why is it a good thing to make file a keyword only parameter?

I was thinking that if we want to add other function parameters, it will
be possible to do so while keeping the (mnemonically useful) convention
that the file parameter is last.

You are right that testing that parameter is still on the TODO list...

 On line 193 you index fmt, and *then* you check the length.  When the
 number of tokens is too long, an IndexError is raised.

Yes, this is fixed in patch5 :)

 Could the 'date' field in the xover headers also be a DateTime rather
 than a string?  And :bytes and :lines be ints?  Or is that being to
 DWIMish?

We could indeed parse the xover/over fields, but the issue is that there
can be extra fields which therefore won't get parsed (or not all of them
if we decide to recognize some e.g. Xref).

Also, people could want to get at the non-parsed representation
(especially for the date field), which means we would need two APIs with
two different levels of abstraction.

 If the date field isn't returned as a datetime, though, should there
 be a helper method for converting it, or should we just assume that
 email.utils mktime_tz and parsedate_tz?

parsedate or parsedate_tz is the right thing to use indeed (dates follow
the same format as in e-mails). As for the from and subject fields, the
provided decode_header() is very recommended (as the __main__ block
shows).

 Am I correct that the purpose of _NNTPBase is to make testing easier?

Yes.

 There seem to be only three lines of code in common between the file
 and the non-file case in _getlongresp.  I think it would be clearer to
 make them two different routines, or at least move the three common
 lines to the top and then do an if test on file is None (that is, put
 the simpler, non-file case first).

I'll take a look.

 My little nttp client script doesn't really test very much of the
 nttplib interface, nor is it very complex.  The change to xover
 considerably simplified that part of my script, so I very much like
 that change.  I was also able to drop my implementation of
 decode_header.  So overall this patch is significant improvement, IMO.

Thank you :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9360
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9360] nntplib cleanup

2010-09-20 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

OK.  It doesn't seem all that likely that we'll be adding parameters, but you 
never know.  So I'm OK with file becoming keyword only.

As for the overview headers...if they aren't decoded through decode_header 
already, why are they unicode?  If they haven't been decoded, I'd expect them 
to be bytes.  Or does the standard really say that these headers, extracted 
from the bytes message data, will be valid utf-8?  (I realize that as things 
stand now with email5 you'll in fact *have* to decode them in order to process 
them with decode_header, but that doesn't change the fact that if they are 
unicode I would expect that they'd be *fully* decoded.)

But I take your point about the datatypes otherwise.  Also, when we get to 
email6 all headers should be turned into Header objects, and there will be a 
way to get a datetime out of a Date Header object.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9360
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9360] nntplib cleanup

2010-09-19 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Here is the current state of the cleanup work. Coding is roughly finished; most 
methods are unit-tested. The documentation remains to be done. Review welcome.

--
stage:  - patch review
Added file: http://bugs.python.org/file18931/nntplib_cleanup5.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9360
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9360] nntplib cleanup

2010-09-19 Thread Nick Coghlan

Nick Coghlan ncogh...@gmail.com added the comment:

To make the distinction easier to remember, would it help if the methods that 
are currently set to return bytes instead accepted the typical encoding+errors 
parameters, with parallel *b APIs to get at the raw bytes?

My concern with the current API is that there isn't a clear indicator during 
normal programming as to which APIs return strings and which return the raw 
bytes and hence require further decoding.

--
nosy: +ncoghlan

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9360
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9360] nntplib cleanup

2010-09-19 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

 To make the distinction easier to remember, would it help if the
 methods that are currently set to return bytes instead accepted the
 typical encoding+errors parameters, with parallel *b APIs to get at
 the raw bytes?

Not really, no. For raw messages, which encoding+errors must be used
depends on the returned contents, it's not something the client can know
up front; moreover, different parts of the returned bytes may need
decoding using different encodings (for example if there are several
MIME parts to the message). People should use the email package to parse
the raw messages, as I assume they already do in 2.x.

Apart from raw message bodies, NNTP data has well-defined encodings and
that's why I can take and return unicode (although as stated, I also use
surrogateescape to be fault-tolerant in the face of broken servers).

 My concern with the current API is that there isn't a clear indicator
 during normal programming as to which APIs return strings and which
 return the raw bytes and hence require further decoding.

That's a documentation issue. I haven't touched the docs yet :)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9360
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9360] nntplib cleanup

2010-09-19 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

I tested this against my existing py3k nttp client code.

Why is it a good thing to make file a keyword only parameter? But given that it 
is, line 720 needs to use it as such, which omission means your missing at 
least one test :)

On line 193 you index fmt, and *then* you check the length.  When the number of 
tokens is too long, an IndexError is raised.  (Note that the offending overview 
line comes from the gmane group comp.lang.python.mime.devel, and the offense is 
an extra '' field on one of the records.  No idea why it got added to that one 
record.  Looks like it is message 701, artid 4111bba9.3040...@harvee.org)

Could the 'date' field in the xover headers also be a DateTime rather than a 
string?  And :bytes and :lines be ints?  Or is that being to DWIMish?  If the 
date field isn't returned as a datetime, though, should there be a helper 
method for converting it, or should we just assume that email.utils mktime_tz 
and parsedate_tz?

Am I correct that the purpose of _NNTPBase is to make testing easier?

There seem to be only three lines of code in common between the file and the 
non-file case in _getlongresp.  I think it would be clearer to make them two 
different routines, or at least move the three common lines to the top and then 
do an if test on file is None (that is, put the simpler, non-file case first).

My little nttp client script doesn't really test very much of the nttplib 
interface, nor is it very complex.  The change to xover considerably simplified 
that part of my script, so I very much like that change.  I was also able to 
drop my implementation of decode_header.  So overall this patch is significant 
improvement, IMO.

I haven't given the code as thorough a review as might be optimal, but I 
certainly think you are headed in the right direction.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9360
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9360] nntplib cleanup

2010-09-19 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

Gah.  I reviewed patch4, didn't noticed today's update :(

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9360
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9360] nntplib cleanup

2010-09-13 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

Assuming we can break backward compatibility, it sounds fine to me.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9360
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9360] nntplib cleanup

2010-09-13 Thread Giampaolo Rodola'

Changes by Giampaolo Rodola' g.rod...@gmail.com:


--
nosy: +giampaolo.rodola

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9360
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9360] nntplib cleanup

2010-09-07 Thread Antoine Pitrou

Antoine Pitrou pit...@free.fr added the comment:

Note that according to RFC 3977, “The character set for all NNTP commands is 
UTF-8”.

But it also says this about multi-line data blocks:

   Note that texts using an encoding (such as UTF-16 or UTF-32) that may
   contain the octets NUL, LF, or CR other than a CRLF pair cannot be
   reliably conveyed in the above format (that is, they violate the MUST
   requirement above).  However, except when stated otherwise, this
   specification does not require the content to be UTF-8, and therefore
   (subject to that same requirement) it MAY include octets above and
   below 128 mixed arbitrarily.

IMO, it should decode/encode by default using utf-8 (with the surrogateescape 
error handler for easy round-tripping with non-compliant servers), except for 
raw articles (bodies / envelopes) where bytes should be returned.

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9360
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9360] nntplib cleanup

2010-07-23 Thread Dmitry Jemerov

New submission from Dmitry Jemerov intelliy...@gmail.com:

The patch performs an extensive cleanup of nntplib:
 - Change API methods to return strings instead of bytes. This breaks API 
compatibility, but given that the parameters need to be passed as strings and 
many of the returned values would need to be passed to other API methods, I 
consider the current API to be broken. I've discussed this with Brett at the 
EuroPython sprint, and he agrees.
 - Add tests.
 - Add pending deprecation warnings for xgtitle() and xpath() methods, which 
are not useful in modern environments.
 - Use named tuples for returned values where appopriate.
 - Modernize the implementation a little bit.
 - Clean up the docstrings.

--
components: Library (Lib)
files: nntplib_cleanup.patch
keywords: patch
messages: 111364
nosy: Dmitry.Jemerov
priority: normal
severity: normal
status: open
title: nntplib cleanup
versions: Python 3.2
Added file: http://bugs.python.org/file18159/nntplib_cleanup.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9360
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9360] nntplib cleanup

2010-07-23 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

I haven't looked at the patch in detail, but if my quick skim is correct, it 
looks like you are decoding using latin1.  If you don't provide any way to get 
at the original bytes (which I presume you don't if you have converted the API 
to return strings), then it will be difficult and non-obvious to correctly 
decode messages using a content transfer encoding of 8bit and some other 
character set.  (Of course, it is currently impossible to do this using the 
Py3k email package either, but I'm working on fixing that).

--
nosy: +r.david.murray

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9360
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9360] nntplib cleanup

2010-07-23 Thread Dmitry Jemerov

Dmitry Jemerov intelliy...@gmail.com added the comment:

This is an issue only for the actual article content, right? I'll be happy to 
extend the API to allow getting the original bytes of the article content, 
while keeping the rest of API (like group names) string-based.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9360
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9360] nntplib cleanup

2010-07-23 Thread R. David Murray

R. David Murray rdmur...@bitdance.com added the comment:

Correct, if by 'article content' you mean what is returned by the article 
command.  So I think it is necessary to be able to get bytes for two commands: 
article and body.  Then for symmetry it should also be possible to get bytes 
from the head command.  (It is actually possible for headers to include 8bit 
data, but it is a violation of the RFC and both rare outside of spam and 
impossible to handle sensibly in most cases...and it may be the case that most 
if not all nttpservers sanitize such headers, I don't have any experience in 
that area.)

On the other hand, this tells me that a possible use case I had been wondering 
about in email is in fact a valid concern: being able to parse a data stream 
that contains both strings and bytes :(  That is, one might use nntplib in a 
mode where you pull the headers as text and the body as bytes and want to hand 
it off to email to build a Message object from those parts.

--
stage:  - patch review
type:  - feature request

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9360
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com