Re: urllib2 - iteration over non-sequence

Gary Herron Sat, 09 Jun 2007 21:08:16 -0700

Paul Rubin wrote:
> Erik Max Francis <[EMAIL PROTECTED]> writes:
>   
>> This is really wasteful, as there's no point in reading in the whole
>> file before iterating over it.  To get the same effect as file
>> iteration in later versions, use the .xreadlines method::
>>
>>      for line in aFile.xreadlines():
>>          ...
>>     
>
> Ehhh, a heck of a lot of web pages don't have any newlines, so you end
> up getting the whole file anyway, with that method.  Something like
>
>    for line in iter(lambda: aFile.read(4096), ''): ...
>
> may be best.
>   
Certainly there's are cases where xreadlines or read(bytecount) are
reasonable, but only if the total pages size is *very* large.  But for
most web pages, you guys are just nit-picking (or showing off) to
suggest that the full read implemented by readlines is wasteful. 
Moreover, the original problem was with sockets -- which don't have
xreadlines.  That seems to be a method on regular file objects.


 For simplicity, I'd still suggest my original use of readlines.   If
and when you find you are downloading web pages with sizes that are
putting a serious strain on your memory footprint, then one of the other
suggestions might be indicated.

Gary Herron




-- 
http://mail.python.org/mailman/listinfo/python-list

Re: urllib2 - iteration over non-sequence

Reply via email to