urllib2 disable proxy

2008-01-02 Thread Dimitrios Apostolou
Hello list, 

I've been looking for a way to explicitly disable the use of proxies with 
urllib2, no matter what the environment dictates. Unfortunately I can't find 
a way in the documentation, and reading the source leads me to believe that 
something like the following does the job: 

req.set_proxy(None,None)

Where req is a urllib2.Request instance. So is there an official way of doing 
this? Perhaps it should be added in the documentation?


Thanks in advance, 
Dimitris
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: urllib2 disable proxy

2008-01-03 Thread Dimitrios Apostolou


On Wed, 2 Jan 2008, Rob Wolfe wrote:

> Dimitrios Apostolou <[EMAIL PROTECTED]> writes:
>
>> Hello list,
>>
>> I've been looking for a way to explicitly disable the use of proxies with
>> urllib2, no matter what the environment dictates. Unfortunately I can't find
>> a way in the documentation, and reading the source leads me to believe that
>> something like the following does the job:
>>
>> req.set_proxy(None,None)
>>
>> Where req is a urllib2.Request instance. So is there an official way of doing
>> this? Perhaps it should be added in the documentation?
>
> I believe that the recommended way is to use `urllib2.ProxyHandler`.
> Take a look at:
> http://www.voidspace.org.uk/python/articles/urllib2.shtml

Thanks for the pointer, I will use that way. However it seems rather 
non-elegant way to do something so simple and I was hoping not to mess 
with ProxyHandler, especially since I want *no* proxy... IMHO something 
like the following would be more elegant:

req.set_proxy('','http')

or

req.set_proxy(None,'http')


However these ways *don't* work. You think I should file a feature request 
somewhere or send this to the python-dev list?


Thank you for the help,
Dimitris


>
> HTH,
> Rob
> --
> http://mail.python.org/mailman/listinfo/python-list
>
-- 
http://mail.python.org/mailman/listinfo/python-list


urllib2 rate limiting

2008-01-10 Thread Dimitrios Apostolou
Hello list,

I want to limit the download speed when using urllib2. In particular, 
having several parallel downloads, I want to make sure that their total 
speed doesn't exceed a maximum value.

I can't find a simple way to achieve this. After researching a can try 
some things but I'm stuck on the details:

1) Can I overload some method in _socket.py to achieve this, and perhaps 
make this generic enough to work even with other libraries than urllib2?

2) There is the urllib.urlretrieve() function which accepts a reporthook 
parameter. Perhaps I can have reporthook to increment a global counter and 
sleep as necessary when a threshold is reached.
However there is not something similar in urllib2. Isn't urllib2 supposed 
to be a superset of urllib in functionality? Why there is no reporthook 
parameter in any of urllib2's functions?
Moreover, even the existing way reporthook can be used doesn't seem so 
right: reporthook(blocknum, bs, size) is always called with bs=8K even 
for the last block, and sometimes (blocknum*bs > size) is possible, if the 
server sends wrong Content-Lentgth HTTP headers.

3) Perhaps I can use filehandle.read(1024) and manually read as many 
chunks of data as I need. However I think this would generally be 
inefficient and I'm not sure how it would work because 
of internal buffering of urllib2.

So how do you think I can achieve rate limiting in urllib2?


Thanks in advance,
Dimitris

P.S. And something simpler: How can I disallow urllib2 to follow 
redirections to foreign hosts?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: urllib2 rate limiting

2008-01-10 Thread Dimitrios Apostolou
On Thu, 10 Jan 2008, Rob Wolfe wrote:

> Dimitrios Apostolou <[EMAIL PROTECTED]> writes:
>
>> P.S. And something simpler: How can I disallow urllib2 to follow
>> redirections to foreign hosts?
>
> You need to subclass `urllib2.HTTPRedirectHandler`, override
> `http_error_301` and `http_error_302` methods and throw
> `urllib2.HTTPError` exception.

Thanks! I think for my case it's better to override redirect_request 
method, and return a Request only in case the redirection goes to the 
same site. Just another question, because I can't find in the docs the 
meaning of (req, fp, code, msg, hdrs) parameters. To read the URL I get 
redirected to (the 'Location:' HTTP header?), should I check the hdrs 
parameter or there is a better way?


Thanks,
Dimitris


>
> http://diveintopython.org/http_web_services/redirects.html
>
> HTH,
> Rob
> --
> http://mail.python.org/mailman/listinfo/python-list
>
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: urllib2 rate limiting

2008-01-10 Thread Dimitrios Apostolou
On Thursday 10 January 2008 22:42:44 Rob Wolfe wrote:
> Dimitrios Apostolou <[EMAIL PROTECTED]> writes:
> > On Thu, 10 Jan 2008, Rob Wolfe wrote:
> >> Dimitrios Apostolou <[EMAIL PROTECTED]> writes:
> >>> P.S. And something simpler: How can I disallow urllib2 to follow
> >>> redirections to foreign hosts?
> >>
> >> You need to subclass `urllib2.HTTPRedirectHandler`, override
> >> `http_error_301` and `http_error_302` methods and throw
> >> `urllib2.HTTPError` exception.
> >
> > Thanks! I think for my case it's better to override redirect_request
> > method, and return a Request only in case the redirection goes to the
> > same site. Just another question, because I can't find in the docs the
> > meaning of (req, fp, code, msg, hdrs) parameters. To read the URL I
> > get redirected to (the 'Location:' HTTP header?), should I check the
> > hdrs parameter or there is a better way?
>
> Well, according to the documentation there is no better way.
> But I looked into the source code of `urllib2` and it seems
> that `redirect_request` method takes one more parameter
> `newurl`, what is probably what you're looking for. ;)
>
> Regards,
> Rob

Cool! :-) Sometimes undocumented features provide superb solutions... I wonder 
if there is something similar for rate limiting :-s


Thank you,
Dimitris
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: urllib2 rate limiting

2008-01-12 Thread Dimitrios Apostolou
On Fri, 11 Jan 2008, Nick Craig-Wood wrote:
> Here is an implementation based on that idea.  I've used urllib rather
> than urllib2 as that is what I'm familiar with.

Thanks! Really nice implementation. However I'm stuck with urllib2 because 
of its extra functionality so I'll try to implement something similar 
using handle.read(1024) to read in small chunks.

It really seems weird that urllib2 is missing reporthook functionality!


Thank you,
Dimitris

-- 
http://mail.python.org/mailman/listinfo/python-list