Re: download x bytes at a time over network

2009-03-17 Thread Saurabh
 This isn't exactly how things work.  The server *sends* you bytes.  It can
 send you a lot at once.  To some extent you can control how much it sends
 before it waits for you to catch up, but you don't have anywhere near
 byte-level control (you might have something like 32kb or 64kb level
 control).

What abt in Python3 ?
It seems to have some header like the one below : b'b495 - binary mode
with 46229 bytes ? Or is it something else ?

 import urllib.request
 url = http://feeds2.feedburner.com/jquery/;
 handler = urllib.request.urlopen(url)
 data = handler.read(1000)
 print(Content :\n%s \n%s \n%s % ('=' * 100, data, '=' * 100))
Content :

b'b495\r\n?xml version=1.0 encoding=UTF-8?\r\n?xml-stylesheet
type=text/xsl media=screen
href=/~d/styles/rss2full.xsl??xml-stylesheet type=text/css
media=screen 
href=http://feeds2.feedburner.com/~d/styles/itemcontent.css;?!--
generator=wordpress/2.0.11 --rss
xmlns:content=http://purl.org/rss/1.0/modules/content/;
xmlns:wfw=http://wellformedweb.org/CommentAPI/;
xmlns:dc=http://purl.org/dc/elements/1.1/;
version=2.0\r\n\r\nchannel\r\n\ttitlejQuery
Blog/title\r\n\tlinkhttp://blog.jquery.com/link\r\n\tdescriptionNew
Wave Javascript./description\r\n\tpubDateFri, 13 Mar 2009 13:07:07
+/pubDate\r\n\tgeneratorhttp://wordpress.org/?v=2.0.11/generator\r\n\tlanguageen/language\r\n\t\t\tatom10:link
xmlns:atom10=http://www.w3.org/2005/Atom; rel=self
href=http://feeds2.feedburner.com/jquery; type=application/rss+xml
/item\r\n\t\ttitleThis Week in jQuery, vol.
1/title\r\n\t\tlinkhttp://blog.jquery.com/2009/03/13/this-week-in-jquery-vol-1//link\r\n\t\tcommentshttp:'


--
http://mail.python.org/mailman/listinfo/python-list


Re: download x bytes at a time over network

2009-03-17 Thread n . s . buttar
2009/3/16 Saurabh phoneth...@gmail.com:
 I want to download content from the net - in chunks of x bytes or characters
 at a time - so that it doesnt pull the entire content in one shot.

 import urllib2
 url = http://python.org/;
 handler = urllib2.urlopen(url)

 data = handler.read(100)
 print Content :\n%s \n%s \n%s % ('=' * 100, data, '=' * 100)

 # Disconnect the internet

 data = handler.read(100) # I want it to throw an exception because it cant
 fetch from the net
 print Content :\n%s \n%s \n%s % ('=' * 100, data, '=' * 100) # But
 still works !

 Apparently, handler = urllib2.urlopen(url) takes the entire data in buffer
 and handler.read(100) is just reading from the buffer ?

 Is it possible to get content from the net in chunks ? Or do I need to use
 HTTPClient/sockets directly ?

 Thanks

 --
 http://mail.python.org/mailman/listinfo/python-list



yes you can do it with urllib(2). Please take a look at the following
HTTP headers which facilitiate this kind of transfers

Content-Range http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.16
Range http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35

You can set the values for these and send the request to get partial contents.
However let me warn you that, not all servers allow this.
--
http://mail.python.org/mailman/listinfo/python-list


Re: download x bytes at a time over network

2009-03-17 Thread Steven D'Aprano
Please excuse my replying to a reply instead of the original, but the 
original doesn't show up on my news feed.


On 2009/3/16 Saurabh phoneth...@gmail.com:
 I want to download content from the net - in chunks of x bytes or
 characters at a time - so that it doesnt pull the entire content in one
 shot.

 import urllib2
 url = http://python.org/;
 handler = urllib2.urlopen(url)

 data = handler.read(100)
 print Content :\n%s \n%s \n%s % ('=' * 100, data, '=' * 100)

 # Disconnect the internet

 data = handler.read(100) # I want it to throw an exception because it
 # cant fetch from the net
 print Content :\n%s \n%s \n%s % ('=' * 100, data, '=' * 100) 
 # But still works !


Perhaps you have a local caching web proxy server, which has downloaded 
the entire document, and even though you have disconnected the Internet, 
you haven't disconnected the proxy which is feeding you the rest of the 
file from the server's cache.


 Apparently, handler = urllib2.urlopen(url) takes the entire data in
 buffer and handler.read(100) is just reading from the buffer ?

I could be wrong, but I don't think so.

I think you have the right approach. By default, urllib2 uses a 
ProxyHandler which will try to auto-detect any proxies (according to 
environment variables, registry entries or similar).




-- 
Steven
--
http://mail.python.org/mailman/listinfo/python-list


Re: download x bytes at a time over network

2009-03-17 Thread Saurabh
Heres the reason behind wanting to get chunks at a time.
Im actually retrieving data from a list of RSS Feeds and need to
continuously check for latest posts.
But I dont want to depend on Last-Modified header or the pubDate tag
in channel. Because a lot of feeds just output date('now')  instead
of the actual last-updated timestamp.
But when continuously checking for latest posts, I dont want to
bombard other people's bandwidth - so I just want to get chunks of
bytes at a time and internally check for item.../item with my
database against timestamp values.
Is there a better way to achieve this ?
--
http://mail.python.org/mailman/listinfo/python-list


Re: download x bytes at a time over network

2009-03-17 Thread R. David Murray
Saurabh phoneth...@gmail.com wrote:
  This isn't exactly how things work.  The server *sends* you bytes.  It can
  send you a lot at once.  To some extent you can control how much it sends
  before it waits for you to catch up, but you don't have anywhere near
  byte-level control (you might have something like 32kb or 64kb level
  control).
 
 What abt in Python3 ?
 It seems to have some header like the one below : b'b495 - binary mode
 with 46229 bytes ? Or is it something else ?
 
  import urllib.request
  url = http://feeds2.feedburner.com/jquery/;
  handler = urllib.request.urlopen(url)
  data = handler.read(1000)
  print(Content :\n%s \n%s \n%s % ('=' * 100, data, '=' * 100))
 Content :
 
 b'b495\r\n?xml version=1.0 encoding=UTF-8?\r\n?xml-stylesheet

That b'... is the string representation of the bytes object returned
by urllib.request.  Remember that in python3 bytes and strings are two
very different types.  You get bytes from urllib.request because urllib
can't know what encoding the bytes are in.  You have to decide how to
decode them in order to convert it into a unicode string object.

--
R. David Murray   http://www.bitdance.com

--
http://mail.python.org/mailman/listinfo/python-list


Re: download x bytes at a time over network

2009-03-17 Thread Jorgen Grahn
On Tue, 17 Mar 2009 13:38:31 +0530, Saurabh phoneth...@gmail.com wrote:
 Heres the reason behind wanting to get chunks at a time.
 Im actually retrieving data from a list of RSS Feeds and need to
 continuously check for latest posts.
 But I dont want to depend on Last-Modified header or the pubDate tag
 in channel. Because a lot of feeds just output date('now')  instead
 of the actual last-updated timestamp.
 But when continuously checking for latest posts, I dont want to
 bombard other people's bandwidth - so I just want to get chunks of
 bytes at a time and internally check for item.../item with my
 database against timestamp values.
 Is there a better way to achieve this ?

I don't know much about RSS, but one approach is If they are too lazy
to provide the information which protects their bandwidth, they
deserve being bombarded. But they also deserve a polite mail telling
them that they have that problem.

/Jorgen

-- 
  // Jorgen Grahn grahn@Ph'nglui mglw'nafh Cthulhu
\X/ snipabacken.se  R'lyeh wgah'nagl fhtagn!
--
http://mail.python.org/mailman/listinfo/python-list


Re: download x bytes at a time over network

2009-03-17 Thread Jean-Paul Calderone

On Tue, 17 Mar 2009 12:15:23 +0530, Saurabh phoneth...@gmail.com wrote:

This isn't exactly how things work.  The server *sends* you bytes.  It can
send you a lot at once.  To some extent you can control how much it sends
before it waits for you to catch up, but you don't have anywhere near
byte-level control (you might have something like 32kb or 64kb level
control).


What abt in Python3 ?
It seems to have some header like the one below : b'b495 - binary mode
with 46229 bytes ? Or is it something else ?


That's just a bug in urllib in Python 3.0.

Jean-Paul
--
http://mail.python.org/mailman/listinfo/python-list


Re: download x bytes at a time over network

2009-03-17 Thread R. David Murray
Jean-Paul Calderone exar...@divmod.com wrote:
 On Tue, 17 Mar 2009 12:15:23 +0530, Saurabh phoneth...@gmail.com wrote:
  This isn't exactly how things work.  The server *sends* you bytes.  It can
  send you a lot at once.  To some extent you can control how much it sends
  before it waits for you to catch up, but you don't have anywhere near
  byte-level control (you might have something like 32kb or 64kb level
  control).
 
 What abt in Python3 ?
 It seems to have some header like the one below : b'b495 - binary mode
 with 46229 bytes ? Or is it something else ?
 
 That's just a bug in urllib in Python 3.0.

What makes you say that's a bug?  Did I miss something?  (Which is entirely
possible!)

--
R. David Murray   http://www.bitdance.com

--
http://mail.python.org/mailman/listinfo/python-list


Re: download x bytes at a time over network

2009-03-17 Thread n . s . buttar
http://code.activestate.com/recipes/114217/

On Tue, Mar 17, 2009 at 1:38 PM, Saurabh phoneth...@gmail.com wrote:
 Heres the reason behind wanting to get chunks at a time.
 Im actually retrieving data from a list of RSS Feeds and need to
 continuously check for latest posts.
 But I dont want to depend on Last-Modified header or the pubDate tag
 in channel. Because a lot of feeds just output date('now')  instead
 of the actual last-updated timestamp.
 But when continuously checking for latest posts, I dont want to
 bombard other people's bandwidth - so I just want to get chunks of
 bytes at a time and internally check for item.../item with my
 database against timestamp values.
 Is there a better way to achieve this ?
 --
 http://mail.python.org/mailman/listinfo/python-list

--
http://mail.python.org/mailman/listinfo/python-list


Re: download x bytes at a time over network

2009-03-17 Thread Jean-Paul Calderone

On Tue, 17 Mar 2009 15:17:56 + (UTC), R. David Murray 
rdmur...@bitdance.com wrote:

Jean-Paul Calderone exar...@divmod.com wrote:

On Tue, 17 Mar 2009 12:15:23 +0530, Saurabh phoneth...@gmail.com wrote:
 This isn't exactly how things work.  The server *sends* you bytes.  It can
 send you a lot at once.  To some extent you can control how much it sends
 before it waits for you to catch up, but you don't have anywhere near
 byte-level control (you might have something like 32kb or 64kb level
 control).

What abt in Python3 ?
It seems to have some header like the one below : b'b495 - binary mode
with 46229 bytes ? Or is it something else ?

That's just a bug in urllib in Python 3.0.


What makes you say that's a bug?  Did I miss something?  (Which is entirely
possible!)


I saw it in the Python issue tracker. :)  Python 3.0 broke handling of
chunked HTTP responses.  Instead of interpreting the chunk length prefixes,
it delivered them as part of the response.

Jean-Paul
--
http://mail.python.org/mailman/listinfo/python-list


Re: download x bytes at a time over network

2009-03-17 Thread R. David Murray
Jean-Paul Calderone exar...@divmod.com wrote:
 On Tue, 17 Mar 2009 15:17:56 + (UTC), R. David Murray 
 rdmur...@bitdance.com wrote:
 Jean-Paul Calderone exar...@divmod.com wrote:
  On Tue, 17 Mar 2009 12:15:23 +0530, Saurabh phoneth...@gmail.com wrote:
   This isn't exactly how things work.  The server *sends* you bytes.  It 
   can
   send you a lot at once.  To some extent you can control how much it 
   sends
   before it waits for you to catch up, but you don't have anywhere near
   byte-level control (you might have something like 32kb or 64kb level
   control).
  
  What abt in Python3 ?
  It seems to have some header like the one below : b'b495 - binary mode
  with 46229 bytes ? Or is it something else ?
 
  That's just a bug in urllib in Python 3.0.
 
 What makes you say that's a bug?  Did I miss something?  (Which is entirely
 possible!)
 
 I saw it in the Python issue tracker. :)  Python 3.0 broke handling of
 chunked HTTP responses.  Instead of interpreting the chunk length prefixes,
 it delivered them as part of the response.

Ah, got you.  Thanks for the info.

--
R. David Murray   http://www.bitdance.com

--
http://mail.python.org/mailman/listinfo/python-list


Re: download x bytes at a time over network

2009-03-17 Thread Gabriel Genellina
En Tue, 17 Mar 2009 17:09:35 -0200, R. David Murray  
rdmur...@bitdance.com escribió:

Jean-Paul Calderone exar...@divmod.com wrote:
On Tue, 17 Mar 2009 15:17:56 + (UTC), R. David Murray  
rdmur...@bitdance.com wrote:

Jean-Paul Calderone exar...@divmod.com wrote:
 On Tue, 17 Mar 2009 12:15:23 +0530, Saurabh phoneth...@gmail.com  
wrote:

 
 It seems to have some header like the one below : b'b495 - binary  
mode

 with 46229 bytes ? Or is it something else ?

 That's just a bug in urllib in Python 3.0.

What makes you say that's a bug?  Did I miss something?  (Which is  
entirely

possible!)

I saw it in the Python issue tracker. :)  Python 3.0 broke handling of
chunked HTTP responses.  Instead of interpreting the chunk length  
prefixes,

it delivered them as part of the response.


Ah, got you.  Thanks for the info.


Just for completeness, here is the tracker issue:
http://bugs.python.org/issue4631

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


Re: download x bytes at a time over network

2009-03-17 Thread Matt Nordhoff
Saurabh wrote:
 Heres the reason behind wanting to get chunks at a time.
 Im actually retrieving data from a list of RSS Feeds and need to
 continuously check for latest posts.
 But I dont want to depend on Last-Modified header or the pubDate tag
 in channel. Because a lot of feeds just output date('now')  instead
 of the actual last-updated timestamp.
 But when continuously checking for latest posts, I dont want to
 bombard other people's bandwidth - so I just want to get chunks of
 bytes at a time and internally check for item.../item with my
 database against timestamp values.
 Is there a better way to achieve this ?

For the feeds that *do* set Last-Modified properly, won't you be using
*more* bandwidth by downloading part of the feed instead of just using
If-Modified-Since?
-- 
--
http://mail.python.org/mailman/listinfo/python-list


download x bytes at a time over network

2009-03-16 Thread Saurabh
I want to download content from the net - in chunks of x bytes or characters
at a time - so that it doesnt pull the entire content in one shot.

import urllib2
url = http://python.org/;
handler = urllib2.urlopen(url)

data = handler.read(100)
print Content :\n%s \n%s \n%s % ('=' * 100, data, '=' * 100)

# Disconnect the internet

data = handler.read(100) # I want it to throw an exception because it cant
fetch from the net
print Content :\n%s \n%s \n%s % ('=' * 100, data, '=' * 100) # But
still works !

Apparently, handler = urllib2.urlopen(url) takes the entire data in buffer
and handler.read(100) is just reading from the buffer ?

Is it possible to get content from the net in chunks ? Or do I need to use
HTTPClient/sockets directly ?

Thanks
--
http://mail.python.org/mailman/listinfo/python-list


Re: download x bytes at a time over network

2009-03-16 Thread Jean-Paul Calderone

On Mon, 16 Mar 2009 13:02:07 +0530, Saurabh phoneth...@gmail.com wrote:

I want to download content from the net - in chunks of x bytes or characters
at a time - so that it doesnt pull the entire content in one shot.


This isn't exactly how things work.  The server *sends* you bytes.  It can
send you a lot at once.  To some extent you can control how much it sends
before it waits for you to catch up, but you don't have anywhere near
byte-level control (you might have something like 32kb or 64kb level
control).

If you try downloading a large enough file and increasing the numbers in
your example, then you can probably see the behavior you were expecting.

Jean-Paul
--
http://mail.python.org/mailman/listinfo/python-list