Re: download x bytes at a time over network
This isn't exactly how things work. The server *sends* you bytes. It can send you a lot at once. To some extent you can control how much it sends before it waits for you to catch up, but you don't have anywhere near byte-level control (you might have something like 32kb or 64kb level control). What abt in Python3 ? It seems to have some header like the one below : b'b495 - binary mode with 46229 bytes ? Or is it something else ? import urllib.request url = http://feeds2.feedburner.com/jquery/; handler = urllib.request.urlopen(url) data = handler.read(1000) print(Content :\n%s \n%s \n%s % ('=' * 100, data, '=' * 100)) Content : b'b495\r\n?xml version=1.0 encoding=UTF-8?\r\n?xml-stylesheet type=text/xsl media=screen href=/~d/styles/rss2full.xsl??xml-stylesheet type=text/css media=screen href=http://feeds2.feedburner.com/~d/styles/itemcontent.css;?!-- generator=wordpress/2.0.11 --rss xmlns:content=http://purl.org/rss/1.0/modules/content/; xmlns:wfw=http://wellformedweb.org/CommentAPI/; xmlns:dc=http://purl.org/dc/elements/1.1/; version=2.0\r\n\r\nchannel\r\n\ttitlejQuery Blog/title\r\n\tlinkhttp://blog.jquery.com/link\r\n\tdescriptionNew Wave Javascript./description\r\n\tpubDateFri, 13 Mar 2009 13:07:07 +/pubDate\r\n\tgeneratorhttp://wordpress.org/?v=2.0.11/generator\r\n\tlanguageen/language\r\n\t\t\tatom10:link xmlns:atom10=http://www.w3.org/2005/Atom; rel=self href=http://feeds2.feedburner.com/jquery; type=application/rss+xml /item\r\n\t\ttitleThis Week in jQuery, vol. 1/title\r\n\t\tlinkhttp://blog.jquery.com/2009/03/13/this-week-in-jquery-vol-1//link\r\n\t\tcommentshttp:' -- http://mail.python.org/mailman/listinfo/python-list
Re: download x bytes at a time over network
2009/3/16 Saurabh phoneth...@gmail.com: I want to download content from the net - in chunks of x bytes or characters at a time - so that it doesnt pull the entire content in one shot. import urllib2 url = http://python.org/; handler = urllib2.urlopen(url) data = handler.read(100) print Content :\n%s \n%s \n%s % ('=' * 100, data, '=' * 100) # Disconnect the internet data = handler.read(100) # I want it to throw an exception because it cant fetch from the net print Content :\n%s \n%s \n%s % ('=' * 100, data, '=' * 100) # But still works ! Apparently, handler = urllib2.urlopen(url) takes the entire data in buffer and handler.read(100) is just reading from the buffer ? Is it possible to get content from the net in chunks ? Or do I need to use HTTPClient/sockets directly ? Thanks -- http://mail.python.org/mailman/listinfo/python-list yes you can do it with urllib(2). Please take a look at the following HTTP headers which facilitiate this kind of transfers Content-Range http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.16 Range http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35 You can set the values for these and send the request to get partial contents. However let me warn you that, not all servers allow this. -- http://mail.python.org/mailman/listinfo/python-list
Re: download x bytes at a time over network
Please excuse my replying to a reply instead of the original, but the original doesn't show up on my news feed. On 2009/3/16 Saurabh phoneth...@gmail.com: I want to download content from the net - in chunks of x bytes or characters at a time - so that it doesnt pull the entire content in one shot. import urllib2 url = http://python.org/; handler = urllib2.urlopen(url) data = handler.read(100) print Content :\n%s \n%s \n%s % ('=' * 100, data, '=' * 100) # Disconnect the internet data = handler.read(100) # I want it to throw an exception because it # cant fetch from the net print Content :\n%s \n%s \n%s % ('=' * 100, data, '=' * 100) # But still works ! Perhaps you have a local caching web proxy server, which has downloaded the entire document, and even though you have disconnected the Internet, you haven't disconnected the proxy which is feeding you the rest of the file from the server's cache. Apparently, handler = urllib2.urlopen(url) takes the entire data in buffer and handler.read(100) is just reading from the buffer ? I could be wrong, but I don't think so. I think you have the right approach. By default, urllib2 uses a ProxyHandler which will try to auto-detect any proxies (according to environment variables, registry entries or similar). -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: download x bytes at a time over network
Heres the reason behind wanting to get chunks at a time. Im actually retrieving data from a list of RSS Feeds and need to continuously check for latest posts. But I dont want to depend on Last-Modified header or the pubDate tag in channel. Because a lot of feeds just output date('now') instead of the actual last-updated timestamp. But when continuously checking for latest posts, I dont want to bombard other people's bandwidth - so I just want to get chunks of bytes at a time and internally check for item.../item with my database against timestamp values. Is there a better way to achieve this ? -- http://mail.python.org/mailman/listinfo/python-list
Re: download x bytes at a time over network
Saurabh phoneth...@gmail.com wrote: This isn't exactly how things work. The server *sends* you bytes. It can send you a lot at once. To some extent you can control how much it sends before it waits for you to catch up, but you don't have anywhere near byte-level control (you might have something like 32kb or 64kb level control). What abt in Python3 ? It seems to have some header like the one below : b'b495 - binary mode with 46229 bytes ? Or is it something else ? import urllib.request url = http://feeds2.feedburner.com/jquery/; handler = urllib.request.urlopen(url) data = handler.read(1000) print(Content :\n%s \n%s \n%s % ('=' * 100, data, '=' * 100)) Content : b'b495\r\n?xml version=1.0 encoding=UTF-8?\r\n?xml-stylesheet That b'... is the string representation of the bytes object returned by urllib.request. Remember that in python3 bytes and strings are two very different types. You get bytes from urllib.request because urllib can't know what encoding the bytes are in. You have to decide how to decode them in order to convert it into a unicode string object. -- R. David Murray http://www.bitdance.com -- http://mail.python.org/mailman/listinfo/python-list
Re: download x bytes at a time over network
On Tue, 17 Mar 2009 13:38:31 +0530, Saurabh phoneth...@gmail.com wrote: Heres the reason behind wanting to get chunks at a time. Im actually retrieving data from a list of RSS Feeds and need to continuously check for latest posts. But I dont want to depend on Last-Modified header or the pubDate tag in channel. Because a lot of feeds just output date('now') instead of the actual last-updated timestamp. But when continuously checking for latest posts, I dont want to bombard other people's bandwidth - so I just want to get chunks of bytes at a time and internally check for item.../item with my database against timestamp values. Is there a better way to achieve this ? I don't know much about RSS, but one approach is If they are too lazy to provide the information which protects their bandwidth, they deserve being bombarded. But they also deserve a polite mail telling them that they have that problem. /Jorgen -- // Jorgen Grahn grahn@Ph'nglui mglw'nafh Cthulhu \X/ snipabacken.se R'lyeh wgah'nagl fhtagn! -- http://mail.python.org/mailman/listinfo/python-list
Re: download x bytes at a time over network
On Tue, 17 Mar 2009 12:15:23 +0530, Saurabh phoneth...@gmail.com wrote: This isn't exactly how things work. The server *sends* you bytes. It can send you a lot at once. To some extent you can control how much it sends before it waits for you to catch up, but you don't have anywhere near byte-level control (you might have something like 32kb or 64kb level control). What abt in Python3 ? It seems to have some header like the one below : b'b495 - binary mode with 46229 bytes ? Or is it something else ? That's just a bug in urllib in Python 3.0. Jean-Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: download x bytes at a time over network
Jean-Paul Calderone exar...@divmod.com wrote: On Tue, 17 Mar 2009 12:15:23 +0530, Saurabh phoneth...@gmail.com wrote: This isn't exactly how things work. The server *sends* you bytes. It can send you a lot at once. To some extent you can control how much it sends before it waits for you to catch up, but you don't have anywhere near byte-level control (you might have something like 32kb or 64kb level control). What abt in Python3 ? It seems to have some header like the one below : b'b495 - binary mode with 46229 bytes ? Or is it something else ? That's just a bug in urllib in Python 3.0. What makes you say that's a bug? Did I miss something? (Which is entirely possible!) -- R. David Murray http://www.bitdance.com -- http://mail.python.org/mailman/listinfo/python-list
Re: download x bytes at a time over network
http://code.activestate.com/recipes/114217/ On Tue, Mar 17, 2009 at 1:38 PM, Saurabh phoneth...@gmail.com wrote: Heres the reason behind wanting to get chunks at a time. Im actually retrieving data from a list of RSS Feeds and need to continuously check for latest posts. But I dont want to depend on Last-Modified header or the pubDate tag in channel. Because a lot of feeds just output date('now') instead of the actual last-updated timestamp. But when continuously checking for latest posts, I dont want to bombard other people's bandwidth - so I just want to get chunks of bytes at a time and internally check for item.../item with my database against timestamp values. Is there a better way to achieve this ? -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: download x bytes at a time over network
On Tue, 17 Mar 2009 15:17:56 + (UTC), R. David Murray rdmur...@bitdance.com wrote: Jean-Paul Calderone exar...@divmod.com wrote: On Tue, 17 Mar 2009 12:15:23 +0530, Saurabh phoneth...@gmail.com wrote: This isn't exactly how things work. The server *sends* you bytes. It can send you a lot at once. To some extent you can control how much it sends before it waits for you to catch up, but you don't have anywhere near byte-level control (you might have something like 32kb or 64kb level control). What abt in Python3 ? It seems to have some header like the one below : b'b495 - binary mode with 46229 bytes ? Or is it something else ? That's just a bug in urllib in Python 3.0. What makes you say that's a bug? Did I miss something? (Which is entirely possible!) I saw it in the Python issue tracker. :) Python 3.0 broke handling of chunked HTTP responses. Instead of interpreting the chunk length prefixes, it delivered them as part of the response. Jean-Paul -- http://mail.python.org/mailman/listinfo/python-list
Re: download x bytes at a time over network
Jean-Paul Calderone exar...@divmod.com wrote: On Tue, 17 Mar 2009 15:17:56 + (UTC), R. David Murray rdmur...@bitdance.com wrote: Jean-Paul Calderone exar...@divmod.com wrote: On Tue, 17 Mar 2009 12:15:23 +0530, Saurabh phoneth...@gmail.com wrote: This isn't exactly how things work. The server *sends* you bytes. It can send you a lot at once. To some extent you can control how much it sends before it waits for you to catch up, but you don't have anywhere near byte-level control (you might have something like 32kb or 64kb level control). What abt in Python3 ? It seems to have some header like the one below : b'b495 - binary mode with 46229 bytes ? Or is it something else ? That's just a bug in urllib in Python 3.0. What makes you say that's a bug? Did I miss something? (Which is entirely possible!) I saw it in the Python issue tracker. :) Python 3.0 broke handling of chunked HTTP responses. Instead of interpreting the chunk length prefixes, it delivered them as part of the response. Ah, got you. Thanks for the info. -- R. David Murray http://www.bitdance.com -- http://mail.python.org/mailman/listinfo/python-list
Re: download x bytes at a time over network
En Tue, 17 Mar 2009 17:09:35 -0200, R. David Murray rdmur...@bitdance.com escribió: Jean-Paul Calderone exar...@divmod.com wrote: On Tue, 17 Mar 2009 15:17:56 + (UTC), R. David Murray rdmur...@bitdance.com wrote: Jean-Paul Calderone exar...@divmod.com wrote: On Tue, 17 Mar 2009 12:15:23 +0530, Saurabh phoneth...@gmail.com wrote: It seems to have some header like the one below : b'b495 - binary mode with 46229 bytes ? Or is it something else ? That's just a bug in urllib in Python 3.0. What makes you say that's a bug? Did I miss something? (Which is entirely possible!) I saw it in the Python issue tracker. :) Python 3.0 broke handling of chunked HTTP responses. Instead of interpreting the chunk length prefixes, it delivered them as part of the response. Ah, got you. Thanks for the info. Just for completeness, here is the tracker issue: http://bugs.python.org/issue4631 -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list
Re: download x bytes at a time over network
Saurabh wrote: Heres the reason behind wanting to get chunks at a time. Im actually retrieving data from a list of RSS Feeds and need to continuously check for latest posts. But I dont want to depend on Last-Modified header or the pubDate tag in channel. Because a lot of feeds just output date('now') instead of the actual last-updated timestamp. But when continuously checking for latest posts, I dont want to bombard other people's bandwidth - so I just want to get chunks of bytes at a time and internally check for item.../item with my database against timestamp values. Is there a better way to achieve this ? For the feeds that *do* set Last-Modified properly, won't you be using *more* bandwidth by downloading part of the feed instead of just using If-Modified-Since? -- -- http://mail.python.org/mailman/listinfo/python-list
download x bytes at a time over network
I want to download content from the net - in chunks of x bytes or characters at a time - so that it doesnt pull the entire content in one shot. import urllib2 url = http://python.org/; handler = urllib2.urlopen(url) data = handler.read(100) print Content :\n%s \n%s \n%s % ('=' * 100, data, '=' * 100) # Disconnect the internet data = handler.read(100) # I want it to throw an exception because it cant fetch from the net print Content :\n%s \n%s \n%s % ('=' * 100, data, '=' * 100) # But still works ! Apparently, handler = urllib2.urlopen(url) takes the entire data in buffer and handler.read(100) is just reading from the buffer ? Is it possible to get content from the net in chunks ? Or do I need to use HTTPClient/sockets directly ? Thanks -- http://mail.python.org/mailman/listinfo/python-list
Re: download x bytes at a time over network
On Mon, 16 Mar 2009 13:02:07 +0530, Saurabh phoneth...@gmail.com wrote: I want to download content from the net - in chunks of x bytes or characters at a time - so that it doesnt pull the entire content in one shot. This isn't exactly how things work. The server *sends* you bytes. It can send you a lot at once. To some extent you can control how much it sends before it waits for you to catch up, but you don't have anywhere near byte-level control (you might have something like 32kb or 64kb level control). If you try downloading a large enough file and increasing the numbers in your example, then you can probably see the behavior you were expecting. Jean-Paul -- http://mail.python.org/mailman/listinfo/python-list