Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines()

Dave Angel Mon, 27 May 2013 17:01:57 -0700

On 05/27/2013 04:47 PM, Bryan Britten wrote:

Hey, everyone!


I'm very new to Python and have only been using it for a couple of days, but 
have some experience in programming (albeit mostly statistical programming in 
SAS or R) so I'm hoping someone can answer this question in a technical way, 
but without using an abundant amount of jargon.

The issue I'm having is that I'm trying to pull information from a website to 
practice Python with, but I'm having trouble getting the data in a timely 
fashion. If I use the following code:

<code>
import json
import urllib

urlStr = "https://stream.twitter.com/1/statuses/sample.json";

twtrDict = [json.loads(line) for line in urllib.urlopen(urlStr)]
</code>

I get a memory issue. I'm running 32-bit Python 2.7 with 4 gigs of RAM if that 
helps at all.


Which OS?

The first question I'd ask is how big this file is. I can't tell, sinceit needs a user name & password to actually get the file. But it's notunusual to need at least double that space in memory, and in Windozeyou're limited to two gig max, regardless of how big your hardware might be.

If you separately fetch the file, then you can experiment with it,including cutting it down to a dozen lines, and see if you can deal withthat much.

How could you fetch it? With wget, with a browser (and saveAs), with asimple loop which uses read(4096) repeatedly and writes each block to alocal file. Don't forget to use 'wb', as you don't know yet what lineendings it might use.

Once you have an idea what the data looks like, you can answer suchquestions as whether it's json at all, whether the lines each contain asingle json record, or what.


For all we know, the file might be a few terabytes in size.


--
DaveA
--
http://mail.python.org/mailman/listinfo/python-list

Re: Reading *.json from URL - json.loads() versus urllib.urlopen.readlines()

Reply via email to