Reading *.json from URL - json.loads() versus urllib.urlopen.readlines()

Bryan Britten Mon, 27 May 2013 13:53:26 -0700

Hey, everyone! 

I'm very new to Python and have only been using it for a couple of days, but 
have some experience in programming (albeit mostly statistical programming in 
SAS or R) so I'm hoping someone can answer this question in a technical way, 
but without using an abundant amount of jargon.


The issue I'm having is that I'm trying to pull information from a website to 
practice Python with, but I'm having trouble getting the data in a timely 
fashion. If I use the following code:

<code>
import json
import urllib

urlStr = "https://stream.twitter.com/1/statuses/sample.json";

twtrDict = [json.loads(line) for line in urllib.urlopen(urlStr)]
</code>

I get a memory issue. I'm running 32-bit Python 2.7 with 4 gigs of RAM if that 
helps at all.

If I use the following code:

<code>
import urllib

urlStr = "https://stream.twitter.com/1/statuses/sample.json";

fileHandle = urllib.urlopen(urlStr)

twtrText = fileHandle.readlines()
</code>

It takes hours (upwards of 6 or 7, if not more) to finish computing the last 
command.

With that being said, my question is whether there is a more efficient manner 
to do this. I'm worried that if it's taking this long to process the 
.readlines() command, trying to work with the data is going to be a 
computational nightmare.

Thanks in advance for any insights or advice!
-- 
http://mail.python.org/mailman/listinfo/python-list

Reading *.json from URL - json.loads() versus urllib.urlopen.readlines()

Reply via email to