On Monday, May 27, 2013 7:58:05 PM UTC-4, Dave Angel wrote: > On 05/27/2013 04:47 PM, Bryan Britten wrote: > > > Hey, everyone! > > > > > > I'm very new to Python and have only been using it for a couple of days, > > but have some experience in programming (albeit mostly statistical > > programming in SAS or R) so I'm hoping someone can answer this question in > > a technical way, but without using an abundant amount of jargon. > > > > > > The issue I'm having is that I'm trying to pull information from a website > > to practice Python with, but I'm having trouble getting the data in a > > timely fashion. If I use the following code: > > > > > > <code> > > > import json > > > import urllib > > > > > > urlStr = "https://stream.twitter.com/1/statuses/sample.json" > > > > > > twtrDict = [json.loads(line) for line in urllib.urlopen(urlStr)] > > > </code> > > > > > > I get a memory issue. I'm running 32-bit Python 2.7 with 4 gigs of RAM if > > that helps at all. > > > > Which OS?
I'm operating on Windows 7. > > The first question I'd ask is how big this file is. I can't tell, since > > it needs a user name & password to actually get the file. If you have Twitter, you can just use your log-in information to access the file. > But it's not unusual to need at least double that space in memory, and in > Windoze > > you're limited to two gig max, regardless of how big your hardware might be. > > > > If you separately fetch the file, then you can experiment with it, > > including cutting it down to a dozen lines, and see if you can deal with > > that much. > > > > How could you fetch it? With wget, with a browser (and saveAs), with a > > simple loop which uses read(4096) repeatedly and writes each block to a > > local file. Don't forget to use 'wb', as you don't know yet what line > > endings it might use. > I'm not familiar with using read(4096), I'll have to look into that. When I tried to just save the file, my computer just sat in limbo for some time and didn't seem to want to process the command. > > Once you have an idea what the data looks like, you can answer such > > questions as whether it's json at all, whether the lines each contain a > > single json record, or what. > Based on my *extremely* limited knowledge of JSON, that's definitely the type of file this is. Here is a snippet of what is seen when you log in: {"created_at":"Tue May 28 03:09:23 +0000 2013","id":339216806461972481,"id_str":"339216806461972481","text":"RT @aleon_11: Sigo creyendo que las noches lluviosas me acercan mucho m\u00e1s a ti!","source":"\u003ca href=\"http:\/\/blackberry.com\/twitter\" rel=\"nofollow\"\u003eTwitter for BlackBerry\u00ae\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":310910123,"id_str":"310910123","name":"\u2661","screen_name":"LaMarielita_","location":"","url":null,"description":"MERCADOLOGA & PUBLICISTA EN PROCESO, AMO A MI DIOS & MI FAMILIA\u2665 ME ENCANTA REIRME , MOLESTAR & HABLAR :D BFF, pancho, ale & china :) LY\u2661","protected":false,"followers_count":506,"friends_count":606,"listed_count":1,"created_at":"Sat Jun 04 15:24:19 +0000 2011","favourites_count":207,"utc_offset":-25200,"time_zone":"Mountain Time (US & Canada)","geo_enabled":false," verified":false,"statuses_count":17241,"lang":"es","contributors_enabled":false,"is_translator":false,"profile_background_color":"FF6699","profile_background_image_url":"http:\/\/a0.twimg.com\/images\/themes\/theme11\/bg.gif","profile_background_image_url_https":"https:\/\/si0.twimg.com\/images\/themes\/theme11\/bg.gif","profile_background_tile":true,"profile_image_url":"http:\/\/a0.twimg.com\/profile_images\/3720425493\/13a48910e56ca34edeea07ff04075c77_normal.jpeg","profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/3720425493\/13a48910e56ca34edeea07ff04075c77_normal.jpeg","profile_link_color":"B40B43","profile_sidebar_border_color":"CC3366","profile_sidebar_fill_color":"E5507E","profile_text_color":"362720","profile_use_background_image":true,"default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweeted_status":{"created_at":"Tue May 2 8 02:57:40 +0000 2013","id":339213856922537984,"id_str":"339213856922537984","text":"Sigo creyendo que las noches lluviosas me acercan mucho m\u00e1s a ti!","source":"web","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":105252134,"id_str":"105252134","name":"Alejandra Le\u00f3n","screen_name":"aleon_11","location":"Guatemala","url":null,"description":"La vida se disfruta m\u00e1s, cuando no se le pone tanta importancia.","protected":false,"followers_count":143,"friends_count":251,"listed_count":0,"created_at":"Fri Jan 15 20:49:38 +0000 2010","favourites_count":83,"utc_offset":-28800,"time_zone":"Pacific Time (US & Canada)","geo_enabled":false,"verified":false,"statuses_count":1863,"lang":"es","contributors_enabled":false,"is_translator":false,"profile_background_color":"F8F2FC","profile_background_image_url":"http:\/\/a0.twimg.com\/profile_background_ images\/811443451\/81abf2f37ee3e37deda396befa7fb557.jpeg","profile_background_image_url_https":"https:\/\/si0.twimg.com\/profile_background_images\/811443451\/81abf2f37ee3e37deda396befa7fb557.jpeg","profile_background_tile":true,"profile_image_url":"http:\/\/a0.twimg.com\/profile_images\/3578979563\/e973196904e25af5d960f2971616eb61_normal.jpeg","profile_image_url_https":"https:\/\/si0.twimg.com\/profile_images\/3578979563\/e973196904e25af5d960f2971616eb61_normal.jpeg","profile_banner_url":"https:\/\/pbs.twimg.com\/profile_banners\/105252134\/1364957374","profile_link_color":"F01A1A","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"7AC3EE","profile_text_color":"3D1957","profile_use_background_image":true,"default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"retweet_count":2,"favorite_count":0,"entities":{"hashtags":[],"symbols":[],"url s":[],"user_mentions":[]},"favorited":false,"retweeted":false,"lang":"es"},"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"symbols":[],"urls":[],"user_mentions":[{"screen_name":"aleon_11","name":"Alejandra Le\u00f3n","id":105252134,"id_str":"105252134","indices":[3,12]}]},"favorited":false,"retweeted":false,"filter_level":"low"} > > For all we know, the file might be a few terabytes in size. > > > > > > -- > > DaveA -- http://mail.python.org/mailman/listinfo/python-list