On Fri, 2010-09-03 at 13:29 +0200, Virgil Stokes wrote: > A more direct question on accessing stock information from Yahoo. > > First, use your browser to go to: http://finance.yahoo.com/q/cp?s=% > 5EGSPC+Components > > Now, you see the first 50 rows of a 500 row table of information on > S&P 500 index. You can LM click on > > 1 -50 of 500 |First|Previous|Next|Last > > below the table to position to any of the 10 pages. > > I would like to use Python to do the following. > > Loop on each of the 10 pages and for each page extract information for > each row --- How can this be accomplished automatically in Python? > > Let's take the first page (as shown by default). It is easy to see the > link to the data for "A" is http://finance.yahoo.com/q?s=A. That is, I > can just move > my cursor over the "A" and I see this URL in the message at the bottom > of my browser (Explorer 8). If I LM click on "A" then I will go to > this > link --- Do this! > > You should now see a table which shows information on this stock and > this is the information that I would like to extract. I would like to > do this for all 500 stocks without the need to enter the symbols for > them (e.g. "A", "AA", etc.). It seems clear that this should be > possible since all the symbols are in the first column of each of the > 50 tables --- but it is not at all clear how to extract these > automatically in Python. > > Hopefully, you understand my problem. Again, I would like Python to > cycle through these 10 pages and extract this information for each > symbol in this table. > > --V > > >
Here's a quick hack to get the SP500 symbols from the visual page with the index letters. From this collection you can then order fifty at a time from the download facility. (If you get a better idea from Yahoo, you'll post it of course.) def get_SP500_symbols (): import urllib symbols = [] url = 'http://finance.yahoo.com/q/cp?s=^GSPC&alpha=%c' for c in [chr(n) for n in range (ord ('A'), ord ('Z') + 1)]: print url % c f = urllib.urlopen (url % c) html = f.readlines () f.close () for line in html: if line.lstrip ().startswith ('</script><span id="yfs_params_vcr"'): line_split = line.split (':') s = [item.strip ().upper () for item in line_split [5].replace ('"', '').split (',')] symbols.extend (s [:-3]) return symbols # Not quite 500 (!?) Frederic -- http://mail.python.org/mailman/listinfo/python-list