On Fri, 2010-09-03 at 13:29 +0200, Virgil Stokes wrote:
> A more direct question on accessing stock information from Yahoo.
> First, use your browser to go to:  http://finance.yahoo.com/q/cp?s=%
> 5EGSPC+Components
> Now, you see the first 50 rows of a 500 row table of information on
> S&P 500 index. You can LM click on
>   1 -50 of 500 |First|Previous|Next|Last
> below the table to position to any of the 10 pages.
> I would like to use Python to do the following.
> Loop on each of the 10 pages and for each page extract information for
> each row --- How can this be accomplished automatically in Python?
> Let's take the first page (as shown by default). It is easy to see the
> link to the data for "A" is http://finance.yahoo.com/q?s=A. That is, I
> can just move 
> my cursor over the "A" and I see this URL in the message at the bottom
> of my browser (Explorer 8). If I LM click on "A" then I will go to
> this
> link --- Do this!
> You should now see a table which shows information on this stock and
> this is the information that I would like to extract. I would like to
> do this for all 500 stocks without the need to enter the symbols for
> them (e.g. "A", "AA", etc.). It seems clear that this should be
> possible since all the symbols are in the first column of each of the
> 50 tables --- but it is not at all clear how to extract these
> automatically in Python. 
> Hopefully, you understand my problem. Again, I would like Python to
> cycle through these 10 pages and extract this information for each
> symbol in this table.
> --V

Here's a quick hack to get the SP500 symbols from the visual page with
the index letters. From this collection you can then order fifty at a
time from the download facility. (If you get a better idea from Yahoo,
you'll post it of course.)

def get_SP500_symbols ():
        import urllib
        symbols = []
        url = 'http://finance.yahoo.com/q/cp?s=^GSPC&alpha=%c'
        for c in [chr(n) for n in range (ord ('A'), ord ('Z') + 1)]:            
                print url % c
                f = urllib.urlopen (url % c)
                html = f.readlines ()
                f.close ()
                for line in html:
                        if line.lstrip ().startswith ('</script><span 
                                line_split = line.split (':')
                                s = [item.strip ().upper () for item in 
line_split [5].replace ('"',
'').split (',')]
                                symbols.extend (s [:-3])

        return symbols 
        # Not quite 500 (!?)



Reply via email to