On Tue, 12 Aug 2014 13:00:30 -0700 (PDT) Simon Evans <musicalhack...@yahoo.co.uk> wrote:
> Dear Programmers, > I have been looking at the You tube 'Web Scraping Tutorials' of Chris Reeves. > I have tried a few of his python programs in the Python27 command prompt, but > altered them from accessing data using links say from the Dow Jones index, to > accessing the details I would be interested in accessing from the 'Racing > Post' on a daily basis. Anyhow, the code it returns is not in the example I > am going to give, is not the information I am seeking, instead of returning > the given odds on a horse, it only returns a [], which isn't much use. > I would be glad if you could tell me where I am going wrong. > Yours faithfully > Simon Evans. > -------------------------------------------------------------------------------- > >>>import urllib > >>>import re > >>>htmlfile = urllib.urlopen("http://www.racingpost.com/horses2/cards/card.sd? > > race_id=600048r_date=2014-05-08#raceTabs=sc_") > htmltext = htmlfile.read() > regex = '<strong>1<a href="http://www.racingpost.com/horses/horse_home.sd? > > horse_id=758752"onclick="scorecards.send("horse_name":):return > Html.popup(this, > > {width:695,height:800})"title="Full details about this HORSE">Lively > > Baron</a>9/4F</strong><br/>' > >>>pattern = re.compile(regex) > >>>odds=re.findall(pattern,htmltext) > >>>print odds > [] > >>> > -------------------------------------------------------------------------------- > >>>import urllib > >>>import re > >>>htmlfile = urllib.urlopen("http://www.racingpost.com/horses2/cards/card.sd? > > >>>race_id=600048r_date=2014-05-08#raceTabs=sc_") > >>>htmltext = htmlfile.read() > >>>regex = '<a></a>' > >>>pattern = re.compile(regex) > >>>odds=re.findall(pattern,htmltext) > >>>print odds > [] > >>> > ------------------------------------------------------------------------------- If you want web scraping, you want to use http://www.crummy.com/software/BeautifulSoup/ . End of story. -- Rob Gaddi, Highland Technology -- www.highlandtechnology.com Email address domain is currently out of order. See above to fix. -- https://mail.python.org/mailman/listinfo/python-list