Many thanks, Martin! I had indeed skipped creating the tree object and a few other things you pointed out. Here is my finished simple code that actually works:
from lxml import html import requests page = requests.get("http://joplin.craigslist.org/search/w4m") tree = html.fromstring(page.text) titles = tree.xpath('//a[@class="hdrlnk"]/text()') try: for title in titles: print title except: pass Pretty simple. Thanks for the help! On Sat, Aug 22, 2015 at 4:20 PM Martin A. Brown <mar...@linux-ip.net> wrote: > > Hi there Anthony, > > > I'm pretty new to lxml but I pretty much thought I'd understood > > the basics. However, for some reason, my first attempt at using it > > is failing miserably. > > > > Here's the deal: > > > > I'm parsing specific page on Craigslist ( > > http://joplin.craigslist.org/search/rea) and trying to retreive the > text of > > each link on that page. When I do an "inspect element" in Firefox, a > sample > > anchor link looks like this: > > > > <a href="/reb/5185592209.html" data-id="5185592209" class="hdrlnk">FIRST > > OPEN HOUSE TOMORROW 2:00pm-4:00pm!!! (8-23-15)</a> > > > > The code I'm using to try to get the link text is this: > > > > from lxml import html > > import requests > > > > page = requests.get("http://joplin.craigslist.org/search/rea") > > You are missing something here that takes the page.content, parses > it and creates variable called tree. > > > titles = tree.xpath('//a[@title="hdrlnk"]/text()') > > And, your xpath is incorrect. Play with this in the interactive > browser and you will be able to correct your xpath. I think you > will notice from the example anchor link above that the attribute of > the <a/> HTML elements you want to grab is "class", not "title". > Therefore: > > titles = tree.xpath('//a[@class="hdrlnk"]/text()') > > Is probably closer. > > > print titles > > > > The last line, where it supposedly will print the text of each anchor > > returns []. > > > > I can't seem to figure out what I'm doing wrong. lmxml seems pretty > > straightforward but I can't seem to get this down. > > Again, I'd recommend playing with the data in an interactive console > session. You will be able to figure out exactly which xpath gets > you the data you would like, and then you can drop it into your > script. > > Good luck, > > -Martin > > -- > Martin A. Brown > http://linux-ip.net/ > _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor