> > The attribute match should be completely contained within the square > brackets: >
Oh, that's a pretty gross mistake. OK, syntax is noted. I won't make that mistake again. Additionally, notice that the anchor tags you're trying to get are *not* children > of the divs you're selecting. > They are *siblings*, which means they are at the same "level" in the > markup hierarchy as the divs. > Aiee! Yeah, totally understood. The reason why I focused on the sibling div (lesson-status-icon) is because I couldn't find anything to "grab" onto in the tag, although now that I think about it, maybe something like: response.xpath("//a[@href='/lessons/*'"] would get me the list I wanted without resorting to the roundabout method of finding the sibling tag and going one past it. Thanks for the lesson -- you've given me something to experiment and play with. Much appreciated! Pete On Wednesday, January 25, 2017 at 4:29:59 PM UTC-5, Joey Espinosa wrote: > > First, this part is wrong: > > for div in response.xpath("//div[@class]='lesson-status-icon'"): > > The attribute match should be completely contained within the square > brackets: > > for div in response.xpath("//div[@class='lesson-status-icon']"): > > Additionally, notice that the anchor tags you're trying to get are *not* > children > of the divs you're selecting. They are *siblings*, which means they are > at the same "level" in the markup hierarchy as the divs. If you are > insistent on selecting those divs (maybe because they're more reliably > selectable to you?), then you can use the "following-sibling" selector: > > for anchor in > response.xpath("//div[@class='lesson-status-icon']/following-sibling::a/@href"): > print anchor.extract() > > I can't check it right now, but give that a shot. > > > On Tue, Jan 24, 2017 at 9:32 PM Peter <p...@dirac.org <javascript:>> > wrote: > >> Trying to scrape some URLs from this page (the stuff highlighted in >> yellow is what I'm looking for): >> >> >> >> >> >> I didn't quite understand the section on selectors and XPath, but this >> was my attempt at getting those URLs: >> >> >> def grab_page(self, response): >> for div in response.xpath("//div[@class]='lesson-status-icon'"): >> print( div.xpath("a[@href]").extract() ) >> print( div.xpath("a[@href]/text()").extract() ) >> print( div.extract() ) >> for div in response.xpath("//div[@class]='lesson-status-icon'"). >> xpath("/a[@href]"): >> print( div.text().extract() ) >> >> >> I'm flailing and drowning. Can someone please put me on the right path? >> What's the right syntax to grab the URLs? >> >> >> Thanks!!! >> >> >> >> >> >> >> >> >> >> >> <https://lh3.googleusercontent.com/-MeiUXL6STxs/WIgIqtFhtPI/AAAAAAAACHU/KO9L90WRgMI9xKA2azq9upycDjQYXxo4ACLcB/s1600/cpod.jpg> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "scrapy-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to scrapy-users...@googlegroups.com <javascript:>. >> To post to this group, send email to scrapy...@googlegroups.com >> <javascript:>. >> Visit this group at https://groups.google.com/group/scrapy-users. >> For more options, visit https://groups.google.com/d/optout. >> > -- > Respectfully, > > *Joey Espinosa* > Chief Technology Officer > *Vote.org* <https://www.vote.org/> > Phone: (305) 747-1711 > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to scrapy-users+unsubscr...@googlegroups.com. To post to this group, send email to scrapy-users@googlegroups.com. Visit this group at https://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.