Re: Newbie: XPath Basics

Peter Wed, 25 Jan 2017 21:09:37 -0800

>
> The attribute match should be completely contained within the square 
> brackets:
>


Oh, that's a pretty gross mistake.   OK, syntax is noted.  I won't make 
that mistake again.

Additionally, notice that the anchor tags you're trying to get are *not* 
children 
> of the divs you're selecting.
> They are *siblings*, which means they are at the same "level" in the 
> markup hierarchy as the divs.
>

Aiee!  Yeah, totally understood.  The reason why I focused on the sibling 
div (lesson-status-icon) is because I couldn't find anything to "grab" onto 
in the tag, although now that I think about it, maybe something like:

response.xpath("//a[@href='/lessons/*'"]

would get me the list I wanted without resorting to the roundabout method 
of finding the sibling tag and going one past it.  Thanks for the lesson -- 
you've given me something to experiment and play with.  Much appreciated!

Pete




On Wednesday, January 25, 2017 at 4:29:59 PM UTC-5, Joey Espinosa wrote:
>
> First, this part is wrong:
>
>       for div in response.xpath("//div[@class]='lesson-status-icon'"):
>
> The attribute match should be completely contained within the square 
> brackets:
>
>       for div in response.xpath("//div[@class='lesson-status-icon']"):
>
> Additionally, notice that the anchor tags you're trying to get are *not* 
> children 
> of the divs you're selecting. They are *siblings*, which means they are 
> at the same "level" in the markup hierarchy as the divs. If you are 
> insistent on selecting those divs (maybe because they're more reliably 
> selectable to you?), then you can use the "following-sibling" selector:
>
>       for anchor in 
> response.xpath("//div[@class='lesson-status-icon']/following-sibling::a/@href"):
>           print anchor.extract()
>
> I can't check it right now, but give that a shot.
>
>
> On Tue, Jan 24, 2017 at 9:32 PM Peter <p...@dirac.org <javascript:>> 
> wrote:
>
>> Trying to scrape some URLs from this page (the stuff highlighted in 
>> yellow is what I'm looking for):
>>
>>
>>
>>
>>
>> I didn't quite understand the section on selectors and XPath, but this 
>> was my attempt at getting those URLs:
>>
>>
>>    def grab_page(self, response):
>>       for div in response.xpath("//div[@class]='lesson-status-icon'"):
>>          print( div.xpath("a[@href]").extract() )
>>          print( div.xpath("a[@href]/text()").extract() )
>>          print( div.extract() )
>>       for div in response.xpath("//div[@class]='lesson-status-icon'").
>> xpath("/a[@href]"):
>>          print( div.text().extract() )
>>
>>
>> I'm flailing and drowning.  Can someone please put me on the right path?  
>> What's the right syntax to grab the URLs?
>>
>>
>> Thanks!!!
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> <https://lh3.googleusercontent.com/-MeiUXL6STxs/WIgIqtFhtPI/AAAAAAAACHU/KO9L90WRgMI9xKA2azq9upycDjQYXxo4ACLcB/s1600/cpod.jpg>
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "scrapy-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to scrapy-users...@googlegroups.com <javascript:>.
>> To post to this group, send email to scrapy...@googlegroups.com 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/scrapy-users.
>> For more options, visit https://groups.google.com/d/optout.
>>
> -- 
> Respectfully,
>
> *Joey Espinosa*
> Chief Technology Officer
> *Vote.org* <https://www.vote.org/>
> Phone: (305) 747-1711
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to scrapy-users+unsubscr...@googlegroups.com.
To post to this group, send email to scrapy-users@googlegroups.com.
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Newbie: XPath Basics

Reply via email to