Scrapy Simple Rule Doesn't Follow Links

jillabramov11211 Sat, 16 May 2015 23:09:50 -0700

Hello

I have a very simple Scrapy CrawlSpider and I have given it a simple rule 
"Crawl/Follow any link that contains '/search/listings'". But the spider is 
not crawling/following any of these links?


I have confirmed that the start url contains many links with the href 
'/search/listings' so the links are there.

Any idea whats going wrong?

class MySpider(CrawlSpider):

    name = "MySpider"
    allowed_domains = ["mywebsite.com"]
    start_urls = ["http://www.mywebsite.com/results";]
    rules = [Rule(LinkExtractor(allow=['/search/listings(.*)']), callback=
"parse2")]

    def parse2(self, response):

        # This function is never called
        log.start("log.txt")
        log.msg("Page crawled: " + response.url)

The start url "http://www.mywebsite.com/results"; contains these links that 
I want the rule to apply to: 

<a href='/search/listings?clue=healthcare&amp;eventType=sort&amp;p=2' 
class='button 
button-pagination' data-page='2' >2</a> 
<a href='/search/listings?clue=healthcare&amp;eventType=sort&amp;p=3' 
class='button 
button-pagination' data-page='3' >3</a> 
<a href='/search/listings?clue=healthcare&amp;eventType=sort&amp;p=4' 
class='button 
button-pagination' data-page='4' >4</a> 


-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Scrapy Simple Rule Doesn't Follow Links

Reply via email to