Hello
I have a very simple Scrapy CrawlSpider and I have given it a simple rule
"Crawl/Follow any link that contains '/search/listings'". But the spider is
not crawling/following any of these links?
I have confirmed that the start url contains many links with the href
'/search/listings' so the links are there.
Any idea whats going wrong?
class MySpider(CrawlSpider):
name = "MySpider"
allowed_domains = ["mywebsite.com"]
start_urls = ["http://www.mywebsite.com/results"]
rules = [Rule(LinkExtractor(allow=['/search/listings(.*)']), callback=
"parse2")]
def parse2(self, response):
# This function is never called
log.start("log.txt")
log.msg("Page crawled: " + response.url)
The start url "http://www.mywebsite.com/results" contains these links that
I want the rule to apply to:
<a href='/search/listings?clue=healthcare&eventType=sort&p=2'
class='button
button-pagination' data-page='2' >2</a>
<a href='/search/listings?clue=healthcare&eventType=sort&p=3'
class='button
button-pagination' data-page='3' >3</a>
<a href='/search/listings?clue=healthcare&eventType=sort&p=4'
class='button
button-pagination' data-page='4' >4</a>
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.