That's weird, I get nearly 2000 items running your spider: (with a custom ItemloadItem)
https://gist.github.com/redapple/aa274c729ee912de46ce On Saturday, February 28, 2015 at 8:10:36 PM UTC+1, JEBI93 wrote: > > Here's full script: http://pastebin.com/13eNky9W, after i change from > parse to parse_page i dont get anything scraped. > > субота, 28. фебруар 2015. 16.51.10 UTC+1, Paul Tremberth је написао/ла: >> >> Hi, >> >> CrawlSpider and a custom parse() method do not play well together. See >> the warning a bit below >> http://doc.scrapy.org/en/latest/topics/spiders.html#crawling-rules >> It's easy to miss. >> >> Try renaming your parse() method to something like parse_page(), and >> reference this new callback name in your rule. >> Le 28 févr. 2015 16:17, "JEBI93" <[email protected]> a écrit : >> >>> Hey guys, i have a small problem when trying to crawl 10+ pages. Heres >>> the code: >>> >>> class ItemspiderSpider(CrawlSpider): >>> name = "itemspider" >>> allowed_domains = ["openstacksummitnovember2014paris.sched.org"] >>> start_urls = [' >>> http://openstacksummitnovember2014paris.sched.org/directory/attendees/'] >>> >>> rules = ( >>> Rule(SgmlLinkExtractor(allow=r'/directory/attendees/\d+'), >>> callback='parse', follow=True), >>> ) >>> >>> The problem is that when i run this code i get only results of first >>> page, not the others. I tried to modify start_urls to something like this >>> and it worked fine >>> >>> start_urls = [ >>> 'http://openstacksummitnovember2014paris.sched.org/directory/attendees/1 >>> ' >>> 'http://openstacksummitnovember2014paris.sched.org/directory/attendees/2 >>> ' >>> 'http://openstacksummitnovember2014paris.sched.org/directory/attendees/3 >>> ' >>> 'http://openstacksummitnovember2014paris.sched.org/directory/attendees/4 >>> ' >>> etc.. >>> ] >>> >>> I'm guessing i messed up at allow part, probably my regex its not proper. >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "scrapy-users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/scrapy-users. >>> For more options, visit https://groups.google.com/d/optout. >>> >> -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
