Re: Problem with crawling multiple pages 2

Aaron Tao Wed, 04 Mar 2015 04:29:45 -0800

name = "gumtreeSpider"
    allowed_domains = ["gumtree.com.au"]
    seed = 'http://www.gumtree.com.au/s-jobs/page-%d/c9302?ad=wanted 
<http://www.google.com/url?q=http%3A%2F%2Fwww.gumtree.com.au%2Fs-jobs%2Fpage-1%2Fc9302%3Fad%3Dwanted&sa=D&sntz=1&usg=AFQjCNG81op--ulBIkaJOJDoexLkhBRFQg>
'
    start_urls = [
         seed % i for i in range(10)
    ]


    rules = (
        Rule(SgmlLinkExtractor(allow=('/s-jobs/page-\d+c9302?ad=wanted')), 
callback='parse', follow=True),
    )

On Wednesday, March 4, 2015 at 1:45:57 AM UTC+8, JEBI93 wrote:
>
> Again i don't know how to deal with pagination. Anyway here's problem:
> class GumtreespiderSpider(CrawlSpider):
>     name = "gumtreeSpider"
>     allowed_domains = ["gumtree.com.au"]
>     start_urls = ['http://www.gumtree.com.au/s-jobs/page-1/c9302?ad=wanted
> ']
>
>     rules = (
>         Rule(SgmlLinkExtractor(allow=('/s-jobs/page-\d+c9302?ad=wanted')), 
> callback='parse', follow=True),
>     )
>
> What I'm trying to do is iterate with \d+ to scrape 100+ pages but it 
> returns only first one(start_urls one).
> Here's full script: http://pastebin.com/CYrPvZuc
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Problem with crawling multiple pages 2

Reply via email to