name = "gumtreeSpider"
allowed_domains = ["gumtree.com.au"]
seed = 'http://www.gumtree.com.au/s-jobs/page-%d/c9302?ad=wanted
<http://www.google.com/url?q=http%3A%2F%2Fwww.gumtree.com.au%2Fs-jobs%2Fpage-1%2Fc9302%3Fad%3Dwanted&sa=D&sntz=1&usg=AFQjCNG81op--ulBIkaJOJDoexLkhBRFQg>
'
start_urls = [
seed % i for i in range(10)
]
rules = (
Rule(SgmlLinkExtractor(allow=('/s-jobs/page-\d+c9302?ad=wanted')),
callback='parse', follow=True),
)
On Wednesday, March 4, 2015 at 1:45:57 AM UTC+8, JEBI93 wrote:
>
> Again i don't know how to deal with pagination. Anyway here's problem:
> class GumtreespiderSpider(CrawlSpider):
> name = "gumtreeSpider"
> allowed_domains = ["gumtree.com.au"]
> start_urls = ['http://www.gumtree.com.au/s-jobs/page-1/c9302?ad=wanted
> ']
>
> rules = (
> Rule(SgmlLinkExtractor(allow=('/s-jobs/page-\d+c9302?ad=wanted')),
> callback='parse', follow=True),
> )
>
> What I'm trying to do is iterate with \d+ to scrape 100+ pages but it
> returns only first one(start_urls one).
> Here's full script: http://pastebin.com/CYrPvZuc
>
>
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.