Re: suggestion for pull request

Capi Etheriel Wed, 04 Feb 2015 08:04:16 -0800

you can pass arguments to the spider. the spider can store them in its own 
context and have the parse methods use them.
i can't see a general use to passing xpath or ids to a spider, but it 
should be easy with the start requests method. something like


class mySpider(Spider):
    start_url_template = 'http://somehost.com/somepath?product={product_id}'
    def __init__(self, some_product_ids):
        super()....
        self.product_ids = some_product_ids

    def start_requests(self):
        for product_id in product_ids:
            yield 
self.make_request_from_url(self.start_url_template.format(product_id=product_id))
    

Em quinta-feira, 29 de janeiro de 2015 00:15:16 UTC-2, user12345 escreveu:
>
> I'm working on a scrapy project where a "rabbit client" and "crawl worker" 
> work together to consume scrape requests from a queue. These requests have 
> more configuration than a start_url - it could be something like url and a 
> set of xpaths, or a domain-specific configuration, like site-specific 
> product ID (from which we programmatically build the url) and optional 
> identifiers like color, style, and size to further specify the item one 
> wants to scrape.
>
> I'm wondering if it would be desirable to have built-in support for more 
> specific "crawl configurations" like this within the framework? If that's 
> the case, I'd be more than happy to have a design discussion and hash out 
> the details.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: suggestion for pull request

Reply via email to