you can pass arguments to the spider. the spider can store them in its own
context and have the parse methods use them.
i can't see a general use to passing xpath or ids to a spider, but it
should be easy with the start requests method. something like
class mySpider(Spider):
start_url_template = 'http://somehost.com/somepath?product={product_id}'
def __init__(self, some_product_ids):
super()....
self.product_ids = some_product_ids
def start_requests(self):
for product_id in product_ids:
yield
self.make_request_from_url(self.start_url_template.format(product_id=product_id))
Em quinta-feira, 29 de janeiro de 2015 00:15:16 UTC-2, user12345 escreveu:
>
> I'm working on a scrapy project where a "rabbit client" and "crawl worker"
> work together to consume scrape requests from a queue. These requests have
> more configuration than a start_url - it could be something like url and a
> set of xpaths, or a domain-specific configuration, like site-specific
> product ID (from which we programmatically build the url) and optional
> identifiers like color, style, and size to further specify the item one
> wants to scrape.
>
> I'm wondering if it would be desirable to have built-in support for more
> specific "crawl configurations" like this within the framework? If that's
> the case, I'd be more than happy to have a design discussion and hash out
> the details.
>
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.