How to track number of domains which have been crawled in Scrapy

Alok Singh Mahor Thu, 06 Mar 2014 01:04:20 -0800


Hi everyone,


I am writing Scrapy Spider which will crawl about 1000 domains. I am 
thinking if there is any way to track number of domains crawled. because it 
will take long time to crawl 1000 domains in one using process. 

if I could track number of domains process then I can trigger some task 
like sending email after crawling of 100 domains out of 1000.

I tried to find on internet but could not get relevant.

if anyone know someway please tell me. if I would not find any way then I 
have to track number of urls crawled. but it would be good if number of 
domains can be tracked.


class MySpider(CrawlSpider):
    name = 'alok2'
    # 'list.txt' file have domains which I have to crawl
    allowed_domains = [i.split('\n')[0] for i in 
open('list.txt','r').readlines()]
    start_urls = ['http://'+i.split('\n')[0] for i in 
open('list.txt','r').readlines()]
    rules = [Rule(SgmlLinkExtractor(), callback='parse_item',follow=True)]

    def __init__(self,category=None, *args, **kwargs):
        super(MySpider, self).__init__(*args, **kwargs)
        self.count=0 #this is to keep track of domains whose all links have 
been crawled

    def parse_start_url(self, response):
        self.parse_item(response)

    def parse_item(self, response):
        #lines
        #lines

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/groups/opt_out.

How to track number of domains which have been crawled in Scrapy

Reply via email to