Deep Crawl for a single domain

Valentino Hudhra Thu, 18 Dec 2014 09:42:57 -0800

Hi, 

I want to get every page under http://dir.uk4net.com/.


Here is my code :

class Uk4NetSpider(CrawlSpider):
    name = "uk4net"
    allowed_domains = ["http://dir.uk4net.com/";]
    start_urls = [ "http://dir.uk4net.com/";]
    rules = (
        Rule(LxmlLinkExtractor(allow=()), callback="parse_items"),
    )

    def parse_item(self, response):
        ...


For some reason, only links from other domains are extracted in the start 
url. Does this have to do with the fact that all internal Urls are 
relative? If so, how can I capture this?

Thanks,

Valentino 

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Deep Crawl for a single domain

Reply via email to