Hi, 

I want to get every page under http://dir.uk4net.com/. 

Here is my code :

class Uk4NetSpider(CrawlSpider):
    name = "uk4net"
    allowed_domains = ["http://dir.uk4net.com/";]
    start_urls = [ "http://dir.uk4net.com/";]
    rules = (
        Rule(LxmlLinkExtractor(allow=()), callback="parse_items"),
    )

    def parse_item(self, response):
        ...


For some reason, only links from other domains are extracted in the start 
url. Does this have to do with the fact that all internal Urls are 
relative? If so, how can I capture this?

Thanks,

Valentino 

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to