Hello All,
  I have been reading the scrapy documentation and mailing lists but cannot 
find an example which works. I don't find the documentation too helpful for 
using process_links().

  All I need to do is analyse each URL as it is processed and make a 
modification to it (in certain circumstances) before passing it back to 
scrapy for spidering.

  As a test, I would just like to print out the URL as it is being 
processed but I cannot even get that to work, example code below which I am 
calling with:  "scrapy runspider test.py" or should I be calling is 
differently? my goal is to create a list of URLs which can be passed to the 
rest of my python code for analysis.

from scrapy.item import Item
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import Selector

class Demo(CrawlSpider):
        name = ['www.linux.com']
        allowed_domains = 'www.linux.com'
        start_urls = ['http://www.linux.com']


        rules = (
                Rule(SgmlLinkExtractor(allow=('')), 
process_links='process_links', follow=True),
        )

def process_links(self,links):
        for link in links:
                print 'link: ', link  #I just want to print out each URL as 
it is processed for now
        return links


Thank you!
Paul.

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to