Hi, I would like to crawl a set of URLs looking for only a specific type of file. For example, all images or RSS feeds.
Right now I can successfully generate/fetch/parse/index, but I don't want to be doing a lot of parse work if I don't need to. I also don't want to parse any of the files I find, I just want to grab a link to it. I understand I can dump the linkdb using "bin/nutch readlinkdb mycrawl/linkdb -dump linkdbout -format csv", but what would be the most efficient nutch cycle to get links to these files without doing a lot of extraneous parsing work? Thanks.

