Hi I am new to nutch so I am just starting my way in. I want to crawl a specific page and under that page, I want to crawl specific links.for e.g
I want to crawl only http://nutch.apache.org/downloads.html Under this page I just want to crawl say only *.txt links.Now they can be active links like in or the could be embedded in some div like we mostly saw in variety of forums where a link for file upload/download sites are pasted/embedded in some div etc. like htp://example.com/movie_abcd/firstpart.avi Here I just want to crawl links ended with avi.I am just confused with regex-urlfilter because till now I am only using it and I ma not familiar with other url filters such prefix and suffix urls filters.Does they also play important role in the solution for my problem.How can achieve this. I will be curiously waiting for the answers. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Crawl-and-Index-specific-links-on-specific-page-tp4106524.html Sent from the Nutch - User mailing list archive at Nabble.com.

