Hello Group,
I need to index URLs that matches a particular URL pattern
and I have added the pattern in crawl-urlfilter.txt e.g. I want to index all
urls of www.test.com that are sub directory of product so my regex is
+^http://www.text.com/products/.*
urls/my.txt contains following entry
http://www.text.com, that mean I want to start indexing from main page of
www.text.com. However nutch does not index anything and when I run nutch it says
No URLs to fetch - check your seed list and URL filters.
I am sure this muct have been answered. I have already searched archive but
not able to find any suggestion.
I would really appreciate if you can put your valuable suggestion or let me
know the classes to be looked into.
Thanks in advance.
- BR
---------------------------------
Never miss a thing. Make Yahoo your homepage.