hi

how or where can i define the urls while crawling
i want to index only the sites which has a certain link format eg.

http://www.myCompany.com/myServlet?
(while crawling i have now all the links under my company host but i need
more filtering)

# accept hosts in MY.DOMAIN.NAME
+^http://([a-z0-9]*\.)*myCompany.com/

index  all pages whose link starts with
"http://www.myCompany.com/myServlet?";.....

thnx for any idea

regards
cem
-- 
View this message in context: 
http://www.nabble.com/fetch-pattern-tp22101517p22101517.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to