Hi,

I found I can use crawl-urlfilter.txt to define the
domain limitation by 
"
# accept hosts in MY.DOMAIN.NAME
+^http://([a-z0-9]*\.)*MY.DOMAIN.NAME/
"

But, I found when I didn't use bin/nutch crawl...,
crawl-urlfilter.txt won't help me to filter out the
domain I don't want.

Can I use regex-urlfiter.txt to define the domain as
crawl-urlfiter.txt does? 

thanks,

Michael Ji


                
__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. 
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to