Nutch URL filter help

ajaxtrend Mon, 03 Dec 2007 18:13:23 -0800

Hello Group,
                   I am trying to fell all URLs from http://xyz.org, where url 
structure is http://xyz.org/2007/12/23 pattern.
   
  In urls/my.txt file contains : http://xyz.org
  in conf/crawl-urlfilter.txt has filder crawl-urlfilter.txt : 
+^http://indianeconomy.org/[0-9]{4}/[0-9]{2}/[0-9]{2}/\\w*
   
  But nutch still fetches other URL from http://xyz.org too like http://abc.com 
etc....
   
  I am not sure, whether I am doing anything wrong and I would appreciate your 
help on this.
   
  regards,
  Ranjan


       
---------------------------------
Get easy, one-click access to your favorites.  Make Yahoo! your homepage.

Nutch URL filter help

Reply via email to