Hi all! I'm newbie using nutch.
I index my web site with the command:/APPS2/nutch-0.9/bin/nutch crawl /APPS2/nutch-0.9/urls/urls.txt -dir /APPS2/nutch-0.9/crawl.db -depth 5 -threads 10
In my crawl-urlfilter.xml file, I have: -^(file|ftp|mailto|https):-\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP|js|wmv|WMV|ra|RA|ram|RAM|css)$
[EMAIL PROTECTED] # accept hosts in MY.DOMAIN.NAME +^http://([a-z0-9]*\.)*(unirioja.es|otrodominio.org)/
