Thnks S.P for your quik response
> 1. Check logs/hadoop.log file. Do you see any lines
> containing the
> string "fetching". Such lines should clearly show
> what URLs have been
> fetched.
there are many fetchinf line there, i think it's not for this reason.
> 2. One reason may be that all URLs are blocked in
> conf/crawl-urlfilter.txt. Did you edit this file as per the
> tutorial?
> If not, this is most certainly the problem. An easy way to
> allow all
> URLs would be to replace the .- in the end with .+
>
yes, like this:
# accept hosts in MY.DOMAIN.NAME
+^http://([a-z0-9]*\.)*hustoo.net/
# skip everything else
+.
what do you think about Tomcat
6.0\webapps\nutch-0.9\WEB-INF\classesnutch-site.xml:
<configuration>
<property>
<name>searcher.dir</name>
<value>C:\nutch-0.9\crawl\</value>
</property>
</configuration>
in ths first i think it's a "\" problem or the path in generally ??
THANKS for any suggestion..
__________________________________________________
Do You Yahoo!?
En finir avec le spam? Yahoo! Mail vous offre la meilleure protection possible
contre les messages non sollicités
http://mail.yahoo.fr Yahoo! Mail