error nutch recrawl

2009-07-07 Thread Maurizio Croci
Hi, I try to REcrawl (with a shell-script. I have already a webDB...) a website (with some links to other webpage, .html, .doc, .pdf, ...) but this error occured: ... Generator: Selecting best-scoring urls due for fetch. Generator: starting Generator: segment: crawl/segments/20090703140431

Re: error nutch recrawl

2009-07-07 Thread xiao yang
You can use bin/hadoop fs -rmr crawl to delete the whole directory and Recrawl. On Tue, Jul 7, 2009 at 1:47 AM, Maurizio Croci croci.mauri...@gmail.comwrote: Hi, I try to REcrawl (with a shell-script. I have already a webDB...) a website (with some links to other webpage, .html, .doc, .pdf,

Re: Problems when deploy nutch-1.0.war

2009-07-07 Thread claus westerkamp
Hello, I think you need to check /etc/default/tomcat6 for TOMCAT5_SECURITY Alex McLintock schrieb: OK, here is how i fixed this in my ubuntu 9.04 setup using the normal tomcatt6 ubuntu package. I added this permission line to the /etc/tomcat6/policy.d/04webapps.policy grant { // Attempt to

Solr Integration since v1.0 ?

2009-07-07 Thread Alex McLintock
I've looked at a lot of tutorials for linking Nutch and Solr but it seems that this has been improved a lot in version 1.0. Can anyone point me at documentation which takes this into account? Cheers Alex