Hi, I have installed Nutch0.9 and crawled news website. I got hits also. After that I recrawled the same site. At that time I didn't get the hits for new pages. But I saw update urls in the log file.
EX: I crawled on 17th. Again I recrawled on 23th. I saw the 23th urls in the log file like. "indexer.Indexer - Indexing [http://......./2007.07.23.html] with analyzer [EMAIL PROTECTED] (null)" Is there any error on "[EMAIL PROTECTED] (null)"? Please help me how to recrawl any website. I have used following code for recrawl bin/nutch generate $1/crawldb $1/segments -adddays 5 segment=`ls -d $1/segments/* | tail -1 | grep "[a-zA-Z0-9/]*"` bin/nutch fetch $segment bin/nutch updatedb $1/crawldb $segment bin/nutch generate $1/crawldb $1/segments -adddays 5 s2=`ls -d $1/segments/2* | tail -1` bin/nutch fetch $s2 bin/nutch updatedb $1/crawldb $s2 bin/nutch generate $1/crawldb $1/segments -adddays 5 s3=`ls -d $1/segments/2* | tail -1` bin/nutch fetch $s3 bin/nutch updatedb $1/crawldb $s3 rm -r $1/indexes bin/nutch invertlinks $1/linkdb $1/segments/* bin/nutch index $1/indexes $1/crawldb $1/linkdb $1/segments/* Thanks in advance. Regards, Anuradha. Why delete messages? Unlimited storage is just a click away. Go to http://help.yahoo.com/l/in/yahoo/mail/yahoomail/tools/tools-08.html
