Hi Matt! In the article found at http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.htmlyou said the re-crawl script have a problem with updating the live search index. In my tests with Nutch version 0.7.2 when I run the script the index could not be update because the tomcat loads it to the memory. Could you suggest a modification to this script or to the NutchBean that accepts modifications to the index without restart tomcat (Actually, I use net stop "Apache Tomcat" before the index updation...)?
Thanks On 7/21/06, Matthew Holt <[EMAIL PROTECTED]> wrote:
Thanks for putting up with all the messages to the list... Here is the recrawl script for 0.8.0 if anyone is interested. Matt ------------------------------- #!/bin/bash # Nutch recrawl script. # Based on 0.7.2 script at http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.html # Modified by Matthew Holt if [ -n "$1" ] then crawl_dir=$1 else echo "Usage: recrawl crawl_dir [depth] [adddays]" exit 1 fi if [ -n "$2" ] then depth=$2 else depth=5 fi if [ -n "$3" ] then adddays=$3 else adddays=0 fi # EDIT THIS - List the location to your nutch servlet container. nutch_dir=/usr/local/apache-tomcat-5.5.17/webapps/nutch/ # No need to edit anything past this line # webdb_dir=$crawl_dir/crawldb segments_dir=$crawl_dir/segments linkdb_dir=$crawl_dir/linkdb index_dir=$crawl_dir/index # The generate/fetch/update cycle for ((i=1; i <= depth ; i++)) do bin/nutch generate $webdb_dir $segments_dir -adddays $adddays segment=`ls -d $segments_dir/* | tail -1` bin/nutch fetch $segment bin/nutch updatedb $webdb_dir $segment done # Update segments bin/nutch invertlinks $linkdb_dir -dir $segments_dir # Index segments new_indexes=$crawl_dir/newindexes #ls -d $segments_dir/* | tail -$depth | xargs bin/nutch index $new_indexes $webdb_dir $linkdb_dir $segments_dir/* # De-duplicate indexes bin/nutch dedup $new_indexes # Merge indexes bin/nutch merge $index_dir $new_indexes # Tell Tomcat to reload index touch $nutch_dir/WEB-INF/web.xml # Clean up rm -rf $new_indexes
-- Lourival Junior Universidade Federal do Pará Curso de Bacharelado em Sistemas de Informação http://www.ufpa.br/cbsi Msn: [EMAIL PROTECTED]