Hi Matt and Lourival, Matt, thank you for the recrawl script. Any plans to commit it to trunk?
Lourival, here's in the script what "reloads Tomcat", not the cleanest, but it should work # Tell Tomcat to reload index touch $nutch_dir/WEB-INF/web.xml HTH, Renaud Lourival Júnior wrote: > Hi Matt! > > In the article found at > http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.htmlyou > > said the re-crawl script have a problem with updating the live search > index. In my tests with Nutch version 0.7.2 when I run the script the > index > could not be update because the tomcat loads it to the memory. Could you > suggest a modification to this script or to the NutchBean that accepts > modifications to the index without restart tomcat (Actually, I use net > stop > "Apache Tomcat" before the index updation...)? > > Thanks > > On 7/21/06, Matthew Holt <[EMAIL PROTECTED]> wrote: >> >> Thanks for putting up with all the messages to the list... Here is the >> recrawl script for 0.8.0 if anyone is interested. >> Matt >> ------------------------------- >> >> #!/bin/bash >> >> # Nutch recrawl script. >> # Based on 0.7.2 script at >> http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.html >> >> # Modified by Matthew Holt >> >> if [ -n "$1" ] >> then >> crawl_dir=$1 >> else >> echo "Usage: recrawl crawl_dir [depth] [adddays]" >> exit 1 >> fi >> >> if [ -n "$2" ] >> then >> depth=$2 >> else >> depth=5 >> fi >> >> if [ -n "$3" ] >> then >> adddays=$3 >> else >> adddays=0 >> fi >> >> >> # EDIT THIS - List the location to your nutch servlet container. >> nutch_dir=/usr/local/apache-tomcat-5.5.17/webapps/nutch/ >> >> # No need to edit anything past this line # >> webdb_dir=$crawl_dir/crawldb >> segments_dir=$crawl_dir/segments >> linkdb_dir=$crawl_dir/linkdb >> index_dir=$crawl_dir/index >> >> # The generate/fetch/update cycle >> for ((i=1; i <= depth ; i++)) >> do >> bin/nutch generate $webdb_dir $segments_dir -adddays $adddays >> segment=`ls -d $segments_dir/* | tail -1` >> bin/nutch fetch $segment >> bin/nutch updatedb $webdb_dir $segment >> done >> >> # Update segments >> bin/nutch invertlinks $linkdb_dir -dir $segments_dir >> >> # Index segments >> new_indexes=$crawl_dir/newindexes >> #ls -d $segments_dir/* | tail -$depth | xargs >> bin/nutch index $new_indexes $webdb_dir $linkdb_dir $segments_dir/* >> >> # De-duplicate indexes >> bin/nutch dedup $new_indexes >> >> # Merge indexes >> bin/nutch merge $index_dir $new_indexes >> >> # Tell Tomcat to reload index >> touch $nutch_dir/WEB-INF/web.xml >> >> # Clean up >> rm -rf $new_indexes >> >> > > -- Renaud Richardet COO America Wyona Inc. - Open Source Content Management - Apache Lenya office +1 857 776-3195 mobile +1 617 230 9112 renaud.richardet <at> wyona.com http://www.wyona.com ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
