Renaud Richardet wrote: > Hi Matt and Lourival, > > Matt, thank you for the recrawl script. Any plans to commit it to trunk? > > Lourival, here's in the script what "reloads Tomcat", not the > cleanest, but it should work > # Tell Tomcat to reload index > touch $nutch_dir/WEB-INF/web.xml > > HTH, > Renaud > > > Lourival Júnior wrote: >> Hi Matt! >> >> In the article found at >> http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.htmlyou >> >> said the re-crawl script have a problem with updating the live search >> index. In my tests with Nutch version 0.7.2 when I run the script the >> index >> could not be update because the tomcat loads it to the memory. Could you >> suggest a modification to this script or to the NutchBean that accepts >> modifications to the index without restart tomcat (Actually, I use >> net stop >> "Apache Tomcat" before the index updation...)? >> >> Thanks >> >> On 7/21/06, Matthew Holt <[EMAIL PROTECTED]> wrote: >>> >>> Thanks for putting up with all the messages to the list... Here is the >>> recrawl script for 0.8.0 if anyone is interested. >>> Matt >>> ------------------------------- >>> >>> #!/bin/bash >>> >>> # Nutch recrawl script. >>> # Based on 0.7.2 script at >>> http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.html >>> >>> # Modified by Matthew Holt >>> >>> if [ -n "$1" ] >>> then >>> crawl_dir=$1 >>> else >>> echo "Usage: recrawl crawl_dir [depth] [adddays]" >>> exit 1 >>> fi >>> >>> if [ -n "$2" ] >>> then >>> depth=$2 >>> else >>> depth=5 >>> fi >>> >>> if [ -n "$3" ] >>> then >>> adddays=$3 >>> else >>> adddays=0 >>> fi >>> >>> >>> # EDIT THIS - List the location to your nutch servlet container. >>> nutch_dir=/usr/local/apache-tomcat-5.5.17/webapps/nutch/ >>> >>> # No need to edit anything past this line # >>> webdb_dir=$crawl_dir/crawldb >>> segments_dir=$crawl_dir/segments >>> linkdb_dir=$crawl_dir/linkdb >>> index_dir=$crawl_dir/index >>> >>> # The generate/fetch/update cycle >>> for ((i=1; i <= depth ; i++)) >>> do >>> bin/nutch generate $webdb_dir $segments_dir -adddays $adddays >>> segment=`ls -d $segments_dir/* | tail -1` >>> bin/nutch fetch $segment >>> bin/nutch updatedb $webdb_dir $segment >>> done >>> >>> # Update segments >>> bin/nutch invertlinks $linkdb_dir -dir $segments_dir >>> >>> # Index segments >>> new_indexes=$crawl_dir/newindexes >>> #ls -d $segments_dir/* | tail -$depth | xargs >>> bin/nutch index $new_indexes $webdb_dir $linkdb_dir $segments_dir/* >>> >>> # De-duplicate indexes >>> bin/nutch dedup $new_indexes >>> >>> # Merge indexes >>> bin/nutch merge $index_dir $new_indexes >>> >>> # Tell Tomcat to reload index >>> touch $nutch_dir/WEB-INF/web.xml >>> >>> # Clean up >>> rm -rf $new_indexes >>> >>> >> >> > I'll commit it to trunk, just have to modify it a little so users dont have to edit the tomcat location in their file and can do it through the command line.. Kinda busy @ work with this right now, so I'll follow up later regarding the commit. Matt
------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
