Hi Matt and Lourival,

Matt, thank you for the recrawl script. Any plans to commit it to trunk?

Lourival, here's in the script what "reloads Tomcat", not the cleanest, 
but it should work
# Tell Tomcat to reload index
touch $nutch_dir/WEB-INF/web.xml

HTH,
Renaud


Lourival Júnior wrote:
> Hi Matt!
>
> In the article found at
> http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.htmlyou 
>
> said the re-crawl script have a problem with updating the live search
> index. In my tests with Nutch version 0.7.2 when I run the script the 
> index
> could not be update because the tomcat loads it to the memory. Could you
> suggest a modification to this script or to the NutchBean that accepts
> modifications to the index without restart tomcat (Actually, I use net 
> stop
> "Apache Tomcat" before the index updation...)?
>
> Thanks
>
> On 7/21/06, Matthew Holt <[EMAIL PROTECTED]> wrote:
>>
>> Thanks for putting up with all the messages to the list... Here is the
>> recrawl script for 0.8.0 if anyone is interested.
>>         Matt
>> -------------------------------
>>
>> #!/bin/bash
>>
>> # Nutch recrawl script.
>> # Based on 0.7.2 script at
>> http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.html 
>>
>> # Modified by Matthew Holt
>>
>> if [ -n "$1" ]
>> then
>>   crawl_dir=$1
>> else
>>   echo "Usage: recrawl crawl_dir [depth] [adddays]"
>>   exit 1
>> fi
>>
>> if [ -n "$2" ]
>> then
>>   depth=$2
>> else
>>   depth=5
>> fi
>>
>> if [ -n "$3" ]
>> then
>>   adddays=$3
>> else
>>   adddays=0
>> fi
>>
>>
>> # EDIT THIS - List the location to your nutch servlet container.
>> nutch_dir=/usr/local/apache-tomcat-5.5.17/webapps/nutch/
>>
>> # No need to edit anything past this line #
>> webdb_dir=$crawl_dir/crawldb
>> segments_dir=$crawl_dir/segments
>> linkdb_dir=$crawl_dir/linkdb
>> index_dir=$crawl_dir/index
>>
>> # The generate/fetch/update cycle
>> for ((i=1; i <= depth ; i++))
>> do
>>   bin/nutch generate $webdb_dir $segments_dir -adddays $adddays
>>   segment=`ls -d $segments_dir/* | tail -1`
>>   bin/nutch fetch $segment
>>   bin/nutch updatedb $webdb_dir $segment
>> done
>>
>> # Update segments
>> bin/nutch invertlinks $linkdb_dir -dir $segments_dir
>>
>> # Index segments
>> new_indexes=$crawl_dir/newindexes
>> #ls -d $segments_dir/* | tail -$depth | xargs
>> bin/nutch index $new_indexes $webdb_dir $linkdb_dir $segments_dir/*
>>
>> # De-duplicate indexes
>> bin/nutch dedup $new_indexes
>>
>> # Merge indexes
>> bin/nutch merge $index_dir $new_indexes
>>
>> # Tell Tomcat to reload index
>> touch $nutch_dir/WEB-INF/web.xml
>>
>> # Clean up
>> rm -rf $new_indexes
>>
>>
>
>

-- 
Renaud Richardet
COO America
Wyona Inc.  -   Open Source Content Management   -   Apache Lenya
office +1 857 776-3195                     mobile +1 617 230 9112
renaud.richardet <at> wyona.com              http://www.wyona.com


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to