Renaud Richardet wrote:
> Hi Matt and Lourival,
>
> Matt, thank you for the recrawl script. Any plans to commit it to trunk?
>
> Lourival, here's in the script what "reloads Tomcat", not the 
> cleanest, but it should work
> # Tell Tomcat to reload index
> touch $nutch_dir/WEB-INF/web.xml
>
> HTH,
> Renaud
>
>
> Lourival Júnior wrote:
>> Hi Matt!
>>
>> In the article found at
>> http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.htmlyou 
>>
>> said the re-crawl script have a problem with updating the live search
>> index. In my tests with Nutch version 0.7.2 when I run the script the 
>> index
>> could not be update because the tomcat loads it to the memory. Could you
>> suggest a modification to this script or to the NutchBean that accepts
>> modifications to the index without restart tomcat (Actually, I use 
>> net stop
>> "Apache Tomcat" before the index updation...)?
>>
>> Thanks
>>
>> On 7/21/06, Matthew Holt <[EMAIL PROTECTED]> wrote:
>>>
>>> Thanks for putting up with all the messages to the list... Here is the
>>> recrawl script for 0.8.0 if anyone is interested.
>>>         Matt
>>> -------------------------------
>>>
>>> #!/bin/bash
>>>
>>> # Nutch recrawl script.
>>> # Based on 0.7.2 script at
>>> http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.html 
>>>
>>> # Modified by Matthew Holt
>>>
>>> if [ -n "$1" ]
>>> then
>>>   crawl_dir=$1
>>> else
>>>   echo "Usage: recrawl crawl_dir [depth] [adddays]"
>>>   exit 1
>>> fi
>>>
>>> if [ -n "$2" ]
>>> then
>>>   depth=$2
>>> else
>>>   depth=5
>>> fi
>>>
>>> if [ -n "$3" ]
>>> then
>>>   adddays=$3
>>> else
>>>   adddays=0
>>> fi
>>>
>>>
>>> # EDIT THIS - List the location to your nutch servlet container.
>>> nutch_dir=/usr/local/apache-tomcat-5.5.17/webapps/nutch/
>>>
>>> # No need to edit anything past this line #
>>> webdb_dir=$crawl_dir/crawldb
>>> segments_dir=$crawl_dir/segments
>>> linkdb_dir=$crawl_dir/linkdb
>>> index_dir=$crawl_dir/index
>>>
>>> # The generate/fetch/update cycle
>>> for ((i=1; i <= depth ; i++))
>>> do
>>>   bin/nutch generate $webdb_dir $segments_dir -adddays $adddays
>>>   segment=`ls -d $segments_dir/* | tail -1`
>>>   bin/nutch fetch $segment
>>>   bin/nutch updatedb $webdb_dir $segment
>>> done
>>>
>>> # Update segments
>>> bin/nutch invertlinks $linkdb_dir -dir $segments_dir
>>>
>>> # Index segments
>>> new_indexes=$crawl_dir/newindexes
>>> #ls -d $segments_dir/* | tail -$depth | xargs
>>> bin/nutch index $new_indexes $webdb_dir $linkdb_dir $segments_dir/*
>>>
>>> # De-duplicate indexes
>>> bin/nutch dedup $new_indexes
>>>
>>> # Merge indexes
>>> bin/nutch merge $index_dir $new_indexes
>>>
>>> # Tell Tomcat to reload index
>>> touch $nutch_dir/WEB-INF/web.xml
>>>
>>> # Clean up
>>> rm -rf $new_indexes
>>>
>>>
>>
>>
>
I'll commit it to trunk, just have to modify it a little so users dont 
have to edit the tomcat location in their file and can do it through the 
command line.. Kinda busy @ work with this right now, so I'll follow up 
later regarding the commit.
Matt

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to