You wanna say that only in windows this error occurs? I haven't tested in linux yet. Has anyone a solution for this problem in windows/tomcat?
On 7/25/06, Thomas Delnoij <[EMAIL PROTECTED]> wrote:
Lourival. I have typically seen the same issues on a cygwin/windows setup. The only thing that worked for me was shutting down and restarting tomcat, instead of just reloading the context. On linux now I don't have these issues anymore. Rgrds, Thomas On 7/21/06, Lourival Júnior <[EMAIL PROTECTED]> wrote: > Ok. However a few minutes ago I ran the script exactly you said and I still > get this error: > > Exception in thread "main" java.io.IOException: Cannot delete _0.f0 > at org.apache.lucene.store.FSDirectory.create(FSDirectory.java :195) > at org.apache.lucene.store.FSDirectory.init(FSDirectory.java :176) > at org.apache.lucene.store.FSDirectory.getDirectory( FSDirectory.java > :141) > at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java :225) > at org.apache.nutch.indexer.IndexMerger.merge(IndexMerger.java :92) > at org.apache.nutch.indexer.IndexMerger.main(IndexMerger.java :160) > > I dont know but I thing it occurs because nutch tries to delete some file > that tomcat loads to the memory, giving permission access error. Any idea? > > On 7/21/06, Matthew Holt <[EMAIL PROTECTED]> wrote: > > > > Lourival Júnior wrote: > > > I thing it wont work with me because i'm using the Nutch version 0.7.2. > > > Actually I use this script (some comments are in Portuguese): > > > > > > #!/bin/bash > > > > > > # A simple script to run a Nutch re-crawl > > > # Fonte do script: > > > > > http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.html > > > > > > #{ > > > > > > if [ -n "$1" ] > > > then > > > crawl_dir=$1 > > > else > > > echo "Usage: recrawl crawl_dir [depth] [adddays]" > > > exit 1 > > > fi > > > > > > if [ -n "$2" ] > > > then > > > depth=$2 > > > else > > > depth=5 > > > fi > > > > > > if [ -n "$3" ] > > > then > > > adddays=$3 > > > else > > > adddays=0 > > > fi > > > > > > webdb_dir=$crawl_dir/db > > > segments_dir=$crawl_dir/segments > > > index_dir=$crawl_dir/index > > > > > > #Para o serviço do TomCat > > > #net stop "Apache Tomcat" > > > > > > # The generate/fetch/update cycle > > > for ((i=1; i <= depth ; i++)) > > > do > > > bin/nutch generate $webdb_dir $segments_dir -adddays $adddays > > > segment=`ls -d $segments_dir/* | tail -1` > > > bin/nutch fetch $segment > > > bin/nutch updatedb $webdb_dir $segment > > > echo > > > echo "Fim do ciclo $i." > > > echo > > > done > > > > > > # Update segments > > > echo > > > echo "Atualizando os Segmentos..." > > > echo > > > mkdir tmp > > > bin/nutch updatesegs $webdb_dir $segments_dir tmp > > > rm -R tmp > > > > > > # Index segments > > > echo "Indexando os segmentos..." > > > echo > > > for segment in `ls -d $segments_dir/* | tail -$depth` > > > do > > > bin/nutch index $segment > > > done > > > > > > # De-duplicate indexes > > > # "bogus" argument is ignored but needed due to > > > # a bug in the number of args expected > > > bin/nutch dedup $segments_dir bogus > > > > > > # Merge indexes > > > #echo "Unindo os segmentos..." > > > #echo > > > ls -d $segments_dir/* | xargs bin/nutch merge $index_dir > > > > > > chmod 777 -R $index_dir > > > > > > #Inicia o serviço do TomCat > > > #net start "Apache Tomcat" > > > > > > echo "Fim." > > > > > > #} > recrawl.log 2>&1 > > > > > > How you suggested I used the touch command instead stops the tomcat. > > > However > > > I get that error posted in previous message. I'm running nutch in > > windows > > > plataform with cygwin. I only get no errors when I stops the tomcat. I > > > use > > > this command to call the script: > > > > > > ./recrawl crawl-legislacao 1 > > > > > > Could you give me more clarifications? > > > > > > Thanks a lot! > > > > > > On 7/21/06, Matthew Holt <[EMAIL PROTECTED]> wrote: > > >> > > >> Lourival Júnior wrote: > > >> > Hi Renaud! > > >> > > > >> > I'm newbie with shell scripts and I know stops tomcat service is > > >> not the > > >> > better way to do this. The problem is, when a run the re-crawl script > > >> > with > > >> > tomcat started I get this error: > > >> > > > >> > 060721 132224 merging segment indexes to: crawl-legislacao2\index > > >> > Exception in thread "main" java.io.IOException: Cannot delete _0.f0 > > >> > at > > >> > org.apache.lucene.store.FSDirectory.create(FSDirectory.java:195) > > >> > at > > >> org.apache.lucene.store.FSDirectory.init(FSDirectory.java:176) > > >> > at > > >> > org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java > > >> > :141) > > >> > at > > >> > org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:225) > > >> > at org.apache.nutch.indexer.IndexMerger.merge( IndexMerger.java > > >> :92) > > >> > at org.apache.nutch.indexer.IndexMerger.main( IndexMerger.java > > >> :160) > > >> > > > >> > So, I want another way to re-crawl my pages without this error and > > >> > without > > >> > restarting the tomcat. Could you suggest one? > > >> > > > >> > Thanks a lot! > > >> > > > >> > > > >> Try this updated script and tell me what command exactly you run to > > call > > >> the script. Let me know the error message then. > > >> > > >> Matt > > >> > > >> > > >> #!/bin/bash > > >> > > >> # Nutch recrawl script. > > >> # Based on 0.7.2 script at > > >> > > http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.html > > >> > > >> # Modified by Matthew Holt > > >> > > >> if [ -n "$1" ] > > >> then > > >> nutch_dir=$1 > > >> else > > >> echo "Usage: recrawl servlet_path crawl_dir [depth] [adddays]" > > >> echo "servlet_path - Path of the nutch servlet (i.e. > > >> /usr/local/tomcat/webapps/ROOT)" > > >> echo "crawl_dir - Name of the directory the crawl is located in." > > >> echo "[depth] - The link depth from the root page that should be > > >> crawled." > > >> echo "[adddays] - Advance the clock # of days for fetchlist > > >> generation." > > >> exit 1 > > >> fi > > >> > > >> if [ -n "$2" ] > > >> then > > >> crawl_dir=$2 > > >> else > > >> echo "Usage: recrawl servlet_path crawl_dir [depth] [adddays]" > > >> echo "servlet_path - Path of the nutch servlet (i.e. > > >> /usr/local/tomcat/webapps/ROOT)" > > >> echo "crawl_dir - Name of the directory the crawl is located in." > > >> echo "[depth] - The link depth from the root page that should be > > >> crawled." > > >> echo "[adddays] - Advance the clock # of days for fetchlist > > >> generation." > > >> exit 1 > > >> fi > > >> > > >> if [ -n "$3" ] > > >> then > > >> depth=$3 > > >> else > > >> depth=5 > > >> fi > > >> > > >> if [ -n "$4" ] > > >> then > > >> adddays=$4 > > >> else > > >> adddays=0 > > >> fi > > >> > > >> # Only change if your crawl subdirectories are named something > > different > > >> webdb_dir=$crawl_dir/crawldb > > >> segments_dir=$crawl_dir/segments > > >> linkdb_dir=$crawl_dir/linkdb > > >> index_dir=$crawl_dir/index > > >> > > >> # The generate/fetch/update cycle > > >> for ((i=1; i <= depth ; i++)) > > >> do > > >> bin/nutch generate $webdb_dir $segments_dir -adddays $adddays > > >> segment=`ls -d $segments_dir/* | tail -1` > > >> bin/nutch fetch $segment > > >> bin/nutch updatedb $webdb_dir $segment > > >> done > > >> > > >> # Update segments > > >> bin/nutch invertlinks $linkdb_dir -dir $segments_dir > > >> > > >> # Index segments > > >> new_indexes=$crawl_dir/newindexes > > >> #ls -d $segments_dir/* | tail -$depth | xargs > > >> bin/nutch index $new_indexes $webdb_dir $linkdb_dir $segments_dir/* > > >> > > >> # De-duplicate indexes > > >> bin/nutch dedup $new_indexes > > >> > > >> # Merge indexes > > >> bin/nutch merge $index_dir $new_indexes > > >> > > >> # Tell Tomcat to reload index > > >> touch $nutch_dir/WEB-INF/web.xml > > >> > > >> # Clean up > > >> rm -rf $new_indexes > > >> > > >> > > > > > > > > Oh yea, you're right the one i sent out was for 0.8.... you should just > > be able to put this at the end of your script.. > > > > # Tell Tomcat to reload index > > touch $nutch_dir/WEB-INF/web.xml > > > > and fill in the appropriate path of course. > > gluck > > matt > > > > > > -- > Lourival Junior > Universidade Federal do Pará > Curso de Bacharelado em Sistemas de Informação > http://www.ufpa.br/cbsi > Msn: [EMAIL PROTECTED] > >
-- Lourival Junior Universidade Federal do Pará Curso de Bacharelado em Sistemas de Informação http://www.ufpa.br/cbsi Msn: [EMAIL PROTECTED]