Re: Very long time just before fetching and just after parsing

kemical Thu, 14 Feb 2013 00:27:08 -0800

HI,

i didn't managed to run Invertlinks and solrindex command only for some
segments since it seems those command works only for segments parent dir.
Then i've made a little change to my fetch/parse/update/index loop.


*In short:*
I generate new segments in an empty "current_segments" dir. When the crawl
is done i move the segments to the classic crawl/segments/ dir


*My Code:*
bin/nutch generate crawl/crawldb current_segments topN 50000
s1=`ls -d current_segments/2* | tail -1`
bin/nutch fetch $s1
bin/nutch parse $s1
bin/nutch updatedb crawl/crawldb $s1

bin/nutch generate crawl/crawldb current_segments topN 50000
s1=`ls -d current_segments/2* | tail -1`
bin/nutch fetch $s1
bin/nutch parse $s1
bin/nutch updatedb crawl/crawldb $s1

bin/nutch invertlinks crawl/linkdb -dir current_segments
bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb
crawl/linkdb current_segments/*

mv current_segments/* crawl/segments/

*Conclusion / Question*
>From my test i haven't seen anything wrong by doing this way. Since it's not
really the way i've found on nutch documentation, i'd rather have the
confirmation there are no side effects from other users.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Very-long-time-just-before-fetching-and-just-after-parsing-tp4037673p4040384.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Very long time just before fetching and just after parsing

Reply via email to