Isn't the whole point of segments to be able to work on them simultaneously? That's the problem I've been having with them.
On Mon, Jan 2, 2012 at 7:27 AM, Markus Jelsma <[email protected]>wrote: > use the maxNumSegments on the generator the make more segments. Then loop > through them with bash and fetch and parse. > > On Thursday 29 December 2011 21:29:02 Bai Shen wrote: > > Currently, I'm using a shell script to run my nutch crawl. It seems to > > work okay, but it only generates one segment at a time. Does anybody > have > > any suggestions for how to improve it, make it work with multiple > segments, > > etc? > > > > Thanks. > > > > > > while true > > do > > bin/nutch generate crawl/crawldb crawl/segments -topN 10000 -noFilter > > -noParm > > export SEGMENT=`hadoop fs -ls crawl/segments | tail -1 | awk '{print > > $8}'` bin/nutch fetch $SEGMENT > > bin/nutch parse $SEGMENT > > bin/nutch updatedb crawl/crawldb $SEGMENT > > bin/nutch invertlinks crawl/linkdb $SEGMENT > > bin/nutch solrindex http://solr:8080/solr crawl/crawldb -linkdb > > crawl/linkdb $SEGMENT > > bin/nutch solrdedup http://solr:8080/solr > > hadoop fs -mv $SEGMENT crawl/old > > done > > -- > Markus Jelsma - CTO - Openindex >

