Hi Maybe you can use this command to check the return code from the preivous command.
$NUTCH_HOME/bin/nutch crawl urls -dir $crawldb -solr $solrurl -depth $depth if [ $? -ne 0 ] then exit $? fi $NUTCH_HOME/bin/nutch solrindex $solrurl $crawldb/crawldb/ -linkdb And the bin/nutch crawl command is DEPRECATED. please use crawl script instead. gxl@gxl-desktop:~/workspace/java/nutch-svn/bin$ ./crawl Missing seedDir : crawl <seedDir> <crawlDir> <solrURL> <numberOfRounds> On Thu, Mar 14, 2013 at 12:17 PM, David Philip <[email protected]>wrote: > Hi, > > While running crawl command, the below error occurred and so indexing of > the other urls that were fetched successfully failed. > Can you please tell me if there is any way to mention in crawl > script[below] that even when such error occurs, continue crawling? > > I think this error occurred because sometime back the crawl initiated got > stopped abruptly so it created segment folder without its respective sub > folders. Next time when crawl command was re run it gave the below error. > What is the best way to handle this error so that crawl continues? > > > *Error:* > SolrIndexer: starting at 2013-03-13 23:21:30 > SolrIndexer: deleting gone documents: true > SolrIndexer: URL filtering: false > SolrIndexer: URL normalizing: false > org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: > > file:/home/ubuntu/Downloads/apache-nutch-1.6/crawlService/segments/20130313140839/crawl_fetch > Input path does not exist: > > file:/home/ubuntu/Downloads/apache-nutch-1.6/crawlService/segments/20130313140839/crawl_parse > Input path does not exist: > > file:/home/ubuntu/Downloads/apache-nutch-1.6/crawlService/segments/20130313140839/parse_data > Input path does not exist: > > file:/home/ubuntu/Downloads/apache-nutch-1.6/crawlService/segments/20130313140839/parse_text > FINISHED: Crawl completed > > * > * > *Script I am using:* > export JAVA_HOME=/usr/lib/jvm/java-6-openjdk > export NUTCH_HOME=/home/ubuntu/Downloads/apache-nutch-1.6 > depth=1 > solrurl=http://xx\.xx\.xx\.xx:8080/solrnutch > crawldb=$NUTCH_HOME/crawlService > > $NUTCH_HOME/bin/nutch crawl urls -dir $crawldb -solr $solrurl -depth $depth > > $NUTCH_HOME/bin/nutch solrindex $solrurl $crawldb/crawldb/ -linkdb > $crawldb/linkdb -dir $crawldb/segments/ *-deleteGone* > > echo "FINISHED: Crawl completed!" > > *Note: *I know that writing script to call the commands individually is > best but I started with crawl command so was working with it only. if at > all using individual crawl command script can handle this exception let me > know. > > > Thanks - David > -- Don't Grow Old, Grow Up... :-)

