Thank you, Distributed was going to be my next step. You saved me some of the learning curve!
________________________________ From: Julien Nioche [mailto:[email protected]] Sent: Fri 11/26/2010 2:03 AM To: [email protected] Subject: Re: No Such File or directory problem > > > > If I do an echo $SEGMENT I get : "crawl/segments/ls -tr > > crawl/segments|tail -1" > > > > > > > > r...@nutchmaster:/usr/share/nutch# export SEGMENT=crawl/segments/'ls -tr > > crawl/segments|tail -1' > > The definition of SEGMENT does not seem right. I think that you are using > single-quotes instead of back-ticks (grave accent), i.e., you should have > export SEGMENT=crawl/segments/`ls -tr crawl/segments|tail -1` > Using 'ls' is fine when crawling on a single machine with a local filesystem but if you use Nutch in distributed mode you won't be able to retrieve the location. Better to use 'hadoop fs -ls' to locate the segments e.g. * # capture the name of the segment SEGMENT=`hadoop fs -ls $CRAWL_PATH/segments/ | grep segments | sed -e "s/\//\\n/g" | egrep 20[0-9]+ | sort -n | tail -n 1` echo "Operating on segment : $SEGMENT" *this will work on whatever underlying FS (distrib or local) is setup in your Nutch/Hadoop config* * J. -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com <http://www.digitalpebble.com/>

