Thank you,
 
Distributed was going to be my next step. You saved me some of the learning 
curve!

________________________________

From: Julien Nioche [mailto:[email protected]]
Sent: Fri 11/26/2010 2:03 AM
To: [email protected]
Subject: Re: No Such File or directory problem



>
>
> > If I do an echo $SEGMENT I get :       "crawl/segments/ls -tr
> > crawl/segments|tail -1"
> >
> >
> >
> > r...@nutchmaster:/usr/share/nutch# export SEGMENT=crawl/segments/'ls -tr
> > crawl/segments|tail -1'
>
> The definition of SEGMENT does not seem right. I think that you are using
> single-quotes instead of back-ticks (grave accent), i.e., you should have
>  export SEGMENT=crawl/segments/`ls -tr crawl/segments|tail -1`
>

Using 'ls' is fine when crawling on a single machine with a local filesystem
but if you use Nutch in distributed mode you won't be able to retrieve the
location.
Better to use 'hadoop fs -ls' to locate the segments e.g.

*  # capture the name of the segment
  SEGMENT=`hadoop fs -ls $CRAWL_PATH/segments/ | grep segments |  sed -e
"s/\//\\n/g" | egrep 20[0-9]+ | sort -n | tail -n 1`
  echo "Operating on segment : $SEGMENT"

*this will work on whatever underlying FS (distrib or local) is setup in
your Nutch/Hadoop config*
*
J.

--
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com <http://www.digitalpebble.com/> 


Reply via email to