Re: newbie questions

yangfeng Mon, 07 Dec 2009 02:59:24 -0800

you should add property  below：

 <property>
      <name>hadoop.job.ugi</name>
      <value>rider,iamsolomon</value>
   </property>


it's ok!

2009/12/1 Mischa Tuffield <[email protected]>

> Hello Brian,
>
> Getting a response from another newbie here, so I could be wrong (do excuse
> if I am).
>
> If you are attempting to run a search index from the filesystem you need to
> have the following in your nutch-site.xml :
>
>  <property>
>    <name>fs.default.name</name>
>    <value>file:///</value>
>  </property>
>
> The fs.default.name is require by the nutch-site.xml when you build your
> .war file for deployment to tomcat. This should be accompanied by the below
> config, which should point to the direct where your index has been copied
> to, in my case it looks something like below :
>
>  <property>
>   <name>searcher.dir</name>
>   <value>/home/nutch/nutch/service/crawl</value>
>   <description>
>   Path to root of crawl.  This directory is searched (in
>   order) for either the file search-servers.txt, containing a list of
>   distributed search servers, or the directory "index" containing
>   merged indexes, or the directory "segments" containing segment
>   indexes.
>   </description>
>  </property>
>
> Regarding your second question :
>
> bin/nutch readdb yourcrawldir/crawldb -dump -format csv
>
> Gives you a nice flat file serialisation of your crawl database.
>
> I hope this helps,
>
> Mischa
> On 1 Dec 2009, at 08:44, brian wrote:
>
> > also, I would like to know how to extract flat text files of the crawl
> data.
>
> ___________________________________
> Mischa Tuffield
> Email: [email protected]
> Homepage - http://mmt.me.uk/
> Garlik Limited, 2 Sheen Road, Richmond, TW9 1AE, UK
> +44(0)20 8973 2465  http://www.garlik.com/
> Registered in England and Wales 535 7233 VAT # 849 0517 11
> Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
>
>

Re: newbie questions

Reply via email to