you should add property below:
<property>
<name>hadoop.job.ugi</name>
<value>rider,iamsolomon</value>
</property>
it's ok!
2009/12/1 Mischa Tuffield <[email protected]>
> Hello Brian,
>
> Getting a response from another newbie here, so I could be wrong (do excuse
> if I am).
>
> If you are attempting to run a search index from the filesystem you need to
> have the following in your nutch-site.xml :
>
> <property>
> <name>fs.default.name</name>
> <value>file:///</value>
> </property>
>
> The fs.default.name is require by the nutch-site.xml when you build your
> .war file for deployment to tomcat. This should be accompanied by the below
> config, which should point to the direct where your index has been copied
> to, in my case it looks something like below :
>
> <property>
> <name>searcher.dir</name>
> <value>/home/nutch/nutch/service/crawl</value>
> <description>
> Path to root of crawl. This directory is searched (in
> order) for either the file search-servers.txt, containing a list of
> distributed search servers, or the directory "index" containing
> merged indexes, or the directory "segments" containing segment
> indexes.
> </description>
> </property>
>
> Regarding your second question :
>
> bin/nutch readdb yourcrawldir/crawldb -dump -format csv
>
> Gives you a nice flat file serialisation of your crawl database.
>
> I hope this helps,
>
> Mischa
> On 1 Dec 2009, at 08:44, brian wrote:
>
> > also, I would like to know how to extract flat text files of the crawl
> data.
>
> ___________________________________
> Mischa Tuffield
> Email: [email protected]
> Homepage - http://mmt.me.uk/
> Garlik Limited, 2 Sheen Road, Richmond, TW9 1AE, UK
> +44(0)20 8973 2465 http://www.garlik.com/
> Registered in England and Wales 535 7233 VAT # 849 0517 11
> Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
>
>