mapred

Markus Jelsma Thu, 01 Mar 2012 22:27:43 -0800

You can also pass it to most jobs with $ nutch <job>-Dhadoop.tmp.dir=bla args. This can be even automatic with some shellscripting.

On Fri, 2 Mar 2012 00:49:36 -0500, Jeremy Villalobos<jeremyvillalo...@gmail.com> wrote:

It is a small number of crawlers, so I copied a runtime for each.
 therefore different configuration files.

Jeremy

On Thu, Mar 1, 2012 at 10:57 PM, remi tassing  wrote:
 How did you define that property so it's different so each job?

 Remi

 On Friday, March 2, 2012, Jeremy Villalobos
 wrote:

That is what I was looking for, thank you.

 >
 > this property was added to:
 > $NUCHT_DIR/runtime/local/conf/nutch-site.xml
 >
 > Jeremy
 >
 > On Thu, Mar 1, 2012 at 7:01 PM, Markus Jelsma wrote:
 >
 >> you can either:
 >>
 >> 1. run on hadoop
 >> 2. not run multiple concurrent jobs on a local machine
 >> 3. set a hadoop.tmp.dir per job
 >> 4. merge all crawls to a single crawl
 >>
 >>
 >> On Thu, 1 Mar 2012 16:26:00 -0500, Jeremy Villalobos <
 >> jeremyvillalo...@gmail.com [4]> wrote:
 >>
 >>> Hello:
 >>>
 >>> I am running multiple small crawls on one machine.  I notice
that they
 are
 >>> conflicting because they all access
 >>>
 >>> /tmp/hadoop-username/mapred
 >>>
 >>> How do I change the location of this folder ?
 >>>
 >>> Do I have use hadoop to run multiple crawlers each specific to a
site ?
 >>>
 >>> thanks
 >>>
 >>> Jeremy
 >>>
 >>
 >> --
 >> Markus Jelsma - CTO - Openindex
 >> http://www.linkedin.com/in/**markus17 [5]
 >> 050-8536600 / 06-50258350
 >>
 >



Links:
------
[1] mailto:tassingr...@gmail.com
[2] mailto:jeremyvillalo...@gmail.com
[3] mailto:markus.jel...@openindex.io
[4] mailto:jeremyvillalo...@gmail.com
[5] http://www.linkedin.com/in/**markus17
[6] http://www.linkedin.com/in/markus17


--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350

Re: multiple small crawlers on single machine conflict at /tmp/hadoop-username/mapred

Reply via email to