To change the directory where hadoop puts its information when you are running 
local, create a conf/hadoop-site.xml file containing the following stuff:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
  <name>hadoop.tmp.dir</name>
  <value>....</value>
</property>
</configuration>

If you want to run several local nutch in parallel, don't forget to have its 
own installation (conf directory at least) for each of them, in order to define 
for each a specific hadoop.tmp.dir and the specific configuration you want to 
change from one to the other.

RemyA

Le 11 juin 2012 à 21:08, Lewis John Mcgibbney a écrit :

> Hi Emre,
> 
> I suppose you could use some kind of conditional regex configuration,
> however this would assume that you are bargaining on all outlink(s)
> from some given page to be similar in nature... which I cannot see
> being a realistic vision.
> 
> 
> On Mon, Jun 11, 2012 at 6:39 PM, Emre Çelikten <[email protected]> wrote:
>> This is like running N instances of
>> Nutch in parallel with each instance having its own regex-urlfilter.
> 
> If you are instead looking to do the above I think you can do this
> locally however each instance cannot share the same /tmp/ directory:
> change /tmp/ per crawl or run on Hadoop or run in sequence if you can
> live with it.
> 
> hth
> 
> Lewis
> 
> 
> 
> -- 
> Lewis

Reply via email to