To change the directory where hadoop puts its information when you are running local, create a conf/hadoop-site.xml file containing the following stuff:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>hadoop.tmp.dir</name> <value>....</value> </property> </configuration> If you want to run several local nutch in parallel, don't forget to have its own installation (conf directory at least) for each of them, in order to define for each a specific hadoop.tmp.dir and the specific configuration you want to change from one to the other. RemyA Le 11 juin 2012 à 21:08, Lewis John Mcgibbney a écrit : > Hi Emre, > > I suppose you could use some kind of conditional regex configuration, > however this would assume that you are bargaining on all outlink(s) > from some given page to be similar in nature... which I cannot see > being a realistic vision. > > > On Mon, Jun 11, 2012 at 6:39 PM, Emre Çelikten <[email protected]> wrote: >> This is like running N instances of >> Nutch in parallel with each instance having its own regex-urlfilter. > > If you are instead looking to do the above I think you can do this > locally however each instance cannot share the same /tmp/ directory: > change /tmp/ per crawl or run on Hadoop or run in sequence if you can > live with it. > > hth > > Lewis > > > > -- > Lewis

