Hi Emre,

I suppose you could use some kind of conditional regex configuration,
however this would assume that you are bargaining on all outlink(s)
from some given page to be similar in nature... which I cannot see
being a realistic vision.


On Mon, Jun 11, 2012 at 6:39 PM, Emre Çelikten <[email protected]> wrote:
> This is like running N instances of
> Nutch in parallel with each instance having its own regex-urlfilter.

If you are instead looking to do the above I think you can do this
locally however each instance cannot share the same /tmp/ directory:
change /tmp/ per crawl or run on Hadoop or run in sequence if you can
live with it.

hth

Lewis



-- 
Lewis

Reply via email to