Answering both of your questions here as I am catching up with some mail.
On Fri, Sep 30, 2016 at 5:04 AM, <user-digest-h...@nutch.apache.org> wrote:
> From: Sachin Shaju <sachi...@mstack.com>
> To: firstname.lastname@example.org
> Date: Fri, 30 Sep 2016 10:00:04 +0530
> Subject: Re: Nutch in production
> Thank you guys for your replies. I will look into the suggestions you gave.
> But I have one more query. How can I trigger nutch from a queue system in a
> distributed environment ?
Well this is a bit more tricky of course, as per my other mailing list
thread, you can easily use the REST API and the Nutchserver for publishing
Nutch workflows so I would advise you to look into that.
> Can REST api be a real option in distributed mode
As per my other thread... yes :) The one limitation is getting the injected
URLs into HDFS for use within the rest of the workflow.
> Or whether I will have to go for a command line invocation for nutch ?
I think that we need to provide a patch for Nutch trunk to enable ingestion
of the injected seeds into HDFS via the REST API. Right now this
functionality is lacking. I've created a ticket for it at
We will try to address this before the pending Nutch 1.13 release however I
cannot promise anything.