Hi Sachin, Answering both of your questions here as I am catching up with some mail.
On Fri, Sep 30, 2016 at 5:04 AM, <[email protected]> wrote: > > From: Sachin Shaju <[email protected]> > To: [email protected] > Cc: > Date: Fri, 30 Sep 2016 10:00:04 +0530 > Subject: Re: Nutch in production > Thank you guys for your replies. I will look into the suggestions you gave. > But I have one more query. How can I trigger nutch from a queue system in a > distributed environment ? Well this is a bit more tricky of course, as per my other mailing list thread, you can easily use the REST API and the Nutchserver for publishing Nutch workflows so I would advise you to look into that. > Can REST api be a real option in distributed mode > ? As per my other thread... yes :) The one limitation is getting the injected URLs into HDFS for use within the rest of the workflow. > Or whether I will have to go for a command line invocation for nutch ? > > I think that we need to provide a patch for Nutch trunk to enable ingestion of the injected seeds into HDFS via the REST API. Right now this functionality is lacking. I've created a ticket for it at https://issues.apache.org/jira/browse/NUTCH-2327 We will try to address this before the pending Nutch 1.13 release however I cannot promise anything. Thanjs Lewis

