Hi Johannes On Tue, Sep 16, 2014 at 10:19 AM, <[email protected]> wrote:
> > is it possible to have nutch as a kind of stand-alone crawl server only > spoken to via the REST API? > Yes this is possible. We just finished a Google Summer of Code project which addresses exactly this via a Wicket-based Web Application. We are working on the final aspects of the patch before this is attached to the relevant issue https://issues.apache.org/jira/browse/NUTCH-841 > I found the generic tutorial to setup nutch server with Cassandra and > found this wiki page https://wiki.apache.org/nutch/NutchRESTAPI but it > leaves me a bit confused about How I can actually start some full fetch > cycles. Yep this is something we need to add to the documentation. We will do this in due course. > I probably need to create some fetch job, but what is actually the full > command with options to send via REST? > https://wiki.apache.org/nutch/NutchRESTAPI#Create_job > Might anybody maybe point to some working examples, I started digging > through the java code, but it seems to be only generic key-value setting. > A fully fledged crawl command has been deprecated in Nutch for a while. Therefore the REST commands you submit to the Nutch 2.X REST API (I suggest you use Nutch 2.3-SNAPSHOT) need to be chained together sequentially. I've been testing this out over the summer using RESTClient plugin for Firefox... it's been working well. Hope this helps you out. Lewis

