Hi Johannes

On Tue, Sep 16, 2014 at 10:19 AM, <[email protected]> wrote:

>
> is it possible to have nutch as a kind of stand-alone crawl server only
> spoken to via the REST API?
>

Yes this is possible.
We just finished a Google Summer of Code project which addresses exactly
this via a Wicket-based Web Application. We are working on the final
aspects of the patch before this is attached to the relevant issue
https://issues.apache.org/jira/browse/NUTCH-841


> I found the generic tutorial to setup nutch server with Cassandra and
> found this wiki page https://wiki.apache.org/nutch/NutchRESTAPI but it
> leaves me a bit confused about How I can actually start some full fetch
> cycles.


Yep this is something we need to add to the documentation. We will do this
in due course.


> I probably need to create some fetch job, but what is actually the full
> command with options to send via REST?
>

https://wiki.apache.org/nutch/NutchRESTAPI#Create_job


> Might anybody maybe point to some working examples, I started digging
> through the java code, but it seems to be only generic key-value setting.
>


A fully fledged crawl command has been deprecated in Nutch for a while.
Therefore the REST commands you submit to the Nutch 2.X REST API (I suggest
you use Nutch 2.3-SNAPSHOT) need to be chained together sequentially.

I've been testing this out over the summer using RESTClient plugin for
Firefox... it's been working well.
Hope this helps you out.
Lewis

Reply via email to