Re: Running Crawls via REST API

Lewis John Mcgibbney Tue, 16 Sep 2014 08:41:31 -0700

Hi Johannes

On Tue, Sep 16, 2014 at 10:19 AM, <[email protected]> wrote:


>
> is it possible to have nutch as a kind of stand-alone crawl server only
> spoken to via the REST API?
>

Yes this is possible.
We just finished a Google Summer of Code project which addresses exactly
this via a Wicket-based Web Application. We are working on the final
aspects of the patch before this is attached to the relevant issue
https://issues.apache.org/jira/browse/NUTCH-841


> I found the generic tutorial to setup nutch server with Cassandra and
> found this wiki page https://wiki.apache.org/nutch/NutchRESTAPI but it
> leaves me a bit confused about How I can actually start some full fetch
> cycles.


Yep this is something we need to add to the documentation. We will do this
in due course.


> I probably need to create some fetch job, but what is actually the full
> command with options to send via REST?
>

https://wiki.apache.org/nutch/NutchRESTAPI#Create_job


> Might anybody maybe point to some working examples, I started digging
> through the java code, but it seems to be only generic key-value setting.
>


A fully fledged crawl command has been deprecated in Nutch for a while.
Therefore the REST commands you submit to the Nutch 2.X REST API (I suggest
you use Nutch 2.3-SNAPSHOT) need to be chained together sequentially.

I've been testing this out over the summer using RESTClient plugin for
Firefox... it's been working well.
Hope this helps you out.
Lewis

Re: Running Crawls via REST API

Reply via email to