Re: Nutch as a service

Sachin Shaju Wed, 05 Oct 2016 21:21:07 -0700

Hi Sujen,
              Thanks for the reply. Actually that stackoverflow post was
created by me itself. :) I have some more queries.
 1. Do I have to run the server on hadoop namenode itself ?
 2. I have tested nutch server in hadoop. But on *fetch phase* it is
encountering *NullPointer* exception. That I can post here.
16/10/05 18:53:59 ERROR impl.JobWorker: Cannot run job worker!


java.lang.NullPointerException
at java.util.Arrays.sort(Arrays.java:1438)
at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:564)
at org.apache.nutch.service.impl.JobWorker.run(JobWorker.java:71)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

I've checked source code. It is due to the absence of a parameter segment
in REST call for fetch. I'm expecting it to pick the latest segment
automatically. But it is not working that way.

The request I've used is :-

*POST /job/create*
*{   *
*    "type":"FETCH",*
*    "confId":"news",*
*    "crawlId":"crawl001",*
*    "args": {}*
*}*

Am I missing anything here ?




Regards,
Sachin Shaju

[email protected]
+919539887554

On Thu, Oct 6, 2016 at 5:03 AM, Sujen Shah <[email protected]> wrote:

> Hi Sachin,
>
> Nutch REST API is built using Apache CXF framework and JAX-RS. The Nutch
> Server uses an embedded Jetty Server to service the http requests.
> You can find out more about CXF and Jetty here (
> http://cxf.apache.org/docs/overview.html).
>
> The server runs on one machine waiting for http requests. Once a request is
> received it will start the respective Nutch Job requested (which might be
> distributed ex- fetch job)
>
>
> Just for visibility on the user list, this question was asked on
> stackoverflow. Link to the question and follow up discussion can be found
> at -
> http://stackoverflow.com/questions/39853492/working-of-
> nutch-server-in-distributed-mode
>
> Thanks
> Sujen
>
>
>
> Regards,
> Sujen Shah
> M.S - Computer Science
> University of Southern California
> http://www.linkedin.com/in/sujenshah
>
> On Tue, Oct 4, 2016 at 6:18 AM, Sachin Shaju <[email protected]> wrote:
>
> > Hi,
> >     I would like to know how nutch server works actually? Whether it use
> a
> > listener for incoming crawl requests or it is a continuously running
> > server?
> > Regards,
> > Sachin Shaju
> >
> > [email protected]
> >
> > --
> >
> >
> > The information contained in this electronic message and any attachments
> to
> > this message are intended for the exclusive use of the addressee(s) and
> may
> > contain proprietary, confidential or privileged information. If you are
> not
> > the intended recipient, you should not disseminate, distribute or copy
> this
> > e-mail. Please notify the sender immediately and destroy all copies of
> this
> > message and any attachments.
> >
> > WARNING: Computer viruses can be transmitted via email. The recipient
> > should check this email and any attachments for the presence of viruses.
> > The company accepts no liability for any damage caused by any virus
> > transmitted by this email.
> >
> > www.mStack.com
> >
>

-- 
 

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not 
the intended recipient, you should not disseminate, distribute or copy this 
e-mail. Please notify the sender immediately and destroy all copies of this 
message and any attachments.

WARNING: Computer viruses can be transmitted via email. The recipient 
should check this email and any attachments for the presence of viruses. 
The company accepts no liability for any damage caused by any virus 
transmitted by this email.

www.mStack.com

Re: Nutch as a service

Reply via email to