Hi,
I have tested running of nutch in server mode by starting it using
bin/nutch startserver command*locally*. Now I wonder whether I can start
nutch in *server mode* on top of a hadoop cluster(in distributed
environment) and submit crawl requests to server using nutch REST api ?
Please help.
Reg
Hi,
I was experimenting some crawl cycles with nutch and would like to setup
a distributed crawl environment. But I wonder how can I trigger nutch for
incoming crawl requests in a production system. I read about nutch REST
api. Is that the real option that I have ? Or can I run nutch as a
contin
I was trying to give custom options in *bin/crawl* script and encountered
an issue. I gave a custom config in nutch to ignore external outlinks in my
crawl command like :-
*bin/crawl -i -D elastic.index=test -D db.ignore.external.links=true urls/
CrawlTest/ 3*
But this is not working. Then I set
Cool... thanks for posting.
On Wed, Sep 28, 2016 at 1:36 AM, wrote:
>
> user Digest 28 Sep 2016 08:36:56 - Issue 2648
>
> Topics (messages 32792 through 32792)
>
> Arch 1.9.2 is available
> 32792 by: Arkadi.Kosmynin.csiro.au
>
> Administrivia:
>
>
Hi Sachin,
Just a suggestion here - you can use Apache Kafka to generate and catch
events which are mapped to incoming crawl requests, crawl status and much
more.
I have created a prototype for production queue [0] which runs on top of a
supercomputer (TACC Wrangler) and integrated it with Kafka.
Yep also check out the work that Sujen Shah just merged (also on my team at JPL
and
USC) where you can publish events to an ActiveMQ queue from Nutch crawling. That
should allow all sorts of production dashboards and analytics.
++
Ch
You are welcome.
> -Original Message-
> From: lewis john mcgibbney [mailto:lewi...@apache.org]
> Sent: Friday, 30 September 2016 2:22 AM
> To: user@nutch.apache.org
> Subject: Re: Arch 1.9.2 is available
>
> Cool... thanks for posting.
>
> On Wed, Sep 28, 2016 at 1:36 AM,
> wrote:
>
>
Thank you guys for your replies. I will look into the suggestions you gave.
But I have one more query. How can I trigger nutch from a queue system in a
distributed environment ? Can REST api be a real option in distributed mode
? Or whether I will have to go for a command line invocation for nutch
Can I have a link to this ?
Regards,
Sachin Shaju
sachi...@mstack.com
+919539887554
On Thu, Sep 29, 2016 at 11:13 PM, Mattmann, Chris A (3980) <
chris.a.mattm...@jpl.nasa.gov> wrote:
> Yep also check out the work that Sujen Shah just merged (also on my team
> at JPL and
> USC) where you can pub
Hi Ralf,
Do mean here the Open Graph Protocol [0] markup?
If so, then if it is resent within then it is already parsed
out and stored within Parse [1] and can be accessed Parse.getData().
Please use the ParserChecker to double check this and if necessary post an
example here so that I can be corre
10 matches
Mail list logo