Thank you Tom and Marty,

Here is the snippet for configuring the plugin:

        <!-- activate the elasticsearch indexer plugin  -->
        <property>
                <name>plugin.includes</name>

<value>protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor
)|indexer-elastic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
                <description>Regular expression naming plugin directory
names to
                include.  Any plugin not matching this expression is
excluded.
                In any case you need at least include the
nutch-extensionpoints plugin. By
                default Nutch includes crawling just HTML and plain text
via HTTP,
                and basic indexing and search plugins. In order to use
HTTPS please enable
                protocol-httpclient, but be aware of possible intermittent
problems with the
                underlying commons-httpclient library.
                </description>
        </property>

And here is the Gist:
https://gist.github.com/dcremonini/563e612e9d5c7051ea31c3a7fd9f5966

One think among others I could miss is the invertlinks step.
Cheers
Daniele Cremonini


-----Message d'origine-----
De : Marty-Scott Sainty (NWIS - Software Development)
[mailto:[email protected]]
Envoyé : vendredi 18 novembre 2016 16:44
À : [email protected]
Objet : RE: Nutch2 - What are exactly the steps to execute?

Hi Tom,

You make sure you have specified the elastic search indexer plugin in
/conf/nutch-site.xml

  <property>
    <name>plugin.includes</name>
    <value>indexer-elastic</value>
  </property>


-----Original Message-----
From: Tom Chiverton [mailto:[email protected]]
Sent: 18 November 2016 15:38
To: [email protected]
Subject: Re: Nutch2 - What are exactly the steps to execute?

Please post the output of each step.

You might want to use something like a GitHub Gist for that as it could be
fairly long over email.

Tom


On 18/11/16 14:28, Daniele Cremonini wrote:
> Hello,
>
> I installed and configured Nutch2 with MongoDB and Elasticsearch.
>
> I'm pretty convinced that the configuration is correct but I don't see
> how to invoke Nutch.
>
> In this page : https://wiki.apache.org/nutch/NutchTutorial there are I
> think enough details to call Nutch 1.x but in this page :
> https://wiki.apache.org/nutch/Nutch2Tutorial the Invoke chapter is
> pretty poor.
>
> What I did :
>
> bin/nutch inject /apps/nutch-urls/
> bin/nutch generate -topN 40
> bin/nutch fetch -all
> bin/nutch parse -all
> bin/nutch updatedb -all
> bin/nutch index -all
>
> but Nutch never tries to index data I know because I enriched the
> logging activity of ElasticIndexWriter a little bit.
>
> May anybody give me some ideas?
>
> Thanks
> Daniele
>
> ______________________________________________________________________
> This email has been scanned by the Symantec Email Security.cloud
service.
> For more information please visit http://www.symanteccloud.com
> ______________________________________________________________________
>

Reply via email to