RE: Nutch2 - What are exactly the steps to execute?

2016-11-21 Thread lewis john mcgibbney
Hi Daniele, In short, if I were you I would look into using the readdb resource https://wiki.apache.org/nutch/bin/nutch%20readdb This will enable you to take a peek into your MongoDB table and find out which documents are present. By the looks of it from your Gist nothing is being fetched and

RE: Nutch2 - What are exactly the steps to execute?

2016-11-18 Thread Daniele Cremonini
Thank you Tom and Marty, Here is the snippet for configuring the plugin: plugin.includes protocol-http|urlfilter-regex|parse-(html|tika)|index-(basic|anchor )|indexer-elastic|scoring-opic|urlnormalizer-(pass|regex|basic) Regular expression

RE: Nutch2 - What are exactly the steps to execute?

2016-11-18 Thread Marty-Scott Sainty (NWIS - Software Development)
Hi Tom, You make sure you have specified the elastic search indexer plugin in /conf/nutch-site.xml plugin.includes indexer-elastic -Original Message- From: Tom Chiverton [mailto:t...@extravision.com] Sent: 18 November 2016 15:38 To: user@nutch.apache.org Subject: Re:

Re: Nutch2 - What are exactly the steps to execute?

2016-11-18 Thread Tom Chiverton
Please post the output of each step. You might want to use something like a GitHub Gist for that as it could be fairly long over email. Tom On 18/11/16 14:28, Daniele Cremonini wrote: Hello, I installed and configured Nutch2 with MongoDB and Elasticsearch. I’m pretty convinced that the