Re: indexing to Solr

2016-11-21 Thread Michael Coffey
Thanks for the info, I will try again with Solr 5.4.1!
 I think it would be helpful if the tutorial would say something about which 
version(s) of Solr can work with Nutch, perhaps calling attention to the ivy 
file you mentioned in your email. The download link, in our "Setup Solr for 
Search" section, points to a choice of 5.5.3 or 6.3.0 (at the moment). I ran 
into NUTCH-2267 on both of the Solr versions (6.3.0 and 5.5.3) I tried to work 
with.

  From: lewis john mcgibbney 
 To: "user@nutch.apache.org"  
 Sent: Monday, November 21, 2016 10:34 AM
 Subject: Re: indexing to Solr
   
Hi Michael,

On Sat, Nov 19, 2016 at 8:09 AM,  wrote:

> From: Michael Coffey 
> To: "user@nutch.apache.org" 
> Cc:
> Date: Fri, 18 Nov 2016 21:15:14 + (UTC)
> Subject: indexing to Solr
> Where can I find up-to-date information on indexing to Solr?


http://wiki.apache.org/nutch/NutchTutorial
in particular
https://wiki.apache.org/nutch/NutchTutorial#Step-by-Step:_Indexing_into_Apache_Solr
If you find any issues with this tutorial then please let us know. Thank
you.


> When I search the web, I find tutorials that use the deprecated solrindex
> command. I also find questions where people want to know why it doesn't
> work.
>

That is because the only official documentation resides at
http://wiki.apache.org/nutch/NutchTutorial


> I have a good nutch 1.12 installation on a working hadoop cluster and a
> Solr 6.3.0 installation which works for their gettingstarted example.
>

You should use the specified version of Solr for the Nutch release. This is
Solr 5.4.1 as defined in the indexer-solr plugin ivy.xml


> I have questions likeDo I need to create a core and a collection in solr?


Yes I would. This is explained at
https://wiki.apache.org/nutch/NutchTutorial#Setup_Solr_for_search


> Do I need http or cloud type server?Do I need solr.zookeeper.url ?
>

This is not a Nutch question. This is your preferred Solr configuration. If
you are just starting out then I would say it is not a big deal...
experiment and go with what works best for your requirements and resources
capacity.


> What else needs to be set in nutch-site.xml?
>

Not much. For reference though, here are the Solr configuration options.
https://github.com/apache/nutch/blob/master/conf/nutch-default.xml#L1750-L1826


> What about schema?
>

This is covered in
https://wiki.apache.org/nutch/NutchTutorial#Setup_Solr_for_search


>
> Thanks for all the help so far!
>
>
No problems. Any more issues, ping us here and we will help.
Ta


   

RE: Nutch2 - What are exactly the steps to execute?

2016-11-21 Thread lewis john mcgibbney
Hi Daniele,
In short, if I were you I would look into using the readdb resource
https://wiki.apache.org/nutch/bin/nutch%20readdb
This will enable you to take a peek into your MongoDB table and find out
which documents are present. By the looks of it from your Gist nothing is
being fetched and therefore no outlinks are being parsed out... however I
may be wrong. You can check using the readdb resource as above.
hth

On Sat, Nov 19, 2016 at 8:09 AM,  wrote:

> From: Daniele Cremonini 
> To: 
> Cc:
> Date: Fri, 18 Nov 2016 15:28:49 +0100 (CET)
> Subject: Nutch2 - What are exactly the steps to execute?
> Hello,
>
> I installed and configured Nutch2 with MongoDB and Elasticsearch.
>
> I’m pretty convinced that the configuration is correct but I don’t see how
> to invoke Nutch.
>
> In this page : https://wiki.apache.org/nutch/NutchTutorial there are I
> think enough details to call Nutch 1.x
> but in this page : https://wiki.apache.org/nutch/Nutch2Tutorial the Invoke
> chapter is pretty poor.
>
> What I did :
>
> bin/nutch inject /apps/nutch-urls/
> bin/nutch generate -topN 40
> bin/nutch fetch -all
> bin/nutch parse -all
> bin/nutch updatedb -all
> bin/nutch index –all
>
> but Nutch never tries to index data I know because I enriched the logging
> activity of ElasticIndexWriter a little bit.
>
> May anybody give me some ideas?
>
>


Re: indexing to Solr

2016-11-21 Thread lewis john mcgibbney
Hi Michael,

On Sat, Nov 19, 2016 at 8:09 AM,  wrote:

> From: Michael Coffey 
> To: "user@nutch.apache.org" 
> Cc:
> Date: Fri, 18 Nov 2016 21:15:14 + (UTC)
> Subject: indexing to Solr
> Where can I find up-to-date information on indexing to Solr?


http://wiki.apache.org/nutch/NutchTutorial
in particular
https://wiki.apache.org/nutch/NutchTutorial#Step-by-Step:_Indexing_into_Apache_Solr
If you find any issues with this tutorial then please let us know. Thank
you.


> When I search the web, I find tutorials that use the deprecated solrindex
> command. I also find questions where people want to know why it doesn't
> work.
>

That is because the only official documentation resides at
http://wiki.apache.org/nutch/NutchTutorial


> I have a good nutch 1.12 installation on a working hadoop cluster and a
> Solr 6.3.0 installation which works for their gettingstarted example.
>

You should use the specified version of Solr for the Nutch release. This is
Solr 5.4.1 as defined in the indexer-solr plugin ivy.xml


> I have questions likeDo I need to create a core and a collection in solr?


Yes I would. This is explained at
https://wiki.apache.org/nutch/NutchTutorial#Setup_Solr_for_search


> Do I need http or cloud type server?Do I need solr.zookeeper.url ?
>

This is not a Nutch question. This is your preferred Solr configuration. If
you are just starting out then I would say it is not a big deal...
experiment and go with what works best for your requirements and resources
capacity.


> What else needs to be set in nutch-site.xml?
>

Not much. For reference though, here are the Solr configuration options.
https://github.com/apache/nutch/blob/master/conf/nutch-default.xml#L1750-L1826


> What about schema?
>

This is covered in
https://wiki.apache.org/nutch/NutchTutorial#Setup_Solr_for_search


>
> Thanks for all the help so far!
>
>
No problems. Any more issues, ping us here and we will help.
Ta