Re: indexing to Solr
Thanks for the info, I will try again with Solr 5.4.1! I think it would be helpful if the tutorial would say something about which version(s) of Solr can work with Nutch, perhaps calling attention to the ivy file you mentioned in your email. The download link, in our "Setup Solr for Search" section, points to a choice of 5.5.3 or 6.3.0 (at the moment). I ran into NUTCH-2267 on both of the Solr versions (6.3.0 and 5.5.3) I tried to work with. From: lewis john mcgibbneyTo: "user@nutch.apache.org" Sent: Monday, November 21, 2016 10:34 AM Subject: Re: indexing to Solr Hi Michael, On Sat, Nov 19, 2016 at 8:09 AM, wrote: > From: Michael Coffey > To: "user@nutch.apache.org" > Cc: > Date: Fri, 18 Nov 2016 21:15:14 + (UTC) > Subject: indexing to Solr > Where can I find up-to-date information on indexing to Solr? http://wiki.apache.org/nutch/NutchTutorial in particular https://wiki.apache.org/nutch/NutchTutorial#Step-by-Step:_Indexing_into_Apache_Solr If you find any issues with this tutorial then please let us know. Thank you. > When I search the web, I find tutorials that use the deprecated solrindex > command. I also find questions where people want to know why it doesn't > work. > That is because the only official documentation resides at http://wiki.apache.org/nutch/NutchTutorial > I have a good nutch 1.12 installation on a working hadoop cluster and a > Solr 6.3.0 installation which works for their gettingstarted example. > You should use the specified version of Solr for the Nutch release. This is Solr 5.4.1 as defined in the indexer-solr plugin ivy.xml > I have questions likeDo I need to create a core and a collection in solr? Yes I would. This is explained at https://wiki.apache.org/nutch/NutchTutorial#Setup_Solr_for_search > Do I need http or cloud type server?Do I need solr.zookeeper.url ? > This is not a Nutch question. This is your preferred Solr configuration. If you are just starting out then I would say it is not a big deal... experiment and go with what works best for your requirements and resources capacity. > What else needs to be set in nutch-site.xml? > Not much. For reference though, here are the Solr configuration options. https://github.com/apache/nutch/blob/master/conf/nutch-default.xml#L1750-L1826 > What about schema? > This is covered in https://wiki.apache.org/nutch/NutchTutorial#Setup_Solr_for_search > > Thanks for all the help so far! > > No problems. Any more issues, ping us here and we will help. Ta
RE: Nutch2 - What are exactly the steps to execute?
Hi Daniele, In short, if I were you I would look into using the readdb resource https://wiki.apache.org/nutch/bin/nutch%20readdb This will enable you to take a peek into your MongoDB table and find out which documents are present. By the looks of it from your Gist nothing is being fetched and therefore no outlinks are being parsed out... however I may be wrong. You can check using the readdb resource as above. hth On Sat, Nov 19, 2016 at 8:09 AM,wrote: > From: Daniele Cremonini > To: > Cc: > Date: Fri, 18 Nov 2016 15:28:49 +0100 (CET) > Subject: Nutch2 - What are exactly the steps to execute? > Hello, > > I installed and configured Nutch2 with MongoDB and Elasticsearch. > > I’m pretty convinced that the configuration is correct but I don’t see how > to invoke Nutch. > > In this page : https://wiki.apache.org/nutch/NutchTutorial there are I > think enough details to call Nutch 1.x > but in this page : https://wiki.apache.org/nutch/Nutch2Tutorial the Invoke > chapter is pretty poor. > > What I did : > > bin/nutch inject /apps/nutch-urls/ > bin/nutch generate -topN 40 > bin/nutch fetch -all > bin/nutch parse -all > bin/nutch updatedb -all > bin/nutch index –all > > but Nutch never tries to index data I know because I enriched the logging > activity of ElasticIndexWriter a little bit. > > May anybody give me some ideas? > >
Re: indexing to Solr
Hi Michael, On Sat, Nov 19, 2016 at 8:09 AM,wrote: > From: Michael Coffey > To: "user@nutch.apache.org" > Cc: > Date: Fri, 18 Nov 2016 21:15:14 + (UTC) > Subject: indexing to Solr > Where can I find up-to-date information on indexing to Solr? http://wiki.apache.org/nutch/NutchTutorial in particular https://wiki.apache.org/nutch/NutchTutorial#Step-by-Step:_Indexing_into_Apache_Solr If you find any issues with this tutorial then please let us know. Thank you. > When I search the web, I find tutorials that use the deprecated solrindex > command. I also find questions where people want to know why it doesn't > work. > That is because the only official documentation resides at http://wiki.apache.org/nutch/NutchTutorial > I have a good nutch 1.12 installation on a working hadoop cluster and a > Solr 6.3.0 installation which works for their gettingstarted example. > You should use the specified version of Solr for the Nutch release. This is Solr 5.4.1 as defined in the indexer-solr plugin ivy.xml > I have questions likeDo I need to create a core and a collection in solr? Yes I would. This is explained at https://wiki.apache.org/nutch/NutchTutorial#Setup_Solr_for_search > Do I need http or cloud type server?Do I need solr.zookeeper.url ? > This is not a Nutch question. This is your preferred Solr configuration. If you are just starting out then I would say it is not a big deal... experiment and go with what works best for your requirements and resources capacity. > What else needs to be set in nutch-site.xml? > Not much. For reference though, here are the Solr configuration options. https://github.com/apache/nutch/blob/master/conf/nutch-default.xml#L1750-L1826 > What about schema? > This is covered in https://wiki.apache.org/nutch/NutchTutorial#Setup_Solr_for_search > > Thanks for all the help so far! > > No problems. Any more issues, ping us here and we will help. Ta