Here is an issue with the official Nutch tutorial.
In the "setup Solr for Search" it gives the following instructions.* download
binary file from here
* unzip to $HOME/apache-solr, we will now refer to this as ${APACHE_SOLR_HOME}
* cd ${APACHE_SOLR_HOME}/example
* java -jar start.jar
Unfortunately, there is no start .jar in the examples directory. When I instead
try to use the start.jar in the servers directory, Java says "WARNING: Nothing
to start, exiting ..."
You need something like the following to start solr.$APACHE_SOLR_HOME/bin/solr
start -e cloud -noprompt
In this case, I am using solr 5.4.1
Also, as mentioned previously, the tutorial says nothing about which version of
solr to use.
From: lewis john mcgibbney <[email protected]>
To: "[email protected]" <[email protected]>
Sent: Monday, November 21, 2016 10:34 AM
Subject: Re: indexing to Solr
Hi Michael,
On Sat, Nov 19, 2016 at 8:09 AM, <[email protected]> wrote:
> From: Michael Coffey <[email protected]>
> To: "[email protected]" <[email protected]>
> Cc:
> Date: Fri, 18 Nov 2016 21:15:14 +0000 (UTC)
> Subject: indexing to Solr
> Where can I find up-to-date information on indexing to Solr?
http://wiki.apache.org/nutch/NutchTutorial
in particular
https://wiki.apache.org/nutch/NutchTutorial#Step-by-Step:_Indexing_into_Apache_Solr
If you find any issues with this tutorial then please let us know. Thank
you.
> When I search the web, I find tutorials that use the deprecated solrindex
> command. I also find questions where people want to know why it doesn't
> work.
>
That is because the only official documentation resides at
http://wiki.apache.org/nutch/NutchTutorial
> I have a good nutch 1.12 installation on a working hadoop cluster and a
> Solr 6.3.0 installation which works for their gettingstarted example.
>
You should use the specified version of Solr for the Nutch release. This is
Solr 5.4.1 as defined in the indexer-solr plugin ivy.xml
> I have questions likeDo I need to create a core and a collection in solr?
Yes I would. This is explained at
https://wiki.apache.org/nutch/NutchTutorial#Setup_Solr_for_search
> Do I need http or cloud type server?Do I need solr.zookeeper.url ?
>
This is not a Nutch question. This is your preferred Solr configuration. If
you are just starting out then I would say it is not a big deal...
experiment and go with what works best for your requirements and resources
capacity.
> What else needs to be set in nutch-site.xml?
>
Not much. For reference though, here are the Solr configuration options.
https://github.com/apache/nutch/blob/master/conf/nutch-default.xml#L1750-L1826
> What about schema?
>
This is covered in
https://wiki.apache.org/nutch/NutchTutorial#Setup_Solr_for_search
>
> Thanks for all the help so far!
>
>
No problems. Any more issues, ping us here and we will help.
Ta