Re: Unable to find documentation for Nutch 1.12, Wiki is outdated

Sebastian Greenholtz Mon, 01 Aug 2016 11:02:12 -0700

I struggled with the same thing recently. Nurch 1.12 does work with Solr
6.1.0, but you have to do two things differently.

1. The schema file that comes with Solr is originally named managed_schema
and it's stored in
${SOLR_HOME}/server/solr/configsets/managed_schema

This file should be renamed to schema.xml.

2. To index with Solr, first start up Solr using the command line command

${SOLR_HOME}/bin/start -e cloud -noprompt

Solr should start up at localhost:8983/solr

To run the indexing:

${NUTCH_HOME}/bin/crawl -I -D solr.server.url=
http://localhost:8983/solr/gettingstarted urls/ segments/ 2

Some of these parameters can be changed. They are explained here:
https://wiki.apache.org/nutch/bin/crawl

The thing that isn't explained anywhere is that your solr.server.url value
is the base url for Solr admin with the core name after the forward slash.
For the example project, the core is called gettingstarted.

Hope that helps!

Sebastian

On Mon, Aug 1, 2016, 11:39 AM Ondřej Sojka <[email protected]> wrote:

> The last three days, I've been struggling with making Nutch index one web
> into Solr. The tutorial on your wiki is extremely outdated and the command
> line tool doesn't work like expected. Now I think I may have managed to
> crawl the web, but not index it into solr. I'm trying to run bin/nutch
> solrindex crawl (my crawldb I previously entered into bin/crawl), but It
> returns just the help of solrindex. By the help it outputs, it makes me
> think the crawldb is the only mandatory parameter.
>
> I think there must be an other source of documentation other than the wiki
> for recent versions of Nutch, or is the wiki the only source of
> documentation? With what versions of Solr is Nutch 1.12 compatible?
>
> Ondrej Sojka
>

Re: Unable to find documentation for Nutch 1.12, Wiki is outdated

Reply via email to