Re: Unable to find documentation for Nutch 1.12, Wiki is outdated

Alexandre Rafalovitch Mon, 01 Aug 2016 17:06:06 -0700

If you run Solr in a cloud mode, then replacing the files on the
filesystem after bin/start command does nothing as they are now in
Zookeeper. They need to be uploaded with zkCli.sh.


So, the new instructions need to refer to that as well.

On the other hand, if only a single server is used, when the
collection is created, the config file can be given then with bin/solr
create_core -c collectionName -d configTemplateDirPath

Regards,
   Alex.
----
Newsletter and resources for Solr beginners and intermediates:
http://www.solr-start.com/


On 2 August 2016 at 04:01, Sebastian Greenholtz <[email protected]> wrote:
> I struggled with the same thing recently. Nurch 1.12 does work with Solr
> 6.1.0, but you have to do two things differently.
>
> 1. The schema file that comes with Solr is originally named managed_schema
> and it's stored in
> ${SOLR_HOME}/server/solr/configsets/managed_schema
>
> This file should be renamed to schema.xml.
>
> 2. To index with Solr, first start up Solr using the command line command
>
> ${SOLR_HOME}/bin/start -e cloud -noprompt
>
> Solr should start up at localhost:8983/solr
>
> To run the indexing:
>
> ${NUTCH_HOME}/bin/crawl -I -D solr.server.url=
> http://localhost:8983/solr/gettingstarted urls/ segments/ 2
>
> Some of these parameters can be changed. They are explained here:
> https://wiki.apache.org/nutch/bin/crawl
>
> The thing that isn't explained anywhere is that your solr.server.url value
> is the base url for Solr admin with the core name after the forward slash.
> For the example project, the core is called gettingstarted.
>
> Hope that helps!
>
> Sebastian
>
> On Mon, Aug 1, 2016, 11:39 AM Ondřej Sojka <[email protected]> wrote:
>
>> The last three days, I've been struggling with making Nutch index one web
>> into Solr. The tutorial on your wiki is extremely outdated and the command
>> line tool doesn't work like expected. Now I think I may have managed to
>> crawl the web, but not index it into solr. I'm trying to run bin/nutch
>> solrindex crawl (my crawldb I previously entered into bin/crawl), but It
>> returns just the help of solrindex. By the help it outputs, it makes me
>> think the crawldb is the only mandatory parameter.
>>
>> I think there must be an other source of documentation other than the wiki
>> for recent versions of Nutch, or is the wiki the only source of
>> documentation? With what versions of Solr is Nutch 1.12 compatible?
>>
>> Ondrej Sojka
>>

Re: Unable to find documentation for Nutch 1.12, Wiki is outdated

Reply via email to