Re: Unable to find documentation for Nutch 1.12, Wiki is outdated

Mattmann, Chris A (3980) Mon, 01 Aug 2016 11:05:27 -0700

Great work Sebastien thank you for this. Would you be willing to
update the wiki with this info? Please let me know your username
and I will grant you permissions.


Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: [email protected]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++










On 8/1/16, 11:01 AM, "Sebastian Greenholtz" <[email protected]> wrote:

>I struggled with the same thing recently. Nurch 1.12 does work with Solr
>6.1.0, but you have to do two things differently.
>
>1. The schema file that comes with Solr is originally named managed_schema
>and it's stored in
>${SOLR_HOME}/server/solr/configsets/managed_schema
>
>This file should be renamed to schema.xml.
>
>2. To index with Solr, first start up Solr using the command line command
>
>${SOLR_HOME}/bin/start -e cloud -noprompt
>
>Solr should start up at localhost:8983/solr
>
>To run the indexing:
>
>${NUTCH_HOME}/bin/crawl -I -D solr.server.url=
>http://localhost:8983/solr/gettingstarted urls/ segments/ 2
>
>Some of these parameters can be changed. They are explained here:
>https://wiki.apache.org/nutch/bin/crawl
>
>The thing that isn't explained anywhere is that your solr.server.url value
>is the base url for Solr admin with the core name after the forward slash.
>For the example project, the core is called gettingstarted.
>
>Hope that helps!
>
>Sebastian
>
>On Mon, Aug 1, 2016, 11:39 AM Ondřej Sojka <[email protected]> wrote:
>
>> The last three days, I've been struggling with making Nutch index one web
>> into Solr. The tutorial on your wiki is extremely outdated and the command
>> line tool doesn't work like expected. Now I think I may have managed to
>> crawl the web, but not index it into solr. I'm trying to run bin/nutch
>> solrindex crawl (my crawldb I previously entered into bin/crawl), but It
>> returns just the help of solrindex. By the help it outputs, it makes me
>> think the crawldb is the only mandatory parameter.
>>
>> I think there must be an other source of documentation other than the wiki
>> for recent versions of Nutch, or is the wiki the only source of
>> documentation? With what versions of Solr is Nutch 1.12 compatible?
>>
>> Ondrej Sojka
>>

Re: Unable to find documentation for Nutch 1.12, Wiki is outdated

Reply via email to