Re: Unable to find documentation for Nutch 1.12, Wiki is outdated

Guy McD Tue, 02 Aug 2016 10:16:38 -0700

In the command ${NUTCH_HOME}/bin/crawl -I -D solr.server.url=
http://localhost:8983/solr/gettingstarted urls/ segments/ 2
the 'gettingstarted' is the name of the core that we want the data indexed
into, correct?


Does the renaming of managed_schema.xml to schema.xml also apply to Solr
5.2.1? Doesn't appear to be there.

Guy McDowell
[email protected]
http://www.GuyMcDowell.com




On Mon, Aug 1, 2016 at 9:04 PM, Alexandre Rafalovitch <[email protected]>
wrote:

> If you run Solr in a cloud mode, then replacing the files on the
> filesystem after bin/start command does nothing as they are now in
> Zookeeper. They need to be uploaded with zkCli.sh.
>
> So, the new instructions need to refer to that as well.
>
> On the other hand, if only a single server is used, when the
> collection is created, the config file can be given then with bin/solr
> create_core -c collectionName -d configTemplateDirPath
>
> Regards,
>    Alex.
> ----
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 2 August 2016 at 04:01, Sebastian Greenholtz <[email protected]>
> wrote:
> > I struggled with the same thing recently. Nurch 1.12 does work with Solr
> > 6.1.0, but you have to do two things differently.
> >
> > 1. The schema file that comes with Solr is originally named
> managed_schema
> > and it's stored in
> > ${SOLR_HOME}/server/solr/configsets/managed_schema
> >
> > This file should be renamed to schema.xml.
> >
> > 2. To index with Solr, first start up Solr using the command line command
> >
> > ${SOLR_HOME}/bin/start -e cloud -noprompt
> >
> > Solr should start up at localhost:8983/solr
> >
> > To run the indexing:
> >
> > ${NUTCH_HOME}/bin/crawl -I -D solr.server.url=
> > http://localhost:8983/solr/gettingstarted urls/ segments/ 2
> >
> > Some of these parameters can be changed. They are explained here:
> > https://wiki.apache.org/nutch/bin/crawl
> >
> > The thing that isn't explained anywhere is that your solr.server.url
> value
> > is the base url for Solr admin with the core name after the forward
> slash.
> > For the example project, the core is called gettingstarted.
> >
> > Hope that helps!
> >
> > Sebastian
> >
> > On Mon, Aug 1, 2016, 11:39 AM Ondřej Sojka <[email protected]>
> wrote:
> >
> >> The last three days, I've been struggling with making Nutch index one
> web
> >> into Solr. The tutorial on your wiki is extremely outdated and the
> command
> >> line tool doesn't work like expected. Now I think I may have managed to
> >> crawl the web, but not index it into solr. I'm trying to run bin/nutch
> >> solrindex crawl (my crawldb I previously entered into bin/crawl), but It
> >> returns just the help of solrindex. By the help it outputs, it makes me
> >> think the crawldb is the only mandatory parameter.
> >>
> >> I think there must be an other source of documentation other than the
> wiki
> >> for recent versions of Nutch, or is the wiki the only source of
> >> documentation? With what versions of Solr is Nutch 1.12 compatible?
> >>
> >> Ondrej Sojka
> >>
>

Re: Unable to find documentation for Nutch 1.12, Wiki is outdated

Reply via email to