Re: Nutch 1.9 integration with Solr 5.0.0

Jeff Cocking Tue, 07 Apr 2015 06:26:02 -0700

The command you are using is not pointing to the specific solr index you
created.  The http://localhost:8983/solr needs to be changed to the URL for
the core created.  It should look like
http://localhost:8983/solr/#/new_core_name.



On Tue, Apr 7, 2015 at 2:33 AM, Anchit Jain <[email protected]>
wrote:

> I followed instructions as given on your blog and created a new core for
> nutch data and copied schema.xml of nutch into it.
> Then I run the following command in nutch working directory
> bin/nutch solrindex http://localhost:8983/solr crawl/crawldb/ -linkdb
> crawl/linkdb/ crawl/segments/20150406231502/ -filter -normalize
>
> But then also the same error is coming as like previous runs.
>
> On Tue, 7 Apr 2015 at 10:00 Jeff Cocking <[email protected]> wrote:
>
> > Solr5 is multicore by default. You have not finished the install by
> > setting up solr5's core.  I would suggest you look at the link I sent to
> > finish up your setup.
> >
> > After you finish your install your solr URL will be
> > http://localhost:8983/solr/#/core_name.
> >
> > Jeff Cocking
> >
> > I apologize for my brevity.
> > This was sent from my mobile device while I should be focusing on
> > something else.....
> > Like a meeting, driving, family, etc.
> >
> > > On Apr 6, 2015, at 11:16 PM, Anchit Jain <[email protected]>
> > wrote:
> > >
> > > I have already installed Solr.I want to integrate it with nutch.
> > > Whenever I try to issue this command to nutch
> > > ""bin/nutch solrindex http://localhost:8983/solr crawl/crawldb/
> -linkdb
> > > crawl/linkdb/ crawl/segments/20150406231502/ -filter -normalize"
> > >
> > > I always get a error
> > >
> > > Indexer: java.io.IOException: Job failed!
> > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
> > > at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
> > > at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
> > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> > > at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)
> > >
> > >
> > >
> > > Here is the complete hadoop log for the process.I have underlined the
> > error
> > > part in it.
> > >
> > > 2015-04-07 09:38:06,613 INFO  indexer.IndexingJob - Indexer: starting
> at
> > > 2015-04-07 09:38:06
> > > 2015-04-07 09:38:06,684 INFO  indexer.IndexingJob - Indexer: deleting
> > gone
> > > documents: false
> > > 2015-04-07 09:38:06,685 INFO  indexer.IndexingJob - Indexer: URL
> > filtering:
> > > true
> > > 2015-04-07 09:38:06,685 INFO  indexer.IndexingJob - Indexer: URL
> > > normalizing: true
> > > 2015-04-07 09:38:06,893 INFO  indexer.IndexWriters - Adding
> > > org.apache.nutch.indexwriter.solr.SolrIndexWriter
> > > 2015-04-07 09:38:06,893 INFO  indexer.IndexingJob - Active
> IndexWriters :
> > > SOLRIndexWriter
> > > solr.server.url : URL of the SOLR instance (mandatory)
> > > solr.commit.size : buffer size when sending to SOLR (default 1000)
> > > solr.mapping.file : name of the mapping file for fields (default
> > > solrindex-mapping.xml)
> > > solr.auth : use authentication (default false)
> > > solr.auth.username : use authentication (default false)
> > > solr.auth : username for authentication
> > > solr.auth.password : password for authentication
> > >
> > >
> > > 2015-04-07 09:38:06,898 INFO  indexer.IndexerMapReduce -
> > IndexerMapReduce:
> > > crawldb: crawl/crawldb
> > > 2015-04-07 09:38:06,898 INFO  indexer.IndexerMapReduce -
> > IndexerMapReduce:
> > > linkdb: crawl/linkdb
> > > 2015-04-07 09:38:06,898 INFO  indexer.IndexerMapReduce -
> > IndexerMapReduces:
> > > adding segment: crawl/segments/20150406231502
> > > 2015-04-07 09:38:07,036 WARN  util.NativeCodeLoader - Unable to load
> > > native-hadoop library for your platform... using builtin-java classes
> > where
> > > applicable
> > > 2015-04-07 09:38:07,540 INFO  anchor.AnchorIndexingFilter - Anchor
> > > deduplication is: off
> > > 2015-04-07 09:38:07,565 INFO  regex.RegexURLNormalizer - can't find
> rules
> > > for scope 'indexer', using default
> > > 2015-04-07 09:38:09,552 INFO  regex.RegexURLNormalizer - can't find
> rules
> > > for scope 'indexer', using default
> > > 2015-04-07 09:38:10,642 INFO  regex.RegexURLNormalizer - can't find
> rules
> > > for scope 'indexer', using default
> > > 2015-04-07 09:38:10,734 INFO  regex.RegexURLNormalizer - can't find
> rules
> > > for scope 'indexer', using default
> > > 2015-04-07 09:38:10,895 INFO  regex.RegexURLNormalizer - can't find
> rules
> > > for scope 'indexer', using default
> > > 2015-04-07 09:38:11,088 INFO  regex.RegexURLNormalizer - can't find
> rules
> > > for scope 'indexer', using default
> > > 2015-04-07 09:38:11,219 INFO  indexer.IndexWriters - Adding
> > > org.apache.nutch.indexwriter.solr.SolrIndexWriter
> > > 2015-04-07 09:38:11,237 INFO  solr.SolrMappingReader - source: content
> > > dest: content
> > > 2015-04-07 09:38:11,237 INFO  solr.SolrMappingReader - source: title
> > dest:
> > > title
> > > 2015-04-07 09:38:11,237 INFO  solr.SolrMappingReader - source: host
> dest:
> > > host
> > > 2015-04-07 09:38:11,237 INFO  solr.SolrMappingReader - source: segment
> > > dest: segment
> > > 2015-04-07 09:38:11,237 INFO  solr.SolrMappingReader - source: boost
> > dest:
> > > boost
> > > 2015-04-07 09:38:11,237 INFO  solr.SolrMappingReader - source: digest
> > dest:
> > > digest
> > > 2015-04-07 09:38:11,237 INFO  solr.SolrMappingReader - source: tstamp
> > dest:
> > > tstamp
> > > 2015-04-07 09:38:11,526 INFO  solr.SolrIndexWriter - Indexing 250
> > documents
> > > 2015-04-07 09:38:11,526 INFO  solr.SolrIndexWriter - Deleting 0
> documents
> > > 2015-04-07 09:38:11,644 INFO  solr.SolrIndexWriter - Indexing 250
> > documents
> > > *2015-04-07 09:38:11,699 WARN  mapred.LocalJobRunner -
> > > job_local1245074757_0001*
> > > *org.apache.solr.common.SolrException: Not Found*
> > >
> > > *Not Found*
> > >
> > > *request: http://localhost:8983/solr/update?wt=javabin&version=2
> > > <http://localhost:8983/solr/update?wt=javabin&version=2>*
> > > * at
> > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.
> > request(CommonsHttpSolrServer.java:430)*
> > > * at
> > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.
> > request(CommonsHttpSolrServer.java:244)*
> > > * at
> > > org.apache.solr.client.solrj.request.AbstractUpdateRequest.
> > process(AbstractUpdateRequest.java:105)*
> > > * at
> > > org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(
> > SolrIndexWriter.java:135)*
> > > * at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:88)*
> > > * at
> > > org.apache.nutch.indexer.IndexerOutputFormat$1.write(
> > IndexerOutputFormat.java:50)*
> > > * at
> > > org.apache.nutch.indexer.IndexerOutputFormat$1.write(
> > IndexerOutputFormat.java:41)*
> > > * at
> > > org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(
> > ReduceTask.java:458)*
> > > * at
> org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:500)*
> > > * at
> > > org.apache.nutch.indexer.IndexerMapReduce.reduce(
> > IndexerMapReduce.java:323)*
> > > * at
> > > org.apache.nutch.indexer.IndexerMapReduce.reduce(
> > IndexerMapReduce.java:53)*
> > > * at org.apache.hadoop.mapred.ReduceTask.runOldReducer(
> > ReduceTask.java:522)*
> > > * at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)*
> > > * at
> > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(
> > LocalJobRunner.java:398)*
> > > *2015-04-07 09:38:12,408 ERROR indexer.IndexingJob - Indexer:
> > > java.io.IOException: Job failed!*
> > > * at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)*
> > > * at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)*
> > > * at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)*
> > > * at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)*
> > > * at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)*
> > >> On Tue, 7 Apr 2015 at 03:18 Jeff Cocking <[email protected]>
> > wrote:
> > >>
> > >> With Solr5.0.0 you can skip that step.  Solr will auto create your
> > schema
> > >> document based on the data being provided.
> > >>
> > >> One of the new features with Solr5 is the install/service feature. I
> > did a
> > >> quick write up on how to install Solr5 on Centos.  Might be something
> > >> useful there for you.
> > >>
> > >> http://www.cocking.com/apache-solr-5-0-install-on-centos-7/
> > >>
> > >> jeff
> > >>
> > >> On Mon, Apr 6, 2015 at 3:13 PM, Anchit Jain <[email protected]
> >
> > >> wrote:
> > >>
> > >>> I want to index nutch results using *Solr 5.0* but as mentioned in
> > >>> https://wiki.apache.org/nutch/NutchTutorial there is no directory
> > >>> ${APACHE_SOLR_HOME}/example/solr/collection1/conf/
> > >>> in  solr 5.0 . So where I have to copy *schema.xml*?
> > >>> Also there is no *start.jar* present in example directory.
> > >>
> >
>

Re: Nutch 1.9 integration with Solr 5.0.0

Reply via email to