Re: Nutch 1.9 integration with Solr 5.0.0

yeshwanth kumar Tue, 07 Apr 2015 11:53:59 -0700

Which db you are using?
If you have any time constraints for your task.
Write a mapreduce job which reads from DB and then index into solr.


I use crawl script for fetching parsing storing and indexing all at a time
instead of doing them individually by nutch script.

Error clearly specifies about url so do a curl or write a simple java code
using http get post for reading and writing documents. This will give you
some understanding whether the url is correct or not.

Sent from mobile, please excuse any typographical errors.
On Apr 7, 2015 12:00 PM, "Anchit Jain" <[email protected]> wrote:

> Same error :-( .
>
> So no workaround for the error?
>
> On Tuesday 07 April 2015 10:06 PM, Jeff Cocking wrote:
>
>> I use the following for all my indexing work.
>>
>> Usage: bin/crawl <seedDir> <crawlDir> <solrURL> <numberOfRounds>
>> Example: bin/crawl urls/ TestCrawl/ http://localhost:8983/solr/ 2
>>
>>
>> On Tue, Apr 7, 2015 at 11:20 AM, Anchit Jain <[email protected]>
>> wrote:
>>
>>  Yes it is working correctly from the browser. I can also manually add the
>>> documents from web browser.But not through nutch.
>>> I am not able to figure out where the problem is.
>>>
>>> Is there any manual way of adding crawldb and linkdb to the nutch besides
>>> that command?
>>>
>>>
>>> On Tuesday 07 April 2015 09:47 PM, Jeff Cocking wrote:
>>>
>>>  There can be numerous reasons....Hosts.conf, firewall, etc.  These are
>>>> all
>>>> unique to your system.
>>>>
>>>> Have you viewed the solr admin panel via a browser?  This is a critical
>>>> step in the installation.  This validates SOLR can accept HTTP commands.
>>>>
>>>> On Tue, Apr 7, 2015 at 9:53 AM, Anchit Jain <[email protected]>
>>>> wrote:
>>>>
>>>>   I created a new core named *foo*. Than I copied the *schema.xml* from
>>>>
>>>>> *nutch* into *var/solr/data/foo/conf* with changes as described in*
>>>>> https://wiki.apache.org/nutch/NutchTutorial*.
>>>>> I changed the url to*http://localhost:8983/solr/#/foo*
>>>>> so new command is
>>>>> "*bin/nutch solrindex http://localhost:8983/solr/#/foo crawl/crawldb/
>>>>> -linkdb crawl/linkdb/ crawl/segments/20150406231502/ -filter
>>>>> -normalize*"
>>>>>    But now I am getting error
>>>>> *org.apache.solr.common.SolrException: HTTP method POST is not
>>>>> supported
>>>>> by this URL*
>>>>> *
>>>>> *
>>>>> Is some other change is also required in URL to support POST requests?
>>>>>
>>>>> Full log
>>>>>
>>>>> 2015-04-07 20:10:56,068 INFO  indexer.IndexingJob - Indexer: starting
>>>>> at
>>>>> 2015-04-07 20:10:56
>>>>> 2015-04-07 20:10:56,178 INFO  indexer.IndexingJob - Indexer: deleting
>>>>> gone
>>>>> documents: false
>>>>> 2015-04-07 20:10:56,178 INFO  indexer.IndexingJob - Indexer: URL
>>>>> filtering: true
>>>>> 2015-04-07 20:10:56,178 INFO  indexer.IndexingJob - Indexer: URL
>>>>> normalizing: true
>>>>> 2015-04-07 20:10:56,727 INFO  indexer.IndexWriters - Adding
>>>>> org.apache.nutch.indexwriter.solr.SolrIndexWriter
>>>>> 2015-04-07 20:10:56,727 INFO  indexer.IndexingJob - Active
>>>>> IndexWriters :
>>>>> SOLRIndexWriter
>>>>> solr.server.url : URL of the SOLR instance (mandatory)
>>>>> solr.commit.size : buffer size when sending to SOLR (default 1000)
>>>>> solr.mapping.file : name of the mapping file for fields (default
>>>>> solrindex-mapping.xml)
>>>>> solr.auth : use authentication (default false)
>>>>> solr.auth.username : use authentication (default false)
>>>>> solr.auth : username for authentication
>>>>> solr.auth.password : password for authentication
>>>>>
>>>>>
>>>>> 2015-04-07 20:10:56,772 INFO  indexer.IndexerMapReduce -
>>>>> IndexerMapReduce:
>>>>> crawldb: crawl/crawldb
>>>>> 2015-04-07 20:10:56,772 INFO  indexer.IndexerMapReduce -
>>>>> IndexerMapReduce:
>>>>> linkdb: crawl/linkdb
>>>>> 2015-04-07 20:10:56,772 INFO  indexer.IndexerMapReduce -
>>>>> IndexerMapReduces: adding segment: crawl/segments/20150406231502
>>>>> 2015-04-07 20:10:57,205 WARN  util.NativeCodeLoader - Unable to load
>>>>> native-hadoop library for your platform... using builtin-java classes
>>>>> where
>>>>> applicable
>>>>> 2015-04-07 20:10:58,020 INFO  anchor.AnchorIndexingFilter - Anchor
>>>>> deduplication is: off
>>>>> 2015-04-07 20:10:58,134 INFO  regex.RegexURLNormalizer - can't find
>>>>> rules
>>>>> for scope 'indexer', using default
>>>>> 2015-04-07 20:11:00,114 INFO  regex.RegexURLNormalizer - can't find
>>>>> rules
>>>>> for scope 'indexer', using default
>>>>> 2015-04-07 20:11:01,205 INFO  regex.RegexURLNormalizer - can't find
>>>>> rules
>>>>> for scope 'indexer', using default
>>>>> 2015-04-07 20:11:01,344 INFO  regex.RegexURLNormalizer - can't find
>>>>> rules
>>>>> for scope 'indexer', using default
>>>>> 2015-04-07 20:11:01,577 INFO  regex.RegexURLNormalizer - can't find
>>>>> rules
>>>>> for scope 'indexer', using default
>>>>> 2015-04-07 20:11:01,788 INFO  regex.RegexURLNormalizer - can't find
>>>>> rules
>>>>> for scope 'indexer', using default
>>>>> 2015-04-07 20:11:01,921 INFO  indexer.IndexWriters - Adding
>>>>> org.apache.nutch.indexwriter.solr.SolrIndexWriter
>>>>> 2015-04-07 20:11:01,986 INFO  solr.SolrMappingReader - source: content
>>>>> dest: content
>>>>> 2015-04-07 20:11:01,986 INFO  solr.SolrMappingReader - source: title
>>>>> dest:
>>>>> title
>>>>> 2015-04-07 20:11:01,986 INFO  solr.SolrMappingReader - source: host
>>>>> dest:
>>>>> host
>>>>> 2015-04-07 20:11:01,986 INFO  solr.SolrMappingReader - source: segment
>>>>> dest: segment
>>>>> 2015-04-07 20:11:01,987 INFO  solr.SolrMappingReader - source: boost
>>>>> dest:
>>>>> boost
>>>>> 2015-04-07 20:11:01,987 INFO  solr.SolrMappingReader - source: digest
>>>>> dest: digest
>>>>> 2015-04-07 20:11:01,987 INFO  solr.SolrMappingReader - source: tstamp
>>>>> dest: tstamp
>>>>> 2015-04-07 20:11:02,266 INFO  solr.SolrIndexWriter - Indexing 250
>>>>> documents
>>>>> 2015-04-07 20:11:02,267 INFO  solr.SolrIndexWriter - Deleting 0
>>>>> documents
>>>>> 2015-04-07 20:11:02,512 INFO  solr.SolrIndexWriter - Indexing 250
>>>>> documents
>>>>> *2015-04-07 20:11:02,576 WARN  mapred.LocalJobRunner -
>>>>> job_local1831338118_0001*
>>>>> *org.apache.solr.common.SolrException: HTTP method POST is not
>>>>> supported
>>>>> by this URL*
>>>>> *
>>>>> *
>>>>> *HTTP method POST is not supported by this URL*
>>>>>
>>>>> request: http://localhost:8983/solr/
>>>>> at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.
>>>>> request(CommonsHttpSolrServer.java:430)
>>>>> at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.
>>>>> request(CommonsHttpSolrServer.java:244)
>>>>> at org.apache.solr.client.solrj.request.AbstractUpdateRequest.
>>>>> process(AbstractUpdateRequest.java:105)
>>>>> at org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(
>>>>> SolrIndexWriter.java:135)
>>>>> at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:88)
>>>>> at org.apache.nutch.indexer.IndexerOutputFormat$1.write(
>>>>> IndexerOutputFormat.java:50)
>>>>> at org.apache.nutch.indexer.IndexerOutputFormat$1.write(
>>>>> IndexerOutputFormat.java:41)
>>>>> at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(
>>>>> ReduceTask.java:458)
>>>>> at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:500)
>>>>> at org.apache.nutch.indexer.IndexerMapReduce.reduce(
>>>>> IndexerMapReduce.java:323)
>>>>> at org.apache.nutch.indexer.IndexerMapReduce.reduce(
>>>>> IndexerMapReduce.java:53)
>>>>> at org.apache.hadoop.mapred.ReduceTask.runOldReducer(
>>>>> ReduceTask.java:522)
>>>>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
>>>>> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(
>>>>> LocalJobRunner.java:398)
>>>>> 2015-04-07 20:11:02,724 ERROR indexer.IndexingJob - Indexer:
>>>>> java.io.IOException: Job failed!
>>>>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
>>>>> at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
>>>>> at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
>>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>> at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)
>>>>>
>>>>>
>>>>> On Tuesday 07 April 2015 06:53 PM, Jeff Cocking wrote:
>>>>>
>>>>>   The command you are using is not pointing to the specific solr index
>>>>> you
>>>>>
>>>>>> created.  The http://localhost:8983/solr needs to be changed to the
>>>>>> URL
>>>>>> for
>>>>>> the core created.  It should look like
>>>>>> http://localhost:8983/solr/#/new_core_name.
>>>>>>
>>>>>>
>>>>>> On Tue, Apr 7, 2015 at 2:33 AM, Anchit Jain <[email protected]
>>>>>> >
>>>>>> wrote:
>>>>>>
>>>>>>    I followed instructions as given on your blog and created a new
>>>>>> core
>>>>>> for
>>>>>>
>>>>>>  nutch data and copied schema.xml of nutch into it.
>>>>>>> Then I run the following command in nutch working directory
>>>>>>> bin/nutch solrindex http://localhost:8983/solr crawl/crawldb/
>>>>>>> -linkdb
>>>>>>> crawl/linkdb/ crawl/segments/20150406231502/ -filter -normalize
>>>>>>>
>>>>>>> But then also the same error is coming as like previous runs.
>>>>>>>
>>>>>>> On Tue, 7 Apr 2015 at 10:00 Jeff Cocking <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>    Solr5 is multicore by default. You have not finished the install
>>>>>>> by
>>>>>>>
>>>>>>>  setting up solr5's core.  I would suggest you look at the link I
>>>>>>>> sent
>>>>>>>> to
>>>>>>>> finish up your setup.
>>>>>>>>
>>>>>>>> After you finish your install your solr URL will be
>>>>>>>> http://localhost:8983/solr/#/core_name.
>>>>>>>>
>>>>>>>> Jeff Cocking
>>>>>>>>
>>>>>>>> I apologize for my brevity.
>>>>>>>> This was sent from my mobile device while I should be focusing on
>>>>>>>> something else.....
>>>>>>>> Like a meeting, driving, family, etc.
>>>>>>>>
>>>>>>>>    On Apr 6, 2015, at 11:16 PM, Anchit Jain <
>>>>>>>> [email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>   I have already installed Solr.I want to integrate it with nutch.
>>>>>>>>
>>>>>>>>> Whenever I try to issue this command to nutch
>>>>>>>>> ""bin/nutch solrindex http://localhost:8983/solr crawl/crawldb/
>>>>>>>>>
>>>>>>>>>   -linkdb
>>>>>>>>>
>>>>>>>> crawl/linkdb/ crawl/segments/20150406231502/ -filter -normalize"
>>>>>>>>
>>>>>>>>  I always get a error
>>>>>>>>>
>>>>>>>>> Indexer: java.io.IOException: Job failed!
>>>>>>>>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
>>>>>>>>> at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.
>>>>>>>>> java:114)
>>>>>>>>> at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)
>>>>>>>>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>>>>>> at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here is the complete hadoop log for the process.I have underlined
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>>   error
>>>>>>>>>
>>>>>>>>   part in it.
>>>>>>>>
>>>>>>>>> 2015-04-07 09:38:06,613 INFO  indexer.IndexingJob - Indexer:
>>>>>>>>> starting
>>>>>>>>>
>>>>>>>>>   at
>>>>>>>>>
>>>>>>>> 2015-04-07 09:38:06
>>>>>>>>
>>>>>>>>  2015-04-07 09:38:06,684 INFO  indexer.IndexingJob - Indexer:
>>>>>>>>> deleting
>>>>>>>>>
>>>>>>>>>   gone
>>>>>>>>>
>>>>>>>>   documents: false
>>>>>>>>
>>>>>>>>> 2015-04-07 09:38:06,685 INFO  indexer.IndexingJob - Indexer: URL
>>>>>>>>>
>>>>>>>>>   filtering:
>>>>>>>>>
>>>>>>>>   true
>>>>>>>>
>>>>>>>>> 2015-04-07 09:38:06,685 INFO  indexer.IndexingJob - Indexer: URL
>>>>>>>>> normalizing: true
>>>>>>>>> 2015-04-07 09:38:06,893 INFO  indexer.IndexWriters - Adding
>>>>>>>>> org.apache.nutch.indexwriter.solr.SolrIndexWriter
>>>>>>>>> 2015-04-07 09:38:06,893 INFO  indexer.IndexingJob - Active
>>>>>>>>>
>>>>>>>>>   IndexWriters :
>>>>>>>>>
>>>>>>>> SOLRIndexWriter
>>>>>>>>
>>>>>>>>  solr.server.url : URL of the SOLR instance (mandatory)
>>>>>>>>> solr.commit.size : buffer size when sending to SOLR (default 1000)
>>>>>>>>> solr.mapping.file : name of the mapping file for fields (default
>>>>>>>>> solrindex-mapping.xml)
>>>>>>>>> solr.auth : use authentication (default false)
>>>>>>>>> solr.auth.username : use authentication (default false)
>>>>>>>>> solr.auth : username for authentication
>>>>>>>>> solr.auth.password : password for authentication
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2015-04-07 09:38:06,898 INFO  indexer.IndexerMapReduce -
>>>>>>>>>
>>>>>>>>>   IndexerMapReduce:
>>>>>>>>>
>>>>>>>>   crawldb: crawl/crawldb
>>>>>>>>
>>>>>>>>> 2015-04-07 09:38:06,898 INFO  indexer.IndexerMapReduce -
>>>>>>>>>
>>>>>>>>>   IndexerMapReduce:
>>>>>>>>>
>>>>>>>>   linkdb: crawl/linkdb
>>>>>>>>
>>>>>>>>> 2015-04-07 09:38:06,898 INFO  indexer.IndexerMapReduce -
>>>>>>>>>
>>>>>>>>>   IndexerMapReduces:
>>>>>>>>>
>>>>>>>>   adding segment: crawl/segments/20150406231502
>>>>>>>>
>>>>>>>>> 2015-04-07 09:38:07,036 WARN  util.NativeCodeLoader - Unable to
>>>>>>>>> load
>>>>>>>>> native-hadoop library for your platform... using builtin-java
>>>>>>>>> classes
>>>>>>>>>
>>>>>>>>>   where
>>>>>>>>>
>>>>>>>>   applicable
>>>>>>>>
>>>>>>>>> 2015-04-07 09:38:07,540 INFO  anchor.AnchorIndexingFilter - Anchor
>>>>>>>>> deduplication is: off
>>>>>>>>> 2015-04-07 09:38:07,565 INFO  regex.RegexURLNormalizer - can't find
>>>>>>>>>
>>>>>>>>>   rules
>>>>>>>>>
>>>>>>>> for scope 'indexer', using default
>>>>>>>>
>>>>>>>>  2015-04-07 09:38:09,552 INFO  regex.RegexURLNormalizer - can't find
>>>>>>>>>
>>>>>>>>>   rules
>>>>>>>>>
>>>>>>>> for scope 'indexer', using default
>>>>>>>>
>>>>>>>>  2015-04-07 09:38:10,642 INFO  regex.RegexURLNormalizer - can't find
>>>>>>>>>
>>>>>>>>>   rules
>>>>>>>>>
>>>>>>>> for scope 'indexer', using default
>>>>>>>>
>>>>>>>>  2015-04-07 09:38:10,734 INFO  regex.RegexURLNormalizer - can't find
>>>>>>>>>
>>>>>>>>>   rules
>>>>>>>>>
>>>>>>>> for scope 'indexer', using default
>>>>>>>>
>>>>>>>>  2015-04-07 09:38:10,895 INFO  regex.RegexURLNormalizer - can't find
>>>>>>>>>
>>>>>>>>>   rules
>>>>>>>>>
>>>>>>>> for scope 'indexer', using default
>>>>>>>>
>>>>>>>>  2015-04-07 09:38:11,088 INFO  regex.RegexURLNormalizer - can't find
>>>>>>>>>
>>>>>>>>>   rules
>>>>>>>>>
>>>>>>>> for scope 'indexer', using default
>>>>>>>>
>>>>>>>>  2015-04-07 09:38:11,219 INFO  indexer.IndexWriters - Adding
>>>>>>>>> org.apache.nutch.indexwriter.solr.SolrIndexWriter
>>>>>>>>> 2015-04-07 09:38:11,237 INFO  solr.SolrMappingReader - source:
>>>>>>>>> content
>>>>>>>>> dest: content
>>>>>>>>> 2015-04-07 09:38:11,237 INFO  solr.SolrMappingReader - source:
>>>>>>>>> title
>>>>>>>>>
>>>>>>>>>   dest:
>>>>>>>>>
>>>>>>>>   title
>>>>>>>>
>>>>>>>>> 2015-04-07 09:38:11,237 INFO  solr.SolrMappingReader - source: host
>>>>>>>>>
>>>>>>>>>   dest:
>>>>>>>>>
>>>>>>>> host
>>>>>>>>
>>>>>>>>  2015-04-07 09:38:11,237 INFO  solr.SolrMappingReader - source:
>>>>>>>>> segment
>>>>>>>>> dest: segment
>>>>>>>>> 2015-04-07 09:38:11,237 INFO  solr.SolrMappingReader - source:
>>>>>>>>> boost
>>>>>>>>>
>>>>>>>>>   dest:
>>>>>>>>>
>>>>>>>>   boost
>>>>>>>>
>>>>>>>>> 2015-04-07 09:38:11,237 INFO  solr.SolrMappingReader - source:
>>>>>>>>> digest
>>>>>>>>>
>>>>>>>>>   dest:
>>>>>>>>>
>>>>>>>>   digest
>>>>>>>>
>>>>>>>>> 2015-04-07 09:38:11,237 INFO  solr.SolrMappingReader - source:
>>>>>>>>> tstamp
>>>>>>>>>
>>>>>>>>>   dest:
>>>>>>>>>
>>>>>>>>   tstamp
>>>>>>>>
>>>>>>>>> 2015-04-07 09:38:11,526 INFO  solr.SolrIndexWriter - Indexing 250
>>>>>>>>>
>>>>>>>>>   documents
>>>>>>>>>
>>>>>>>>   2015-04-07 09:38:11,526 INFO  solr.SolrIndexWriter - Deleting 0
>>>>>>>>
>>>>>>>>>   documents
>>>>>>>>>
>>>>>>>> 2015-04-07 09:38:11,644 INFO  solr.SolrIndexWriter - Indexing 250
>>>>>>>> documents
>>>>>>>>
>>>>>>>>   *2015-04-07 09:38:11,699 WARN  mapred.LocalJobRunner -
>>>>>>>>
>>>>>>>>> job_local1245074757_0001*
>>>>>>>>> *org.apache.solr.common.SolrException: Not Found*
>>>>>>>>>
>>>>>>>>> *Not Found*
>>>>>>>>>
>>>>>>>>> *request: http://localhost:8983/solr/update?wt=javabin&version=2
>>>>>>>>> <http://localhost:8983/solr/update?wt=javabin&version=2>*
>>>>>>>>> * at
>>>>>>>>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.
>>>>>>>>>
>>>>>>>>>   request(CommonsHttpSolrServer.java:430)*
>>>>>>>>>
>>>>>>>>   * at
>>>>>>>>
>>>>>>>>> org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.
>>>>>>>>>
>>>>>>>>>   request(CommonsHttpSolrServer.java:244)*
>>>>>>>>>
>>>>>>>>   * at
>>>>>>>>
>>>>>>>>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.
>>>>>>>>>
>>>>>>>>>   process(AbstractUpdateRequest.java:105)*
>>>>>>>>>
>>>>>>>>   * at
>>>>>>>>
>>>>>>>>> org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(
>>>>>>>>>
>>>>>>>>>   SolrIndexWriter.java:135)*
>>>>>>>>>
>>>>>>>>   * at org.apache.nutch.indexer.IndexWriters.write(
>>>>>>>>
>>>>>>>>> IndexWriters.java:88)*
>>>>>>>>> * at
>>>>>>>>> org.apache.nutch.indexer.IndexerOutputFormat$1.write(
>>>>>>>>>
>>>>>>>>>   IndexerOutputFormat.java:50)*
>>>>>>>>>
>>>>>>>>   * at
>>>>>>>>
>>>>>>>>> org.apache.nutch.indexer.IndexerOutputFormat$1.write(
>>>>>>>>>
>>>>>>>>>   IndexerOutputFormat.java:41)*
>>>>>>>>>
>>>>>>>>   * at
>>>>>>>>
>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(
>>>>>>>>>
>>>>>>>>>   ReduceTask.java:458)*
>>>>>>>>>
>>>>>>>>   * at
>>>>>>>>
>>>>>>>>>   org.apache.hadoop.mapred.ReduceTask$3.collect(
>>>>>>>>> ReduceTask.java:500)*
>>>>>>>>>
>>>>>>>> * at
>>>>>>>>
>>>>>>>>  org.apache.nutch.indexer.IndexerMapReduce.reduce(
>>>>>>>>>
>>>>>>>>>   IndexerMapReduce.java:323)*
>>>>>>>>>
>>>>>>>>   * at
>>>>>>>>
>>>>>>>>> org.apache.nutch.indexer.IndexerMapReduce.reduce(
>>>>>>>>>
>>>>>>>>>   IndexerMapReduce.java:53)*
>>>>>>>>>
>>>>>>>>   * at org.apache.hadoop.mapred.ReduceTask.runOldReducer(
>>>>>>>>
>>>>>>>>>   ReduceTask.java:522)*
>>>>>>>>>
>>>>>>>>   * at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.
>>>>>>>> java:421)*
>>>>>>>>
>>>>>>>>> * at
>>>>>>>>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(
>>>>>>>>>
>>>>>>>>>   LocalJobRunner.java:398)*
>>>>>>>>>
>>>>>>>>   *2015-04-07 09:38:12,408 ERROR indexer.IndexingJob - Indexer:
>>>>>>>>
>>>>>>>>> java.io.IOException: Job failed!*
>>>>>>>>> * at org.apache.hadoop.mapred.JobClient.runJob(JobClient.
>>>>>>>>> java:1357)*
>>>>>>>>> * at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.
>>>>>>>>> java:114)*
>>>>>>>>> * at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.
>>>>>>>>> java:176)*
>>>>>>>>> * at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)*
>>>>>>>>> * at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.
>>>>>>>>> java:186)*
>>>>>>>>>
>>>>>>>>>   On Tue, 7 Apr 2015 at 03:18 Jeff Cocking <[email protected]
>>>>>>>>> >
>>>>>>>>>
>>>>>>>>>>   wrote:
>>>>>>>>>>
>>>>>>>>> With Solr5.0.0 you can skip that step.  Solr will auto create your
>>>>>>>>> schema
>>>>>>>>> document based on the data being provided.
>>>>>>>>>
>>>>>>>>>  One of the new features with Solr5 is the install/service
>>>>>>>>>> feature. I
>>>>>>>>>>
>>>>>>>>>>   did a
>>>>>>>>>>
>>>>>>>>> quick write up on how to install Solr5 on Centos.  Might be
>>>>>>>>> something
>>>>>>>>>
>>>>>>>>>  useful there for you.
>>>>>>>>>>
>>>>>>>>>> http://www.cocking.com/apache-solr-5-0-install-on-centos-7/
>>>>>>>>>>
>>>>>>>>>> jeff
>>>>>>>>>>
>>>>>>>>>> On Mon, Apr 6, 2015 at 3:13 PM, Anchit Jain <
>>>>>>>>>> [email protected]
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>    I want to index nutch results using *Solr 5.0* but as
>>>>>>>>>> mentioned in
>>>>>>>>>>
>>>>>>>>>>  https://wiki.apache.org/nutch/NutchTutorial there is no
>>>>>>>>>>> directory
>>>>>>>>>>> ${APACHE_SOLR_HOME}/example/solr/collection1/conf/
>>>>>>>>>>> in  solr 5.0 . So where I have to copy *schema.xml*?
>>>>>>>>>>> Also there is no *start.jar* present in example directory.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>

Re: Nutch 1.9 integration with Solr 5.0.0

Reply via email to