Same error :-( . So no workaround for the error?
On Tuesday 07 April 2015 10:06 PM, Jeff Cocking wrote:
I use the following for all my indexing work. Usage: bin/crawl <seedDir> <crawlDir> <solrURL> <numberOfRounds> Example: bin/crawl urls/ TestCrawl/ http://localhost:8983/solr/ 2 On Tue, Apr 7, 2015 at 11:20 AM, Anchit Jain <[email protected]> wrote:Yes it is working correctly from the browser. I can also manually add the documents from web browser.But not through nutch. I am not able to figure out where the problem is. Is there any manual way of adding crawldb and linkdb to the nutch besides that command? On Tuesday 07 April 2015 09:47 PM, Jeff Cocking wrote:There can be numerous reasons....Hosts.conf, firewall, etc. These are all unique to your system. Have you viewed the solr admin panel via a browser? This is a critical step in the installation. This validates SOLR can accept HTTP commands. On Tue, Apr 7, 2015 at 9:53 AM, Anchit Jain <[email protected]> wrote: I created a new core named *foo*. Than I copied the *schema.xml* from*nutch* into *var/solr/data/foo/conf* with changes as described in* https://wiki.apache.org/nutch/NutchTutorial*. I changed the url to*http://localhost:8983/solr/#/foo* so new command is "*bin/nutch solrindex http://localhost:8983/solr/#/foo crawl/crawldb/ -linkdb crawl/linkdb/ crawl/segments/20150406231502/ -filter -normalize*" But now I am getting error *org.apache.solr.common.SolrException: HTTP method POST is not supported by this URL* * * Is some other change is also required in URL to support POST requests? Full log 2015-04-07 20:10:56,068 INFO indexer.IndexingJob - Indexer: starting at 2015-04-07 20:10:56 2015-04-07 20:10:56,178 INFO indexer.IndexingJob - Indexer: deleting gone documents: false 2015-04-07 20:10:56,178 INFO indexer.IndexingJob - Indexer: URL filtering: true 2015-04-07 20:10:56,178 INFO indexer.IndexingJob - Indexer: URL normalizing: true 2015-04-07 20:10:56,727 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter 2015-04-07 20:10:56,727 INFO indexer.IndexingJob - Active IndexWriters : SOLRIndexWriter solr.server.url : URL of the SOLR instance (mandatory) solr.commit.size : buffer size when sending to SOLR (default 1000) solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml) solr.auth : use authentication (default false) solr.auth.username : use authentication (default false) solr.auth : username for authentication solr.auth.password : password for authentication 2015-04-07 20:10:56,772 INFO indexer.IndexerMapReduce - IndexerMapReduce: crawldb: crawl/crawldb 2015-04-07 20:10:56,772 INFO indexer.IndexerMapReduce - IndexerMapReduce: linkdb: crawl/linkdb 2015-04-07 20:10:56,772 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: crawl/segments/20150406231502 2015-04-07 20:10:57,205 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2015-04-07 20:10:58,020 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 2015-04-07 20:10:58,134 INFO regex.RegexURLNormalizer - can't find rules for scope 'indexer', using default 2015-04-07 20:11:00,114 INFO regex.RegexURLNormalizer - can't find rules for scope 'indexer', using default 2015-04-07 20:11:01,205 INFO regex.RegexURLNormalizer - can't find rules for scope 'indexer', using default 2015-04-07 20:11:01,344 INFO regex.RegexURLNormalizer - can't find rules for scope 'indexer', using default 2015-04-07 20:11:01,577 INFO regex.RegexURLNormalizer - can't find rules for scope 'indexer', using default 2015-04-07 20:11:01,788 INFO regex.RegexURLNormalizer - can't find rules for scope 'indexer', using default 2015-04-07 20:11:01,921 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter 2015-04-07 20:11:01,986 INFO solr.SolrMappingReader - source: content dest: content 2015-04-07 20:11:01,986 INFO solr.SolrMappingReader - source: title dest: title 2015-04-07 20:11:01,986 INFO solr.SolrMappingReader - source: host dest: host 2015-04-07 20:11:01,986 INFO solr.SolrMappingReader - source: segment dest: segment 2015-04-07 20:11:01,987 INFO solr.SolrMappingReader - source: boost dest: boost 2015-04-07 20:11:01,987 INFO solr.SolrMappingReader - source: digest dest: digest 2015-04-07 20:11:01,987 INFO solr.SolrMappingReader - source: tstamp dest: tstamp 2015-04-07 20:11:02,266 INFO solr.SolrIndexWriter - Indexing 250 documents 2015-04-07 20:11:02,267 INFO solr.SolrIndexWriter - Deleting 0 documents 2015-04-07 20:11:02,512 INFO solr.SolrIndexWriter - Indexing 250 documents *2015-04-07 20:11:02,576 WARN mapred.LocalJobRunner - job_local1831338118_0001* *org.apache.solr.common.SolrException: HTTP method POST is not supported by this URL* * * *HTTP method POST is not supported by this URL* request: http://localhost:8983/solr/ at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer. request(CommonsHttpSolrServer.java:430) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer. request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.AbstractUpdateRequest. process(AbstractUpdateRequest.java:105) at org.apache.nutch.indexwriter.solr.SolrIndexWriter.write( SolrIndexWriter.java:135) at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:88) at org.apache.nutch.indexer.IndexerOutputFormat$1.write( IndexerOutputFormat.java:50) at org.apache.nutch.indexer.IndexerOutputFormat$1.write( IndexerOutputFormat.java:41) at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write( ReduceTask.java:458) at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:500) at org.apache.nutch.indexer.IndexerMapReduce.reduce( IndexerMapReduce.java:323) at org.apache.nutch.indexer.IndexerMapReduce.reduce( IndexerMapReduce.java:53) at org.apache.hadoop.mapred.ReduceTask.runOldReducer( ReduceTask.java:522) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421) at org.apache.hadoop.mapred.LocalJobRunner$Job.run( LocalJobRunner.java:398) 2015-04-07 20:11:02,724 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357) at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114) at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186) On Tuesday 07 April 2015 06:53 PM, Jeff Cocking wrote: The command you are using is not pointing to the specific solr index youcreated. The http://localhost:8983/solr needs to be changed to the URL for the core created. It should look like http://localhost:8983/solr/#/new_core_name. On Tue, Apr 7, 2015 at 2:33 AM, Anchit Jain <[email protected]> wrote: I followed instructions as given on your blog and created a new core fornutch data and copied schema.xml of nutch into it. Then I run the following command in nutch working directory bin/nutch solrindex http://localhost:8983/solr crawl/crawldb/ -linkdb crawl/linkdb/ crawl/segments/20150406231502/ -filter -normalize But then also the same error is coming as like previous runs. On Tue, 7 Apr 2015 at 10:00 Jeff Cocking <[email protected]> wrote: Solr5 is multicore by default. You have not finished the install bysetting up solr5's core. I would suggest you look at the link I sent to finish up your setup. After you finish your install your solr URL will be http://localhost:8983/solr/#/core_name. Jeff Cocking I apologize for my brevity. This was sent from my mobile device while I should be focusing on something else..... Like a meeting, driving, family, etc. On Apr 6, 2015, at 11:16 PM, Anchit Jain <[email protected]> wrote: I have already installed Solr.I want to integrate it with nutch.Whenever I try to issue this command to nutch ""bin/nutch solrindex http://localhost:8983/solr crawl/crawldb/ -linkdbcrawl/linkdb/ crawl/segments/20150406231502/ -filter -normalize"I always get a error Indexer: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357) at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114) at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:186) Here is the complete hadoop log for the process.I have underlined the errorpart in it.2015-04-07 09:38:06,613 INFO indexer.IndexingJob - Indexer: starting at2015-04-07 09:38:062015-04-07 09:38:06,684 INFO indexer.IndexingJob - Indexer: deleting gonedocuments: false2015-04-07 09:38:06,685 INFO indexer.IndexingJob - Indexer: URL filtering:true2015-04-07 09:38:06,685 INFO indexer.IndexingJob - Indexer: URL normalizing: true 2015-04-07 09:38:06,893 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter 2015-04-07 09:38:06,893 INFO indexer.IndexingJob - Active IndexWriters :SOLRIndexWritersolr.server.url : URL of the SOLR instance (mandatory) solr.commit.size : buffer size when sending to SOLR (default 1000) solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml) solr.auth : use authentication (default false) solr.auth.username : use authentication (default false) solr.auth : username for authentication solr.auth.password : password for authentication 2015-04-07 09:38:06,898 INFO indexer.IndexerMapReduce - IndexerMapReduce:crawldb: crawl/crawldb2015-04-07 09:38:06,898 INFO indexer.IndexerMapReduce - IndexerMapReduce:linkdb: crawl/linkdb2015-04-07 09:38:06,898 INFO indexer.IndexerMapReduce - IndexerMapReduces:adding segment: crawl/segments/201504062315022015-04-07 09:38:07,036 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes whereapplicable2015-04-07 09:38:07,540 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 2015-04-07 09:38:07,565 INFO regex.RegexURLNormalizer - can't find rulesfor scope 'indexer', using default2015-04-07 09:38:09,552 INFO regex.RegexURLNormalizer - can't find rulesfor scope 'indexer', using default2015-04-07 09:38:10,642 INFO regex.RegexURLNormalizer - can't find rulesfor scope 'indexer', using default2015-04-07 09:38:10,734 INFO regex.RegexURLNormalizer - can't find rulesfor scope 'indexer', using default2015-04-07 09:38:10,895 INFO regex.RegexURLNormalizer - can't find rulesfor scope 'indexer', using default2015-04-07 09:38:11,088 INFO regex.RegexURLNormalizer - can't find rulesfor scope 'indexer', using default2015-04-07 09:38:11,219 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter 2015-04-07 09:38:11,237 INFO solr.SolrMappingReader - source: content dest: content 2015-04-07 09:38:11,237 INFO solr.SolrMappingReader - source: title dest:title2015-04-07 09:38:11,237 INFO solr.SolrMappingReader - source: host dest:host2015-04-07 09:38:11,237 INFO solr.SolrMappingReader - source: segment dest: segment 2015-04-07 09:38:11,237 INFO solr.SolrMappingReader - source: boost dest:boost2015-04-07 09:38:11,237 INFO solr.SolrMappingReader - source: digest dest:digest2015-04-07 09:38:11,237 INFO solr.SolrMappingReader - source: tstamp dest:tstamp2015-04-07 09:38:11,526 INFO solr.SolrIndexWriter - Indexing 250 documents2015-04-07 09:38:11,526 INFO solr.SolrIndexWriter - Deleting 0documents2015-04-07 09:38:11,644 INFO solr.SolrIndexWriter - Indexing 250 documents *2015-04-07 09:38:11,699 WARN mapred.LocalJobRunner -job_local1245074757_0001* *org.apache.solr.common.SolrException: Not Found* *Not Found* *request: http://localhost:8983/solr/update?wt=javabin&version=2 <http://localhost:8983/solr/update?wt=javabin&version=2>* * at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer. request(CommonsHttpSolrServer.java:430)** atorg.apache.solr.client.solrj.impl.CommonsHttpSolrServer. request(CommonsHttpSolrServer.java:244)** atorg.apache.solr.client.solrj.request.AbstractUpdateRequest. process(AbstractUpdateRequest.java:105)** atorg.apache.nutch.indexwriter.solr.SolrIndexWriter.write( SolrIndexWriter.java:135)** at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:88)* * at org.apache.nutch.indexer.IndexerOutputFormat$1.write( IndexerOutputFormat.java:50)** atorg.apache.nutch.indexer.IndexerOutputFormat$1.write( IndexerOutputFormat.java:41)** atorg.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write( ReduceTask.java:458)** atorg.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:500)** atorg.apache.nutch.indexer.IndexerMapReduce.reduce( IndexerMapReduce.java:323)** atorg.apache.nutch.indexer.IndexerMapReduce.reduce( IndexerMapReduce.java:53)** at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:522)** at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)** at org.apache.hadoop.mapred.LocalJobRunner$Job.run( LocalJobRunner.java:398)**2015-04-07 09:38:12,408 ERROR indexer.IndexingJob - Indexer:java.io.IOException: Job failed!* * at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)* * at org.apache.nutch.indexer.IndexingJob.index(IndexingJob. java:114)* * at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:176)* * at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)* * at org.apache.nutch.indexer.IndexingJob.main(IndexingJob. java:186)* On Tue, 7 Apr 2015 at 03:18 Jeff Cocking <[email protected]>wrote:With Solr5.0.0 you can skip that step. Solr will auto create your schema document based on the data being provided.One of the new features with Solr5 is the install/service feature. I did aquick write up on how to install Solr5 on Centos. Might be somethinguseful there for you. http://www.cocking.com/apache-solr-5-0-install-on-centos-7/ jeff On Mon, Apr 6, 2015 at 3:13 PM, Anchit Jain < [email protected] wrote: I want to index nutch results using *Solr 5.0* but as mentioned inhttps://wiki.apache.org/nutch/NutchTutorial there is no directory ${APACHE_SOLR_HOME}/example/solr/collection1/conf/ in solr 5.0 . So where I have to copy *schema.xml*? Also there is no *start.jar* present in example directory.

