RE: java.io.IOException: No FileSystem for scheme: http

Markus Jelsma Thu, 24 Dec 2015 05:55:50 -0800

Hello - solrindex should no longer exist. Anyway, you should use


bin/nutch index -Dsolr.server.url=http://blablabla crawldb segmentpath
 
 
-----Original message-----
> From:Guy McD <[email protected]>
> Sent: Thursday 24th December 2015 14:30
> To: [email protected]
> Subject: java.io.IOException: No FileSystem for scheme: http
> 
> When running
> 
> bin/nutch solrindex \ http://localhost:8983/solr/nutch_solr_data_core \
> crawl/crawldb/ -linkdb crawl/linkdb/ $s1
> 
> the Nutch results do not get indexed into Solr. Solr shows no docs in the
> core.
> 
> The only thing that looks like an error message in the response to that
> command is:
> Indexer: java.io.IOException: No FileSystem for scheme: http
> 
> 1. Is that the issue?
> I suspect it has more to do with the Java implementation than anything, but
> not sure where to go with that suspicion.
> 2. How do I fix it?
> Or at least point me in the right direction to figure it out for myself.
> 
> Background:
> 
>    - Followed Nutch tutorial at Apache's Nutch page.
>    - Everything appears to have worked to this point.
>    - nutch_solr_data_core does exist as a core in Solr.
>    - Data was definitely brought back by the crawl.
>    - Searched mail archives and Google for the phrase No FileSystem for
>    scheme: http. No useful responses were found. Seems like whenever this
>    question is asked, ti doesn't get answered.
> 
> 
> Specs:
> 
>    - Nutch 1.11
>    - Solr 5.4.0
>    - Java Default for Ubuntu 14.04 LTS
> 
> 
> Response Message in Entirety:
> root@Walleye:/nutch/nutch# bin/nutch solrindex \
> http://localhost:8983/solr/nutch_solr_data_core \ crawl/crawldb/ -linkdb
> crawl/linkdb/
> $s1
> Indexer: starting at 2015-12-24 08:44:07
> Indexer: deleting gone documents: false
> Indexer: URL filtering: false
> Indexer: URL normalizing: false
> Active IndexWriters :
> SolrIndexWriter
>         solr.server.type : Type of SolrServer to communicate with (default
> 'http' however options include 'cloud', 'lb' and 'concurrent')
>         solr.server.url : URL of the Solr instance (mandatory)
>         solr.zookeeper.url : URL of the Zookeeper URL (mandatory if 'cloud'
> value for solr.server.type)
>         solr.loadbalance.urls : Comma-separated string of Solr server
> strings to be used (madatory if 'lb' value for solr.server.type)
>         solr.mapping.file : name of the mapping file for fields (default
> solrindex-mapping.xml)
>         solr.commit.size : buffer size when sending to Solr (default 1000)
>         solr.auth : use authentication (default false)
>         solr.auth.username : username for authentication
>         solr.auth.password : password for authentication
> 
> 
> Indexer: java.io.IOException: No FileSystem for scheme: http
>         at
> org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2385)
>         at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392)
>         at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
>         at
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
>         at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
>         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
>         at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
>         at
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256)
>         at
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
>         at
> org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:45)
>         at
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304)
>         at
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520)
>         at
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512)
>         at
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)
>         at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
>         at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>         at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
>         at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>         at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>         at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>         at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:833)
>         at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)
>         at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:222)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:231)
> 
> Guy McDowell
> [email protected]
> http://www.GuyMcDowell.com
>

RE: java.io.IOException: No FileSystem for scheme: http

Reply via email to