java.io.IOException: No FileSystem for scheme: http

Guy McD Thu, 24 Dec 2015 05:30:30 -0800

When running

bin/nutch solrindex \ http://localhost:8983/solr/nutch_solr_data_core \
crawl/crawldb/ -linkdb crawl/linkdb/ $s1


the Nutch results do not get indexed into Solr. Solr shows no docs in the
core.

The only thing that looks like an error message in the response to that
command is:
Indexer: java.io.IOException: No FileSystem for scheme: http

1. Is that the issue?
I suspect it has more to do with the Java implementation than anything, but
not sure where to go with that suspicion.
2. How do I fix it?
Or at least point me in the right direction to figure it out for myself.

Background:

   - Followed Nutch tutorial at Apache's Nutch page.
   - Everything appears to have worked to this point.
   - nutch_solr_data_core does exist as a core in Solr.
   - Data was definitely brought back by the crawl.
   - Searched mail archives and Google for the phrase No FileSystem for
   scheme: http. No useful responses were found. Seems like whenever this
   question is asked, ti doesn't get answered.


Specs:

   - Nutch 1.11
   - Solr 5.4.0
   - Java Default for Ubuntu 14.04 LTS


Response Message in Entirety:
root@Walleye:/nutch/nutch# bin/nutch solrindex \
http://localhost:8983/solr/nutch_solr_data_core \ crawl/crawldb/ -linkdb
crawl/linkdb/
$s1
Indexer: starting at 2015-12-24 08:44:07
Indexer: deleting gone documents: false
Indexer: URL filtering: false
Indexer: URL normalizing: false
Active IndexWriters :
SolrIndexWriter
        solr.server.type : Type of SolrServer to communicate with (default
'http' however options include 'cloud', 'lb' and 'concurrent')
        solr.server.url : URL of the Solr instance (mandatory)
        solr.zookeeper.url : URL of the Zookeeper URL (mandatory if 'cloud'
value for solr.server.type)
        solr.loadbalance.urls : Comma-separated string of Solr server
strings to be used (madatory if 'lb' value for solr.server.type)
        solr.mapping.file : name of the mapping file for fields (default
solrindex-mapping.xml)
        solr.commit.size : buffer size when sending to Solr (default 1000)
        solr.auth : use authentication (default false)
        solr.auth.username : username for authentication
        solr.auth.password : password for authentication


Indexer: java.io.IOException: No FileSystem for scheme: http
        at
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2385)
        at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
        at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
        at
org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256)
        at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
        at
org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:45)
        at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304)
        at
org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520)
        at
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512)
        at
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
        at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
        at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:833)
        at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)
        at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:222)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:231)

Guy McDowell
[email protected]
http://www.GuyMcDowell.com

java.io.IOException: No FileSystem for scheme: http

Reply via email to