Aha! Thank you Markus.

And now that's on the books for those that follow after me.
Getting a path error now, but even I can figure that one out.

Merry Christmas, Happy Holidays, and a Prosperous, Contented New Year

Guy McDowell
[email protected]
http://www.GuyMcDowell.com




On Thu, Dec 24, 2015 at 9:55 AM, Markus Jelsma <[email protected]>
wrote:

> Hello - solrindex should no longer exist. Anyway, you should use
>
> bin/nutch index -Dsolr.server.url=http://blablabla crawldb segmentpath
>
>
> -----Original message-----
> > From:Guy McD <[email protected]>
> > Sent: Thursday 24th December 2015 14:30
> > To: [email protected]
> > Subject: java.io.IOException: No FileSystem for scheme: http
> >
> > When running
> >
> > bin/nutch solrindex \ http://localhost:8983/solr/nutch_solr_data_core \
> > crawl/crawldb/ -linkdb crawl/linkdb/ $s1
> >
> > the Nutch results do not get indexed into Solr. Solr shows no docs in the
> > core.
> >
> > The only thing that looks like an error message in the response to that
> > command is:
> > Indexer: java.io.IOException: No FileSystem for scheme: http
> >
> > 1. Is that the issue?
> > I suspect it has more to do with the Java implementation than anything,
> but
> > not sure where to go with that suspicion.
> > 2. How do I fix it?
> > Or at least point me in the right direction to figure it out for myself.
> >
> > Background:
> >
> >    - Followed Nutch tutorial at Apache's Nutch page.
> >    - Everything appears to have worked to this point.
> >    - nutch_solr_data_core does exist as a core in Solr.
> >    - Data was definitely brought back by the crawl.
> >    - Searched mail archives and Google for the phrase No FileSystem for
> >    scheme: http. No useful responses were found. Seems like whenever this
> >    question is asked, ti doesn't get answered.
> >
> >
> > Specs:
> >
> >    - Nutch 1.11
> >    - Solr 5.4.0
> >    - Java Default for Ubuntu 14.04 LTS
> >
> >
> > Response Message in Entirety:
> > root@Walleye:/nutch/nutch# bin/nutch solrindex \
> > http://localhost:8983/solr/nutch_solr_data_core \ crawl/crawldb/ -linkdb
> > crawl/linkdb/
> > $s1
> > Indexer: starting at 2015-12-24 08:44:07
> > Indexer: deleting gone documents: false
> > Indexer: URL filtering: false
> > Indexer: URL normalizing: false
> > Active IndexWriters :
> > SolrIndexWriter
> >         solr.server.type : Type of SolrServer to communicate with
> (default
> > 'http' however options include 'cloud', 'lb' and 'concurrent')
> >         solr.server.url : URL of the Solr instance (mandatory)
> >         solr.zookeeper.url : URL of the Zookeeper URL (mandatory if
> 'cloud'
> > value for solr.server.type)
> >         solr.loadbalance.urls : Comma-separated string of Solr server
> > strings to be used (madatory if 'lb' value for solr.server.type)
> >         solr.mapping.file : name of the mapping file for fields (default
> > solrindex-mapping.xml)
> >         solr.commit.size : buffer size when sending to Solr (default
> 1000)
> >         solr.auth : use authentication (default false)
> >         solr.auth.username : username for authentication
> >         solr.auth.password : password for authentication
> >
> >
> > Indexer: java.io.IOException: No FileSystem for scheme: http
> >         at
> > org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2385)
> >         at
> > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392)
> >         at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
> >         at
> > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
> >         at
> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
> >         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
> >         at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
> >         at
> >
> org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256)
> >         at
> >
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
> >         at
> >
> org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:45)
> >         at
> >
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304)
> >         at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520)
> >         at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512)
> >         at
> >
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)
> >         at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
> >         at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
> >         at java.security.AccessController.doPrivileged(Native Method)
> >         at javax.security.auth.Subject.doAs(Subject.java:415)
> >         at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> >         at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
> >         at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
> >         at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
> >         at java.security.AccessController.doPrivileged(Native Method)
> >         at javax.security.auth.Subject.doAs(Subject.java:415)
> >         at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> >         at
> > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
> >         at
> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
> >         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:833)
> >         at
> org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)
> >         at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:222)
> >         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> >         at
> org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:231)
> >
> > Guy McDowell
> > [email protected]
> > http://www.GuyMcDowell.com
> >
>

Reply via email to