[Nutch] and Solr integration

Adam Estrada Mon, 20 Dec 2010 12:27:44 -0800

All,

I have a couple websites that I need to crawl and the following command line
used to work I think. Solr is up and running and everything is fine there
and I can go through and index the site but I really need the results added
to Solr after the crawl. Does anyone have any idea on how to make that
happen or what I'm doing wrong?  These errors are being thrown fro Hadoop
which I am not using at all.


$ bin/nutch crawl urls -dir crawl -threads 10 -depth 100 -topN 50 -solrindex
ht
tp://localhost:8983/solr
crawl started in: crawl
rootUrlDir = http://localhost:8983/solr
threads = 10
depth = 100
indexer=lucene
topN = 50
Injector: starting at 2010-12-20 15:23:25
Injector: crawlDb: crawl/crawldb
Injector: urlDir: http://localhost:8983/solr
Injector: Converting injected urls to crawl db entries.
Exception in thread "main" java.io.IOException: No FileSystem for scheme:
http
        at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375
)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
        at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.j
ava:169)
        at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.ja
va:201)
        at
org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)

        at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:7
81)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:217)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:124)

[Nutch] and Solr integration

Reply via email to