why are using solrindex in the argument.? It is used when we need to index the crawled data in Solr For more read http://wiki.apache.org/nutch/NutchTutorial .
Also for nutch-solr integration this is very useful blog http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ I integrated nutch and solr and it works well. Thanks On Tue, Dec 21, 2010 at 1:57 AM, Adam Estrada-2 [via Lucene] < ml-node+2122347-622655030-146...@n3.nabble.com<ml-node%2b2122347-622655030-146...@n3.nabble.com> > wrote: > All, > > I have a couple websites that I need to crawl and the following command > line > used to work I think. Solr is up and running and everything is fine there > and I can go through and index the site but I really need the results added > > to Solr after the crawl. Does anyone have any idea on how to make that > happen or what I'm doing wrong? These errors are being thrown fro Hadoop > which I am not using at all. > > $ bin/nutch crawl urls -dir crawl -threads 10 -depth 100 -topN 50 > -solrindex > ht > tp://localhost:8983/solr > crawl started in: crawl > rootUrlDir = http://localhost:8983/solr > threads = 10 > depth = 100 > indexer=lucene > topN = 50 > Injector: starting at 2010-12-20 15:23:25 > Injector: crawlDb: crawl/crawldb > Injector: urlDir: http://localhost:8983/solr > Injector: Converting injected urls to crawl db entries. > Exception in thread "main" java.io.IOException: No FileSystem for scheme: > http > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375 > ) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175) > at > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.j > ava:169) > at > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.ja > va:201) > at > org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) > > at > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:7 > 81) > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249) > at org.apache.nutch.crawl.Injector.inject(Injector.java:217) > at org.apache.nutch.crawl.Crawl.main(Crawl.java:124) > > > ------------------------------ > View message @ > http://lucene.472066.n3.nabble.com/Nutch-and-Solr-integration-tp2122347p2122347.html > To start a new topic under Solr - User, email > ml-node+472068-1941297125-146...@n3.nabble.com<ml-node%2b472068-1941297125-146...@n3.nabble.com> > To unsubscribe from Solr - User, click > here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472068&code=YW51cmFnLml0LmpvbGx5QGdtYWlsLmNvbXw0NzIwNjh8LTIwOTgzNDQxOTY=>. > > -- Kumar Anurag ----- Kumar Anurag -- View this message in context: http://lucene.472066.n3.nabble.com/Nutch-and-Solr-integration-tp2122347p2122623.html Sent from the Solr - User mailing list archive at Nabble.com.