BLEH! <facepalm> This is entirely possible to do in a single step AS LONG AS YOU GET THE SYNTAX CORRECT ;-)
http://www.lucidimagination.com/blog/2010/09/10/refresh-using-nutch-with-solr/ <http://www.lucidimagination.com/blog/2010/09/10/refresh-using-nutch-with-solr/>bin/nutch crawl urls -dir crawl -threads 10 -depth 100 -topN 50* -solr* http://localhost:8983/solr <http://localhost:8983/solr>The correct param is -solr NOT -solrindex. Cheers, Adam On Mon, Jan 3, 2011 at 11:45 AM, Adam Estrada <estrada.a...@gmail.com>wrote: > All, > > I realize that the documentation says that you crawl first then add to Solr > but I spent several hours running the same command through Cygwin with > -solrindex http://localhost:8983/solr on the command line (eg. bin/nutch > crawl urls -dir crawl -threads 10 -depth 100 -topN 50 -solrindex > http://localhost:8983/solr) and it worked. Does anyone know why it's not > working for me anymore? I am using the Lucid build of Solr which was what i > was using before. I neglected to write down the command line syntax which is > biting me in the arse. Any tips on this one would be great! > > Thanks, > Adam > > On Mon, Dec 20, 2010 at 4:21 PM, Anurag <anurag.it.jo...@gmail.com> wrote: > >> >> why are using solrindex in the argument.? It is used when we need to index >> the crawled data in Solr >> For more read http://wiki.apache.org/nutch/NutchTutorial . >> >> Also for nutch-solr integration this is very useful blog >> http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ >> I integrated nutch and solr and it works well. >> >> Thanks >> >> On Tue, Dec 21, 2010 at 1:57 AM, Adam Estrada-2 [via Lucene] < >> ml-node+2122347-622655030-146...@n3.nabble.com<ml-node%2b2122347-622655030-146...@n3.nabble.com> >> <ml-node%2b2122347-622655030-146...@n3.nabble.com<ml-node%252b2122347-622655030-146...@n3.nabble.com> >> > >> > wrote: >> >> > All, >> > >> > I have a couple websites that I need to crawl and the following command >> > line >> > used to work I think. Solr is up and running and everything is fine >> there >> > and I can go through and index the site but I really need the results >> added >> > >> > to Solr after the crawl. Does anyone have any idea on how to make that >> > happen or what I'm doing wrong? These errors are being thrown fro >> Hadoop >> > which I am not using at all. >> > >> > $ bin/nutch crawl urls -dir crawl -threads 10 -depth 100 -topN 50 >> > -solrindex >> > ht >> > tp://localhost:8983/solr >> > crawl started in: crawl >> > rootUrlDir = http://localhost:8983/solr >> > threads = 10 >> > depth = 100 >> > indexer=lucene >> > topN = 50 >> > Injector: starting at 2010-12-20 15:23:25 >> > Injector: crawlDb: crawl/crawldb >> > Injector: urlDir: http://localhost:8983/solr >> > Injector: Converting injected urls to crawl db entries. >> > Exception in thread "main" java.io.IOException: No FileSystem for >> scheme: >> > http >> > at >> > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375 >> > ) >> > at >> org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66) >> > at >> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390) >> > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196) >> > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175) >> > at >> > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.j >> > ava:169) >> > at >> > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.ja >> > va:201) >> > at >> > org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) >> > >> > at >> > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:7 >> > 81) >> > at >> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) >> > >> > at >> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249) >> > at org.apache.nutch.crawl.Injector.inject(Injector.java:217) >> > at org.apache.nutch.crawl.Crawl.main(Crawl.java:124) >> > >> > >> > ------------------------------ >> > View message @ >> > >> http://lucene.472066.n3.nabble.com/Nutch-and-Solr-integration-tp2122347p2122347.html >> > To start a new topic under Solr - User, email >> > ml-node+472068-1941297125-146...@n3.nabble.com<ml-node%2b472068-1941297125-146...@n3.nabble.com> >> <ml-node%2b472068-1941297125-146...@n3.nabble.com<ml-node%252b472068-1941297125-146...@n3.nabble.com> >> > >> > To unsubscribe from Solr - User, click here< >> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472068&code=YW51cmFnLml0LmpvbGx5QGdtYWlsLmNvbXw0NzIwNjh8LTIwOTgzNDQxOTY= >> >. >> > >> > >> >> >> >> -- >> Kumar Anurag >> >> >> ----- >> Kumar Anurag >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Nutch-and-Solr-integration-tp2122347p2122623.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > >