Re: [Nutch] and Solr integration

Anurag Mon, 20 Dec 2010 13:22:05 -0800

why are using solrindex in the argument.? It is used when we need to index
the crawled data in Solr
For more read http://wiki.apache.org/nutch/NutchTutorial .


Also for nutch-solr integration this is very useful blog
http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/
I integrated nutch and solr and it works well.

Thanks

On Tue, Dec 21, 2010 at 1:57 AM, Adam Estrada-2 [via Lucene] <
ml-node+2122347-622655030-146...@n3.nabble.com<ml-node%2b2122347-622655030-146...@n3.nabble.com>
> wrote:

> All,
>
> I have a couple websites that I need to crawl and the following command
> line
> used to work I think. Solr is up and running and everything is fine there
> and I can go through and index the site but I really need the results added
>
> to Solr after the crawl. Does anyone have any idea on how to make that
> happen or what I'm doing wrong?  These errors are being thrown fro Hadoop
> which I am not using at all.
>
> $ bin/nutch crawl urls -dir crawl -threads 10 -depth 100 -topN 50
> -solrindex
> ht
> tp://localhost:8983/solr
> crawl started in: crawl
> rootUrlDir = http://localhost:8983/solr
> threads = 10
> depth = 100
> indexer=lucene
> topN = 50
> Injector: starting at 2010-12-20 15:23:25
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: http://localhost:8983/solr
> Injector: Converting injected urls to crawl db entries.
> Exception in thread "main" java.io.IOException: No FileSystem for scheme:
> http
>         at
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375
> )
>         at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
>         at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
>         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
>         at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
>         at
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.j
> ava:169)
>         at
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.ja
> va:201)
>         at
> org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
>
>         at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:7
> 81)
>         at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
>
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
>         at org.apache.nutch.crawl.Injector.inject(Injector.java:217)
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:124)
>
>
> ------------------------------
>  View message @
> http://lucene.472066.n3.nabble.com/Nutch-and-Solr-integration-tp2122347p2122347.html
> To start a new topic under Solr - User, email
> ml-node+472068-1941297125-146...@n3.nabble.com<ml-node%2b472068-1941297125-146...@n3.nabble.com>
> To unsubscribe from Solr - User, click 
> here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472068&code=YW51cmFnLml0LmpvbGx5QGdtYWlsLmNvbXw0NzIwNjh8LTIwOTgzNDQxOTY=>.
>
>



-- 
Kumar Anurag


-----
Kumar Anurag

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Nutch-and-Solr-integration-tp2122347p2122623.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: [Nutch] and Solr integration

Reply via email to