I’m running Nutch 2.2.1 on top of a Hadoop 1.7 cluster and I’m using Gora to store the crawled data on a remote Cassandra 2.0.4 cluster.
I wish to setup a Solrcloud cluster and index the crawled data on it. Can I do that by integrating Nutch and Solr? The tutorial in the website tells how to integrate when the crawl data is stored on the filesystem and when Solr is running locally. My situation is this: 1. Crawled data is on a remote Cassandra cluster. 2. The Solr cloud will again be remote. All nodes are, however, in the same data centre and just two racks apart. I’m running everything on the private IP within the datacenter and nothing goes to the public internet. -- Manikandan Saravanan Architect - Technology TheSocialPeople

