Thank you very much for your replies,

Yes Otis one possibility is to copy my data do HDFS and then apply a Map
function
to create the intermediate indexes across the cluster using SOLR java
library in HDFS.

I have some doubts concerning this solution:

                                 1 - The intermediate indexes created realy
need to be merged?
                                       I mean is there any mechanis in SOLR
CLOUD to easily
                                       combine those intermediate indexes
and serve them as if they were a "whole index", in a
                                       distributed fashion?

                                  2-  Can I serve these diferent index with
SOLR or SOLR CLOUD directly in HDFS?
                                       Google says no :), so maybe I need
to copy the indexes to a local file system
                                       and point Solr at it.

Timothy thak you for your tips. I am looking at Pig.
CloudSolrServer seams an interesting piece of the architecture, especially
to discover
Solr endpoints and then possibly replicate my index, but I was wondering if
I need to implement that
or if Solr will take care of that for me. Maybe I just didn't get your tip
due to my newbie knowledge in Solr.

I am sorry if I am confusing some concepts or not being very precise in my
words.

Jack thank you for sharing DataSax solution, I will definetey take a look
since it's free :). But anyway the objective
of this project is for me to learn Solr and Hadoop. :)

Thank you,
Rui Vaz

Reply via email to