In simple words:

HDFS is good for file-oriented replication. Solr is good for index replication.

Consequently, if atomic file update operations of an application (like Solr) 
are not atomic on a file level, HDFS is not adequate - like for Solr with live 
index updates. Running Solr on HDFS (as a file system) will pose limitations 
due to HDFS properties. Indexing, however, still won't use Hadoop.

If you produce indexes and distribute them as finalized, read-only structures 
(e.g., through Hadoop jobs), HDFS is fine. Solr does not need to be much aware 
of HDFS.

The third one in the picture is records-based replication to be handled by 
Hbase, Cassandra or Zookeeper, depending on requirements.

Cheers,
Jürgen

Reply via email to