On 6/7/2018 6:41 AM, Greenhorn Techie wrote:
As HDFS has got its own replication mechanism, with a HDFS replication
factor of 3, and then SolrCloud replication factor of 3, does that mean
each document will probably have around 9 copies replicated underneath of
HDFS? If so, is there a way to configure HDFS or Solr such that only three
copies are maintained overall?

Yes, that is exactly what happens.

SolrCloud replication assumes that each of its replicas is a completely independent index.  I am not aware of anything in Solr's HDFS support that can use one HDFS index directory for multiple replicas.  At the most basic level, a Solr index is a Lucene index.  Lucene goes to great lengths to make sure that an index *CANNOT* be used in more than one place.

Perhaps somebody who is more familiar with HDFSDirectoryFactory can offer you a solution.  But as far as I know, there isn't one.

Thanks,
Shawn

Reply via email to