On 4/9/2021 4:38 PM, Natarajan, Rajeswari wrote:
Trying to understand how solr is co-locating documents with a prefix using
composite id router scheme.
Created a collection with 2 shards with composite id router. Published 3 docs , 2 docs with prefix
"tenant1!" in the docId field and 1 doc with prefix "tenant2!" in the docId.
Queried the collections with shards=shard1 and shards=shard2 parameter.
Saw that 3 documents are placed in shard1 and on shard2 there are no documents.
Is there a certain threshold number of docs to be present in shard1 ,before
shard2 is considered.
According to https://sematext.com/blog/solrcloud-large-tenants-and-routing/ ,
documents with first level prefix will be routed to one shard. Is it a
possibility to send documents of one tenant to occupy one shard in a collection
in composite id router scheme.
Composite routing like that does not exactly let you choose which shards
will be used.
Here's a relevant quote from the reference guide:
'So "IBM/3!12345" will take 3 bits from the shard key and 29 bits from
the unique doc id, spreading the tenant over 1/8th of the shards in the
collection. Likewise if the num value was 2 it would spread the
documents across 1/4th the number of shards. At query time, you include
the prefix(es) along with the number of bits into your query with the
_route_ parameter (i.e., q=solr&_route_=IBM/3!) to direct queries to
specific shards.'
The part before the ! is hashed as is the part after the ! character.
The hash bits are then combined, and that full hash decides which shard
will get the document.
You can't say "use these specific shards" with that capability. The
tenant part just tells Solr to only use a certain reduced number of
shards, but because it utilizes hashing to figure out which shards to
use, there's never any guarantee that tenant1 will choose different
shards from tenant2. So you cannot use this to accomplish your original
goal of determining the index size of a single tenant within a collection.
Thanks,
Shawn