Thanks for the reply.

It doesn't matter to me which shard the document ends up in, just matters
how many shards the document ends up with:

And seems like I wouldn't have control over that as the number of shards
grows.

thanks,
hasmik



On Wed, May 18, 2022 at 11:38 AM Shawn Heisey <apa...@elyograg.org> wrote:

> On 5/18/22 08:42, Hasmik Sarkezians wrote:
> > Have a question about shard splitting and compositeId usage. We are
> > starting a solr collection with X number of shards for our multi-tenant
> > application. We are assuming that the number of shards will increase over
> > time as the number of customers grows as well as the customer data.
> >
> > We are thinking of using the <customerId>/num!docId format to specify
> > multiple shards for my tenants depending on the number of records that we
> > will index. We will start with 4 shards and then my assumption is that we
> > use the shard split to add more shards to the collection.
> >
> > customer size X = 1 shard and as such the compositeId would be
> > customer1!docId
> > customer size 5*X = 2 shards and as such the compositeId would be
> > customer2/1!docId
> >
> > And now if I split the shards and the number of shards becomes 5, 6, 7, 8
> > what happens to the data? The point is I don't want the customer2 endup
> in
> > 4 shards when we get to have 8 shards. If someone can shed some light
> here
> > I would appreciate it.
>
> I wonder if you have a good understanding of how a compositeId works.
>
> The prefix does not directly dictate what shard a document will end up
> in.  It determines how many bits of the full 32-bit ID hash will be
> computed from the prefix and how many from the rest of the ID.
>
>
> https://solr.apache.org/guide/8_11/shards-and-indexing-data-in-solrcloud.html#document-routing
> <https://solr.apache.org/guide/8_11/shards-and-indexing-data-in-solrcloud.html#document-routing>
>
> Something not stated there is how many bits are used if the number is
> not specified.  Looking at the code, the default appears to be 16 if the
> number of parts in the ID is 2, and 8 if the number of parts in the ID
> is 3.  I don't think it supports more than 3 parts.
>
> When you split a shard, the hash range for the shard will be split, and
> the range for the new shards will be smaller than any other shards that
> were not split.  So it may not be completely predictable which shards a
> composite ID will be stored in when you split them.  If you split ALL
> shards in half, then a prefix that limited the number of shards to 2
> could result in those documents being split across 4 shards, but
> depending on how many documents there are with that prefix and EXACTLY
> how the hashes end up being divided, it could be as low as 2 shards and
> as high as 4.  If there are a lot of documents with that prefix, chances
> are that it would be 4 shards.
>
> If you want explicit control over which shard a document ends up in, you
> cannot use compositeId.  You'll have to use the implicit router and
> designate a field where the name of the shard will go.  I don't think
> splitting shards is possible with the implicit router.
>
> Thanks,
> Shawn
>


-- 

Hasmik Sarkezians

VP, Applications

M:

O:

E: hasmik.sarkezi...@zoominfo.com

805 Broadway Street, Suite 900
Vancouver, WA 98660

www.zoominfo.com




[image: Start Learning with Zoominfo!]
<https://signatures.zoominfo.com/uc/6228bfdaf1c5381fbaa4708b/c_604aad8859ca88009e007950/b_607da8e5f7d89f0025e881fa>

Reply via email to