Thank you for taking the time and explaining this.

hasmik

On 2022/05/18 16:56:29 Hasmik Sarkezians wrote:
> Thanks for the reply.
>
> It doesn't matter to me which shard the document ends up in, just matters
> how many shards the document ends up with:
>
> And seems like I wouldn't have control over that as the number of shards
> grows.
>
> thanks,
> hasmik
>
>
>
> On Wed, May 18, 2022 at 11:38 AM Shawn Heisey <[email protected]> wrote:
>
> > On 5/18/22 08:42, Hasmik Sarkezians wrote:
> > > Have a question about shard splitting and compositeId usage. We are
> > > starting a solr collection with X number of shards for our
multi-tenant
> > > application. We are assuming that the number of shards will increase
over
> > > time as the number of customers grows as well as the customer data.
> > >
> > > We are thinking of using the <customerId>/num!docId format to specify
> > > multiple shards for my tenants depending on the number of records
that we
> > > will index. We will start with 4 shards and then my assumption is
that we
> > > use the shard split to add more shards to the collection.
> > >
> > > customer size X = 1 shard and as such the compositeId would be
> > > customer1!docId
> > > customer size 5*X = 2 shards and as such the compositeId would be
> > > customer2/1!docId
> > >
> > > And now if I split the shards and the number of shards becomes 5, 6,
7, 8
> > > what happens to the data? The point is I don't want the customer2
endup
> > in
> > > 4 shards when we get to have 8 shards. If someone can shed some light
> > here
> > > I would appreciate it.
> >
> > I wonder if you have a good understanding of how a compositeId works.
> >
> > The prefix does not directly dictate what shard a document will end up
> > in.  It determines how many bits of the full 32-bit ID hash will be
> > computed from the prefix and how many from the rest of the ID.
> >
> >
> >
https://solr.apache.org/guide/8_11/shards-and-indexing-data-in-solrcloud.html#document-routing
> > <
https://solr.apache.org/guide/8_11/shards-and-indexing-data-in-solrcloud.html#document-routing
>
> >
> > Something not stated there is how many bits are used if the number is
> > not specified.  Looking at the code, the default appears to be 16 if the
> > number of parts in the ID is 2, and 8 if the number of parts in the ID
> > is 3.  I don't think it supports more than 3 parts.
> >
> > When you split a shard, the hash range for the shard will be split, and
> > the range for the new shards will be smaller than any other shards that
> > were not split.  So it may not be completely predictable which shards a
> > composite ID will be stored in when you split them.  If you split ALL
> > shards in half, then a prefix that limited the number of shards to 2
> > could result in those documents being split across 4 shards, but
> > depending on how many documents there are with that prefix and EXACTLY
> > how the hashes end up being divided, it could be as low as 2 shards and
> > as high as 4.  If there are a lot of documents with that prefix, chances
> > are that it would be 4 shards.
> >
> > If you want explicit control over which shard a document ends up in, you
> > cannot use compositeId.  You'll have to use the implicit router and
> > designate a field where the name of the shard will go.  I don't think
> > splitting shards is possible with the implicit router.
> >
> > Thanks,
> > Shawn
> >
>
>
> --
>
> Hasmik Sarkezians
>
> VP, Applications
>
> M:
>
> O:
>
> E: [email protected]
>
> 805 Broadway Street, Suite 900
> Vancouver, WA 98660
>
> www.zoominfo.com
>
>
>
>
> [image: Start Learning with Zoominfo!]
> <
https://signatures.zoominfo.com/uc/6228bfdaf1c5381fbaa4708b/c_604aad8859ca88009e007950/b_607da8e5f7d89f0025e881fa
>
>

Reply via email to