Thanks for the reply. It doesn't matter to me which shard the document ends up in, just matters how many shards the document ends up with:
And seems like I wouldn't have control over that as the number of shards grows. thanks, hasmik On Wed, May 18, 2022 at 11:38 AM Shawn Heisey <apa...@elyograg.org> wrote: > On 5/18/22 08:42, Hasmik Sarkezians wrote: > > Have a question about shard splitting and compositeId usage. We are > > starting a solr collection with X number of shards for our multi-tenant > > application. We are assuming that the number of shards will increase over > > time as the number of customers grows as well as the customer data. > > > > We are thinking of using the <customerId>/num!docId format to specify > > multiple shards for my tenants depending on the number of records that we > > will index. We will start with 4 shards and then my assumption is that we > > use the shard split to add more shards to the collection. > > > > customer size X = 1 shard and as such the compositeId would be > > customer1!docId > > customer size 5*X = 2 shards and as such the compositeId would be > > customer2/1!docId > > > > And now if I split the shards and the number of shards becomes 5, 6, 7, 8 > > what happens to the data? The point is I don't want the customer2 endup > in > > 4 shards when we get to have 8 shards. If someone can shed some light > here > > I would appreciate it. > > I wonder if you have a good understanding of how a compositeId works. > > The prefix does not directly dictate what shard a document will end up > in. It determines how many bits of the full 32-bit ID hash will be > computed from the prefix and how many from the rest of the ID. > > > https://solr.apache.org/guide/8_11/shards-and-indexing-data-in-solrcloud.html#document-routing > <https://solr.apache.org/guide/8_11/shards-and-indexing-data-in-solrcloud.html#document-routing> > > Something not stated there is how many bits are used if the number is > not specified. Looking at the code, the default appears to be 16 if the > number of parts in the ID is 2, and 8 if the number of parts in the ID > is 3. I don't think it supports more than 3 parts. > > When you split a shard, the hash range for the shard will be split, and > the range for the new shards will be smaller than any other shards that > were not split. So it may not be completely predictable which shards a > composite ID will be stored in when you split them. If you split ALL > shards in half, then a prefix that limited the number of shards to 2 > could result in those documents being split across 4 shards, but > depending on how many documents there are with that prefix and EXACTLY > how the hashes end up being divided, it could be as low as 2 shards and > as high as 4. If there are a lot of documents with that prefix, chances > are that it would be 4 shards. > > If you want explicit control over which shard a document ends up in, you > cannot use compositeId. You'll have to use the implicit router and > designate a field where the name of the shard will go. I don't think > splitting shards is possible with the implicit router. > > Thanks, > Shawn > -- Hasmik Sarkezians VP, Applications M: O: E: hasmik.sarkezi...@zoominfo.com 805 Broadway Street, Suite 900 Vancouver, WA 98660 www.zoominfo.com [image: Start Learning with Zoominfo!] <https://signatures.zoominfo.com/uc/6228bfdaf1c5381fbaa4708b/c_604aad8859ca88009e007950/b_607da8e5f7d89f0025e881fa>