Thank you for taking the time and explaining this. hasmik
On 2022/05/18 16:56:29 Hasmik Sarkezians wrote: > Thanks for the reply. > > It doesn't matter to me which shard the document ends up in, just matters > how many shards the document ends up with: > > And seems like I wouldn't have control over that as the number of shards > grows. > > thanks, > hasmik > > > > On Wed, May 18, 2022 at 11:38 AM Shawn Heisey <[email protected]> wrote: > > > On 5/18/22 08:42, Hasmik Sarkezians wrote: > > > Have a question about shard splitting and compositeId usage. We are > > > starting a solr collection with X number of shards for our multi-tenant > > > application. We are assuming that the number of shards will increase over > > > time as the number of customers grows as well as the customer data. > > > > > > We are thinking of using the <customerId>/num!docId format to specify > > > multiple shards for my tenants depending on the number of records that we > > > will index. We will start with 4 shards and then my assumption is that we > > > use the shard split to add more shards to the collection. > > > > > > customer size X = 1 shard and as such the compositeId would be > > > customer1!docId > > > customer size 5*X = 2 shards and as such the compositeId would be > > > customer2/1!docId > > > > > > And now if I split the shards and the number of shards becomes 5, 6, 7, 8 > > > what happens to the data? The point is I don't want the customer2 endup > > in > > > 4 shards when we get to have 8 shards. If someone can shed some light > > here > > > I would appreciate it. > > > > I wonder if you have a good understanding of how a compositeId works. > > > > The prefix does not directly dictate what shard a document will end up > > in. It determines how many bits of the full 32-bit ID hash will be > > computed from the prefix and how many from the rest of the ID. > > > > > > https://solr.apache.org/guide/8_11/shards-and-indexing-data-in-solrcloud.html#document-routing > > < https://solr.apache.org/guide/8_11/shards-and-indexing-data-in-solrcloud.html#document-routing > > > > > Something not stated there is how many bits are used if the number is > > not specified. Looking at the code, the default appears to be 16 if the > > number of parts in the ID is 2, and 8 if the number of parts in the ID > > is 3. I don't think it supports more than 3 parts. > > > > When you split a shard, the hash range for the shard will be split, and > > the range for the new shards will be smaller than any other shards that > > were not split. So it may not be completely predictable which shards a > > composite ID will be stored in when you split them. If you split ALL > > shards in half, then a prefix that limited the number of shards to 2 > > could result in those documents being split across 4 shards, but > > depending on how many documents there are with that prefix and EXACTLY > > how the hashes end up being divided, it could be as low as 2 shards and > > as high as 4. If there are a lot of documents with that prefix, chances > > are that it would be 4 shards. > > > > If you want explicit control over which shard a document ends up in, you > > cannot use compositeId. You'll have to use the implicit router and > > designate a field where the name of the shard will go. I don't think > > splitting shards is possible with the implicit router. > > > > Thanks, > > Shawn > > > > > -- > > Hasmik Sarkezians > > VP, Applications > > M: > > O: > > E: [email protected] > > 805 Broadway Street, Suite 900 > Vancouver, WA 98660 > > www.zoominfo.com > > > > > [image: Start Learning with Zoominfo!] > < https://signatures.zoominfo.com/uc/6228bfdaf1c5381fbaa4708b/c_604aad8859ca88009e007950/b_607da8e5f7d89f0025e881fa > >
