On 3/2/2018 11:43 AM, Stephen Lewis wrote: > I'm wondering what information you may be able to provide on performance > implications of implicit routing VS composite ID routing. In particular, > I'm curious what the horizontal scaling behavior may be of implicit routing > or composite ID routing with and without the "/<bits>" param appended on.
The hash calculations should probably introduce so little overhead that you'd never notice it. I once implemented a hash algorithm using two hash classes built into Java. I'm pretty sure that it was NOT a fast implementation ... and it could calculate over a million hashes per second on input strings of about 20 characters. The hash algorithm used by CompositeId (one of the MurmurHash implementations) is supposed to be one of the fastest algorithms in the world. Unless your uniqueId field values are extremely huge, I really doubt that hash calculation is a significant source of overhead. The use of implicit doesn't automatically mean there's no overhead for routing. The implicit router can still redirect documents to different shards, it just does it explicitly, usually with a shard name in a particular field, rather than by hash calculation. > A relatively simple assessment I've done belowleads me to believe the > following is likely the case: if we have S shards and B as our "/bits" > param, then resource usage would Big O scale as follows (note: Previously > I've received the advice that any shard should be capped at a max of 120M > documents, which is where the cap on docs/shard-key comes from) Query performance should be about the same for any routing type. It does look like when you use compositeId and actually implement shard keys, you can limit queries to those shards, but a *general* query is going to hit all shards. If your query rate is very low (or shards are distributed across a lot of hardware that has significant spare CPU capacity) performance isn't going to be dramatically different for a query that hits 2 shards versus one that hits 6 shards. If your query rate is high, then more shards probably will be noticeably slower than fewer shards. For the maximum docs to allow per shard: It depends. For some indexes and use cases, a million documents per shard might be way too big. For others, 500 million per shard might have incredible performance. There are no hard rules about this. It's entirely dependent on what you're actually doing. Thanks, Shawn