Alright. So shard splitting and composite routing plays nicely together. Thank you Anshum.
On Wed, Feb 4, 2015 at 11:24 AM, Anshum Gupta <ans...@anshumgupta.net> wrote: > In one line, shard splitting doesn't cater to depend on the routing > mechanism but just the hash range so you could have documents for the same > prefix split up. > > Here's an overview of routing in SolrCloud: > * Happens based on a hash value > * The hash is calculated using the multiple parts of the routing key. In > case of A!B, 16 bits are obtained from murmurhash(A) and the LSB 16 bits of > the routing key are obtained from murmurhash(B). This sends the docs to the > right shard. > * When querying using A!, all shards that contain hashes from the range 16 > bits from murmurhash(A)-0000 to murmurhash(A)-ffff are used. > > When you split a shard, for say range 00000000 - ffffffff , it is split > from the middle (by default) and over multiple split, docs for the same A! > prefix might end up on different shards, but the request routing should > take care of that. > > You can read more about routing here: > https://lucidworks.com/blog/solr-cloud-document-routing/ > http://lucidworks.com/blog/multi-level-composite-id-routing-solrcloud/ > > and shard splitting here: > http://lucidworks.com/blog/shard-splitting-in-solrcloud/ > > > On Wed, Feb 4, 2015 at 12:59 AM, Gili Nachum <gilinac...@gmail.com> wrote: > > > Hi, I'm also interested. When using composite the ID, the _route_ > > information is not kept on the document itself, so to me it looks like > it's > > not possible as the split API > > < > > > https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3 > > > > > doesn't have a relevant parameter to split correctly. > > Could report back once I try it in practice. > > > > On Mon, Nov 10, 2014 at 7:27 PM, Ian Rose <ianr...@fullstory.com> wrote: > > > > > Howdy - > > > > > > We are using composite IDs of the form <user>!<event>. This ensures > that > > > all events for a user are stored in the same shard. > > > > > > I'm assuming from the description of how composite ID routing works, > that > > > if you split a shard the "split point" of the hash range for that shard > > is > > > chosen to maintain the invariant that all documents that share a > routing > > > prefix (before the "!") will still map to the same (new) shard. Is > that > > > accurate? > > > > > > A naive shard-split implementation (e.g. that chose the hash range > split > > > point arbitrarily) could end up with "child" shards that split a > routing > > > prefix. > > > > > > Thanks, > > > Ian > > > > > > > > > -- > Anshum Gupta > http://about.me/anshumgupta >