> : > > Ok, so it wouldn't be possible to have a smaller, faster authoritative > : > > shard for near-real-time updates while keeping the entire dataset in a > : > > second shard which is updates less frequently? > > I believe Otis's point is that many people use distributed search across > shards where some are large and mostly static and one is small and > frequently updated with new docs in order to get some performance > advantages out of hte long cache lifes on the larger shard(s) ... but this > typically works best when you only "add" new docs, and don't modify old > ones (or only modify docs added very recently so they're always in the > small shard) while the bigger shards are treated as "archives" that don't > change. > > To be deterministic you can't have the same uniqueKey in multiple shards.
Hmm, partitioning by document has a lot of merit, but having this be (configurably) deterministic would seem to enable some interesting features, such as simple 'tagging' by partitioning by document fields. For example, you could have a large essentially read-only index of documents and a separate small index for tags. To tag a document, you would create (or update) a document in the tag index containing the uniqueKey from the main index as well as a multivalued tag field, and whenever you search, you fire off a distributed search across the two shards, but pulling the fields from the main index (eg /solr/select?fq=tag1&shards=main_index/path,tag_index/path&q=*:*). My specific use case is a bit more involved, but if there were either some way to deterministically pick the shard source *or* to dynamically (additively) merge the multiple docs sharing the same uniqueKey from separate shards, it would be quite helpful. The later would provide the general case functionality to have partial document updates, except even more powerful. However, I could get by with just the former - using the main index for all scoring but being able to augment documents for filtering. I'm not a solr expert by any means, so if there is another recommended way to achieve that functionality, I'd love some guidance. Or, if this is just a rare case, I guess it'd be time for me to roll up my sleeves and hack up some solr code. Making QueryComponent configurably deterministic would suffice (eg a "shard.primary=main_index/path" parameter, perhaps? or even just treating the shards parameter as an ordered list with the primary first?). Adding field merging would likely be... more involved though. Thanks in advance for any advice! -pete