bq: - the documents are organized in "shards" according to date (integer) and language (a possibly extensible discrete set)
bq: - the indexes are disjunct OK, I'm having a hard time getting my head around these two statements. If the indexes are disjunct in the sense that you only search one at a time, then they are different "collections" in SolrCloud jargon. If, on the other hand, these are a big collection and you want to search them all with a single query, I suggest that in SolrCloud land you don't want them to be discrete shards. My reasoning here is that let's say you have a bunch of documents for October, 2014 in Spanish. By putting these all on a single shard, your queries all have to be serviced by that one shard. You don't get any parallelism. If it really does make sense in your case to route all the doc to a single shard, then Michael's comment is spot-on use compositeId router. Best, Erick On Mon, Nov 10, 2014 at 11:50 AM, Michael Della Bitta <michael.della.bi...@appinions.com> wrote: > Hi Michal, > > Is there a particular reason to shard your collections like that? If it was > mainly for ease of operations, I'd consider just using CompositeId to > prevent specific types of queries hotspotting particular nodes. > > If your ingest rate is fast, you might also consider making each > "collection" an alias that points to many actual collections, and > periodically closing off a collection and starting a new one. This prevents > cache churn and the impact of large merges. > > Michael > > > > On 11/10/14 08:03, Michal Krajňanský wrote: >> >> Hi All, >> >> I have been working on a project that has long employed Lucene indexer. >> >> Currently, the system implements a proprietary document routing and index >> plugging/unplugging on top of the Lucene and of course contains a great >> body of indexes. Recently an idea came up to migrate from Lucene to >> Solrcloud, which appears to be more powerfull that our proprietary system. >> >> Could you suggest the best way to seamlessly migrate the system to use >> Solrcloud, when the reindexing is not an option? >> >> - all the existing indexes represent a single collection in terms of >> Solrcloud >> - the documents are organized in "shards" according to date (integer) and >> language (a possibly extensible discrete set) >> - the indexes are disjunct >> >> I have been able to convert the existing indexes to the newest Lucene >> version and plug them individually into the Solrcloud. However, there is >> the question of routing, sharding etc. >> >> Any insight appreciated. >> >> Best, >> >> >> Michal Krajnansky >> >