Thanks Erik, Sounds about right.
BTW how long can I keep adding collections i.e. can I keep 5/10 years data like this? Also what do you think of bullet 2) of having collection specific configurations in zookeeper? On Fri, Apr 25, 2014 at 11:44 PM, Erick Erickson <erickerick...@gmail.com>wrote: > So you're talking about 700 or so collections. That should be do-able, > especially as Solr is rapidly evolving to handle more and more > collections and there's two years for that to happen. > > The aging out bit is manual (well, you'd script it I suppose). So > every day there'd be a script that ran and "just knew" the right > collection to change the alias on, there's nothing automatic yet. > > Best, > Erick > > On Fri, Apr 25, 2014 at 9:37 AM, Mukesh Jha <me.mukesh....@gmail.com> > wrote: > > Thanks for quick reply Erik, > > > > I want to keep my collections till I run out of hardware, which is at > least > > a couple of years worth data. > > I'd like to know more on ageing out aliases, did a quick search but > didn't > > find much. > > > > > > On Fri, Apr 25, 2014 at 9:45 PM, Erick Erickson <erickerick...@gmail.com > >wrote: > > > >> Hmmm, tell us a little more about your use-case. In particular, how > >> long do you need to keep the data around? Days? Months? Years? > >> > >> Because if you only need to keep the data for a specified period, you > >> can use the collection aliasing process to age-out collections and > >> keep the number of cores from growing too large. > >> > >> Best, > >> Erick > >> > >> On Fri, Apr 25, 2014 at 6:49 AM, Mukesh Jha <me.mukesh....@gmail.com> > >> wrote: > >> > Hi Experts, > >> > > >> > I need to divide my indexes based on hour/day with each index having > >> ~50-80 > >> > GB data & ~50-80 mill docs, so I'm planning to create daily collection > >> with > >> > names e.g. *sample_colledction_yyyy_mm_dd_hh.* > >> > I'll also create an alias *sample_collection* and update it whenever I > >> will > >> > create a new collection so that the entire data set is searchable. > >> > > >> > I've a couple of question on the above design > >> > 1) How far can it scale? As my collections will increase (so will the > >> > shards & replicas) do we have a breaking point when adding > more/searching > >> > will become an issue? > >> > 2) As my cluster will grow because of huge number of collections the > >> > clusterstate.json file present in zookeeper will grow too, won't this > be > >> a > >> > limiting factor? If so instead of storing all this info in one > >> > clusterstate.json file shouldn't Solr save cluster specific details in > >> this > >> > file & have collection specific config files present on zookeeper? > >> > 3) How can I easily manage all these collections? Do we have Java > >> Coreadmin > >> > API's available. I cannot find much documented on it. > >> > > >> > -- > >> > Txz, > >> > > >> > *Mukesh Jha <me.mukesh....@gmail.com>* > >> > > > > > > > > -- > > > > > > Thanks & Regards, > > > > *Mukesh Jha <me.mukesh....@gmail.com>* > -- Thanks & Regards, *Mukesh Jha <me.mukesh....@gmail.com>*