The other thing to keep in the back of your mind as you go through this process is that search is addicting to most organizations. Meaning your Solr solution may quickly become a victim of its own success. The queries we tested before going production 5+ months ago and the queries we handle today are very different beasts. We're now dealing with much more complexity because when we started out, the business side didn't have a full appreciation for what was possible. Now that they've seen Solr in action (pun intended), my team can't keep up with all the great ideas our PM's have for how to leverage Solr in many places that were unforeseen during initial planning.
We're entering in our third phase of adoption and are having to increase node count and RAM significantly. Bottom line is to do all the important things Otis and Jack have suggested, but also realize that what you design today may only be valid for 6 months or so. Of course I can't speak to your business situation but we've just accepted that we need to revisit our infrastructure decisions frequently. Admittedly, this is much easier in a cloud like Amazon than if you're buying your own hardware. Cheers, Tim On Tue, Apr 23, 2013 at 8:10 AM, Jack Krupansky <j...@basetechnology.com> wrote: > Another aspect I neglected to mention: Think about distinguishing between > "development", "test", and "production" systems - all separately. Your > development system is whether you try out ideas and experiment - your proof > of concept. Your "test" or "pre-production" system is where you verify that > your ideas are really ready to go - the "test" system should parallel the > production system and approximate real load. And finally your "production" > system is where you don't have the libery to "just try stuff out." > > For real, "cloud" systems it's all about scaling of commodity boxes. Pick a > reasonable size box and then put a reasonable amount of data on that box, > then you can calculate how many boxes you will need for scaling (shards). > And your HA (High Availability) and Query load requirements will drive how > many replicas you will need for each shard. > > -- Jack Krupansky > > -----Original Message----- From: Jack Krupansky > Sent: Tuesday, April 23, 2013 9:54 AM > To: solr-user@lucene.apache.org > Subject: Re: What to test, calculate, measeure for a pre-production version > of SolrCloud? > > > To be clear, there are no solid and reliable prediction rules for Solr - for > the simple reason that there are too many non-linear variables - you need to > stand up a "proof of concept" system, load it with representative data and > execute representative queries and then measure that system. You can then > use those numbers to size your production system. > > I don't want to give you the impression that this notion of "predicting" or > "calculating" the size of a production Solr system is a viable option. Sure, > you can try and maybe you will get lucky and maybe you won't be lucky. Flip > a coin. But what sane manager would want to "plan" production based on > flipping a coin? > > -- Jack Krupansky > > -----Original Message----- From: Furkan KAMACI > Sent: Tuesday, April 23, 2013 5:48 AM > To: solr-user@lucene.apache.org > Subject: What to test, calculate, measeure for a pre-production version of > SolrCloud? > > Hi Folks; > > This week we will make a pre-production version of our system. I've been > askng some questions for a time and I gor really good responses from mail > list. At pre-production and test step: > > * I want to measure how much RAM I should define for my Solr instances, > * I will try to make some predictions about how much disk space I will need > at production step. > * Maybe I will check my answer for that question: which RAID to use (or not > use) etc. > > For that questions I got answers from mail list and I have some > approximations about them. Also I know that it is not easy to answer such > questions and I should test them to get more accurate answers. > My question is that:: > > What do you suggest me at pre-production and test step? > > * i.e. give much more heap size to Solr instances to calculate RAM > * use solrmeter to test qps for your cluster > * use sematext or anything else for performance monitoring etc. > > I need some advices what to test, calculate, measeure etc. Also there was a > question about Codahale metrics and Graphite. You can advice something > about that too. > > PS: I use Solr 4.2.1 for tests but if Solr 4.3 becomes ready (if it is > tagged at repository) I will use it.