I bet that while there are no specific numbers, there are indicators that everybody - who knows what they are doing - look at to decide which particular aspect of configuration is hurting most.
So perhaps a good article would be not so much the concrete numbers but the indicators to check. I think I saw people throwing around the cache utilization as one of them. Any others? Regards, Alex. ---- Sign up for my Solr resources newsletter at http://www.solr-start.com/ On 30 December 2014 at 11:24, Jack Krupansky <jack.krupan...@gmail.com> wrote: > If people are so gung-ho to go down the "lots on endless pain" rabbit-hole > route by heavily under-configuring their clusters, I guess that's their > choice, but I would strongly advise against it. Sure, a small "the few and > the proud" warhorses can proudly proclaim how they "did it", and a small > number of elite young Turks can probably do it as well, but it's quite the > fool's errand for average developers to try to replicate the "heroic > efforts" of the few. > > Rather, "average developers" are well-advised to simply seek "the easy > path" and cease and desist from trying to configure Solr clusters with a > billion documents or more per node, or even 500 million for that matter. > "Just say no" to any demands that you run Solr on so-called "fat nodes". > > Go with relatively commodity hardware (e.g., 16-32 GB per node), even if > that means you need a lot more more nodes. Or virtualize fat nodes into a > bunch of skinny nodes if that's all you have to work with. > > My bottom line advice: use 100 million documents per node as your baseline > target, and make sure your index fits entirely in memory, with a proof of > concept implementation to validate whether the sweet spot for your > particular data, data model, and application access patterns may be well > above or even below that. > > Yes, indeed, sing praises for heroes, but don't kill yourself and drag down > others trying to be one yourself. > > </sermon> > > -- Jack Krupansky > > > -- Jack Krupansky > > On Tue, Dec 30, 2014 at 11:03 AM, Erick Erickson <erickerick...@gmail.com> > wrote: > >> bq: I did at some point try to write a long blog entry on Solr >> hardware and setup for non-small corpuses, but have to give up: >> >> Man, this makes me laugh! Oh the memories! >> >> A common question from sales, quite a reasonable one at that; "can we >> have a checklist that we can use to give clients an idea how much >> hardware to buy?". And do note that sales folks are talking to clients >> with all different types and sizes. >> >> I sat down and tried to do this... three separate times. Pretty soon >> I'd get to the point of realizing that the doc was worthless exactly >> because of all the "if this then that" phrases. I guess I can take >> some comfort from the fact that it only took me about an hour the >> third time to remember that it was hopeless, and after that i >> remembered to not even try. >> >> I think that it would be _extremely_ helpful to have a bunch of "war >> stories" to reference. In my experience, people dealing with large >> numbers of documents really are most concerned with whether what >> they're doing is _possible_, and are mostly looking to see if someone >> else has "been there and done that". Of course they'd like all the >> specificity possible, but there's a lot of comfort in knowing >> something similar has been done before. >> >> Best, >> Erick >> >> On Tue, Dec 30, 2014 at 4:43 AM, Toke Eskildsen <t...@statsbiblioteket.dk> >> wrote: >> > Shawn Heisey [apa...@elyograg.org] wrote: >> >> I believe it would be useful to organize a session at Lucene Revolution, >> >> possibly more interactive than a straight presentation, where users with >> >> very large indexes are encouraged to attend. The point of this session >> >> would be to exchange war stories, configuration requirements, hardware >> >> requirements, and observations. >> > >> > From the perspective of the conference it might tie up a lot of time: If >> we were to get down to the configuration level, one session would not be >> enough. Some sort of pre-conference bar camp might do it? Or maybe even a >> whole pre-conference day? >> > >> > (side-note to the side-note: Living in Europe, going to Lucene/Solr >> Revolution means spending more time on travel than the actual conference - >> extending the activities to 3 days would increase the odds of me going next >> year) >> > >> >> Better documentation for extreme scaling is also a possible outcome. >> > >> > I did at some point try to write a long blog entry on Solr hardware and >> setup for non-small corpuses, but have to give up: There were just too many >> "but if you need to scale X, you might be better off by choosing Y, unless >> your usage is Z". I think multiple detailed descriptions of setups is a >> great starting point. If we get enough of them, some pattern will hopefully >> emerge, although I am afraid that the pattern will be "to get this to work, >> we had to write custom code". >> > >> >> Another idea, not sure if it would be good as an alternate idea or >> >> supplemental, is a less formal gathering, perhaps over a meal or three. >> > >> > Outside of Lucene/Solr Revolution? How would that work geographically? >> > >> > - Toke Eskildsen >>