Re: How large is your solr index?

Alexandre Rafalovitch Tue, 30 Dec 2014 08:52:32 -0800

I bet that while there are no specific numbers, there are indicators
that everybody - who knows what they are doing - look at to decide
which particular aspect of configuration is hurting most.


So perhaps a good article would be not so much the concrete numbers
but the indicators to check. I think I saw people throwing around the
cache utilization as one of them. Any others?

Regards,
   Alex.
----
Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 30 December 2014 at 11:24, Jack Krupansky <jack.krupan...@gmail.com> wrote:
> If people are so gung-ho to go down the "lots on endless pain" rabbit-hole
> route by heavily under-configuring their clusters, I guess that's their
> choice, but I would strongly advise against it. Sure, a small "the few and
> the proud" warhorses can proudly proclaim how they "did it", and a small
> number of elite young Turks can probably do it as well, but it's quite the
> fool's errand for average developers to try to replicate the "heroic
> efforts" of the few.
>
> Rather, "average developers" are well-advised to simply seek "the easy
> path" and cease and desist from trying to configure Solr clusters with a
> billion documents or more per node, or even 500 million for that matter.
> "Just say no" to any demands that you run Solr on so-called "fat nodes".
>
> Go with relatively commodity hardware (e.g., 16-32 GB per node), even if
> that means you need a lot more more nodes. Or virtualize fat nodes into a
> bunch of skinny nodes if that's all you have to work with.
>
> My bottom line advice: use 100 million documents per node as your baseline
> target, and make sure your index fits entirely in memory, with a proof of
> concept implementation to validate whether the sweet spot for your
> particular data, data model, and application access patterns may be well
> above or even below that.
>
> Yes, indeed, sing praises for heroes, but don't kill yourself and drag down
> others trying to be one yourself.
>
> </sermon>
>
> -- Jack Krupansky
>
>
> -- Jack Krupansky
>
> On Tue, Dec 30, 2014 at 11:03 AM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> bq: I did at some point try to write a long blog entry on Solr
>> hardware and setup for non-small corpuses, but have to give up:
>>
>> Man, this makes me laugh! Oh the memories!
>>
>> A common question from sales, quite a reasonable one at that; "can we
>> have a checklist that we can use to give clients an idea how much
>> hardware to buy?". And do note that sales folks are talking to clients
>> with all different types and sizes.
>>
>> I sat down and tried to do this... three separate times. Pretty soon
>> I'd get to the point of realizing that the doc was worthless exactly
>> because of all the "if this then that" phrases. I guess I can take
>> some comfort from the fact that it only took me about an hour the
>> third time to remember that it was hopeless, and after that i
>> remembered to not even try.
>>
>> I think that it would be _extremely_ helpful to have a bunch of "war
>> stories" to reference. In my experience, people dealing with large
>> numbers of documents really are most concerned with whether what
>> they're doing is _possible_, and are mostly looking to see if someone
>> else has "been there and done that". Of course they'd like all the
>> specificity possible, but there's a lot of comfort in knowing
>> something similar has been done before.
>>
>> Best,
>> Erick
>>
>> On Tue, Dec 30, 2014 at 4:43 AM, Toke Eskildsen <t...@statsbiblioteket.dk>
>> wrote:
>> > Shawn Heisey [apa...@elyograg.org] wrote:
>> >> I believe it would be useful to organize a session at Lucene Revolution,
>> >> possibly more interactive than a straight presentation, where users with
>> >> very large indexes are encouraged to attend.  The point of this session
>> >> would be to exchange war stories, configuration requirements, hardware
>> >> requirements, and observations.
>> >
>> > From the perspective of the conference it might tie up a lot of time: If
>> we were to get down to the configuration level, one session would not be
>> enough. Some sort of pre-conference bar camp might do it? Or maybe even a
>> whole pre-conference day?
>> >
>> > (side-note to the side-note: Living in Europe, going to Lucene/Solr
>> Revolution means spending more time on travel than the actual conference -
>> extending the activities to 3 days would increase the odds of me going next
>> year)
>> >
>> >> Better documentation for extreme scaling is also a possible outcome.
>> >
>> > I did at some point try to write a long blog entry on Solr hardware and
>> setup for non-small corpuses, but have to give up: There were just too many
>> "but if you need to scale X, you might be better off by choosing Y, unless
>> your usage is Z". I think multiple detailed descriptions of setups is a
>> great starting point. If we get enough of them, some pattern will hopefully
>> emerge, although I am afraid that the pattern will be "to get this to work,
>> we had to write custom code".
>> >
>> >> Another idea, not sure if it would be good as an alternate idea or
>> >> supplemental, is a less formal gathering, perhaps over a meal or three.
>> >
>> > Outside of Lucene/Solr Revolution? How would that work geographically?
>> >
>> > - Toke Eskildsen
>>

Re: How large is your solr index?

Reply via email to