That's a laudable goal - to support low-latency queries - including
faceting - for "hundreds of millions" of documents, using Solr "out of the
box" on a random, commodity box selected by IT and just adding a dozen or
two fields to the default schema that are both indexed and stored, without
any "expert" tuning, by an "average" developer. The reality doesn't seem to
be there today. 50 to 100 million documents, yes, but beyond that takes
some kind of "heroic" effort, whether a much beefier box, very careful and
limited data modeling or limiting of query capabilities or tolerance of
higher latency, expert tuning, etc.

The proof is always in the pudding - pick a box, install Solr, setup the
schema, load 20 or 50 or 100 or 250 or 350 million documents, try some
queries with the features you need, and you get what you get.

But I agree that it would be highly desirable to push that 100 million
number up to 350 million or even 500 million ASAP since the pain of
unnecessarily sharding is unnecessarily excessive.

I wonder what changes will have to occur in Lucene, or... what evolution in
commodity hardware will be necessary to get there.

-- Jack Krupansky

On Sat, Jan 3, 2015 at 6:11 PM, Toke Eskildsen <t...@statsbiblioteket.dk>
wrote:

> Erick Erickson [erickerick...@gmail.com] wrote:
> > I can't disagree. You bring up some of the points that make me
> _extremely_
> > reluctant to try to get this in to 5.x though. 6.0 at the earliest I
> should
> > think.
>
> Ignoring the magic 2b number for a moment, I think the overall question is
> whether or not single shards should perform well in the hundreds of
> millions of documents range. The alternative is more shards, but it is
> quite an explicit process to handle shard-juggling. From an end-user
> perspective, the underlying technology matters little: Whatever the choice,
> it should be possible to install "something" on a machine and expect it to
> scale within the hardware limitations without much ado.
>
> - Toke Eskildsen
>

Reply via email to