On 01/04/2015 02:22 AM, Jack Krupansky wrote:
The reality doesn't seem to be there today. 50 to 100 million documents, yes, but beyond that takes some kind of "heroic" effort, whether a much beefier box, very careful and limited data modeling or limiting of query capabilities or tolerance of higher latency, expert tuning, etc.
I disagree. On the scale, at least. Up until 500M Solr performs "well" (read: well enough considering the scale) in a single shard on a single box of commodity hardware. Without any tuning or heroic efforts. Sure, some queries aren't as snappy as you'd like, and sure, indexing and querying at the same time will be somewhat unpleasant, but it will work, and it will work well enough.
Will it work for thousands of concurrent users? Of course not. Anyone who is after that sort of thing won't find themselves in this scenario -- they will throw hardware at the problem.
There is something to be said for making sharding less painful. It would be nice if, for instance, Solr would automagically create a new shard once some magic number was reached (2B at the latest, I guess). But then that'll break some query features ... :-(
The reason we're using single large instances (sometimes on beefy hardware) is that SolrCloud is a pain. Not just from an administrative point of view (though that seems to be getting better, kudos for that!), but mostly because some queries cannot be executed with distributed=true. Our users, at least, prefer a slow query over an impossible query.
Actually, this 2B limit is a good thing. It'll help me convince $management to donate some of our time to Solr :-)
- Bram