Re: Solr limitations

Lance Norskog Wed, 10 Jul 2013 12:40:03 -0700

Also, total index file size. At 200-300gb managing an index becomes a pain.


Lance

On 07/08/2013 07:28 AM, Jack Krupansky wrote:

Other that the per-node/per-collection limit of 2 billion documentsper Lucene index, most of the limits of Solr are performance-basedlimits - Solr can handle it, but the performance may not beacceptable. Dynamic fields are a great example. Nothing prevents youfrom creating a document with, say, 50,000 dynamic fields, but you arelikely to find the performance less than acceptable. Or facets. Sure,Solr will let you have 5,000 faceted fields, but the performance islikely to be... you get the picture.
What is acceptable performance? That's for you to decide.
What will the performance of 5,000 dynamic fields or 500 facetedfields or 500 million documents on a node be? It all depends on yourdata, especially the cardinality (unique values) of each individualfield.
How can you determine the performance? Only one way: Proof of concept.You need to do your own proof of concept implementation, with your ownrepresentative data, with your own representative data model, withyour own representative hardware, with your own representative clientsoftware, with your own representative user query load. That testingwill give you all the answers you need.
There are are no magic answers. Don't believe any magic spreadsheet ormagic wizard. Flip a coin whether they will work for your situation.
Some simple, common sense limits:

1. No more than 50 to 100 million documents per node.
2. No more than 250 fields per document.
3. No more than 250K characters per document.
4. No more than 25 faceted fields.
5. No more than 32 nodes in your SolrCloud cluster.
6. Don't return more than 250 results on a query.
None of those is a hard limit, but don't go beyond them unless yourProof of Concept testing proves that performance is acceptable foryour situation.
Start with a simple 4-node, 2-shard, 2-replica cluster for preliminarytests and then scale as needed.
Dynamic and multivalued fields? Try to stay away from them - exceptsfor the simplest cases, they are usually an indicator of a weak datamodel. Sure, it's fine to store a relatively small number of values ina multivalued field (say, dozens of values), but be aware that youcan't directly access individual values, you can't tell which wasmatched on a query, and you can't coordinate values between multiplemultivalued fields. Except for very simple cases, multivalued fieldsshould be flattened into multiple documents with a parent ID.
Since you brought up the topic of dynamic fields, I am curious how yougot the impression that they were a good technique to use as astarting point. They're fine for prototyping and hacking, and finewhen used in moderation, but not when used to excess. The whole pointof Solr is searching and searching is optimized within fields, notacross fields, so having lots of dynamic fields is counter to theprimary strengths of Lucene and Solr. And... schemas with lots ofdynamic fields tend to be difficult to maintain. For example, if youwanted to ask a support question here, one of the first things we wantto know is what your schema looks like, but with lots of dynamicfields it is not possible to have a simple discussion of what yourschema looks like.
Sure, there is something called "schemaless design" (and Solr supportsthat in 4.4), but that's very different from heavy reliance on dynamicfields in the traditional sense. Schemaless design is A-OK, but usingdynamic fields for "arrays" of data in a single document is a poormatch for the search features of Solr (e.g., Edismax searching acrossmultiple fields.)
One other tidbit: Although Solr does not enforce naming conventionsfor field names, and you can put special characters in them, there areplenty of features in Solr, such as the common "fl" parameter, wherefield names are expected to adhere to Java naming rules. When peoplestart "going wild" with dynamic fields, it is common that they start"going wild" with their names as well, using spaces, colons, slashes,etc. that cannot be parsed in the "fl" and "qf" parameters, forexample. Please don't go there!
In short, put up a small cluster and start doing a Proof of Conceptcluster. Stay within my suggested guidelines and you should do okay.
-- Jack Krupansky

-----Original Message----- From: Marcelo Elias Del Valle
Sent: Monday, July 08, 2013 9:46 AM
To: solr-user@lucene.apache.org
Subject: Solr limitations

Hello everyone,

   I am trying to search information about possible solr limitations I
should consider in my architecture. Things like max number of dynamic
fields, max number o documents in SolrCloud, etc.
   Does anyone know where I can find this info?

Best regards,

Re: Solr limitations

Reply via email to