Thanks Shawn for your response. Based on that 1) Can you please direct me where I can get more information about cold shard vs hot shard?
2) That 10GB number assumes there's no other software on the machine, like a database server or a webserver. Yes the machine is dedicated for Solr 3) How much index data is on the machine? I have 3 collections 2 for testing (so the aggregate of both of them does not exceed 1M document) and the main collection that I am querying now which contains around 69M. I have distributed all my collections into 2 shards each with 2 replicas. The consumption on the hard disk is about 40GB. 4) A memory size of 14GB would be unusual for a physical machine, and makes me wonder if you're using virtual machines Yes I am using virtual machine as using a bare metal will be difficult in my case as all of our data center is on the cloud. I can increase its capacity though. While testing some edge cases on Solr, I realized on Solr admin that the memory sometimes reaches to its limit (14GB RAM, and 4GB JVM) 5) Just to confirm, I have combined the lessons from http://www.slideshare.net/lucidworks/high-performance-solr-and-jvm-tuning-strategies-used-for-map-quests-search-ahead-darren-spehr AND https://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache to come up with the following settings FilterCache <filterCache class="solr.FastLRUCache" size="16384" initialSize="4096" autowarmCount="4096"/> DocummentCahce <documentCache class="solr.LRUCache" size="16384" initialSize="16384" autowarmCount="0"/> NewSearcher and FirsSearcher <listener event="newSearcher" class="solr.QuerySenderListener"> <arr name="queries"> <lst><str name="q">*</str><str name="sort">score desc id desc</str></lst> </arr> </listener> <listener event="firstSearcher" class="solr.QuerySenderListener"> <arr name="queries"> <lst> <str name="q">*</str> <str name="sort">score desc id desc</str> </lst> <!-- seed common facets and filter queries --> <lst> <str name="q">*</str> <str name="facet.field">category</str> </lst> </arr> </listener> Will this be using more cache in Solr and prepoupulate it? Regards, Salman On Sat, Oct 10, 2015 at 5:10 AM, Shawn Heisey <apa...@elyograg.org> wrote: > On 10/9/2015 1:39 PM, Salman Ansari wrote: > > > INFO - 2015-10-09 18:46:17.953; [c:sabr102 s:shard1 r:core_node2 > > x:sabr102_shard1_replica1] org.apache.solr.core.SolrCore; > > [sabr102_shard1_replica1] webapp=/solr path=/select > > params={start=0&q=(content_text:Football)&rows=10} hits=24408 status=0 > > QTime=3391 > > Over 3 seconds for a query like this definitely sounds like there's a > problem. > > > INFO - 2015-10-09 18:47:04.727; [c:sabr102 s:shard1 r:core_node2 > > x:sabr102_shard1_replica1] org.apache.solr.core.SolrCore; > > [sabr102_shard1_replica1] webapp=/solr path=/select > > params={start=1000&q=(content_text:Football)&rows=10} hits=24408 status=0 > > QTime=21569 > > Adding a start value of 1000 increases QTime by a factor of more than > 6? Even more evidence of a performance problem. > > For comparison purposes, I did a couple of simple queries on a large > index of mine. Here are the response headers showing the QTime value > and all the parameters (except my shard URLs) for each query: > > "responseHeader": { > "status": 0, > "QTime": 1253, > "params": { > "df": "catchall", > "spellcheck.maxCollationEvaluations": "2", > "spellcheck.dictionary": "default", > "echoParams": "all", > "spellcheck.maxCollations": "5", > "q.op": "AND", > "shards.info": "true", > "spellcheck.maxCollationTries": "2", > "rows": "70", > "spellcheck.extendedResults": "false", > "shards": "REDACTED SEVEN SHARD URLS", > "shards.tolerant": "true", > "spellcheck.onlyMorePopular": "false", > "facet.method": "enum", > "spellcheck.count": "9", > "q": "catchall:carriage", > "indent": "true", > "wt": "json", > "_": "1444420900498" > } > > > "responseHeader": { > "status": 0, > "QTime": 176, > "params": { > "df": "catchall", > "spellcheck.maxCollationEvaluations": "2", > "spellcheck.dictionary": "default", > "echoParams": "all", > "spellcheck.maxCollations": "5", > "q.op": "AND", > "shards.info": "true", > "spellcheck.maxCollationTries": "2", > "rows": "70", > "spellcheck.extendedResults": "false", > "shards": "REDACTED SEVEN SHARD URLS", > "shards.tolerant": "true", > "spellcheck.onlyMorePopular": "false", > "facet.method": "enum", > "spellcheck.count": "9", > "q": "catchall:wibble", > "indent": "true", > "wt": "json", > "_": "1444421001024" > } > > The first query had a numFound of 120906, the second a numFound of 32. > When I re-executed the first query (the one with a QTime of 1253) so it > would use the Solr caches, QTime was 17. > > This is an index that has six cold shards with 38.8 million documents > each and a hot shard with 1.5 million documents. Total document count > for the index is over 234 million documents, and the total size of the > index is about 272GB. Each copy of the index has its shards split > between two servers that each have 64GB of RAM, with an 8GB max Java > heap. I do not have enough memory to cache all the index contents in > RAM, but I can get a little less than half of it in the cache -- each > machine has about 56GB of cache available and contains around 135GB of > index data. The index data is stored on a RAID10 array with six SATA > disks, so it's fairly fast, but nowhere near as fast as SSD. > > You've already mentioned the SolrPerformanceProblems wiki page that I > wrote, which is where I would normally send you for more information. > You said that your machine has 14GB of RAM and 4GB is allocated to Solr, > leaving about 10GB for caching. That 10GB number assumes there's no > other software on the machine, like a database server or a webserver. > How much index data is on the machine? You need to count all the Solr > cores. If the "10GB for caching" figure is accurate, then more than > about 20GB of index data means you might need more memory. If it's more > than about 40GB of index data, you definitely need more memory. > > A memory size of 14GB would be unusual for a physical machine, and makes > me wonder if you're using virtual machines. Bare metal is always going > to offer better performance than a VM. Another potential problem with > VMs is that the host system might have its memory oversubscribed -- the > total amount of memory in the host machine might be less than the total > amount of memory allocated to all the running virtual machines. Solr > performance will be terrible if VM memory is oversubscribed. > > Thanks, > Shawn > >