Thanks Shawn, The filter queries are not complex. Below are the filter queries I’m running for the corresponding schema entry:
q=*:*&fq=PARENT_DOC_ID:100&fq=MODIFY_TS:[1970-01-01T00:00:00Z TO *]&fq=PHY_KEY2:"HQ012206"&fq=PHY_KEY1:"JACK"&rows=1000&sort=MODIFY_TS desc,LOGICAL_SECT_NAME asc,TRACK_ID desc,TRACK_INTER_ID asc,PHY_KEY1 asc,PHY_KEY2 asc,PHY_KEY3 asc,PHY_KEY4 asc,PHY_KEY5 asc,PHY_KEY6 asc,PHY_KEY7 asc,PHY_KEY8 asc,PHY_KEY9 asc,PHY_KEY10 asc,FIELD_NAME asc This was the original query. Since there were lot of sorting fields, we decided to not do on the solr side, instead fetch the query response and do the sorting outside solr. This eliminated the need of more JVM memory which was allocated. Every time we ran this query, solr would crash exceeding the JVM memory. Now we are only running filter queries. And regarding the filter cache, it is in default setup: (we are using default solrconfig.xml, and we have only added the request handler for DIH) <filterCache class="solr.FastLRUCache" size="512" initialSize="512" autowarmCount="0"/> Now that you’re aware of the size and numbers, can you please let me know what values/size that I need to increase? Is there an advantage of moving this single core to solr cloud? If yes, can you let us know, how many shards/replica do we require for this core considering we allow it to grow as users transact. The updates to this core is not thru DIH delta import rather, we are using SolrJ to push the changes. <schema.xml> <field name="PARENT_DOC_ID" type="string" indexed="true" stored="true" omitTermFreqAndPositions="true" /> <field name="MODIFY_TS" type="date" indexed="true" stored="true" omitTermFreqAndPositions="true" /> <field name="PHY_KEY1" type="string" indexed="true" stored="true" omitTermFreqAndPositions="true" /> <field name="PHY_KEY2" type="string" indexed="true" stored="true" omitTermFreqAndPositions="true" /> <field name="PHY_KEY3" type="string" indexed="true" stored="true" omitTermFreqAndPositions="true" /> <field name="PHY_KEY4" type="string" indexed="true" stored="true" omitTermFreqAndPositions="true" /> <field name="PHY_KEY5" type="string" indexed="true" stored="true" omitTermFreqAndPositions="true" /> <field name="PHY_KEY6" type="string" indexed="true" stored="true" omitTermFreqAndPositions="true" /> <field name="PHY_KEY7" type="string" indexed="true" stored="true" omitTermFreqAndPositions="true" /> <field name="PHY_KEY8" type="string" indexed="true" stored="true" omitTermFreqAndPositions="true" /> <field name="PHY_KEY9" type="string" indexed="true" stored="true" omitTermFreqAndPositions="true" /> <field name="PHY_KEY10" type="string" indexed="true" stored="true" omitTermFreqAndPositions="true" /> Thanks, Srinivas On 6/4/2020 9:51 PM, Srinivas Kashyap wrote: > We are on solr 8.4.1 and In standalone server mode. We have a core with > 497,767,038 Records indexed. It took around 32Hours to load data through DIH. > > The disk occupancy is shown below: > > 82G /var/solr/data/<corename>/data/index > > When I restarted solr instance and went to this core to query on solr admin > GUI, it is hanging and is showing "Connection to Solr lost. Please check the > Solr instance". But when I go back to dashboard, instance is up and I'm able > to query other cores. > > Also, querying on this core is eating up JVM memory allocated(24GB)/(32GB > RAM). A query(*:*) with filterqueries is overshooting the memory with OOM. You're going to want to have a lot more than 8GB available memory for disk caching with an 82GB index. That's a performance thing... with so little caching memory, Solr will be slow, but functional. That aspect of your setup will NOT lead to out of memory. If you are experiencing Java "OutOfMemoryError" exceptions, you will need to figure out what resource is running out. It might be heap memory, but it also might be that you're hitting the process/thread limit of your operating system. And there are other possible causes for that exception too. Do you have the text of the exception available? It will be absolutely critical for you to determine what resource is running out, or you might focus your efforts on the wrong thing. If it's heap memory (something that I can't really assume), then Solr is requiring more than the 24GB heap you've allocated. Do you have faceting or grouping on those queries? Are any of your filters really large or complex? These are the things that I would imagine as requiring lots of heap memory. What is the size of your filterCache? With about 500 million documents in the core, each entry in the filterCache will consume nearly 60 megabytes of memory. If your filterCache has the default example size of 512, and it actually gets that big, then that single cache will require nearly 30 gigabytes of heap memory (on top of the other things in Solr that require heap) ... and you only have 24GB. That could cause OOME exceptions. Does the server run things other than Solr? Look here for some valuable info about performance and memory: https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems<https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems> Thanks, Shawn ________________________________ DISCLAIMER: E-mails and attachments from Bamboo Rose, LLC are confidential. If you are not the intended recipient, please notify the sender immediately by replying to the e-mail, and then delete it without making copies or using it in any way. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. Disclaimer The information contained in this communication from the sender is confidential. It is intended solely for use by the recipient and others authorized to receive it. If you are not the recipient, you are hereby notified that any disclosure, copying, distribution or taking action in relation of the contents of this information is strictly prohibited and may be unlawful. This email has been scanned for viruses and malware, and may have been automatically archived by Mimecast Ltd, an innovator in Software as a Service (SaaS) for business. Providing a safer and more useful place for your human generated data. Specializing in; Security, archiving and compliance. To find out more visit the Mimecast website.