RE: Solr takes time to warm up core with huge data

Srinivas Kashyap Thu, 04 Jun 2020 23:18:27 -0700

Thanks Shawn,

The filter queries are not complex. Below are the filter queries I’m running 
for the corresponding schema entry:

q=*:*&fq=PARENT_DOC_ID:100&fq=MODIFY_TS:[1970-01-01T00:00:00Z TO 
*]&fq=PHY_KEY2:"HQ012206"&fq=PHY_KEY1:"JACK"&rows=1000&sort=MODIFY_TS 
desc,LOGICAL_SECT_NAME asc,TRACK_ID desc,TRACK_INTER_ID asc,PHY_KEY1 
asc,PHY_KEY2 asc,PHY_KEY3 asc,PHY_KEY4 asc,PHY_KEY5 asc,PHY_KEY6 asc,PHY_KEY7 
asc,PHY_KEY8 asc,PHY_KEY9 asc,PHY_KEY10 asc,FIELD_NAME asc

This was the original query. Since there were lot of sorting fields, we decided 
to not do on the solr side, instead fetch the query response and do the sorting 
outside solr. This eliminated the need of more JVM memory which was allocated. 
Every time we ran this query, solr would crash exceeding the JVM memory. Now we 
are only running filter queries.

And regarding the filter cache, it is in default setup: (we are using default 
solrconfig.xml, and we have only added the request handler for DIH)

<filterCache class="solr.FastLRUCache"
                 size="512"
                 initialSize="512"
                 autowarmCount="0"/>

Now that you’re aware of the size and numbers, can you please let me know what 
values/size that I need to increase? Is there an advantage of moving this 
single core to solr cloud? If yes, can you let us know, how many shards/replica 
do we require for this core considering we allow it to grow as users transact. 
The updates to this core is not thru DIH delta import rather, we are using 
SolrJ to push the changes.

<schema.xml>
<field name="PARENT_DOC_ID"                                                     
                           type="string"  indexed="true"  stored="true"    
omitTermFreqAndPositions="true" />
<field name="MODIFY_TS"                                                         
                                 type="date"    indexed="true"  stored="true"   
 omitTermFreqAndPositions="true" />
<field name="PHY_KEY1"                                                          
                                   type="string"  indexed="true"  stored="true" 
   omitTermFreqAndPositions="true" />
<field name="PHY_KEY2"                                                          
                                   type="string"  indexed="true"  stored="true" 
   omitTermFreqAndPositions="true" />
<field name="PHY_KEY3"                                                          
                                   type="string"  indexed="true"  stored="true" 
   omitTermFreqAndPositions="true" />
<field name="PHY_KEY4"                                                          
                                   type="string"  indexed="true"  stored="true" 
   omitTermFreqAndPositions="true" />
<field name="PHY_KEY5"                                                          
                                   type="string"  indexed="true"  stored="true" 
   omitTermFreqAndPositions="true" />
<field name="PHY_KEY6"                                                          
                                   type="string"  indexed="true"  stored="true" 
   omitTermFreqAndPositions="true" />
<field name="PHY_KEY7"                                                          
                                   type="string"  indexed="true"  stored="true" 
   omitTermFreqAndPositions="true" />
<field name="PHY_KEY8"                                                          
                                   type="string"  indexed="true"  stored="true" 
   omitTermFreqAndPositions="true" />
<field name="PHY_KEY9"                                                          
                                   type="string"  indexed="true"  stored="true" 
   omitTermFreqAndPositions="true" />
<field name="PHY_KEY10"                                                         
                                  type="string"  indexed="true"  stored="true"  
  omitTermFreqAndPositions="true" />

Thanks,
Srinivas

On 6/4/2020 9:51 PM, Srinivas Kashyap wrote:
> We are on solr 8.4.1 and In standalone server mode. We have a core with 
> 497,767,038 Records indexed. It took around 32Hours to load data through DIH.
>
> The disk occupancy is shown below:
>
> 82G /var/solr/data/<corename>/data/index
>
> When I restarted solr instance and went to this core to query on solr admin 
> GUI, it is hanging and is showing "Connection to Solr lost. Please check the 
> Solr instance". But when I go back to dashboard, instance is up and I'm able 
> to query other cores.
>
> Also, querying on this core is eating up JVM memory allocated(24GB)/(32GB 
> RAM). A query(*:*) with filterqueries is overshooting the memory with OOM.

You're going to want to have a lot more than 8GB available memory for
disk caching with an 82GB index. That's a performance thing... with so
little caching memory, Solr will be slow, but functional. That aspect
of your setup will NOT lead to out of memory.

If you are experiencing Java "OutOfMemoryError" exceptions, you will
need to figure out what resource is running out. It might be heap
memory, but it also might be that you're hitting the process/thread
limit of your operating system. And there are other possible causes for
that exception too. Do you have the text of the exception available?
It will be absolutely critical for you to determine what resource is
running out, or you might focus your efforts on the wrong thing.

If it's heap memory (something that I can't really assume), then Solr is
requiring more than the 24GB heap you've allocated.

Do you have faceting or grouping on those queries? Are any of your
filters really large or complex? These are the things that I would
imagine as requiring lots of heap memory.

What is the size of your filterCache? With about 500 million documents
in the core, each entry in the filterCache will consume nearly 60
megabytes of memory. If your filterCache has the default example size
of 512, and it actually gets that big, then that single cache will
require nearly 30 gigabytes of heap memory (on top of the other things
in Solr that require heap) ... and you only have 24GB. That could cause
OOME exceptions.

Does the server run things other than Solr?

Look here for some valuable info about performance and memory:

https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems<https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems>

Thanks,
Shawn
________________________________
DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.

RE: Solr takes time to warm up core with huge data

Reply via email to