Hi Eric,

The collection has almost 13billion documents with each document around 5kb 
size, all the columns around 150 are the indexed. Do you think that number of 
documents in the collection causing this issue. Appreciate your response.

Regards,
Madhava 

Sent from my iPhone

> On 3 Jul 2020, at 12:42, Erick Erickson <erickerick...@gmail.com> wrote:
> 
> If you’re seeing low CPU utilization at the same time, you probably
> just have too much data on too little hardware. Check your
> swapping, how much of your I/O is just because Lucene can’t
> hold all the parts of the index it needs in memory at once? Lucene
> uses MMapDirectory to hold the index and you may well be
> swapping, see:
> 
> https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> 
> But my guess is that you’ve just reached a tipping point. You say:
> 
> "From last 2-3 weeks we have been noticing either slow indexing or timeout 
> errors while indexing”
> 
> So have you been continually adding more documents to your
> collections for more than the 2-3 weeks? If so you may have just
> put so much data on the same boxes that you’ve gone over
> the capacity of your hardware. As Toke says, adding physical
> memory for the OS to use to hold relevant parts of the index may
> alleviate the problem (again, refer to Uwe’s article for why).
> 
> All that said, if you’re going to keep adding document you need to
> seriously think about adding new machines and moving some of
> your replicas to them.
> 
> Best,
> Erick
> 
>> On Jul 3, 2020, at 7:14 AM, Toke Eskildsen <t...@kb.dk> wrote:
>> 
>>> On Thu, 2020-07-02 at 11:16 +0000, Kommu, Vinodh K. wrote:
>>> We are performing QA performance testing on couple of collections
>>> which holds 2 billion and 3.5 billion docs respectively.
>> 
>> How many shards?
>> 
>>> 1.  Our performance team noticed that read operations are pretty
>>> more than write operations like 100:1 ratio, is this expected during
>>> indexing or solr nodes are doing any other operations like syncing?
>> 
>> Are you saying that there are 100 times more read operations when you
>> are indexing? That does not sound too unrealistic as the disk cache
>> might be filled with the data that the writers are flushing.
>> 
>> In that case, more RAM would help. Okay, more RAM nearly always helps,
>> but such massive difference in IO-utilization does indicate that you
>> are starved for cache.
>> 
>> I noticed you have at least 18 replicas. That's a lot. Just to sanity
>> check: How many replicas are each physical box handling? If they are
>> sharing resources, fewer replicas would probably be better.
>> 
>>> 3.  Our client timeout is set to 2mins, can they increase further
>>> more? Would that help or create any other problems?
>> 
>> It does not hurt the server to increase the client timeout as the
>> initiated query will keep running until it is finished, independent of
>> whether or not there is a client to receive the result.
>> 
>> If you want a better max time for query processing, you should look at 
>> 
>> https://lucene.apache.org/solr/guide/7_7/common-query-parameters.html#timeallowed-parameter
>> but due to its inherent limitations it might not help in your
>> situation.
>> 
>>> 4.  When we created an empty collection and loaded same data file,
>>> it loaded fine without any issues so having more documents in a
>>> collection would create such problems?
>> 
>> Solr 7 does have a problem with sparse DocValues and many documents,
>> leading to excessive IO-activity, which might be what you are seeing. I
>> can see from an earlier post that you were using streaming expressions
>> for another collection: This is one of the things that are affected by
>> the Solr 7 DocValues issue.
>> 
>> More info about DocValues and streaming:
>> https://issues.apache.org/jira/browse/SOLR-13013
>> 
>> Fairly in-depth info on the problem with Solr 7 docValues:
>> https://issues.apache.org/jira/browse/LUCENE-8374
>> 
>> If this is your problem, upgrading to Solr 8 and indexing the
>> collection from scratch should fix it. 
>> 
>> Alternatively you can port the LUCENE-8374-patch from Solr 7.3 to 7.7
>> or you can ensure that there are values defined for all DocValues-
>> fields in all your documents.
>> 
>>> java.net.SocketTimeoutException: Read timed out
>>>       at java.net.SocketInputStream.socketRead0(Native Method) 
>> ...
>>> Remote error message: java.util.concurrent.TimeoutException: Idle
>>> timeout expired: 600000/600000 ms
>> 
>> There is a default timeout of 10 minutes (distribUpdateSoTimeout?). You
>> should be able to change it in solr.xml.
>> https://lucene.apache.org/solr/guide/8_5/format-of-solr-xml.html
>> 
>> BUT if an update takes > 10 minutes to be processed, it indicates that
>> the cluster is overloaded.  Increasing the timeout is just a band-aid.
>> 
>> - Toke Eskildsen, Royal Danish Library
>> 
>> 
> 

Reply via email to