One possibility is merging index segments. When this happens, are you actively 
indexing? And are these NRT replicas or TLOG/PULL? If the latter, are your TLOG 
leaders on the affected machines?

Best,
Erick

> On Jun 3, 2020, at 3:57 AM, Marvin Bredal Lillehaug 
> <marvin.lilleh...@gmail.com> wrote:
> 
> Hi,
> We have a cluster with five Solr(8.5.1, Java 11) nodes, and sometimes one
> or two nodes has Solr running with 100% cpu on all cores, «load» over 400,
> and high IO. It usually lasts five to ten minutes, and the node is hardly
> responding.
> Does anyone have any experience with this type of behaviour? Is there any
> logging other than infostream that could give any information?
> 
> We managed to trigger a thread dump,
> 
>> java.base@11.0.6
>> /java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:112)
>> org.apache.lucene.util.IOUtils.fsync(IOUtils.java:483)
>> org.apache.lucene.store.FSDirectory.fsync(FSDirectory.java:331)
>> org.apache.lucene.store.FSDirectory.sync(FSDirectory.java:286)
>> 
>> org.apache.lucene.store.NRTCachingDirectory.sync(NRTCachingDirectory.java:158)
>> 
>> org.apache.lucene.store.LockValidatingDirectoryWrapper.sync(LockValidatingDirectoryWrapper.java:68)
>> org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4805)
>> 
>> org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:3277)
>> org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:3445)
>> org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3410)
>> 
>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:678)
>> 
>> org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:636)
>> 
>> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:337)
>> org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:318)
> 
> 
> But not sure if this is from the incident or just right after. It seems
> strange that a fsync should behave like this.
> 
> Swappiness is set to default for RHEL 7 (Ops have resisted turning it off)
> 
> -- 
> Kind regards,
> Marvin B. Lillehaug

Reply via email to