I have switched between solr and lucene user lists while debugging this
issue (detail In following thread) My current hypothesis is that since a
large number of indexing threads are being created ( maxIndexingThreads
config is now obsolete) , each output segment is really small .  Reference:

Is there any config in solr 6.6 to control this ?
If not , why was the current config  considered useless ?


---------- Forwarded message ---------
From: Nawab Zada Asad Iqbal <khi...@gmail.com>
Date: Sun, Aug 6, 2017 at 8:25 AM
Subject: Re: Understanding flush and DocumentsWriterPerThread
To: <java-u...@lucene.apache.org>

I think I am hitting this problem. Since, maxIndexingThreads is not used
anymore, i see 330+ indexing threads (in the attached log:-   "334 in-use
non-flushing threads states" )

The bugfix recommends using custom code to control concurrency in
IndexWriter, how can I configure it using solr6.6 ?

On Sat, Aug 5, 2017 at 12:59 PM, Nawab Zada Asad Iqbal <khi...@gmail.com>

> Hi,
> I am debugging a bulk indexing performance issue while upgrading to 6.6
> from 4.5.0 . I have commits disabled while indexing total of 85G data
> during 7 hours. At the end of it, I want some 30 or so big segments. But i
> am getting 3000 segments.
> I deleted the index and enabled infostream logging ; i have attached the
> log when first segment is flushed. Here are few questions:
> 1. When a segment if flushed , then is it permanent or can more documents
> be written to it (besides the merge scenario)?
> 2. It seems that 330+ threads are writing in parallel. Will each one of
> them become one segment when written to the disk? In which case, i should
> probably decrease concurrency?
> 3. One possibility is to delay flushing, the flush is getting triggered at
> 10000MB, probably coming from <ramBufferSizeMB>10000</ramBufferSizeMB> ;
> however, the segment which is flushed is only 115MB. Is this limit for the
> combined size of all in-memory segments? In which case, is it ok to
> increase it further to use more of my heap (48GB).
> 4. How can I decrease the concurrency, maybe the solution is to use fewer
> in memory segments?
> In previous run, there were 110k files in the index folder after I
> stopping indexing. Before doing commit, I noticed that the file count
> continued to decrease every few minutes, until it reduced to 27k or so. (I
> committed after it stabilized)
> My Indexconfig is this:
>   <indexConfig>
>     <writeLockTimeout>1000</writeLockTimeout>
>     <commitLockTimeout>10000</commitLockTimeout>
>     <maxIndexingThreads>10</maxIndexingThreads>
>     <useCompoundFile>false</useCompoundFile>
>     <ramBufferSizeMB>10000</ramBufferSizeMB>
>   <mergePolicyFactory
> class="org.apache.solr.index.TieredMergePolicyFactory">
>   <int name="maxMergeAtOnce">5</int>
>      <int name="segmentsPerTier">3000</int>
>       <int name="maxMergeAtOnceExplicit">10</int>
>       <int name="floorSegmentMB">16</int>
>       <!-- 200 gb since we want few big segments during full indexing -->
>       <double name="maxMergedSegmentMB">200000</double>
>       <double name="forceMergeDeletesPctAllowed">1</double>
>     </mergePolicyFactory>
>      <mergeScheduler
> class="org.apache.lucene.index.ConcurrentMergeScheduler">
>        <int name="maxThreadCount">10</int>
>        <int name="maxMergeCount">10</int>
>      </mergeScheduler>
>     <lockType>${solr.lock.type:native}</lockType>
>     <reopenReaders>true</reopenReaders>
>     <deletionPolicy class="solr.SolrDeletionPolicy">
>       <str name="maxCommitsToKeep">1</str>
>       <str name="maxOptimizedCommitsToKeep">0</str>
>     </deletionPolicy>
>     <infoStream>true</infoStream>
>     <applyAllDeletesOnFlush>false</applyAllDeletesOnFlush>
>   </indexConfig>
> Thanks
> Nawab

Reply via email to