Hi I have switched between solr and lucene user lists while debugging this issue (detail In following thread) My current hypothesis is that since a large number of indexing threads are being created ( maxIndexingThreads config is now obsolete) , each output segment is really small . Reference: https://issues.apache.org/jira/plugins/servlet/mobile#issue/LUCENE-6659
Is there any config in solr 6.6 to control this ? If not , why was the current config considered useless ? Thanks Nawab ---------- Forwarded message --------- From: Nawab Zada Asad Iqbal <khi...@gmail.com> Date: Sun, Aug 6, 2017 at 8:25 AM Subject: Re: Understanding flush and DocumentsWriterPerThread To: <java-u...@lucene.apache.org> I think I am hitting this problem. Since, maxIndexingThreads is not used anymore, i see 330+ indexing threads (in the attached log:- "334 in-use non-flushing threads states" ) The bugfix recommends using custom code to control concurrency in IndexWriter, how can I configure it using solr6.6 ? On Sat, Aug 5, 2017 at 12:59 PM, Nawab Zada Asad Iqbal <khi...@gmail.com> wrote: > Hi, > > I am debugging a bulk indexing performance issue while upgrading to 6.6 > from 4.5.0 . I have commits disabled while indexing total of 85G data > during 7 hours. At the end of it, I want some 30 or so big segments. But i > am getting 3000 segments. > I deleted the index and enabled infostream logging ; i have attached the > log when first segment is flushed. Here are few questions: > > 1. When a segment if flushed , then is it permanent or can more documents > be written to it (besides the merge scenario)? > 2. It seems that 330+ threads are writing in parallel. Will each one of > them become one segment when written to the disk? In which case, i should > probably decrease concurrency? > 3. One possibility is to delay flushing, the flush is getting triggered at > 10000MB, probably coming from <ramBufferSizeMB>10000</ramBufferSizeMB> ; > however, the segment which is flushed is only 115MB. Is this limit for the > combined size of all in-memory segments? In which case, is it ok to > increase it further to use more of my heap (48GB). > 4. How can I decrease the concurrency, maybe the solution is to use fewer > in memory segments? > > In previous run, there were 110k files in the index folder after I > stopping indexing. Before doing commit, I noticed that the file count > continued to decrease every few minutes, until it reduced to 27k or so. (I > committed after it stabilized) > > > My Indexconfig is this: > > <indexConfig> > <writeLockTimeout>1000</writeLockTimeout> > <commitLockTimeout>10000</commitLockTimeout> > <maxIndexingThreads>10</maxIndexingThreads> > <useCompoundFile>false</useCompoundFile> > <ramBufferSizeMB>10000</ramBufferSizeMB> > <mergePolicyFactory > class="org.apache.solr.index.TieredMergePolicyFactory"> > <int name="maxMergeAtOnce">5</int> > <int name="segmentsPerTier">3000</int> > <int name="maxMergeAtOnceExplicit">10</int> > <int name="floorSegmentMB">16</int> > <!-- 200 gb since we want few big segments during full indexing --> > <double name="maxMergedSegmentMB">200000</double> > <double name="forceMergeDeletesPctAllowed">1</double> > </mergePolicyFactory> > <mergeScheduler > class="org.apache.lucene.index.ConcurrentMergeScheduler"> > <int name="maxThreadCount">10</int> > <int name="maxMergeCount">10</int> > </mergeScheduler> > <lockType>${solr.lock.type:native}</lockType> > <reopenReaders>true</reopenReaders> > <deletionPolicy class="solr.SolrDeletionPolicy"> > <str name="maxCommitsToKeep">1</str> > <str name="maxOptimizedCommitsToKeep">0</str> > </deletionPolicy> > <infoStream>true</infoStream> > <applyAllDeletesOnFlush>false</applyAllDeletesOnFlush> > </indexConfig> > > > Thanks > Nawab > > >