Re: Merging of index in Solr

Emir Arnautović Wed, 22 Nov 2017 00:51:07 -0800

Hi Edwin,
Quick googling suggests that this is the issue of NTFS related to large number 
of file fragments caused by large number of files in one directory of huge 
files. Are you running this merging on a Windows machine?


Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 22 Nov 2017, at 02:33, Zheng Lin Edwin Yeo <edwinye...@gmail.com> wrote:
> 
> Hi,
> 
> I have encountered this error during the merging of the 3.5TB of index.
> What could be the cause that lead to this?
> 
> Exception in thread "main" Exception in thread "Lucene Merge Thread #8"
> java.io.
> 
> IOException: background merge hit exception: _6f(6.5.1):C7256757
> _6e(6.5.1):C646
> 
> 2072 _6d(6.5.1):C3750777 _6c(6.5.1):C2243594 _6b(6.5.1):C1015431
> _6a(6.5.1):C105
> 
> 0220 _69(6.5.1):c273879 _28(6.4.1):c79011/84:delGen=84
> _26(6.4.1):c44960/8149:de
> 
> lGen=100 _29(6.4.1):c73855/68:delGen=68 _5(6.4.1):C46672/31:delGen=31
> _68(6.5.1)
> 
> :c66 into _6g [maxNumSegments=1]
> 
>        at
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1931)
> 
> 
> 
>        at
> org.apache.lucene.index.IndexWriter.forceMerge(IndexWriter.java:1871)
> 
> 
> 
>        at
> org.apache.lucene.misc.IndexMergeTool.main(IndexMergeTool.java:57)
> 
> Caused by: java.io.IOException: The requested operation could not be
> completed d
> 
> ue to a file system limitation
> 
>        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
> 
>        at sun.nio.ch.FileDispatcherImpl.write(Unknown Source)
> 
>        at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
> 
>        at sun.nio.ch.IOUtil.write(Unknown Source)
> 
>        at sun.nio.ch.FileChannelImpl.write(Unknown Source)
> 
>        at java.nio.channels.Channels.writeFullyImpl(Unknown Source)
> 
>        at java.nio.channels.Channels.writeFully(Unknown Source)
> 
>        at java.nio.channels.Channels.access$000(Unknown Source)
> 
>        at java.nio.channels.Channels$1.write(Unknown Source)
> 
>        at
> org.apache.lucene.store.FSDirectory$FSIndexOutput$1.write(FSDirectory
> 
> .java:419)
> 
>        at java.util.zip.CheckedOutputStream.write(Unknown Source)
> 
>        at java.io.BufferedOutputStream.flushBuffer(Unknown Source)
> 
>        at java.io.BufferedOutputStream.write(Unknown Source)
> 
>        at
> org.apache.lucene.store.OutputStreamIndexOutput.writeBytes(OutputStre
> 
> amIndexOutput.java:53)
> 
>        at
> org.apache.lucene.store.RateLimitedIndexOutput.writeBytes(RateLimited
> 
> IndexOutput.java:73)
> 
>        at org.apache.lucene.store.DataOutput.writeBytes(DataOutput.java:52)
> 
>        at
> org.apache.lucene.codecs.lucene50.ForUtil.writeBlock(ForUtil.java:175
> 
> )
> 
>        at
> org.apache.lucene.codecs.lucene50.Lucene50PostingsWriter.addPosition(
> 
> Lucene50PostingsWriter.java:286)
> 
>        at
> org.apache.lucene.codecs.PushPostingsWriterBase.writeTerm(PushPosting
> 
> sWriterBase.java:156)
> 
>        at
> org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter$TermsWriter.w
> 
> rite(BlockTreeTermsWriter.java:866)
> 
>        at
> org.apache.lucene.codecs.blocktree.BlockTreeTermsWriter.write(BlockTr
> 
> eeTermsWriter.java:344)
> 
>        at
> org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:105
> 
> )
> 
>        at
> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter
> 
> .merge(PerFieldPostingsFormat.java:164)
> 
>        at
> org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:2
> 
> 16)
> 
>        at
> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:101)
> 
>        at
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4353
> 
> )
> 
>        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3928)
> 
>        at
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMe
> 
> rgeScheduler.java:624)
> 
>        at
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(Conc
> 
> urrentMergeScheduler.java:661)
> 
> org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException:
> The req
> 
> uested operation could not be completed due to a file system limitation
> 
>        at
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException
> 
> (ConcurrentMergeScheduler.java:703)
> 
>        at
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(Conc
> 
> urrentMergeScheduler.java:683)
> 
> Caused by: java.io.IOException: The requested operation could not be
> completed d
> 
> ue to a file system limitation
> 
>        at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
> 
>        at sun.nio.ch.FileDispatcherImpl.write(Unknown Source)
> 
>        at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
> 
>        at sun.nio.ch.IOUtil.write(Unknown Source)
> 
>        at sun.nio.ch.FileChannelImpl.write(Unknown Source)
> 
>        at java.nio.channels.Channels.writeFullyImpl(Unknown Source)
> 
>        at java.nio.channels.Channels.writeFully(Unknown Source)
> 
>        at java.nio.channels.Channels.access$000(Unknown Source)
> 
>        at java.nio.channels.Channels$1.write(Unknown Source)
> 
> Regards,
> Edwin
> 
> On 22 November 2017 at 00:10, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
> wrote:
> 
>> I am using the IndexMergeTool from Solr, from the command below:
>> 
>> java -classpath lucene-core-6.5.1.jar;lucene-misc-6.5.1.jar
>> org.apache.lucene.misc.IndexMergeTool
>> 
>> The heap size is 32GB. There are more than 20 million documents in the two
>> cores.
>> 
>> Regards,
>> Edwin
>> 
>> 
>> 
>> On 21 November 2017 at 21:54, Shawn Heisey <apa...@elyograg.org> wrote:
>> 
>>> On 11/20/2017 9:35 AM, Zheng Lin Edwin Yeo wrote:
>>> 
>>>> Does anyone knows how long usually the merging in Solr will take?
>>>> 
>>>> I am currently merging about 3.5TB of data, and it has been running for
>>>> more than 28 hours and it is not completed yet. The merging is running on
>>>> SSD disk.
>>>> 
>>> 
>>> The following will apply if you mean Solr's "optimize" feature when you
>>> say "merging".
>>> 
>>> In my experience, merging proceeds at about 20 to 30 megabytes per second
>>> -- even if the disks are capable of far faster data transfer.  Merging is
>>> not just copying the data. Lucene is completely rebuilding very large data
>>> structures, and *not* including data from deleted documents as it does so.
>>> It takes a lot of CPU power and time.
>>> 
>>> If we average the data rates I've seen to 25, then that would indicate
>>> that an optimize on a 3.5TB is going to take about 39 hours, and might take
>>> as long as 48 hours.  And if you're running SolrCloud with multiple
>>> replicas, multiply that by the number of copies of the 3.5TB index.  An
>>> optimize on a SolrCloud collection handles one shard replica at a time and
>>> works its way through the entire collection.
>>> 
>>> If you are merging different indexes *together*, which a later message
>>> seems to state, then the actual Lucene operation is probably nearly
>>> identical, but I'm not really familiar with it, so I cannot say for sure.
>>> 
>>> Thanks,
>>> Shawn
>>> 
>>> 
>>

Re: Merging of index in Solr

Reply via email to