All,
We're puzzled why we're still unable to optimize a 192GB index on a LVM
volume that has 406GB available. We are not using Solr distribution.
There is no snapshooter in the picture. We run out of disk capacity with
a df showing 100% but a du showing just 379GB of files.
Restarting tomcat causes space to be recovered and many segments to be
deleted leaving just 3 from the original 33. Issuing another optimize at
that point causes solr to run for a while and then show no further
activity (cpu,memory consumption) in jconsole. The 3 segments do not
merge into one.
% df -h .
/dev/mapper/internal-solr--build--2
size used avail
406G 402G 30M 100% /l/solrs/build-2
Also suspicious is the 406G vs. 402G vs. 30M for size vs. used vs. avail
Yesterday, after our 2nd try, lsof showed several "deleted" files that
were still open that apparently were consuming space almost 134GB.
jsvc 8381 tomcat 377u REG 253,6 13369098240
1982471 /l/solrs/build-2/data/index/_1j37.tis (deleted)
jsvc 8381 tomcat 378u REG 253,6 184778752
1982472 /l/solrs/build-2/data/index/_1j37.tii (deleted)
jsvc 8381 tomcat 379u REG 253,6 34053685248
1982473 /l/solrs/build-2/data/index/_1j37.frq (deleted)
jsvc 8381 tomcat 380u REG 253,6 130411978752
1982474 /l/solrs/build-2/data/index/_1j37.prx (deleted)
That theory did not work because the error log showed that solr was
trying to merge into the _1j37 segment files showing as deleted in the
lsof above when it ran out of space so those are a symptom not a cause
of the lost space:
SEVERE: java.io.IOException: background merge hit exception:
_ojl:C151080 _169w:C141302 _1j36:C80405 _1j35:C2043 _1j34:C192 into
_1j37 [optimize]]]]e]: java.io.IOException: background merge hit
exception: _ojl:C151080 _169w:C141302 _1j36:C80405 _1j35:C2043
_1j34:C192 into _1j37 [
We restored the pre-optimized index again, restarted tomcat and tried to
optimize using SerialMergePolicy instead of the default
ConcurrentMergePolicy under the theory that concurrent merges could
somehow take more than 2X disk space.
The optimize failed again with "out of space" error. This time there
where no "deleted" files in the lsof output.
This is one shard out of 10. A couple the shards were around 192GB and
merged successfully. Any suggestions on how to debug this would be
greatly appreciated.
Thanks!
Phil
hathitrust.org
University of Michigan
Shalin Shekhar Mangar wrote:
Not sure but a quick search turned up:
http://www.walkernews.net/2007/07/13/df-and-du-command-show-different-used-disk-space/
Using upto 2x the index size can happen. Also check if there is a
snapshooter script running through cron which is making hard links to files
while a merge is in progress.
Do let us know if you make any progress. This is interesting.
On Tue, Oct 6, 2009 at 5:28 PM, Phillip Farber <pfar...@umich.edu> wrote:
I am attempting to optimize a large shard on solr 1.4 and repeatedly get
java.io.IOException: No space left on device. The shard, after a final
commit before optimize, shows a size of about 192GB on a 400GB volume. I
have successfully optimized 2 other shards that were similarly large without
this problem on identical hardware boxes.
Before the optimize I see:
% df -B1 .
Filesystem 1B-blocks Used Available Use% Mounted on
/dev/mapper/internal-solr--build--2
435440427008 205681356800 225335255040 48%
/l/solrs/build-2
slurm-4:/l/solrs/build-2/data/index % du -B1
205441486848 .
There's a slight discrepancy between the du and df which appears to be
orphaned inodes. But the du says there should be enough space to handle the
doubling in size during optimization. However, for the second time we run
out of space and the du and df are wildly different at that point and the
volume is at 100%
% df -B1 .
Filesystem 1B-blocks Used Available Use% Mounted on
/dev/mapper/internal-solr--build--2
435440427008 430985760768 30851072 100%
/l/solrs/build-2
slurm-4:/l/solrs/build-2/data/index % du -B1
252552298496 .
At this point it appears orphaned inodes are consuming space and not being
freed-up. Any clue as to whether this is a lucene bug a solr bug or some
other problem. Error traces follow.
Thanks!
Phil
---
Oct 6, 2009 2:12:37 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: {} 0 9110523
Oct 6, 2009 2:12:37 AM org.apache.solr.common.SolrException log
SEVERE: java.io.IOException: background merge hit exception: _ojl:C151080
_169w:C141302 _1j36:C80405 _1j35:C2043 _1j34:C192 into _1j37 [optimize]
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2737)
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2658)
at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:401)
at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:168)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:548)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
at
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
at
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: No space left on device
at java.io.RandomAccessFile.writeBytes(Native Method)
at java.io.RandomAccessFile.write(RandomAccessFile.java:466)
at
org.apache.lucene.store.FSDirectory$FSIndexOutput.flushBuffer(FSDirectory.java:719)
at
org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96)
at
org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:85)
at
org.apache.lucene.store.BufferedIndexOutput.seek(BufferedIndexOutput.java:124)
at
org.apache.lucene.store.FSDirectory$FSIndexOutput.seek(FSDirectory.java:744)
at
org.apache.lucene.index.TermInfosWriter.close(TermInfosWriter.java:220)
at
org.apache.lucene.index.FormatPostingsFieldsWriter.finish(FormatPostingsFieldsWriter.java:70)
at
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:493)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:140)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4838)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4429)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291)
Oct 6, 2009 2:12:37 AM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/build-2 path=/update params={} status=500 QTime=9110523