All,

We're puzzled why we're still unable to optimize a 192GB index on a LVM volume that has 406GB available. We are not using Solr distribution. There is no snapshooter in the picture. We run out of disk capacity with a df showing 100% but a du showing just 379GB of files.

Restarting tomcat causes space to be recovered and many segments to be deleted leaving just 3 from the original 33. Issuing another optimize at that point causes solr to run for a while and then show no further activity (cpu,memory consumption) in jconsole. The 3 segments do not merge into one.

% df -h .
/dev/mapper/internal-solr--build--2
                     size      used    avail
                      406G  402G   30M 100% /l/solrs/build-2

Also suspicious is the 406G vs. 402G vs. 30M for size vs. used vs. avail

Yesterday, after our 2nd try, lsof showed several "deleted" files that were still open that apparently were consuming space almost 134GB.

jsvc 8381 tomcat 377u REG 253,6 13369098240 1982471 /l/solrs/build-2/data/index/_1j37.tis (deleted) jsvc 8381 tomcat 378u REG 253,6 184778752 1982472 /l/solrs/build-2/data/index/_1j37.tii (deleted) jsvc 8381 tomcat 379u REG 253,6 34053685248 1982473 /l/solrs/build-2/data/index/_1j37.frq (deleted) jsvc 8381 tomcat 380u REG 253,6 130411978752 1982474 /l/solrs/build-2/data/index/_1j37.prx (deleted)

That theory did not work because the error log showed that solr was trying to merge into the _1j37 segment files showing as deleted in the lsof above when it ran out of space so those are a symptom not a cause of the lost space:

SEVERE: java.io.IOException: background merge hit exception: _ojl:C151080 _169w:C141302 _1j36:C80405 _1j35:C2043 _1j34:C192 into _1j37 [optimize]]]]e]: java.io.IOException: background merge hit exception: _ojl:C151080 _169w:C141302 _1j36:C80405 _1j35:C2043 _1j34:C192 into _1j37 [

We restored the pre-optimized index again, restarted tomcat and tried to optimize using SerialMergePolicy instead of the default ConcurrentMergePolicy under the theory that concurrent merges could somehow take more than 2X disk space.

The optimize failed again with "out of space" error. This time there where no "deleted" files in the lsof output.

This is one shard out of 10. A couple the shards were around 192GB and merged successfully. Any suggestions on how to debug this would be greatly appreciated.

Thanks!

Phil
hathitrust.org
University of Michigan


Shalin Shekhar Mangar wrote:
Not sure but a quick search turned up:
http://www.walkernews.net/2007/07/13/df-and-du-command-show-different-used-disk-space/

Using upto 2x the index size can happen. Also check if there is a
snapshooter script running through cron which is making hard links to files
while a merge is in progress.

Do let us know if you make any progress. This is interesting.

On Tue, Oct 6, 2009 at 5:28 PM, Phillip Farber <pfar...@umich.edu> wrote:

I am attempting to optimize a large shard on solr 1.4 and repeatedly get
java.io.IOException: No space left on device. The shard, after a final
commit before optimize, shows a size of about 192GB on a 400GB volume.  I
have successfully optimized 2 other shards that were similarly large without
this problem on identical hardware boxes.

Before the optimize I see:

% df -B1 .
Filesystem 1B-blocks Used Available Use% Mounted on
/dev/mapper/internal-solr--build--2
                    435440427008 205681356800 225335255040 48%
/l/solrs/build-2

slurm-4:/l/solrs/build-2/data/index % du -B1
205441486848 .

There's a slight discrepancy between the du and df which appears to be
orphaned inodes. But the du says there should be enough space to handle the
doubling in size during optimization. However, for the second time we run
out of space and the du and df are wildly different at that point and the
volume is at 100%


% df -B1 .

Filesystem           1B-blocks      Used Available Use% Mounted on
/dev/mapper/internal-solr--build--2
                   435440427008 430985760768  30851072 100%
/l/solrs/build-2

slurm-4:/l/solrs/build-2/data/index % du -B1
252552298496    .

At this point it appears orphaned inodes are consuming space and not being
freed-up. Any clue as to whether this is a lucene bug a solr bug or  some
other problem.  Error traces follow.

Thanks!

Phil

---

Oct 6, 2009 2:12:37 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: {} 0 9110523
Oct 6, 2009 2:12:37 AM org.apache.solr.common.SolrException log
SEVERE: java.io.IOException: background merge hit exception: _ojl:C151080
_169w:C141302 _1j36:C80405 _1j35:C2043 _1j34:C192 into _1j37 [optimize]
  at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2737)
  at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2658)
  at
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:401)
  at
org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
  at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:168)
  at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
  at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
  at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
  at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
  at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
  at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
  at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
  at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
  at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
  at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:548)
  at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
  at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
  at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
  at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
  at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
  at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
  at
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
  at
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
  at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
  at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: No space left on device
  at java.io.RandomAccessFile.writeBytes(Native Method)
  at java.io.RandomAccessFile.write(RandomAccessFile.java:466)
  at
org.apache.lucene.store.FSDirectory$FSIndexOutput.flushBuffer(FSDirectory.java:719)
  at
org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96)
  at
org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:85)
  at
org.apache.lucene.store.BufferedIndexOutput.seek(BufferedIndexOutput.java:124)
  at
org.apache.lucene.store.FSDirectory$FSIndexOutput.seek(FSDirectory.java:744)
  at
org.apache.lucene.index.TermInfosWriter.close(TermInfosWriter.java:220)
  at
org.apache.lucene.index.FormatPostingsFieldsWriter.finish(FormatPostingsFieldsWriter.java:70)
  at
org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:493)
  at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:140)
  at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4838)
  at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4429)
  at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235)
  at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291)

Oct 6, 2009 2:12:37 AM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/build-2 path=/update params={} status=500 QTime=9110523





Reply via email to