Re: indexWriter.addIndexes, Disk space, and open files
On Mon, Jun 7, 2010 at 7:19 AM, Regan Heath regan.he...@bridgeheadsoftware.com wrote: That's pretty much exactly what I suspected was happening. I've had the same problem myself on another occasion... out of interest is there any way to force the file closed without flushing? No, IndexOutput has no such method. We could consider adding one... That sounds useful in general. In our case what we actually want is to abort the merge and delete all the new files created. This is in fact what Lucene will do, if the disk full is hit during merge. If the disk full is hit during flush(), Lucene discards those docs that were the RAM buffer. But then, our usage may be slightly unusual in that we merge an existing 'master' index and a number of 'temporary' indices into a new master index. On success we delete the old master and rename the new master into it's place. We're doing disk space checks prior to merge, based on the docs here: http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/index/IndexWriter.html#optimize() but I disabled these to test this out of disk space case, as it is possible something else could use up the required space during the merge. From memory I tried everything I could think of at the time but couldn't manage it. Best I could do was catch and swallow the expected exception from close and carry on. I think that's the best to do w/ today's API; but, you should save the first IOE you hit, then force close the remaining files, then throw that IOE. When you say 'force' close do you just mean wrapping the close calls in try/catch(IOException) where the catch block is empty (swallows the exception)? Or is there a specific call to force a file closed? The former. There is no method. So, the only option for us is to upgrade the version of lucene we're using to the current trunk? Is there no existing stable release version containing the fix? If not, when do you estimate the next stable release with the fix will be available? I don't think any release of Lucene will have fixed all of these cases, yet. Patches welcome :) I would if I had the time, or sufficient understanding of the existing code, sadly I've only looked at it for 5 mins. :( Start small then iterate... add that static method somewhere and call it from a place or two :) Actually, the best fix is something Earwin created but is not yet committed (nor in a patch yet, I think), which adds a nice API for closing multiple IndexOutputs safely. Earwin, maybe you could pull out just this part of your patch and open a separate issue? Then we can fix all places in Lucene that need to close multiple IndexOutputs to use this API. That sounds great.. I'm not sure if something like this is useful to you.. public class Safe { /** * Safely closes any object that implements closeable * * @param closeable The object to close */ public static void close(Closeable closeable) { try { closeable.close(); } catch(Exception e) { // ignore } finally { closeable = null; } } } We use this in catch and finally blocks where we do not want to raise an exception. That looks great! I think Earwin's version took multiple Closeables and closed them all, which would be useful. We'd also want a way to close N closeables, but, if any exception is hit on closing any of them, throw that exception, but still force the remaining ones closed. Mike - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: indexWriter.addIndexes, Disk space, and open files
If you don't want to use the ImDisk software, a small flash drive will do just as well... Regan Heath wrote: Windows XP. The problem occurs on the local file system, but to replicate it more easily I am using http://www.ltr-data.se/opencode.html#ImDisk to mount a virtual 10mb disk on F:\. It is formatted as an NTFS file system. The files can be removed normally (delete from explorer or command prompt) after program shut down. In fact, the program cleans them up itself on restart (an interim solution). Process Explorer shows the program has handles to these three files open. Erick Erickson wrote: What op system and what file system are you using? Is the file system local or networked? What does it take to remove the files. That is, can you do it manually after the program shuts down? Best Erick -- View this message in context: http://lucene.472066.n3.nabble.com/indexWriter-addIndexes-Disk-space-and-open-files-tp841735p875713.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: indexWriter.addIndexes, Disk space, and open files
This is a bug in how Lucene handles IOException while closing files. Look at SegmentMerger's sources, for 2.3.2: https://svn.apache.org/repos/asf/lucene/java/tags/lucene_2_3_2/src/java/org/apache/lucene/index/SegmentMerger.java Look at the finally clause in mergeTerms: } finally { if (freqOutput != null) freqOutput.close(); if (proxOutput != null) proxOutput.close(); if (termInfosWriter != null) termInfosWriter.close(); if (queue != null) queue.close(); } You are hitting an exception in that freqOutput.close, which means the proxOutput (*.prx) and termInfosWriter (*.tii, *.tis) are not successfully closed. It looks like the bug is still present to some degree through 3x, but fixed (at least specifically for segment merging, but likely not in other places) in trunk. Likely what happened is you hit a disk full inside the try part, and so the finally clause went to close the files, but close then tries to flush the pending buffer, which also hits disk full. Mike On Mon, Jun 7, 2010 at 4:52 AM, Regan Heath regan.he...@bridgeheadsoftware.com wrote: If you don't want to use the ImDisk software, a small flash drive will do just as well... Regan Heath wrote: Windows XP. The problem occurs on the local file system, but to replicate it more easily I am using http://www.ltr-data.se/opencode.html#ImDisk to mount a virtual 10mb disk on F:\. It is formatted as an NTFS file system. The files can be removed normally (delete from explorer or command prompt) after program shut down. In fact, the program cleans them up itself on restart (an interim solution). Process Explorer shows the program has handles to these three files open. Erick Erickson wrote: What op system and what file system are you using? Is the file system local or networked? What does it take to remove the files. That is, can you do it manually after the program shuts down? Best Erick -- View this message in context: http://lucene.472066.n3.nabble.com/indexWriter-addIndexes-Disk-space-and-open-files-tp841735p875713.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: indexWriter.addIndexes, Disk space, and open files
That's pretty much exactly what I suspected was happening. I've had the same problem myself on another occasion... out of interest is there any way to force the file closed without flushing? From memory I tried everything I could think of at the time but couldn't manage it. Best I could do was catch and swallow the expected exception from close and carry on. So, the only option for us is to upgrade the version of lucene we're using to the current trunk? Is there no existing stable release version containing the fix? If not, when do you estimate the next stable release with the fix will be available? Thanks, Regan Michael McCandless-2 wrote: This is a bug in how Lucene handles IOException while closing files. Look at SegmentMerger's sources, for 2.3.2: https://svn.apache.org/repos/asf/lucene/java/tags/lucene_2_3_2/src/java/org/apache/lucene/index/SegmentMerger.java Look at the finally clause in mergeTerms: } finally { if (freqOutput != null) freqOutput.close(); if (proxOutput != null) proxOutput.close(); if (termInfosWriter != null) termInfosWriter.close(); if (queue != null) queue.close(); } You are hitting an exception in that freqOutput.close, which means the proxOutput (*.prx) and termInfosWriter (*.tii, *.tis) are not successfully closed. It looks like the bug is still present to some degree through 3x, but fixed (at least specifically for segment merging, but likely not in other places) in trunk. Likely what happened is you hit a disk full inside the try part, and so the finally clause went to close the files, but close then tries to flush the pending buffer, which also hits disk full. Mike On Mon, Jun 7, 2010 at 4:52 AM, Regan Heath regan.he...@bridgeheadsoftware.com wrote: If you don't want to use the ImDisk software, a small flash drive will do just as well... Regan Heath wrote: Windows XP. The problem occurs on the local file system, but to replicate it more easily I am using http://www.ltr-data.se/opencode.html#ImDisk to mount a virtual 10mb disk on F:\. It is formatted as an NTFS file system. The files can be removed normally (delete from explorer or command prompt) after program shut down. In fact, the program cleans them up itself on restart (an interim solution). Process Explorer shows the program has handles to these three files open. Erick Erickson wrote: What op system and what file system are you using? Is the file system local or networked? What does it take to remove the files. That is, can you do it manually after the program shuts down? Best Erick -- View this message in context: http://lucene.472066.n3.nabble.com/indexWriter-addIndexes-Disk-space-and-open-files-tp841735p875713.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- View this message in context: http://lucene.472066.n3.nabble.com/indexWriter-addIndexes-Disk-space-and-open-files-tp841735p875884.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: indexWriter.addIndexes, Disk space, and open files
On Mon, Jun 7, 2010 at 6:18 AM, Regan Heath regan.he...@bridgeheadsoftware.com wrote: That's pretty much exactly what I suspected was happening. I've had the same problem myself on another occasion... out of interest is there any way to force the file closed without flushing? No, IndexOutput has no such method. We could consider adding one... From memory I tried everything I could think of at the time but couldn't manage it. Best I could do was catch and swallow the expected exception from close and carry on. I think that's the best to do w/ today's API; but, you should save the first IOE you hit, then force close the remaining files, then throw that IOE. So, the only option for us is to upgrade the version of lucene we're using to the current trunk? Is there no existing stable release version containing the fix? If not, when do you estimate the next stable release with the fix will be available? I don't think any release of Lucene will have fixed all of these cases, yet. Patches welcome :) Actually, the best fix is something Earwin created but is not yet committed (nor in a patch yet, I think), which adds a nice API for closing multiple IndexOutputs safely. Earwin, maybe you could pull out just this part of your patch and open a separate issue? Then we can fix all places in Lucene that need to close multiple IndexOutputs to use this API. Mike - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: indexWriter.addIndexes, Disk space, and open files
That's pretty much exactly what I suspected was happening. I've had the same problem myself on another occasion... out of interest is there any way to force the file closed without flushing? No, IndexOutput has no such method. We could consider adding one... That sounds useful in general. In our case what we actually want is to abort the merge and delete all the new files created. But then, our usage may be slightly unusual in that we merge an existing 'master' index and a number of 'temporary' indices into a new master index. On success we delete the old master and rename the new master into it's place. We're doing disk space checks prior to merge, based on the docs here: http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/index/IndexWriter.html#optimize() but I disabled these to test this out of disk space case, as it is possible something else could use up the required space during the merge. From memory I tried everything I could think of at the time but couldn't manage it. Best I could do was catch and swallow the expected exception from close and carry on. I think that's the best to do w/ today's API; but, you should save the first IOE you hit, then force close the remaining files, then throw that IOE. When you say 'force' close do you just mean wrapping the close calls in try/catch(IOException) where the catch block is empty (swallows the exception)? Or is there a specific call to force a file closed? So, the only option for us is to upgrade the version of lucene we're using to the current trunk? Is there no existing stable release version containing the fix? If not, when do you estimate the next stable release with the fix will be available? I don't think any release of Lucene will have fixed all of these cases, yet. Patches welcome :) I would if I had the time, or sufficient understanding of the existing code, sadly I've only looked at it for 5 mins. :( Actually, the best fix is something Earwin created but is not yet committed (nor in a patch yet, I think), which adds a nice API for closing multiple IndexOutputs safely. Earwin, maybe you could pull out just this part of your patch and open a separate issue? Then we can fix all places in Lucene that need to close multiple IndexOutputs to use this API. That sounds great.. I'm not sure if something like this is useful to you.. public class Safe { /** * Safely closes any object that implements closeable * * @param closeable The object to close */ public static void close(Closeable closeable) { try { closeable.close(); } catch(Exception e) { // ignore } finally { closeable = null; } } } We use this in catch and finally blocks where we do not want to raise an exception. -- View this message in context: http://lucene.472066.n3.nabble.com/indexWriter-addIndexes-Disk-space-and-open-files-tp841735p876022.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: indexWriter.addIndexes, Disk space, and open files
What op system and what file system are you using? Is the file system local or networked? What does it take to remove the files. That is, can you do it manually after the program shuts down? Best Erick On Tue, May 25, 2010 at 5:42 AM, Regan Heath regan.he...@bridgeheadsoftware.com wrote: Hi, Appologies if this the wrong place to post this, or if it has been answered somewhere (I have searched and failed to find anything matching my case exactly). We're using Lucene 2.3.2 (an old version, I know). We have a system where we use a number of master indexes and a number of temp indexes. At some point we will decide to perform a merge, where we select a single master index and all (or as many temp indexes as will fit - based on a given max size) temp indexes, opening IndexReaders for each. code condensed for clarity for(String tempIndexPath: tempIndexPaths) { reader = IndexReader.open(tempIndexPath); readers.add(reader); } IndexReader[] result = new IndexReader[readers.size()]; return readers.toArray(result); We then create a new index with IndexWriter and call addIndexes passing the array of IndexReaders. code condensed for clarity File mergeMasterIndex = ... ... indexWriter = new IndexWriter(mergeMasterIndex, new StandardAnalyzer(), true); indexWriter.setMaxBufferedDocs(-1); indexWriter.setMaxMergeDocs(2147483647); indexWriter.setMergeFactor(10); indexWriter.setMaxFieldLength(1); indexWriter.addIndexes(indexReaders); indexWriter.optimize(); This throws an IOException, due to lack of disk space (testing with small indexes and a virtual 10mb disk.. http://www.ltr-data.se/opencode.html#ImDisk) At this point we close all the readers, and the writer and attempt to cleanup/delete the 'failed' new index directory and files. The problem is that there are some files being held open, specifically; _0.prx, _0.tii, _o.tis. There are no other readers or searchers open to this 'new' index, as it has just been created. There are no readers/searchers open to the temp indexes being merged (we never search temp indexes), there may be a searcher open to the master index selected for the merge. So.. I am hoping someone can give me a clue as to why there are files being held open, whether this is a known bug and fixed in a specific version of lucene or if there is something I can do to force these files closed. I have tried the writer.close(); IndexReader.isLocked(directory); IndexReader.unlock(directory); trick, isLocked returns false, even ignoring that and calling unlock anyway made no difference (I think it threw an AlreadyClosedException or similar). The exception stack trace... java.io.IOException: There is not enough space on the disk at java.io.RandomAccessFile.writeBytes(Native Method) at java.io.RandomAccessFile.write(Unknown Source) at org.apache.lucene.store.FSDirectory$FSIndexOutput.flushBuffer(FSDirectory.java:599) at org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96) at org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:85) at org.apache.lucene.store.BufferedIndexOutput.close(BufferedIndexOutput.java:109) at org.apache.lucene.store.FSDirectory$FSIndexOutput.close(FSDirectory.java:606) at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:398) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:134) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:110) at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:2428) at com.bridgehead.index.ApplicationIndex.merge(ApplicationIndex.java:506) at com.bridgehead.index.ServerThread.serviceMergeIndex(ServerThread.java:918) at com.bridgehead.index.ServerThread.run(ServerThread.java:266) Info stream.. IFD [ServerThread:/10.193.221.75:3821]: setInfoStream deletionpolicy=org.apache.lucene.index.keeponlylastcommitdeletionpol...@133796 IW 0 [ServerThread:/10.193.221.75:3821]: setInfoStream: dir=org.apache.lucene.store.fsdirect...@f:\Master\14.merge autoCommit=true mergepolicy=org.apache.lucene.index.logbytesizemergepol...@1a679b7 mergescheduler=org.apache.lucene.index.concurrentmergeschedu...@80f4cb ramBufferSizeMB=16.0 maxBuffereDocs=-1 maxBuffereDeleteTerms=-1 maxFieldLength=1 index= IW 0 [ServerThread:/10.193.221.75:3821]: optimize: index now IW 0 [ServerThread:/10.193.221.75:3821]: flush: segment=null docStoreSegment=null docStoreOffset=0 flushDocs=false flushDeletes=false flushDocStores=false numDocs=0 numBufDelTerms=0 IW 0 [ServerThread:/10.193.221.75:3821]: index before flush IW 0 [ServerThread:/10.193.221.75:3821]: CMS: now merge IW 0 [ServerThread:/10.193.221.75:3821]: CMS: index: IW 0 [ServerThread:/10.193.221.75:3821]: CMS: no more merges pending; now return IW 0 [ServerThread:/10.193.221.75:3821]: now
Re: indexWriter.addIndexes, Disk space, and open files
Windows XP. The problem occurs on the local file system, but to replicate it more easily I am using http://www.ltr-data.se/opencode.html#ImDisk to mount a virtual 10mb disk on F:\. It is formatted as an NTFS file system. The files can be removed normally (delete from explorer or command prompt) after program shut down. In fact, the program cleans them up itself on restart (an interim solution). Process Explorer shows the program has handles to these three files open. Erick Erickson wrote: What op system and what file system are you using? Is the file system local or networked? What does it take to remove the files. That is, can you do it manually after the program shuts down? Best Erick On Tue, May 25, 2010 at 5:42 AM, Regan Heath regan.he...@bridgeheadsoftware.com wrote: Hi, Appologies if this the wrong place to post this, or if it has been answered somewhere (I have searched and failed to find anything matching my case exactly). We're using Lucene 2.3.2 (an old version, I know). We have a system where we use a number of master indexes and a number of temp indexes. At some point we will decide to perform a merge, where we select a single master index and all (or as many temp indexes as will fit - based on a given max size) temp indexes, opening IndexReaders for each. code condensed for clarity for(String tempIndexPath: tempIndexPaths) { reader = IndexReader.open(tempIndexPath); readers.add(reader); } IndexReader[] result = new IndexReader[readers.size()]; return readers.toArray(result); We then create a new index with IndexWriter and call addIndexes passing the array of IndexReaders. code condensed for clarity File mergeMasterIndex = ... ... indexWriter = new IndexWriter(mergeMasterIndex, new StandardAnalyzer(), true); indexWriter.setMaxBufferedDocs(-1); indexWriter.setMaxMergeDocs(2147483647); indexWriter.setMergeFactor(10); indexWriter.setMaxFieldLength(1); indexWriter.addIndexes(indexReaders); indexWriter.optimize(); This throws an IOException, due to lack of disk space (testing with small indexes and a virtual 10mb disk.. http://www.ltr-data.se/opencode.html#ImDisk) At this point we close all the readers, and the writer and attempt to cleanup/delete the 'failed' new index directory and files. The problem is that there are some files being held open, specifically; _0.prx, _0.tii, _o.tis. There are no other readers or searchers open to this 'new' index, as it has just been created. There are no readers/searchers open to the temp indexes being merged (we never search temp indexes), there may be a searcher open to the master index selected for the merge. So.. I am hoping someone can give me a clue as to why there are files being held open, whether this is a known bug and fixed in a specific version of lucene or if there is something I can do to force these files closed. I have tried the writer.close(); IndexReader.isLocked(directory); IndexReader.unlock(directory); trick, isLocked returns false, even ignoring that and calling unlock anyway made no difference (I think it threw an AlreadyClosedException or similar). The exception stack trace... java.io.IOException: There is not enough space on the disk at java.io.RandomAccessFile.writeBytes(Native Method) at java.io.RandomAccessFile.write(Unknown Source) at org.apache.lucene.store.FSDirectory$FSIndexOutput.flushBuffer(FSDirectory.java:599) at org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96) at org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:85) at org.apache.lucene.store.BufferedIndexOutput.close(BufferedIndexOutput.java:109) at org.apache.lucene.store.FSDirectory$FSIndexOutput.close(FSDirectory.java:606) at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:398) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:134) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:110) at org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:2428) at com.bridgehead.index.ApplicationIndex.merge(ApplicationIndex.java:506) at com.bridgehead.index.ServerThread.serviceMergeIndex(ServerThread.java:918) at com.bridgehead.index.ServerThread.run(ServerThread.java:266) Info stream.. IFD [ServerThread:/10.193.221.75:3821]: setInfoStream deletionpolicy=org.apache.lucene.index.keeponlylastcommitdeletionpol...@133796 IW 0 [ServerThread:/10.193.221.75:3821]: setInfoStream: dir=org.apache.lucene.store.fsdirect...@f:\Master\14.merge autoCommit=true mergepolicy=org.apache.lucene.index.logbytesizemergepol...@1a679b7 mergescheduler=org.apache.lucene.index.concurrentmergeschedu...@80f4cb ramBufferSizeMB=16.0 maxBuffereDocs=-1 maxBuffereDeleteTerms=-1 maxFieldLength=1 index= IW 0