NioFile cache performance
I finally got around to writing a testcase to verify the numbers I presented. The following testcase and results are for the lowest level disk operations. On my machine reading from the cache, vs. going to disk (even when the data is in the OS cache) is 30%-40% faster. Since Lucene makes extensive use of disk IO and often reads the same data (e.g. reading the terms), a localized user-level cache can provide significant performance benefits. Using a 4mb file (so I could be "guarantee" the disk data would be in the OS cache as well), the test shows the following results. Most of the CPU time is actually used during the synchronization with multiple threads. I hacked together a version of MemoryLRUCache that used a ConcurrentHashMap from JDK 1.5, and it was another 50% faster ! At a minimum, if the ReadWriteLock class was modified to use the 1.5 facilities some significant additional performance gains should be realized. filesize is 4194304 non-cached time = 10578, avg = 0.010578 non-cached threaded (3 threads) time = 32094, avg = 0.010698 cached time = 6125, avg = 0.006125 cache hits 996365 cache misses 3635 cached threaded (3 threads) time = 20734, avg = 0.0069116 cache hits 3989089 cache misses 10911 When using the shared test (which is more like the lucene usage, since a single "file" is shared by multiple threads), the difference is even more dramatic with multiple threads (since the cache size is effectively reduced by the number of threads). This test also shows the value of using multiple file handles when using multiple threads to read a single file (rather than using a shared file handle). filesize is 4194304 non-cached time = 10594, avg = 0.010594 non-cached threaded (3 threads) time = 42110, avg = 0.014036 cached time = 6047, avg = 0.006047 cache hits 996827 cache misses 3173 cached threaded (3 threads) time = 20079, avg = 0.006693 cache hits 3995776 cache misses 4224 package org.apache.lucene.util; import java.io.*; import java.util.Random; import junit.framework.TestCase; public class TestNioFilePerf extends TestCase { static final String FILENAME = "testfile.dat"; static final int BLOCKSIZE = 2048; static final int NBLOCKS = 2048; // 4 mb file static final int NREADS = 50; static final int NTHREADS = 3; static { System.setProperty("org.apache.lucene.CachePercent","90"); } public void setUp() throws Exception { FileOutputStream f = new FileOutputStream(FILENAME); Random r = new Random(); byte[] block = new byte[BLOCKSIZE]; for(int i=0;ipackage org.apache.lucene.util; /** * a read/write lock. allows unlimited simultaneos readers, or a single writer. A thread with * the "wrte" lock implictly owns a read lock as well. */ public class ReadWriteLock { int readlocks = 0; int writelocks = 0; Thread writethread = null; public synchronized void readLock() { while(true) { if(writelocks==0 || (Thread.currentThread()==writethread) ) { readlocks++; return; } else { try { wait(); } catch (InterruptedException e) { } } } } public synchronized void readUnlock() { readlocks--; notifyAll(); } public synchronized void writeLock() { while(true) { if(tryWriteLock()) return; try { wait(); } catch (InterruptedException e) { } } } /** * try to get the write lock * * @return true if the write lock could be acquired, else false */ public synchronized boolean tryWriteLock() { if(readlocks==0 && (writelocks==0 || writethread == Thread.currentThread())) { writethread = Thread.currentThread(); writelocks++; return true; } return false; } public synchronized void writeUnlock() { if(writelocks==0) throw new IllegalStateException("caller does not own write lock"); if(--writelocks==0) writethread=null; notifyAll(); } /** * checks if the calling thread owns the write lock * * @return true if the calling thread owns the write lock */ public synchronized boolean ownsWriteLock() { return Thread.currentThread()==writethread; } } package org.apache.lucene.util; import java.io.*; import java.nio.ByteBuffer; import java.nio.channels.FileChannel; /** * wrapper for NIO FileChannel in order to circumvent problems with multiple threads reading the * same FileChannel, and to provide local cache. The current Windows implementation of FileChannel * has some synchronization even when performing positioned reads. See JDK bug #6265734. * * The NioFile contains internal caching to red
[jira] Created: (LUCENE-480) NullPointerException during IndexWriter.mergeSegments
NullPointerException during IndexWriter.mergeSegments - Key: LUCENE-480 URL: http://issues.apache.org/jira/browse/LUCENE-480 Project: Lucene - Java Type: Bug Components: Index Versions: CVS Nightly - Specify date in submission, 1.9 Environment: 64bit, ubuntu, Java 5 SE Reporter: Jeremy Calvert Last commit on culprit org.apache.lucene.index.FieldsReader: Sun Oct 30 05:38:46 2005. - Offending code in FieldsReader.java: ... final Document doc(int n) throws IOException { indexStream.seek(n * 8L); long position = indexStream.readLong(); fieldsStream.seek(position); Document doc = new Document(); int numFields = fieldsStream.readVInt(); for (int i = 0; i < numFields; i++) { int fieldNumber = fieldsStream.readVInt(); FieldInfo fi = fieldInfos.fieldInfo(fieldNumber); // // This apparently returns null, presumably either as a result of: // catch (IndexOutOfBoundsException ioobe) { // return null; //} // in fieldInfos.fieldInfo(int fieldNumber) // - or - // because there's a null member of member ArrayList byNumber of FieldInfos byte bits = fieldsStream.readByte(); boolean compressed = (bits & FieldsWriter.FIELD_IS_COMPRESSED) != 0; Field.Store store = Field.Store.YES; // // Here --v is where the NPE is thrown. if (fi.isIndexed && tokenize) index = Field.Index.TOKENIZED; ... - Proposed Patch: I'm not sure what the behavior should be in this case, but if it's no big deal that there's null field info for an index and we should just ignore that index, an obvious patch could be: In FieldsReader.java: ... for (int i = 0; i < numFields; i++) { int fieldNumber = fieldsStream.readVInt(); FieldInfo fi = fieldInfos.fieldInfo(fieldNumber); //vvvPatchvvv if(fi == null) {continue;} byte bits = fieldsStream.readByte(); ... - Other observations: In my search prior to submitting this issue, I found LUCENE-168, which looks similar, and is perhaps related, but if so, I'm not sure exactly how. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: NioFile cache performance
As a follow-up... The real performance benefit comes in a shared server environment, where the Lucene process runs along side other processes - i.e. competes for the use of the OS file cache. Since the Lucene process can be configured with a dedicated memory pool, using facilities like NioFile allows for an large dedicated application cache - similar to how databases buffer data/index blocks and don't rely on the OS to do so. If the Lucene process (we wrap Lucene in a server "process") is the "only" process on the server, the OS cache will likely perform well-enough for most applications. I will attempt to get some performance numbers using/not using NioFile performing actual Lucene queries. -Original Message- From: Robert Engels [mailto:[EMAIL PROTECTED] Sent: Thursday, December 08, 2005 10:37 AM To: Lucene-Dev Subject: NioFile cache performance I finally got around to writing a testcase to verify the numbers I presented. The following testcase and results are for the lowest level disk operations. On my machine reading from the cache, vs. going to disk (even when the data is in the OS cache) is 30%-40% faster. Since Lucene makes extensive use of disk IO and often reads the same data (e.g. reading the terms), a localized user-level cache can provide significant performance benefits. Using a 4mb file (so I could be "guarantee" the disk data would be in the OS cache as well), the test shows the following results. Most of the CPU time is actually used during the synchronization with multiple threads. I hacked together a version of MemoryLRUCache that used a ConcurrentHashMap from JDK 1.5, and it was another 50% faster ! At a minimum, if the ReadWriteLock class was modified to use the 1.5 facilities some significant additional performance gains should be realized. filesize is 4194304 non-cached time = 10578, avg = 0.010578 non-cached threaded (3 threads) time = 32094, avg = 0.010698 cached time = 6125, avg = 0.006125 cache hits 996365 cache misses 3635 cached threaded (3 threads) time = 20734, avg = 0.0069116 cache hits 3989089 cache misses 10911 When using the shared test (which is more like the lucene usage, since a single "file" is shared by multiple threads), the difference is even more dramatic with multiple threads (since the cache size is effectively reduced by the number of threads). This test also shows the value of using multiple file handles when using multiple threads to read a single file (rather than using a shared file handle). filesize is 4194304 non-cached time = 10594, avg = 0.010594 non-cached threaded (3 threads) time = 42110, avg = 0.014036 cached time = 6047, avg = 0.006047 cache hits 996827 cache misses 3173 cached threaded (3 threads) time = 20079, avg = 0.006693 cache hits 3995776 cache misses 4224
[jira] Commented: (LUCENE-480) NullPointerException during IndexWriter.mergeSegments
[ http://issues.apache.org/jira/browse/LUCENE-480?page=comments#action_12359750 ] Yonik Seeley commented on LUCENE-480: - Is this possible to reproduce in a testcase you can add here? FieldInfo should never be null AFAIK, so I'd rather get to the root cause of the problem rather than covering it up. > NullPointerException during IndexWriter.mergeSegments > - > > Key: LUCENE-480 > URL: http://issues.apache.org/jira/browse/LUCENE-480 > Project: Lucene - Java > Type: Bug > Components: Index > Versions: CVS Nightly - Specify date in submission, 1.9 > Environment: 64bit, ubuntu, Java 5 SE > Reporter: Jeremy Calvert > > Last commit on culprit org.apache.lucene.index.FieldsReader: Sun Oct 30 > 05:38:46 2005. > - > Offending code in FieldsReader.java: > ... > final Document doc(int n) throws IOException { > indexStream.seek(n * 8L); > long position = indexStream.readLong(); > fieldsStream.seek(position); > Document doc = new Document(); > int numFields = fieldsStream.readVInt(); > for (int i = 0; i < numFields; i++) { > int fieldNumber = fieldsStream.readVInt(); > FieldInfo fi = fieldInfos.fieldInfo(fieldNumber); > // > // This apparently returns null, presumably either as a result of: > // catch (IndexOutOfBoundsException ioobe) { > // return null; > //} > // in fieldInfos.fieldInfo(int fieldNumber) > // - or - > // because there's a null member of member ArrayList byNumber of FieldInfos > byte bits = fieldsStream.readByte(); > > boolean compressed = (bits & FieldsWriter.FIELD_IS_COMPRESSED) != 0; > > Field.Store store = Field.Store.YES; > // > // Here --v is where the NPE is thrown. > if (fi.isIndexed && tokenize) > index = Field.Index.TOKENIZED; > ... > - > Proposed Patch: > I'm not sure what the behavior should be in this case, but if it's no big > deal that there's null field info for an index and we should just ignore that > index, an obvious patch could be: > In FieldsReader.java: > ... > for (int i = 0; i < numFields; i++) { > int fieldNumber = fieldsStream.readVInt(); > FieldInfo fi = fieldInfos.fieldInfo(fieldNumber); > //vvvPatchvvv > if(fi == null) {continue;} > byte bits = fieldsStream.readByte(); > ... > - > Other observations: > In my search prior to submitting this issue, I found LUCENE-168, which looks > similar, and is perhaps related, but if so, I'm not sure exactly how. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-480) NullPointerException during IndexWriter.mergeSegments
[ http://issues.apache.org/jira/browse/LUCENE-480?page=comments#action_12359752 ] Jeremy Calvert commented on LUCENE-480: --- Sure, let me try and put that together. > NullPointerException during IndexWriter.mergeSegments > - > > Key: LUCENE-480 > URL: http://issues.apache.org/jira/browse/LUCENE-480 > Project: Lucene - Java > Type: Bug > Components: Index > Versions: CVS Nightly - Specify date in submission, 1.9 > Environment: 64bit, ubuntu, Java 5 SE > Reporter: Jeremy Calvert > > Last commit on culprit org.apache.lucene.index.FieldsReader: Sun Oct 30 > 05:38:46 2005. > - > Offending code in FieldsReader.java: > ... > final Document doc(int n) throws IOException { > indexStream.seek(n * 8L); > long position = indexStream.readLong(); > fieldsStream.seek(position); > Document doc = new Document(); > int numFields = fieldsStream.readVInt(); > for (int i = 0; i < numFields; i++) { > int fieldNumber = fieldsStream.readVInt(); > FieldInfo fi = fieldInfos.fieldInfo(fieldNumber); > // > // This apparently returns null, presumably either as a result of: > // catch (IndexOutOfBoundsException ioobe) { > // return null; > //} > // in fieldInfos.fieldInfo(int fieldNumber) > // - or - > // because there's a null member of member ArrayList byNumber of FieldInfos > byte bits = fieldsStream.readByte(); > > boolean compressed = (bits & FieldsWriter.FIELD_IS_COMPRESSED) != 0; > > Field.Store store = Field.Store.YES; > // > // Here --v is where the NPE is thrown. > if (fi.isIndexed && tokenize) > index = Field.Index.TOKENIZED; > ... > - > Proposed Patch: > I'm not sure what the behavior should be in this case, but if it's no big > deal that there's null field info for an index and we should just ignore that > index, an obvious patch could be: > In FieldsReader.java: > ... > for (int i = 0; i < numFields; i++) { > int fieldNumber = fieldsStream.readVInt(); > FieldInfo fi = fieldInfos.fieldInfo(fieldNumber); > //vvvPatchvvv > if(fi == null) {continue;} > byte bits = fieldsStream.readByte(); > ... > - > Other observations: > In my search prior to submitting this issue, I found LUCENE-168, which looks > similar, and is perhaps related, but if so, I'm not sure exactly how. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-480) NullPointerException during IndexWriter.mergeSegments
[ http://issues.apache.org/jira/browse/LUCENE-480?page=comments#action_12359755 ] Jeremy Calvert commented on LUCENE-480: --- A little more data: int fieldNumber = fieldsStream.readVInt(); on line 68 of FieldsReader.java results in fieldNumber = 221997 for my particular fieldsStream, so it would seem that my proposed patch would indeed just gloss over a larger problem wherein the fieldsStream is getting corrupted. On the other hand, having this cause an NPE seems less than ideal. Is there some way to throw an exception that's more indicative of the stream corruption? In any case, I'm tracing back how this happened in the first place. I would simply give you the code and data to reproduce it, but the data is ~500M worth. Stay tuned! > NullPointerException during IndexWriter.mergeSegments > - > > Key: LUCENE-480 > URL: http://issues.apache.org/jira/browse/LUCENE-480 > Project: Lucene - Java > Type: Bug > Components: Index > Versions: CVS Nightly - Specify date in submission, 1.9 > Environment: 64bit, ubuntu, Java 5 SE > Reporter: Jeremy Calvert > > Last commit on culprit org.apache.lucene.index.FieldsReader: Sun Oct 30 > 05:38:46 2005. > - > Offending code in FieldsReader.java: > ... > final Document doc(int n) throws IOException { > indexStream.seek(n * 8L); > long position = indexStream.readLong(); > fieldsStream.seek(position); > Document doc = new Document(); > int numFields = fieldsStream.readVInt(); > for (int i = 0; i < numFields; i++) { > int fieldNumber = fieldsStream.readVInt(); > FieldInfo fi = fieldInfos.fieldInfo(fieldNumber); > // > // This apparently returns null, presumably either as a result of: > // catch (IndexOutOfBoundsException ioobe) { > // return null; > //} > // in fieldInfos.fieldInfo(int fieldNumber) > // - or - > // because there's a null member of member ArrayList byNumber of FieldInfos > byte bits = fieldsStream.readByte(); > > boolean compressed = (bits & FieldsWriter.FIELD_IS_COMPRESSED) != 0; > > Field.Store store = Field.Store.YES; > // > // Here --v is where the NPE is thrown. > if (fi.isIndexed && tokenize) > index = Field.Index.TOKENIZED; > ... > - > Proposed Patch: > I'm not sure what the behavior should be in this case, but if it's no big > deal that there's null field info for an index and we should just ignore that > index, an obvious patch could be: > In FieldsReader.java: > ... > for (int i = 0; i < numFields; i++) { > int fieldNumber = fieldsStream.readVInt(); > FieldInfo fi = fieldInfos.fieldInfo(fieldNumber); > //vvvPatchvvv > if(fi == null) {continue;} > byte bits = fieldsStream.readByte(); > ... > - > Other observations: > In my search prior to submitting this issue, I found LUCENE-168, which looks > similar, and is perhaps related, but if so, I'm not sure exactly how. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: NioFile cache performance
Most of the CPU time is actually used during the synchronization with multiple threads. I hacked together a version of MemoryLRUCache that used a ConcurrentHashMap from JDK 1.5, and it was another 50% faster ! At a minimum, if the ReadWriteLock class was modified to use the 1.5 facilities some significant additional performance gains should be realized. Would you be able to run the same test in JDK 1.4 but use the util.concurrent compatibility pack? (supposedly the same classes in Java5) It would be nice to verify whether the gain is the result of the different ConcurrentHashMap vs the different JDK itself.Paul Smith smime.p7s Description: S/MIME cryptographic signature
[jira] Resolved: (LUCENE-479) MultiReader.numDocs incorrect after undeleteAll
[ http://issues.apache.org/jira/browse/LUCENE-479?page=all ] Doug Cutting resolved LUCENE-479: - Fix Version: 1.9 Resolution: Fixed I committed this. Thanks! > MultiReader.numDocs incorrect after undeleteAll > --- > > Key: LUCENE-479 > URL: http://issues.apache.org/jira/browse/LUCENE-479 > Project: Lucene - Java > Type: Bug > Components: Index > Versions: CVS Nightly - Specify date in submission > Reporter: Robert Kirchgessner (JIRA) > Priority: Minor > Fix For: 1.9 > Attachments: undeleteAll.patch > > Calling MultiReader.undeleteAll does not clear cached numDocs value. So the > subsequent numDocs() call returns a wrong value if there were deleted > documents in the index. Following patch fixes the bug and adds a test showing > the issue. > Index: src/test/org/apache/lucene/index/TestMultiReader.java > === > --- src/test/org/apache/lucene/index/TestMultiReader.java (revision > 354923) > +++ src/test/org/apache/lucene/index/TestMultiReader.java (working copy) > @@ -69,6 +69,18 @@ > assertTrue(vector != null); > TestSegmentReader.checkNorms(reader); >} > + > + public void testUndeleteAll() throws IOException { > +sis.read(dir); > +MultiReader reader = new MultiReader(dir, sis, false, readers); > +assertTrue(reader != null); > +assertEquals( 2, reader.numDocs() ); > +reader.delete(0); > +assertEquals( 1, reader.numDocs() ); > +reader.undeleteAll(); > +assertEquals( 2, reader.numDocs() ); > + } > + >public void testTermVectors() { > MultiReader reader = new MultiReader(dir, sis, false, readers); > Index: src/java/org/apache/lucene/index/MultiReader.java > === > --- src/java/org/apache/lucene/index/MultiReader.java (revision 354923) > +++ src/java/org/apache/lucene/index/MultiReader.java (working copy) > @@ -122,6 +122,7 @@ > for (int i = 0; i < subReaders.length; i++) >subReaders[i].undeleteAll(); > hasDeletions = false; > +numDocs = -1; // invalidate cache >} >private int readerIndex(int n) {// find reader for doc n: -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: NioFile cache performance
I modified MemoryLRUCache to use the attached ConcurrentHashMap.java and ran under 1.4.2_10 filesize is 4194304 non-cached time = 11140, avg = 0.01114 non-cached threaded (3 threads) time = 35485, avg = 0.011828 cached time = 6109, avg = 0.006109 cache hits 996138 cache misses 3862 cached threaded (3 threads) time = 17281, avg = 0.0057605 cache hits 3985911 cache misses 14089 with the shared test filesize is 4194304 non-cached time = 11266, avg = 0.011266 non-cached threaded (3 threads) time = 46734, avg = 0.015578 cached time = 6094, avg = 0.006094 cache hits 996133 cache misses 3867 cached threaded (3 threads) time = 16500, avg = 0.0055 cache hits 3994999 cache misses 5001 I then ran the tests using jdk 1.5.0_06 using the built-in ConcurrentHashMap filesize is 4194304 non-cached time = 10515, avg = 0.010515 non-cached threaded (3 threads) time = 30688, avg = 0.010229 cached time = 7031, avg = 0.007031 cache hits 996742 cache misses 3258 cached threaded (3 threads) time = 17468, avg = 0.0058226667 cache hits 3989122 cache misses 10878 with the shared test filesize is 4194304 non-cached time = 10187, avg = 0.010187 non-cached threaded (3 threads) time = 44000, avg = 0.014666 cached time = 6234, avg = 0.006234 cache hits 996315 cache misses 3685 cached threaded (3 threads) time = 16766, avg = 0.005588 cache hits 3995081 cache misses 4919 surprisingly the 1.4.2_10 version performed as well (if not better) than the jdk 1.5 version. Also, I am only running on a single processor box (non-hyper threaded), so it would be interesting to see the numbers on a true multi-processor box. My thinking is that the cached version will be MUCH faster than the non, as many more context switches into the OS will be avoided. -Original Message-From: Paul Smith [mailto:[EMAIL PROTECTED]Sent: Thursday, December 08, 2005 1:54 PMTo: java-dev@lucene.apache.orgSubject: Re: NioFile cache performance Most of the CPU time is actually used during the synchronization with multiple threads. I hacked together a version of MemoryLRUCache that used a ConcurrentHashMap from JDK 1.5, and it was another 50% faster ! At a minimum, if the ReadWriteLock class was modified to use the 1.5 facilities some significant additional performance gains should be realized. Would you be able to run the same test in JDK 1.4 but use the util.concurrent compatibility pack? (supposedly the same classes in Java5) It would be nice to verify whether the gain is the result of the different ConcurrentHashMap vs the different JDK itself. Paul Smith /* File: ConcurrentHashMap Written by Doug Lea. Adapted and released, under explicit permission, from JDK1.2 HashMap.java and Hashtable.java which carries the following copyright: * Copyright 1997 by Sun Microsystems, Inc., * 901 San Antonio Road, Palo Alto, California, 94303, U.S.A. * All rights reserved. * * This software is the confidential and proprietary information * of Sun Microsystems, Inc. ("Confidential Information"). You * shall not disclose such Confidential Information and shall use * it only in accordance with the terms of the license agreement * you entered into with Sun. History: Date WhoWhat 26nov2000 dl Created, based on ConcurrentReaderHashMap 12jan2001 dl public release 17nov2001 dl Minor tunings 24oct2003 dl Segment implements Serializable 23jun2004 dl Avoid bad array sizings in view toArray methods */ package org.apache.lucene.util; import java.util.Map; import java.util.AbstractMap; import java.util.AbstractSet; import java.util.AbstractCollection; import java.util.Collection; import java.util.Set; import java.util.ArrayList; import java.util.Iterator; import java.util.Enumeration; import java.util.NoSuchElementException; import java.io.Serializable; import java.io.IOException; import java.io.ObjectInputStream; import java.io.ObjectOutputStream; /** * A version of Hashtable supporting * concurrency for both retrievals and updates: * * * Retrievals * * Retrievals may overlap updates. (This is the same policy as * ConcurrentReaderHashMap.) Successful retrievals using get(key) and * containsKey(key) usually run without locking. Unsuccessful * retrievals (i.e., when the key is not present) do involve brief * synchronization (locking). Because retrieval operations can * ordinarily overlap with update operations (i.e., put, remove, and * their derivatives), retrievals can only be guaranteed to return the * results of the most recently completed operations holding * upon their onset. Retrieval operations may or may not return * results reflecting in-progress writing operations. However, the * retrieval operations do always return consistent r
[jira] Commented: (LUCENE-480) NullPointerException during IndexWriter.mergeSegments
[ http://issues.apache.org/jira/browse/LUCENE-480?page=comments#action_12359810 ] Jeremy Calvert commented on LUCENE-480: --- Apparently my hardware or filesystem is having some difficulties, which could be the reason the fieldsStream is corrupt. I apologize for the false alarm and sincerely appreciate the quick feedback. # dmesg ... PCI-DMA: Out of IOMMU space for 180224 bytes at device :00:07.0 end_request: I/O error, dev sda, sector 52463038 printk: 1014 messages suppressed. Buffer I/O error on device md0, logical block 21106784 > NullPointerException during IndexWriter.mergeSegments > - > > Key: LUCENE-480 > URL: http://issues.apache.org/jira/browse/LUCENE-480 > Project: Lucene - Java > Type: Bug > Components: Index > Versions: CVS Nightly - Specify date in submission, 1.9 > Environment: 64bit, ubuntu, Java 5 SE > Reporter: Jeremy Calvert > > Last commit on culprit org.apache.lucene.index.FieldsReader: Sun Oct 30 > 05:38:46 2005. > - > Offending code in FieldsReader.java: > ... > final Document doc(int n) throws IOException { > indexStream.seek(n * 8L); > long position = indexStream.readLong(); > fieldsStream.seek(position); > Document doc = new Document(); > int numFields = fieldsStream.readVInt(); > for (int i = 0; i < numFields; i++) { > int fieldNumber = fieldsStream.readVInt(); > FieldInfo fi = fieldInfos.fieldInfo(fieldNumber); > // > // This apparently returns null, presumably either as a result of: > // catch (IndexOutOfBoundsException ioobe) { > // return null; > //} > // in fieldInfos.fieldInfo(int fieldNumber) > // - or - > // because there's a null member of member ArrayList byNumber of FieldInfos > byte bits = fieldsStream.readByte(); > > boolean compressed = (bits & FieldsWriter.FIELD_IS_COMPRESSED) != 0; > > Field.Store store = Field.Store.YES; > // > // Here --v is where the NPE is thrown. > if (fi.isIndexed && tokenize) > index = Field.Index.TOKENIZED; > ... > - > Proposed Patch: > I'm not sure what the behavior should be in this case, but if it's no big > deal that there's null field info for an index and we should just ignore that > index, an obvious patch could be: > In FieldsReader.java: > ... > for (int i = 0; i < numFields; i++) { > int fieldNumber = fieldsStream.readVInt(); > FieldInfo fi = fieldInfos.fieldInfo(fieldNumber); > //vvvPatchvvv > if(fi == null) {continue;} > byte bits = fieldsStream.readByte(); > ... > - > Other observations: > In my search prior to submitting this issue, I found LUCENE-168, which looks > similar, and is perhaps related, but if so, I'm not sure exactly how. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Closed: (LUCENE-480) NullPointerException during IndexWriter.mergeSegments
[ http://issues.apache.org/jira/browse/LUCENE-480?page=all ] Yonik Seeley closed LUCENE-480: --- Resolution: Invalid Assign To: Yonik Seeley No problem... glad to hear it will be an easy fix :-) > NullPointerException during IndexWriter.mergeSegments > - > > Key: LUCENE-480 > URL: http://issues.apache.org/jira/browse/LUCENE-480 > Project: Lucene - Java > Type: Bug > Components: Index > Versions: CVS Nightly - Specify date in submission, 1.9 > Environment: 64bit, ubuntu, Java 5 SE > Reporter: Jeremy Calvert > Assignee: Yonik Seeley > > Last commit on culprit org.apache.lucene.index.FieldsReader: Sun Oct 30 > 05:38:46 2005. > - > Offending code in FieldsReader.java: > ... > final Document doc(int n) throws IOException { > indexStream.seek(n * 8L); > long position = indexStream.readLong(); > fieldsStream.seek(position); > Document doc = new Document(); > int numFields = fieldsStream.readVInt(); > for (int i = 0; i < numFields; i++) { > int fieldNumber = fieldsStream.readVInt(); > FieldInfo fi = fieldInfos.fieldInfo(fieldNumber); > // > // This apparently returns null, presumably either as a result of: > // catch (IndexOutOfBoundsException ioobe) { > // return null; > //} > // in fieldInfos.fieldInfo(int fieldNumber) > // - or - > // because there's a null member of member ArrayList byNumber of FieldInfos > byte bits = fieldsStream.readByte(); > > boolean compressed = (bits & FieldsWriter.FIELD_IS_COMPRESSED) != 0; > > Field.Store store = Field.Store.YES; > // > // Here --v is where the NPE is thrown. > if (fi.isIndexed && tokenize) > index = Field.Index.TOKENIZED; > ... > - > Proposed Patch: > I'm not sure what the behavior should be in this case, but if it's no big > deal that there's null field info for an index and we should just ignore that > index, an obvious patch could be: > In FieldsReader.java: > ... > for (int i = 0; i < numFields; i++) { > int fieldNumber = fieldsStream.readVInt(); > FieldInfo fi = fieldInfos.fieldInfo(fieldNumber); > //vvvPatchvvv > if(fi == null) {continue;} > byte bits = fieldsStream.readByte(); > ... > - > Other observations: > In my search prior to submitting this issue, I found LUCENE-168, which looks > similar, and is perhaps related, but if so, I'm not sure exactly how. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]