NioFile cache performance

2005-12-08 Thread Robert Engels



I finally got around 
to writing a testcase to verify the numbers I presented. The following testcase 
and results are for the lowest level disk operations. On my machine reading from 
the cache, vs. going to disk (even when the data is in the OS cache) is 30%-40% 
faster. Since Lucene makes extensive use of disk IO and often reads the same 
data (e.g. reading the terms), a localized user-level cache can provide 
significant performance benefits.
 
Using a 4mb file (so 
I could be "guarantee" the disk data would be in the OS cache as well), the test 
shows the following results.
 
Most of the CPU time 
is actually used during the synchronization with multiple threads. I hacked 
together a version of MemoryLRUCache that used a ConcurrentHashMap from JDK 1.5, 
and it was another 50% faster ! At a minimum, if the ReadWriteLock class was 
modified to use the 1.5 facilities some significant additional performance 
gains should be realized.
 

filesize is 4194304
non-cached time = 10578, avg = 0.010578
non-cached threaded (3 threads) time = 32094, avg = 0.010698
cached time = 6125, avg = 0.006125
cache hits 996365
cache misses 3635
cached threaded (3 threads) time = 20734, avg = 0.0069116
cache hits 3989089
cache misses 10911
When using the shared test (which is more like 
the lucene usage, since a single "file" is shared by multiple threads), the 
difference is even more dramatic with multiple threads (since the cache size is 
effectively reduced by the number of threads). This test also shows the value of 
using multiple file handles when using multiple threads to read a single file 
(rather than using a shared file handle).
filesize is 4194304
non-cached time = 10594, avg = 0.010594
non-cached threaded (3 threads) time = 42110, avg = 0.014036
cached time = 6047, avg = 0.006047
cache hits 996827
cache misses 3173
cached threaded (3 threads) time = 20079, avg = 0.006693
cache hits 3995776
cache misses 4224
package org.apache.lucene.util;

import java.io.*;
import java.util.Random;

import junit.framework.TestCase;

public class TestNioFilePerf extends TestCase {
static final String FILENAME = "testfile.dat";
static final int BLOCKSIZE = 2048;
static final int NBLOCKS = 2048; // 4 mb file
static final int NREADS = 50;
static final int NTHREADS = 3;

static {
System.setProperty("org.apache.lucene.CachePercent","90");
}

public void setUp() throws Exception {
FileOutputStream f = new FileOutputStream(FILENAME);
Random r = new Random();

byte[] block = new byte[BLOCKSIZE]; 
for(int i=0;ipackage org.apache.lucene.util;

/**
 * a read/write lock. allows unlimited simultaneos readers, or a single writer. A thread with
 * the "wrte" lock implictly owns a read lock as well.
 */
public class ReadWriteLock {
int readlocks = 0;
int writelocks = 0;
Thread writethread = null;

public synchronized void readLock() {
while(true) {
if(writelocks==0 || (Thread.currentThread()==writethread) ) {
readlocks++;
return;
} else {
try {
wait();
} catch (InterruptedException e) {
}
}
}
}

public synchronized void readUnlock() {
readlocks--;
notifyAll();
}

public synchronized void writeLock() {
while(true) {
if(tryWriteLock())
return;
try {
wait();
} catch (InterruptedException e) {
}
}
}

/**
 * try to get the write lock
 *  
 * @return true if the write lock could be acquired, else false
 */
public synchronized boolean tryWriteLock() {
if(readlocks==0 && (writelocks==0 || writethread == Thread.currentThread())) {
writethread = Thread.currentThread();
writelocks++;
return true;
}
return false;
}

public synchronized void writeUnlock() {
if(writelocks==0)
throw new IllegalStateException("caller does not own write lock");
if(--writelocks==0)
writethread=null;
notifyAll();
}

/**
 * checks if the calling thread owns the write lock
 * 
 * @return true if the calling thread owns the write lock
 */
public synchronized boolean ownsWriteLock() {
return Thread.currentThread()==writethread;
}
}
package org.apache.lucene.util;

import java.io.*;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;

/**
 * wrapper for NIO FileChannel in order to circumvent problems with multiple threads reading the
 * same FileChannel, and to provide local cache. The current Windows implementation of FileChannel
 * has some synchronization even when performing positioned reads. See JDK bug #6265734.
 * 
 * The NioFile contains internal caching to red

[jira] Created: (LUCENE-480) NullPointerException during IndexWriter.mergeSegments

2005-12-08 Thread Jeremy Calvert (JIRA)
NullPointerException during IndexWriter.mergeSegments
-

 Key: LUCENE-480
 URL: http://issues.apache.org/jira/browse/LUCENE-480
 Project: Lucene - Java
Type: Bug
  Components: Index  
Versions: CVS Nightly - Specify date in submission, 1.9
 Environment: 64bit, ubuntu, Java 5 SE
Reporter: Jeremy Calvert


Last commit on culprit org.apache.lucene.index.FieldsReader: Sun Oct 30 
05:38:46 2005.

-
Offending code in FieldsReader.java:

...
  final Document doc(int n) throws IOException {
indexStream.seek(n * 8L);
long position = indexStream.readLong();
fieldsStream.seek(position);

Document doc = new Document();
int numFields = fieldsStream.readVInt();
for (int i = 0; i < numFields; i++) {
  int fieldNumber = fieldsStream.readVInt();
  FieldInfo fi = fieldInfos.fieldInfo(fieldNumber); 
//
// This apparently returns null, presumably either as a result of:
//   catch (IndexOutOfBoundsException ioobe) {
//  return null;
//}
// in fieldInfos.fieldInfo(int fieldNumber)
//  - or -
// because there's a null member of member ArrayList byNumber of FieldInfos

  byte bits = fieldsStream.readByte();
  
  boolean compressed = (bits & FieldsWriter.FIELD_IS_COMPRESSED) != 0;



Field.Store store = Field.Store.YES;
//
// Here --v is where the NPE is thrown.
if (fi.isIndexed && tokenize)
  index = Field.Index.TOKENIZED;
...

-

Proposed Patch:
I'm not sure what the behavior should be in this case, but if it's no big deal 
that there's null field info for an index and we should just ignore that index, 
an obvious patch could be:

In FieldsReader.java:

...
for (int i = 0; i < numFields; i++) {
  int fieldNumber = fieldsStream.readVInt();
  FieldInfo fi = fieldInfos.fieldInfo(fieldNumber); 
//vvvPatchvvv
  if(fi == null) {continue;}

  byte bits = fieldsStream.readByte();
...

-

Other observations:
In my search prior to submitting this issue, I found LUCENE-168, which looks 
similar, and is perhaps related, but if so, I'm not sure exactly how.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: NioFile cache performance

2005-12-08 Thread Robert Engels
As a follow-up...

The real performance benefit comes in a shared server environment, where the
Lucene process runs along side other processes - i.e. competes for the use
of the OS file cache. Since the Lucene process can be configured with a
dedicated memory pool, using facilities like NioFile allows for an large
dedicated application cache - similar to how databases buffer data/index
blocks and don't rely on the OS to do so.

If the Lucene process (we wrap Lucene in a server "process") is the "only"
process on the server, the OS cache will likely perform well-enough for most
applications.

I will attempt to get some performance numbers using/not using NioFile
performing actual Lucene queries.
  -Original Message-
  From: Robert Engels [mailto:[EMAIL PROTECTED]
  Sent: Thursday, December 08, 2005 10:37 AM
  To: Lucene-Dev
  Subject: NioFile cache performance


  I finally got around to writing a testcase to verify the numbers I
presented. The following testcase and results are for the lowest level disk
operations. On my machine reading from the cache, vs. going to disk (even
when the data is in the OS cache) is 30%-40% faster. Since Lucene makes
extensive use of disk IO and often reads the same data (e.g. reading the
terms), a localized user-level cache can provide significant performance
benefits.

  Using a 4mb file (so I could be "guarantee" the disk data would be in the
OS cache as well), the test shows the following results.

  Most of the CPU time is actually used during the synchronization with
multiple threads. I hacked together a version of MemoryLRUCache that used a
ConcurrentHashMap from JDK 1.5, and it was another 50% faster ! At a
minimum, if the ReadWriteLock class was modified to use the 1.5 facilities
some significant additional performance gains should be realized.

  filesize is 4194304

  non-cached time = 10578, avg = 0.010578

  non-cached threaded (3 threads) time = 32094, avg = 0.010698

  cached time = 6125, avg = 0.006125

  cache hits 996365

  cache misses 3635

  cached threaded (3 threads) time = 20734, avg = 0.0069116

  cache hits 3989089

  cache misses 10911

  When using the shared test (which is more like the lucene usage, since a
single "file" is shared by multiple threads), the difference is even more
dramatic with multiple threads (since the cache size is effectively reduced
by the number of threads). This test also shows the value of using multiple
file handles when using multiple threads to read a single file (rather than
using a shared file handle).

  filesize is 4194304

  non-cached time = 10594, avg = 0.010594

  non-cached threaded (3 threads) time = 42110, avg = 0.014036

  cached time = 6047, avg = 0.006047

  cache hits 996827

  cache misses 3173

  cached threaded (3 threads) time = 20079, avg = 0.006693

  cache hits 3995776

  cache misses 4224


[jira] Commented: (LUCENE-480) NullPointerException during IndexWriter.mergeSegments

2005-12-08 Thread Yonik Seeley (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-480?page=comments#action_12359750 ] 

Yonik Seeley commented on LUCENE-480:
-

Is this possible to reproduce in a testcase you can add here?
FieldInfo should never be null AFAIK, so  I'd rather get to the root cause of 
the problem rather than covering it up.

> NullPointerException during IndexWriter.mergeSegments
> -
>
>  Key: LUCENE-480
>  URL: http://issues.apache.org/jira/browse/LUCENE-480
>  Project: Lucene - Java
> Type: Bug
>   Components: Index
> Versions: CVS Nightly - Specify date in submission, 1.9
>  Environment: 64bit, ubuntu, Java 5 SE
> Reporter: Jeremy Calvert

>
> Last commit on culprit org.apache.lucene.index.FieldsReader: Sun Oct 30 
> 05:38:46 2005.
> -
> Offending code in FieldsReader.java:
> ...
>   final Document doc(int n) throws IOException {
> indexStream.seek(n * 8L);
> long position = indexStream.readLong();
> fieldsStream.seek(position);
> Document doc = new Document();
> int numFields = fieldsStream.readVInt();
> for (int i = 0; i < numFields; i++) {
>   int fieldNumber = fieldsStream.readVInt();
>   FieldInfo fi = fieldInfos.fieldInfo(fieldNumber); 
> //
> // This apparently returns null, presumably either as a result of:
> //   catch (IndexOutOfBoundsException ioobe) {
> //  return null;
> //}
> // in fieldInfos.fieldInfo(int fieldNumber)
> //  - or -
> // because there's a null member of member ArrayList byNumber of FieldInfos
>   byte bits = fieldsStream.readByte();
>   
>   boolean compressed = (bits & FieldsWriter.FIELD_IS_COMPRESSED) != 0;
> 
> Field.Store store = Field.Store.YES;
> //
> // Here --v is where the NPE is thrown.
> if (fi.isIndexed && tokenize)
>   index = Field.Index.TOKENIZED;
> ...
> -
> Proposed Patch:
> I'm not sure what the behavior should be in this case, but if it's no big 
> deal that there's null field info for an index and we should just ignore that 
> index, an obvious patch could be:
> In FieldsReader.java:
> ...
> for (int i = 0; i < numFields; i++) {
>   int fieldNumber = fieldsStream.readVInt();
>   FieldInfo fi = fieldInfos.fieldInfo(fieldNumber); 
> //vvvPatchvvv
>   if(fi == null) {continue;}
>   byte bits = fieldsStream.readByte();
> ...
> -
> Other observations:
> In my search prior to submitting this issue, I found LUCENE-168, which looks 
> similar, and is perhaps related, but if so, I'm not sure exactly how.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-480) NullPointerException during IndexWriter.mergeSegments

2005-12-08 Thread Jeremy Calvert (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-480?page=comments#action_12359752 ] 

Jeremy Calvert commented on LUCENE-480:
---

Sure, let me try and put that together.

> NullPointerException during IndexWriter.mergeSegments
> -
>
>  Key: LUCENE-480
>  URL: http://issues.apache.org/jira/browse/LUCENE-480
>  Project: Lucene - Java
> Type: Bug
>   Components: Index
> Versions: CVS Nightly - Specify date in submission, 1.9
>  Environment: 64bit, ubuntu, Java 5 SE
> Reporter: Jeremy Calvert

>
> Last commit on culprit org.apache.lucene.index.FieldsReader: Sun Oct 30 
> 05:38:46 2005.
> -
> Offending code in FieldsReader.java:
> ...
>   final Document doc(int n) throws IOException {
> indexStream.seek(n * 8L);
> long position = indexStream.readLong();
> fieldsStream.seek(position);
> Document doc = new Document();
> int numFields = fieldsStream.readVInt();
> for (int i = 0; i < numFields; i++) {
>   int fieldNumber = fieldsStream.readVInt();
>   FieldInfo fi = fieldInfos.fieldInfo(fieldNumber); 
> //
> // This apparently returns null, presumably either as a result of:
> //   catch (IndexOutOfBoundsException ioobe) {
> //  return null;
> //}
> // in fieldInfos.fieldInfo(int fieldNumber)
> //  - or -
> // because there's a null member of member ArrayList byNumber of FieldInfos
>   byte bits = fieldsStream.readByte();
>   
>   boolean compressed = (bits & FieldsWriter.FIELD_IS_COMPRESSED) != 0;
> 
> Field.Store store = Field.Store.YES;
> //
> // Here --v is where the NPE is thrown.
> if (fi.isIndexed && tokenize)
>   index = Field.Index.TOKENIZED;
> ...
> -
> Proposed Patch:
> I'm not sure what the behavior should be in this case, but if it's no big 
> deal that there's null field info for an index and we should just ignore that 
> index, an obvious patch could be:
> In FieldsReader.java:
> ...
> for (int i = 0; i < numFields; i++) {
>   int fieldNumber = fieldsStream.readVInt();
>   FieldInfo fi = fieldInfos.fieldInfo(fieldNumber); 
> //vvvPatchvvv
>   if(fi == null) {continue;}
>   byte bits = fieldsStream.readByte();
> ...
> -
> Other observations:
> In my search prior to submitting this issue, I found LUCENE-168, which looks 
> similar, and is perhaps related, but if so, I'm not sure exactly how.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-480) NullPointerException during IndexWriter.mergeSegments

2005-12-08 Thread Jeremy Calvert (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-480?page=comments#action_12359755 ] 

Jeremy Calvert commented on LUCENE-480:
---

A little more data: 

  int fieldNumber = fieldsStream.readVInt();

on line 68 of FieldsReader.java results in fieldNumber = 221997 for my 
particular fieldsStream, so it would seem that my proposed patch would indeed 
just gloss over a larger problem wherein the fieldsStream is getting corrupted.

On the other hand, having this cause an NPE seems less than ideal.  Is there 
some way to throw an exception that's more indicative of the stream corruption?

In any case, I'm tracing back how this happened in the first place.   I would 
simply give you the code and data to reproduce it, but the data is ~500M worth.

Stay tuned!


> NullPointerException during IndexWriter.mergeSegments
> -
>
>  Key: LUCENE-480
>  URL: http://issues.apache.org/jira/browse/LUCENE-480
>  Project: Lucene - Java
> Type: Bug
>   Components: Index
> Versions: CVS Nightly - Specify date in submission, 1.9
>  Environment: 64bit, ubuntu, Java 5 SE
> Reporter: Jeremy Calvert

>
> Last commit on culprit org.apache.lucene.index.FieldsReader: Sun Oct 30 
> 05:38:46 2005.
> -
> Offending code in FieldsReader.java:
> ...
>   final Document doc(int n) throws IOException {
> indexStream.seek(n * 8L);
> long position = indexStream.readLong();
> fieldsStream.seek(position);
> Document doc = new Document();
> int numFields = fieldsStream.readVInt();
> for (int i = 0; i < numFields; i++) {
>   int fieldNumber = fieldsStream.readVInt();
>   FieldInfo fi = fieldInfos.fieldInfo(fieldNumber); 
> //
> // This apparently returns null, presumably either as a result of:
> //   catch (IndexOutOfBoundsException ioobe) {
> //  return null;
> //}
> // in fieldInfos.fieldInfo(int fieldNumber)
> //  - or -
> // because there's a null member of member ArrayList byNumber of FieldInfos
>   byte bits = fieldsStream.readByte();
>   
>   boolean compressed = (bits & FieldsWriter.FIELD_IS_COMPRESSED) != 0;
> 
> Field.Store store = Field.Store.YES;
> //
> // Here --v is where the NPE is thrown.
> if (fi.isIndexed && tokenize)
>   index = Field.Index.TOKENIZED;
> ...
> -
> Proposed Patch:
> I'm not sure what the behavior should be in this case, but if it's no big 
> deal that there's null field info for an index and we should just ignore that 
> index, an obvious patch could be:
> In FieldsReader.java:
> ...
> for (int i = 0; i < numFields; i++) {
>   int fieldNumber = fieldsStream.readVInt();
>   FieldInfo fi = fieldInfos.fieldInfo(fieldNumber); 
> //vvvPatchvvv
>   if(fi == null) {continue;}
>   byte bits = fieldsStream.readByte();
> ...
> -
> Other observations:
> In my search prior to submitting this issue, I found LUCENE-168, which looks 
> similar, and is perhaps related, but if so, I'm not sure exactly how.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: NioFile cache performance

2005-12-08 Thread Paul Smith
  Most of the CPU time is actually used during the synchronization with multiple threads. I hacked together a version of MemoryLRUCache that used a ConcurrentHashMap from JDK 1.5, and it was another 50% faster ! At a minimum, if the ReadWriteLock class was modified to use the 1.5 facilities some significant additional performance gains should be realized. Would you be able to run the same test in JDK 1.4 but use the util.concurrent compatibility pack? (supposedly the same classes in Java5)  It would be nice to verify whether the gain is the result of the different ConcurrentHashMap vs the different JDK itself.Paul Smith

smime.p7s
Description: S/MIME cryptographic signature


[jira] Resolved: (LUCENE-479) MultiReader.numDocs incorrect after undeleteAll

2005-12-08 Thread Doug Cutting (JIRA)
 [ http://issues.apache.org/jira/browse/LUCENE-479?page=all ]
 
Doug Cutting resolved LUCENE-479:
-

Fix Version: 1.9
 Resolution: Fixed

I committed this.  Thanks!

> MultiReader.numDocs incorrect after undeleteAll
> ---
>
>  Key: LUCENE-479
>  URL: http://issues.apache.org/jira/browse/LUCENE-479
>  Project: Lucene - Java
> Type: Bug
>   Components: Index
> Versions: CVS Nightly - Specify date in submission
> Reporter: Robert Kirchgessner (JIRA)
> Priority: Minor
>  Fix For: 1.9
>  Attachments: undeleteAll.patch
>
> Calling MultiReader.undeleteAll does not clear cached numDocs value. So the 
> subsequent numDocs() call returns a wrong value if there were deleted 
> documents in the index. Following patch fixes the bug and adds a test showing 
> the issue.
> Index: src/test/org/apache/lucene/index/TestMultiReader.java
> ===
> --- src/test/org/apache/lucene/index/TestMultiReader.java   (revision 
> 354923)
> +++ src/test/org/apache/lucene/index/TestMultiReader.java   (working copy)
> @@ -69,6 +69,18 @@
>  assertTrue(vector != null);
>  TestSegmentReader.checkNorms(reader);
>}
> +
> +  public void testUndeleteAll() throws IOException {
> +sis.read(dir);
> +MultiReader reader = new MultiReader(dir, sis, false, readers);
> +assertTrue(reader != null);
> +assertEquals( 2, reader.numDocs() );
> +reader.delete(0);
> +assertEquals( 1, reader.numDocs() );
> +reader.undeleteAll();
> +assertEquals( 2, reader.numDocs() );
> +  }
> +
>public void testTermVectors() {
>  MultiReader reader = new MultiReader(dir, sis, false, readers);
> Index: src/java/org/apache/lucene/index/MultiReader.java
> ===
> --- src/java/org/apache/lucene/index/MultiReader.java   (revision 354923)
> +++ src/java/org/apache/lucene/index/MultiReader.java   (working copy)
> @@ -122,6 +122,7 @@
>  for (int i = 0; i < subReaders.length; i++)
>subReaders[i].undeleteAll();
>  hasDeletions = false;
> +numDocs = -1;  // invalidate cache
>}
>private int readerIndex(int n) {// find reader for doc n:

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: NioFile cache performance

2005-12-08 Thread Robert Engels



I 
modified MemoryLRUCache to use the attached ConcurrentHashMap.java and ran under 
1.4.2_10

filesize is 4194304
non-cached time = 11140, avg = 0.01114
non-cached threaded (3 threads) time = 35485, avg = 0.011828
cached time = 6109, avg = 0.006109
cache hits 996138
cache misses 3862
cached threaded (3 threads) time = 17281, avg = 0.0057605
cache hits 3985911
cache misses 14089
with 
the shared test
 
filesize is 
4194304

non-cached time = 11266, avg = 0.011266
non-cached threaded (3 threads) time = 46734, avg = 0.015578
cached time = 6094, avg = 0.006094
cache hits 996133
cache misses 3867
cached threaded (3 threads) time = 16500, avg = 0.0055
cache hits 3994999
cache misses 5001
I then 
ran the tests using jdk 1.5.0_06 using the built-in 
ConcurrentHashMap

filesize is 4194304
non-cached time = 10515, avg = 0.010515
non-cached threaded (3 threads) time = 30688, avg = 0.010229
cached time = 7031, avg = 0.007031
cache hits 996742
cache misses 3258
cached threaded (3 threads) time = 17468, avg = 0.0058226667
cache hits 3989122
cache misses 10878
with 
the shared test
 
filesize is 4194304

non-cached time = 10187, avg = 
0.010187
non-cached threaded (3 threads) time = 
44000, avg = 0.014666
cached time = 6234, avg = 
0.006234
cache hits 996315
cache misses 3685
cached threaded (3 threads) time = 
16766, avg = 0.005588
cache hits 3995081
cache misses 4919
surprisingly the 1.4.2_10 version performed as well (if not better) than 
the jdk 1.5 version.
 
Also, I am only running 
on a single processor box (non-hyper threaded), so it would be interesting to 
see the numbers on a true multi-processor box. My thinking is that the cached 
version will be MUCH faster than the non, as many more context switches into the 
OS will be avoided.

  -Original Message-From: Paul Smith 
  [mailto:[EMAIL PROTECTED]Sent: Thursday, December 08, 2005 1:54 
  PMTo: java-dev@lucene.apache.orgSubject: Re: NioFile 
  cache performance
  
  

Most of the CPU 
time is actually used during the synchronization with multiple threads. I 
hacked together a version of MemoryLRUCache that used a ConcurrentHashMap 
from JDK 1.5, and it was another 50% faster ! At a minimum, if the 
ReadWriteLock class was modified to use the 1.5 facilities some significant 
additional performance gains should be 
  realized.
  Would you be able to run the same test in JDK 1.4 but use the 
  util.concurrent compatibility pack? (supposedly the same classes in Java5) It 
  would be nice to verify whether the gain is the result of the different 
  ConcurrentHashMap vs the different JDK itself.
  
  Paul Smith
/*
  File: ConcurrentHashMap

  Written by Doug Lea. Adapted and released, under explicit
  permission, from JDK1.2 HashMap.java and Hashtable.java which
  carries the following copyright:

 * Copyright 1997 by Sun Microsystems, Inc.,
 * 901 San Antonio Road, Palo Alto, California, 94303, U.S.A.
 * All rights reserved.
 *
 * This software is the confidential and proprietary information
 * of Sun Microsystems, Inc. ("Confidential Information").  You
 * shall not disclose such Confidential Information and shall use
 * it only in accordance with the terms of the license agreement
 * you entered into with Sun.

  History:
  Date   WhoWhat
  26nov2000  dl   Created, based on ConcurrentReaderHashMap
  12jan2001  dl   public release
  17nov2001  dl   Minor tunings
  24oct2003  dl   Segment implements Serializable
  23jun2004  dl   Avoid bad array sizings in view toArray methods
*/

package org.apache.lucene.util;

import java.util.Map;
import java.util.AbstractMap;
import java.util.AbstractSet;
import java.util.AbstractCollection;
import java.util.Collection;
import java.util.Set;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.Enumeration;
import java.util.NoSuchElementException;

import java.io.Serializable;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;


/**
 * A version of Hashtable supporting 
 * concurrency for both retrievals and updates:
 *
 *  
 *  Retrievals
 *
 *  Retrievals may overlap updates.  (This is the same policy as
 * ConcurrentReaderHashMap.)  Successful retrievals using get(key) and
 * containsKey(key) usually run without locking. Unsuccessful
 * retrievals (i.e., when the key is not present) do involve brief
 * synchronization (locking).  Because retrieval operations can
 * ordinarily overlap with update operations (i.e., put, remove, and
 * their derivatives), retrievals can only be guaranteed to return the
 * results of the most recently completed operations holding
 * upon their onset. Retrieval operations may or may not return
 * results reflecting in-progress writing operations.  However, the
 * retrieval operations do always return consistent r

[jira] Commented: (LUCENE-480) NullPointerException during IndexWriter.mergeSegments

2005-12-08 Thread Jeremy Calvert (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-480?page=comments#action_12359810 ] 

Jeremy Calvert commented on LUCENE-480:
---

Apparently my hardware or filesystem is having some difficulties, which could 
be the reason the fieldsStream is corrupt. I apologize for the false alarm and 
sincerely appreciate the quick feedback.

# dmesg
...
PCI-DMA: Out of IOMMU space for 180224 bytes at device :00:07.0
end_request: I/O error, dev sda, sector 52463038
printk: 1014 messages suppressed.
Buffer I/O error on device md0, logical block 21106784 

> NullPointerException during IndexWriter.mergeSegments
> -
>
>  Key: LUCENE-480
>  URL: http://issues.apache.org/jira/browse/LUCENE-480
>  Project: Lucene - Java
> Type: Bug
>   Components: Index
> Versions: CVS Nightly - Specify date in submission, 1.9
>  Environment: 64bit, ubuntu, Java 5 SE
> Reporter: Jeremy Calvert

>
> Last commit on culprit org.apache.lucene.index.FieldsReader: Sun Oct 30 
> 05:38:46 2005.
> -
> Offending code in FieldsReader.java:
> ...
>   final Document doc(int n) throws IOException {
> indexStream.seek(n * 8L);
> long position = indexStream.readLong();
> fieldsStream.seek(position);
> Document doc = new Document();
> int numFields = fieldsStream.readVInt();
> for (int i = 0; i < numFields; i++) {
>   int fieldNumber = fieldsStream.readVInt();
>   FieldInfo fi = fieldInfos.fieldInfo(fieldNumber); 
> //
> // This apparently returns null, presumably either as a result of:
> //   catch (IndexOutOfBoundsException ioobe) {
> //  return null;
> //}
> // in fieldInfos.fieldInfo(int fieldNumber)
> //  - or -
> // because there's a null member of member ArrayList byNumber of FieldInfos
>   byte bits = fieldsStream.readByte();
>   
>   boolean compressed = (bits & FieldsWriter.FIELD_IS_COMPRESSED) != 0;
> 
> Field.Store store = Field.Store.YES;
> //
> // Here --v is where the NPE is thrown.
> if (fi.isIndexed && tokenize)
>   index = Field.Index.TOKENIZED;
> ...
> -
> Proposed Patch:
> I'm not sure what the behavior should be in this case, but if it's no big 
> deal that there's null field info for an index and we should just ignore that 
> index, an obvious patch could be:
> In FieldsReader.java:
> ...
> for (int i = 0; i < numFields; i++) {
>   int fieldNumber = fieldsStream.readVInt();
>   FieldInfo fi = fieldInfos.fieldInfo(fieldNumber); 
> //vvvPatchvvv
>   if(fi == null) {continue;}
>   byte bits = fieldsStream.readByte();
> ...
> -
> Other observations:
> In my search prior to submitting this issue, I found LUCENE-168, which looks 
> similar, and is perhaps related, but if so, I'm not sure exactly how.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Closed: (LUCENE-480) NullPointerException during IndexWriter.mergeSegments

2005-12-08 Thread Yonik Seeley (JIRA)
 [ http://issues.apache.org/jira/browse/LUCENE-480?page=all ]
 
Yonik Seeley closed LUCENE-480:
---

Resolution: Invalid
 Assign To: Yonik Seeley

No problem... glad to hear it will be an easy fix :-)

> NullPointerException during IndexWriter.mergeSegments
> -
>
>  Key: LUCENE-480
>  URL: http://issues.apache.org/jira/browse/LUCENE-480
>  Project: Lucene - Java
> Type: Bug
>   Components: Index
> Versions: CVS Nightly - Specify date in submission, 1.9
>  Environment: 64bit, ubuntu, Java 5 SE
> Reporter: Jeremy Calvert
> Assignee: Yonik Seeley

>
> Last commit on culprit org.apache.lucene.index.FieldsReader: Sun Oct 30 
> 05:38:46 2005.
> -
> Offending code in FieldsReader.java:
> ...
>   final Document doc(int n) throws IOException {
> indexStream.seek(n * 8L);
> long position = indexStream.readLong();
> fieldsStream.seek(position);
> Document doc = new Document();
> int numFields = fieldsStream.readVInt();
> for (int i = 0; i < numFields; i++) {
>   int fieldNumber = fieldsStream.readVInt();
>   FieldInfo fi = fieldInfos.fieldInfo(fieldNumber); 
> //
> // This apparently returns null, presumably either as a result of:
> //   catch (IndexOutOfBoundsException ioobe) {
> //  return null;
> //}
> // in fieldInfos.fieldInfo(int fieldNumber)
> //  - or -
> // because there's a null member of member ArrayList byNumber of FieldInfos
>   byte bits = fieldsStream.readByte();
>   
>   boolean compressed = (bits & FieldsWriter.FIELD_IS_COMPRESSED) != 0;
> 
> Field.Store store = Field.Store.YES;
> //
> // Here --v is where the NPE is thrown.
> if (fi.isIndexed && tokenize)
>   index = Field.Index.TOKENIZED;
> ...
> -
> Proposed Patch:
> I'm not sure what the behavior should be in this case, but if it's no big 
> deal that there's null field info for an index and we should just ignore that 
> index, an obvious patch could be:
> In FieldsReader.java:
> ...
> for (int i = 0; i < numFields; i++) {
>   int fieldNumber = fieldsStream.readVInt();
>   FieldInfo fi = fieldInfos.fieldInfo(fieldNumber); 
> //vvvPatchvvv
>   if(fi == null) {continue;}
>   byte bits = fieldsStream.readByte();
> ...
> -
> Other observations:
> In my search prior to submitting this issue, I found LUCENE-168, which looks 
> similar, and is perhaps related, but if so, I'm not sure exactly how.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]