[jira] Commented: (LUCENE-500) Lucene 2.0 requirements - Remove all deprecated code

2006-03-22 Thread paul.elschot (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-500?page=comments#action_12371389 ] 

paul.elschot commented on LUCENE-500:
-

Here, at revision 387786, the target common.compile-test fails because a few 
previously deprecated methods
are still being used in test code (Field creation and BooleanQuery.add() in 
TestKipping* and TestRewriting).
Is there someone looking into this, or shall I provide a patch?




> Lucene 2.0 requirements - Remove all deprecated code
> 
>
>  Key: LUCENE-500
>  URL: http://issues.apache.org/jira/browse/LUCENE-500
>  Project: Lucene - Java
> Type: Task
> Versions: 1.9
> Reporter: Grant Ingersoll
>  Attachments: deprecation.txt, deprecation2.txt
>
> Per the move to Lucene 2.0 from 1.9, remove all deprecated code and update 
> documentation, etc.
> Patch to follow shortly.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-500) Lucene 2.0 requirements - Remove all deprecated code

2006-03-22 Thread paul.elschot (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-500?page=comments#action_12371392 ] 

paul.elschot commented on LUCENE-500:
-

Oops, I checked svn status for the test code, and the two tests that cause 
these compiler errors
never made it into the trunk, so please ignore this.

> Lucene 2.0 requirements - Remove all deprecated code
> 
>
>  Key: LUCENE-500
>  URL: http://issues.apache.org/jira/browse/LUCENE-500
>  Project: Lucene - Java
> Type: Task
> Versions: 1.9
> Reporter: Grant Ingersoll
>  Attachments: deprecation.txt, deprecation2.txt
>
> Per the move to Lucene 2.0 from 1.9, remove all deprecated code and update 
> documentation, etc.
> Patch to follow shortly.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException

2006-03-22 Thread Andy Hind (JIRA)
TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks 
=>  OutOfMemoryException 


 Key: LUCENE-529
 URL: http://issues.apache.org/jira/browse/LUCENE-529
 Project: Lucene - Java
Type: Bug
  Components: Index  
Versions: 1.9
 Environment: Lucene 1.4.3 with 1.5.0_04 JVM or newer..will aplpy to 1.9 
code 
Reporter: Andy Hind


TermInfosReader uses an instance level ThreadLocal for enumerators.
This is a transient/odd memory leak in lucene 1.4.3-1.9 and applies to current 
JVMs, 
not just an old JVM issue as described in the finalizer of the 1.9 code.

There is also an instance level thread local in SegmentReaderwhich will 
have the same issue.
There may be other uses which also need to be fixed.

I don't understand the intended use for these variables.however

Each ThreadLocal has its own hashcode used for look up, see the ThreadLocal 
source code. Each instance of TermInfosReader will be creating an instance of 
the thread local. All this does is create an instance variable on each thread 
when it accesses the thread local. Setting it to null in the finaliser will set 
it to null on one thread, the finalizer thread, where it has never been 
created.  There is no point to this :-(

I assume there is a good concurrency reason why an instance variable can not be 
used...

I have not used multi-threaded searching, but I have used a lot of threads each 
making searchers and searching.
1.4.3 has a clear memory leak caused by this thread local. This use case above 
is definitely solved by setting the thread local to null in the close(). This 
at least has a chance of being on the correct thread :-) 
I know reusing Searchers would help but that is my choice and I will get to 
that later  

Now you wnat to know why

Thread locals are stored in a table of entries. Each entry is *weak reference* 
to the key (Here the TermInfosReader instance)  and a *simple reference* to the 
thread local value. When the instance is GCed its key becomes null. 
This is now a stale entry in the table.
Stale entries are cleared up in an ad hoc way and until they are cleared up the 
value will not be garbage collected.
Until the instance is GCed it is a valid key and its presence may cause the 
table to expand.
See the ThreadLocal code.

So if you have lots of threads, all creating thread locals rapidly, you can get 
each thread holding a large table of thread locals which all contain many stale 
entries and preventing some objects from being garbage collected. 
The limited GC of the thread local table is not enough to save you from running 
out of memory.  

Summary:

- remove finalizer()
- set the thread local to null in close() 
  - values will be available for gc 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [jira] Created: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException

2006-03-22 Thread Robert Engels
There is only a single TermInfoReader per index. In order to share this 
instance with multiple threads, and avoid the overhead of creating new 
enumerators for each request, the enumerator for the thread is stored in a 
thread local. Normally, in a server application, threads are pooled, so new 
threads are not constantly created and destroyed, so the memory leak is 
insiginificant.

The same reasoning holds true for the SegmentReader class.


-Original Message-
From: Andy Hind (JIRA) [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 22, 2006 11:07 AM
To: java-dev@lucene.apache.org
Subject: [jira] Created: (LUCENE-529) TermInfosReader and other +
instance ThreadLocal => transient/odd memory leaks =>
OutOfMemoryException


TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks 
=>  OutOfMemoryException 


 Key: LUCENE-529
 URL: http://issues.apache.org/jira/browse/LUCENE-529
 Project: Lucene - Java
Type: Bug
  Components: Index  
Versions: 1.9
 Environment: Lucene 1.4.3 with 1.5.0_04 JVM or newer..will aplpy to 1.9 
code 
Reporter: Andy Hind


TermInfosReader uses an instance level ThreadLocal for enumerators.
This is a transient/odd memory leak in lucene 1.4.3-1.9 and applies to current 
JVMs, 
not just an old JVM issue as described in the finalizer of the 1.9 code.

There is also an instance level thread local in SegmentReaderwhich will 
have the same issue.
There may be other uses which also need to be fixed.

I don't understand the intended use for these variables.however

Each ThreadLocal has its own hashcode used for look up, see the ThreadLocal 
source code. Each instance of TermInfosReader will be creating an instance of 
the thread local. All this does is create an instance variable on each thread 
when it accesses the thread local. Setting it to null in the finaliser will set 
it to null on one thread, the finalizer thread, where it has never been 
created.  There is no point to this :-(

I assume there is a good concurrency reason why an instance variable can not be 
used...

I have not used multi-threaded searching, but I have used a lot of threads each 
making searchers and searching.
1.4.3 has a clear memory leak caused by this thread local. This use case above 
is definitely solved by setting the thread local to null in the close(). This 
at least has a chance of being on the correct thread :-) 
I know reusing Searchers would help but that is my choice and I will get to 
that later  

Now you wnat to know why

Thread locals are stored in a table of entries. Each entry is *weak reference* 
to the key (Here the TermInfosReader instance)  and a *simple reference* to the 
thread local value. When the instance is GCed its key becomes null. 
This is now a stale entry in the table.
Stale entries are cleared up in an ad hoc way and until they are cleared up the 
value will not be garbage collected.
Until the instance is GCed it is a valid key and its presence may cause the 
table to expand.
See the ThreadLocal code.

So if you have lots of threads, all creating thread locals rapidly, you can get 
each thread holding a large table of thread locals which all contain many stale 
entries and preventing some objects from being garbage collected. 
The limited GC of the thread local table is not enough to save you from running 
out of memory.  

Summary:

- remove finalizer()
- set the thread local to null in close() 
  - values will be available for gc 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [jira] Created: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException

2006-03-22 Thread Robert Engels
There was a small mistake - there is a single TermInfoReader per segment.

-Original Message-
From: Robert Engels [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 22, 2006 11:37 AM
To: java-dev@lucene.apache.org
Subject: RE: [jira] Created: (LUCENE-529) TermInfosReader and other +
instance ThreadLocal => transient/odd memory leaks =>
OutOfMemoryException


There is only a single TermInfoReader per index. In order to share this 
instance with multiple threads, and avoid the overhead of creating new 
enumerators for each request, the enumerator for the thread is stored in a 
thread local. Normally, in a server application, threads are pooled, so new 
threads are not constantly created and destroyed, so the memory leak is 
insiginificant.

The same reasoning holds true for the SegmentReader class.


-Original Message-
From: Andy Hind (JIRA) [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 22, 2006 11:07 AM
To: java-dev@lucene.apache.org
Subject: [jira] Created: (LUCENE-529) TermInfosReader and other +
instance ThreadLocal => transient/odd memory leaks =>
OutOfMemoryException


TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks 
=>  OutOfMemoryException 


 Key: LUCENE-529
 URL: http://issues.apache.org/jira/browse/LUCENE-529
 Project: Lucene - Java
Type: Bug
  Components: Index  
Versions: 1.9
 Environment: Lucene 1.4.3 with 1.5.0_04 JVM or newer..will aplpy to 1.9 
code 
Reporter: Andy Hind


TermInfosReader uses an instance level ThreadLocal for enumerators.
This is a transient/odd memory leak in lucene 1.4.3-1.9 and applies to current 
JVMs, 
not just an old JVM issue as described in the finalizer of the 1.9 code.

There is also an instance level thread local in SegmentReaderwhich will 
have the same issue.
There may be other uses which also need to be fixed.

I don't understand the intended use for these variables.however

Each ThreadLocal has its own hashcode used for look up, see the ThreadLocal 
source code. Each instance of TermInfosReader will be creating an instance of 
the thread local. All this does is create an instance variable on each thread 
when it accesses the thread local. Setting it to null in the finaliser will set 
it to null on one thread, the finalizer thread, where it has never been 
created.  There is no point to this :-(

I assume there is a good concurrency reason why an instance variable can not be 
used...

I have not used multi-threaded searching, but I have used a lot of threads each 
making searchers and searching.
1.4.3 has a clear memory leak caused by this thread local. This use case above 
is definitely solved by setting the thread local to null in the close(). This 
at least has a chance of being on the correct thread :-) 
I know reusing Searchers would help but that is my choice and I will get to 
that later  

Now you wnat to know why

Thread locals are stored in a table of entries. Each entry is *weak reference* 
to the key (Here the TermInfosReader instance)  and a *simple reference* to the 
thread local value. When the instance is GCed its key becomes null. 
This is now a stale entry in the table.
Stale entries are cleared up in an ad hoc way and until they are cleared up the 
value will not be garbage collected.
Until the instance is GCed it is a valid key and its presence may cause the 
table to expand.
See the ThreadLocal code.

So if you have lots of threads, all creating thread locals rapidly, you can get 
each thread holding a large table of thread locals which all contain many stale 
entries and preventing some objects from being garbage collected. 
The limited GC of the thread local table is not enough to save you from running 
out of memory.  

Summary:

- remove finalizer()
- set the thread local to null in close() 
  - values will be available for gc 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



query parsing

2006-03-22 Thread Robert Engels
Using lucene 1.4.3, if I use the query

+cat AND -dog

it parses to

+cat -dog

and works correctly.

If I use

(+cat) AND (-dog)

it parses to

+(+cat) +(-dog)

and returns no results.

Is this a known issue?

[jira] Created: (LUCENE-530) Extend NumberTools to support int/long/float/double to string

2006-03-22 Thread Andy Hind (JIRA)
Extend NumberTools to support int/long/float/double to string 
--

 Key: LUCENE-530
 URL: http://issues.apache.org/jira/browse/LUCENE-530
 Project: Lucene - Java
Type: Improvement
  Components: Analysis  
Versions: 1.9
Reporter: Andy Hind
Priority: Minor


Extend Number tools to support int/long/float/double to string 

So you can search using range queries on int/long/float/double, if you want.

Here is the basis for how NumberTools cold be extended to support 
int/long/double/float.
As I only write these values to the index and fix tokenisation in searchesI was 
not so fussed about the reverse transformations back to Strings.



public class NumericEncoder
{
/*
 * Constants for integer encoding
 */

static int INTEGER_SIGN_MASK = 0x8000;

/*
 * Constants for long encoding
 */

static long LONG_SIGN_MASK = 0x8000L;

/*
 * Constants for float encoding
 */

static int FLOAT_SIGN_MASK = 0x8000;

static int FLOAT_EXPONENT_MASK = 0x7F80;

static int FLOAT_MANTISSA_MASK = 0x007F;

/*
 * Constants for double encoding
 */

static long DOUBLE_SIGN_MASK = 0x8000L;

static long DOUBLE_EXPONENT_MASK = 0x7FF0L;

static long DOUBLE_MANTISSA_MASK = 0x000FL;

private NumericEncoder()
{
super();
}

/**
 * Encode an integer into a string that orders correctly using string
 * comparison Integer.MIN_VALUE encodes as  and MAX_VALUE as
 * .
 * 
 * @param intToEncode
 * @return
 */
public static String encode(int intToEncode)
{
int replacement = intToEncode ^ INTEGER_SIGN_MASK;
return encodeToHex(replacement);
}

/**
 * Encode a long into a string that orders correctly using string comparison
 * Long.MIN_VALUE encodes as  and MAX_VALUE as
 * .
 * 
 * @param longToEncode
 * @return
 */
public static String encode(long longToEncode)
{
long replacement = longToEncode ^ LONG_SIGN_MASK;
return encodeToHex(replacement);
}

/**
 * Encode a float into a string that orders correctly according to string
 * comparison. Note that there is no negative NaN but there are codings that
 * imply this. So NaN and -Infinity may not compare as expected.
 * 
 * @param floatToEncode
 * @return
 */
public static String encode(float floatToEncode)
{
int bits = Float.floatToIntBits(floatToEncode);
int sign = bits & FLOAT_SIGN_MASK;
int exponent = bits & FLOAT_EXPONENT_MASK;
int mantissa = bits & FLOAT_MANTISSA_MASK;
if (sign != 0)
{
exponent ^= FLOAT_EXPONENT_MASK;
mantissa ^= FLOAT_MANTISSA_MASK;
}
sign ^= FLOAT_SIGN_MASK;
int replacement = sign | exponent | mantissa;
return encodeToHex(replacement);
}

/**
 * Encode a double into a string that orders correctly according to string
 * comparison. Note that there is no negative NaN but there are codings that
 * imply this. So NaN and -Infinity may not compare as expected.
 * 
 * @param doubleToEncode
 * @return
 */
public static String encode(double doubleToEncode)
{
long bits = Double.doubleToLongBits(doubleToEncode);
long sign = bits & DOUBLE_SIGN_MASK;
long exponent = bits & DOUBLE_EXPONENT_MASK;
long mantissa = bits & DOUBLE_MANTISSA_MASK;
if (sign != 0)
{
exponent ^= DOUBLE_EXPONENT_MASK;
mantissa ^= DOUBLE_MANTISSA_MASK;
}
sign ^= DOUBLE_SIGN_MASK;
long replacement = sign | exponent | mantissa;
return encodeToHex(replacement);
}

private static String encodeToHex(int i)
{
char[] buf = new char[] { '0', '0', '0', '0', '0', '0', '0', '0' };
int charPos = 8;
do
{
buf[--charPos] = DIGITS[i & MASK];
i >>>= 4;
}
while (i != 0);
return new String(buf);
}

private static String encodeToHex(long l)
{
char[] buf = new char[] { '0', '0', '0', '0', '0', '0', '0', '0', '0', 
'0', '0', '0', '0', '0', '0', '0' };
int charPos = 16;
do
{
buf[--charPos] = DIGITS[(int) l & MASK];
l >>>= 4;
}
while (l != 0);
return new String(buf);
}

private static final char[] DIGITS = { '0', '1', '2', '3', '4', '5', '6', 
'7', '8', '9', 'a', 'b', 'c', 'd', 'e',
'f' };

private static final int MASK = (1 << 4) - 1;
}
























public class NumericEncodingTest extends TestCase
{

public NumericEncodingTest()
{
super();
}

public NumericEncodingTest(String arg0)
{
sup

[jira] Commented: (LUCENE-530) Extend NumberTools to support int/long/float/double to string

2006-03-22 Thread Yonik Seeley (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-530?page=comments#action_12371446 ] 

Yonik Seeley commented on LUCENE-530:
-

Here is how Solr did it:
http://svn.apache.org/viewcvs.cgi/incubator/solr/trunk/src/java/org/apache/solr/util/NumberUtils.java?rev=382610&view=markup

It's a binary representation transformed to sort correctly and fit in to chars.
A 4 byte int or float is transformed into 3 java chars
An 8 byte long or double is transformed into 5 java chars

> Extend NumberTools to support int/long/float/double to string
> -
>
>  Key: LUCENE-530
>  URL: http://issues.apache.org/jira/browse/LUCENE-530
>  Project: Lucene - Java
> Type: Improvement
>   Components: Analysis
> Versions: 1.9
> Reporter: Andy Hind
> Priority: Minor

>
> Extend Number tools to support int/long/float/double to string 
> So you can search using range queries on int/long/float/double, if you want.
> Here is the basis for how NumberTools cold be extended to support 
> int/long/double/float.
> As I only write these values to the index and fix tokenisation in searchesI 
> was not so fussed about the reverse transformations back to Strings.
> public class NumericEncoder
> {
> /*
>  * Constants for integer encoding
>  */
> static int INTEGER_SIGN_MASK = 0x8000;
> /*
>  * Constants for long encoding
>  */
> static long LONG_SIGN_MASK = 0x8000L;
> /*
>  * Constants for float encoding
>  */
> static int FLOAT_SIGN_MASK = 0x8000;
> static int FLOAT_EXPONENT_MASK = 0x7F80;
> static int FLOAT_MANTISSA_MASK = 0x007F;
> /*
>  * Constants for double encoding
>  */
> static long DOUBLE_SIGN_MASK = 0x8000L;
> static long DOUBLE_EXPONENT_MASK = 0x7FF0L;
> static long DOUBLE_MANTISSA_MASK = 0x000FL;
> private NumericEncoder()
> {
> super();
> }
> /**
>  * Encode an integer into a string that orders correctly using string
>  * comparison Integer.MIN_VALUE encodes as  and MAX_VALUE as
>  * .
>  * 
>  * @param intToEncode
>  * @return
>  */
> public static String encode(int intToEncode)
> {
> int replacement = intToEncode ^ INTEGER_SIGN_MASK;
> return encodeToHex(replacement);
> }
> /**
>  * Encode a long into a string that orders correctly using string 
> comparison
>  * Long.MIN_VALUE encodes as  and MAX_VALUE as
>  * .
>  * 
>  * @param longToEncode
>  * @return
>  */
> public static String encode(long longToEncode)
> {
> long replacement = longToEncode ^ LONG_SIGN_MASK;
> return encodeToHex(replacement);
> }
> /**
>  * Encode a float into a string that orders correctly according to string
>  * comparison. Note that there is no negative NaN but there are codings 
> that
>  * imply this. So NaN and -Infinity may not compare as expected.
>  * 
>  * @param floatToEncode
>  * @return
>  */
> public static String encode(float floatToEncode)
> {
> int bits = Float.floatToIntBits(floatToEncode);
> int sign = bits & FLOAT_SIGN_MASK;
> int exponent = bits & FLOAT_EXPONENT_MASK;
> int mantissa = bits & FLOAT_MANTISSA_MASK;
> if (sign != 0)
> {
> exponent ^= FLOAT_EXPONENT_MASK;
> mantissa ^= FLOAT_MANTISSA_MASK;
> }
> sign ^= FLOAT_SIGN_MASK;
> int replacement = sign | exponent | mantissa;
> return encodeToHex(replacement);
> }
> /**
>  * Encode a double into a string that orders correctly according to string
>  * comparison. Note that there is no negative NaN but there are codings 
> that
>  * imply this. So NaN and -Infinity may not compare as expected.
>  * 
>  * @param doubleToEncode
>  * @return
>  */
> public static String encode(double doubleToEncode)
> {
> long bits = Double.doubleToLongBits(doubleToEncode);
> long sign = bits & DOUBLE_SIGN_MASK;
> long exponent = bits & DOUBLE_EXPONENT_MASK;
> long mantissa = bits & DOUBLE_MANTISSA_MASK;
> if (sign != 0)
> {
> exponent ^= DOUBLE_EXPONENT_MASK;
> mantissa ^= DOUBLE_MANTISSA_MASK;
> }
> sign ^= DOUBLE_SIGN_MASK;
> long replacement = sign | exponent | mantissa;
> return encodeToHex(replacement);
> }
> private static String encodeToHex(int i)
> {
> char[] buf = new char[] { '0', '0', '0', '0', '0', '0', '0', '0' };
> int charPos = 8;
> do
> {
> buf[--charPos] = DIGITS[i & MASK];
> i >>>= 4;
> }
> while (i != 0);
> return new String(buf);
> }
> private

RE: [jira] Created: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException

2006-03-22 Thread Andy Hind

For every IndexReader that is opened
- there is one SegmentReader for every segment in the index 
   - with its thread local
   - for each of these there is a TermInfosReader + its thread local.

So I get 2 * (no of index segments) thread locals.

I am creating index readers for a main index and transactional updates
and layering the two. At the moment this is an issue, under stress
testing, using tomcat, with thread pooling, with a pretty big changing
index, left running for a few hours, it blows up.

Thread locals are also used in other areas of the app.

It would be better if threads were created and destroyed!

It is certainly not insignificant for me and gives a JVM that creeps up
in size pretty steadily over time.

I have fixed this issue locally in the code and it works.

Regards

Andy

 


-Original Message-
From: Robert Engels [mailto:[EMAIL PROTECTED] 
Sent: 22 March 2006 17:46
To: java-dev@lucene.apache.org
Subject: RE: [jira] Created: (LUCENE-529) TermInfosReader and other +
instance ThreadLocal => transient/odd memory leaks =>
OutOfMemoryException

There was a small mistake - there is a single TermInfoReader per
segment.

-Original Message-
From: Robert Engels [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 22, 2006 11:37 AM
To: java-dev@lucene.apache.org
Subject: RE: [jira] Created: (LUCENE-529) TermInfosReader and other +
instance ThreadLocal => transient/odd memory leaks =>
OutOfMemoryException


There is only a single TermInfoReader per index. In order to share this
instance with multiple threads, and avoid the overhead of creating new
enumerators for each request, the enumerator for the thread is stored in
a thread local. Normally, in a server application, threads are pooled,
so new threads are not constantly created and destroyed, so the memory
leak is insiginificant.

The same reasoning holds true for the SegmentReader class.


-Original Message-
From: Andy Hind (JIRA) [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 22, 2006 11:07 AM
To: java-dev@lucene.apache.org
Subject: [jira] Created: (LUCENE-529) TermInfosReader and other +
instance ThreadLocal => transient/odd memory leaks =>
OutOfMemoryException


TermInfosReader and other + instance ThreadLocal => transient/odd memory
leaks =>  OutOfMemoryException 



 Key: LUCENE-529
 URL: http://issues.apache.org/jira/browse/LUCENE-529
 Project: Lucene - Java
Type: Bug
  Components: Index  
Versions: 1.9
 Environment: Lucene 1.4.3 with 1.5.0_04 JVM or newer..will aplpy to
1.9 code 
Reporter: Andy Hind


TermInfosReader uses an instance level ThreadLocal for enumerators.
This is a transient/odd memory leak in lucene 1.4.3-1.9 and applies to
current JVMs, 
not just an old JVM issue as described in the finalizer of the 1.9 code.

There is also an instance level thread local in SegmentReaderwhich
will have the same issue.
There may be other uses which also need to be fixed.

I don't understand the intended use for these variables.however

Each ThreadLocal has its own hashcode used for look up, see the
ThreadLocal source code. Each instance of TermInfosReader will be
creating an instance of the thread local. All this does is create an
instance variable on each thread when it accesses the thread local.
Setting it to null in the finaliser will set it to null on one thread,
the finalizer thread, where it has never been created.  There is no
point to this :-(

I assume there is a good concurrency reason why an instance variable can
not be used...

I have not used multi-threaded searching, but I have used a lot of
threads each making searchers and searching.
1.4.3 has a clear memory leak caused by this thread local. This use case
above is definitely solved by setting the thread local to null in the
close(). This at least has a chance of being on the correct thread :-) 
I know reusing Searchers would help but that is my choice and I will get
to that later  

Now you wnat to know why

Thread locals are stored in a table of entries. Each entry is *weak
reference* to the key (Here the TermInfosReader instance)  and a *simple
reference* to the thread local value. When the instance is GCed its key
becomes null. 
This is now a stale entry in the table.
Stale entries are cleared up in an ad hoc way and until they are cleared
up the value will not be garbage collected.
Until the instance is GCed it is a valid key and its presence may cause
the table to expand.
See the ThreadLocal code.

So if you have lots of threads, all creating thread locals rapidly, you
can get each thread holding a large table of thread locals which all
contain many stale entries and preventing some objects from being
garbage collected. 
The limited GC of the thread local table is not enough to save you from
running out of memory.  

Summary:

- remove finalizer()
- set the thread local to null

Re: query parsing

2006-03-22 Thread Daniel Naber
On Mittwoch 22 März 2006 18:49, Robert Engels wrote:

> If I use
>
> (+cat) AND (-dog)
>
> it parses to
>
> +(+cat) +(-dog)
>
> and returns no results.
>
> Is this a known issue?

Basically yes. QueryParser is known to exhibit strange behavior when 
combining +/- and AND/OR/NOT.

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [jira] Created: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException

2006-03-22 Thread Robert Engels
Creating and destroying threads is one of the worst performing operations,
and should be avoided at ALMOST all costs.

I do not see this problem in my server impl of Lucene, internally
multithreaded, and accessed via multiple threads from a Tomcat server. I
have to assume many (most?) users of Lucene are doing so in a multithreaded
server environment.

I reviewed the bugs in java.sun related to memory leaks with ThreadLocal's.
I don't think any of them apply in this case.

Maybe you could provide a simplified ThreadLocal testcase that demonstrates
the 'out of memory' condition?

Are you sure that you do not have a modified version of Lucene that is
somehow maintain a reference back to the ThreadLocal from the ThreadLocal's
value, as this is a known JDK issue
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6254531 I don't see this
bug as being applicable to the 1.9.1 or 1.4.3 code.

Did you try running your server using 1.4.3? (our server code is based off
the 1.4.3 codeset at this time).


-Original Message-
From: Andy Hind [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 22, 2006 12:48 PM
To: java-dev@lucene.apache.org; [EMAIL PROTECTED]
Subject: RE: [jira] Created: (LUCENE-529) TermInfosReader and other +
instance ThreadLocal => transient/odd memory leaks =>
OutOfMemoryException



For every IndexReader that is opened
- there is one SegmentReader for every segment in the index
   - with its thread local
   - for each of these there is a TermInfosReader + its thread local.

So I get 2 * (no of index segments) thread locals.

I am creating index readers for a main index and transactional updates
and layering the two. At the moment this is an issue, under stress
testing, using tomcat, with thread pooling, with a pretty big changing
index, left running for a few hours, it blows up.

Thread locals are also used in other areas of the app.

It would be better if threads were created and destroyed!

It is certainly not insignificant for me and gives a JVM that creeps up
in size pretty steadily over time.

I have fixed this issue locally in the code and it works.

Regards

Andy




-Original Message-
From: Robert Engels [mailto:[EMAIL PROTECTED]
Sent: 22 March 2006 17:46
To: java-dev@lucene.apache.org
Subject: RE: [jira] Created: (LUCENE-529) TermInfosReader and other +
instance ThreadLocal => transient/odd memory leaks =>
OutOfMemoryException

There was a small mistake - there is a single TermInfoReader per
segment.

-Original Message-
From: Robert Engels [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 22, 2006 11:37 AM
To: java-dev@lucene.apache.org
Subject: RE: [jira] Created: (LUCENE-529) TermInfosReader and other +
instance ThreadLocal => transient/odd memory leaks =>
OutOfMemoryException


There is only a single TermInfoReader per index. In order to share this
instance with multiple threads, and avoid the overhead of creating new
enumerators for each request, the enumerator for the thread is stored in
a thread local. Normally, in a server application, threads are pooled,
so new threads are not constantly created and destroyed, so the memory
leak is insiginificant.

The same reasoning holds true for the SegmentReader class.


-Original Message-
From: Andy Hind (JIRA) [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 22, 2006 11:07 AM
To: java-dev@lucene.apache.org
Subject: [jira] Created: (LUCENE-529) TermInfosReader and other +
instance ThreadLocal => transient/odd memory leaks =>
OutOfMemoryException


TermInfosReader and other + instance ThreadLocal => transient/odd memory
leaks =>  OutOfMemoryException



 Key: LUCENE-529
 URL: http://issues.apache.org/jira/browse/LUCENE-529
 Project: Lucene - Java
Type: Bug
  Components: Index
Versions: 1.9
 Environment: Lucene 1.4.3 with 1.5.0_04 JVM or newer..will aplpy to
1.9 code
Reporter: Andy Hind


TermInfosReader uses an instance level ThreadLocal for enumerators.
This is a transient/odd memory leak in lucene 1.4.3-1.9 and applies to
current JVMs,
not just an old JVM issue as described in the finalizer of the 1.9 code.

There is also an instance level thread local in SegmentReaderwhich
will have the same issue.
There may be other uses which also need to be fixed.

I don't understand the intended use for these variables.however

Each ThreadLocal has its own hashcode used for look up, see the
ThreadLocal source code. Each instance of TermInfosReader will be
creating an instance of the thread local. All this does is create an
instance variable on each thread when it accesses the thread local.
Setting it to null in the finaliser will set it to null on one thread,
the finalizer thread, where it has never been created.  There is no
point to this :-(

I assume there is a good concurrency reason why an instance variable can
not be used...

I have not used multi-threaded searching, but I

[jira] Commented: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException

2006-03-22 Thread Otis Gospodnetic (JIRA)
[ 
http://issues.apache.org/jira/browse/LUCENE-529?page=comments#action_12371463 ] 

Otis Gospodnetic commented on LUCENE-529:
-

This sounds like something that should be reproducable when written as a JUnit 
test.  Could you write one and attach it to this?  We could then clearly see 
the OOM problems and how your changes (patch?) fix it.

Also, you said you made your changes locally and things work now.  Have you 
been running your locally-modified code in a serious production environment for 
days/weeks, and have you observed any side-effects?

> TermInfosReader and other + instance ThreadLocal => transient/odd memory 
> leaks =>  OutOfMemoryException
> ---
>
>  Key: LUCENE-529
>  URL: http://issues.apache.org/jira/browse/LUCENE-529
>  Project: Lucene - Java
> Type: Bug
>   Components: Index
> Versions: 1.9
>  Environment: Lucene 1.4.3 with 1.5.0_04 JVM or newer..will aplpy to 1.9 
> code 
> Reporter: Andy Hind

>
> TermInfosReader uses an instance level ThreadLocal for enumerators.
> This is a transient/odd memory leak in lucene 1.4.3-1.9 and applies to 
> current JVMs, 
> not just an old JVM issue as described in the finalizer of the 1.9 code.
> There is also an instance level thread local in SegmentReaderwhich will 
> have the same issue.
> There may be other uses which also need to be fixed.
> I don't understand the intended use for these variables.however
> Each ThreadLocal has its own hashcode used for look up, see the ThreadLocal 
> source code. Each instance of TermInfosReader will be creating an instance of 
> the thread local. All this does is create an instance variable on each thread 
> when it accesses the thread local. Setting it to null in the finaliser will 
> set it to null on one thread, the finalizer thread, where it has never been 
> created.  There is no point to this :-(
> I assume there is a good concurrency reason why an instance variable can not 
> be used...
> I have not used multi-threaded searching, but I have used a lot of threads 
> each making searchers and searching.
> 1.4.3 has a clear memory leak caused by this thread local. This use case 
> above is definitely solved by setting the thread local to null in the 
> close(). This at least has a chance of being on the correct thread :-) 
> I know reusing Searchers would help but that is my choice and I will get to 
> that later  
> Now you wnat to know why
> Thread locals are stored in a table of entries. Each entry is *weak 
> reference* to the key (Here the TermInfosReader instance)  and a *simple 
> reference* to the thread local value. When the instance is GCed its key 
> becomes null. 
> This is now a stale entry in the table.
> Stale entries are cleared up in an ad hoc way and until they are cleared up 
> the value will not be garbage collected.
> Until the instance is GCed it is a valid key and its presence may cause the 
> table to expand.
> See the ThreadLocal code.
> So if you have lots of threads, all creating thread locals rapidly, you can 
> get each thread holding a large table of thread locals which all contain many 
> stale entries and preventing some objects from being garbage collected. 
> The limited GC of the thread local table is not enough to save you from 
> running out of memory.  
> Summary:
> 
> - remove finalizer()
> - set the thread local to null in close() 
>   - values will be available for gc 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: query parsing

2006-03-22 Thread Robert Engels
Any suggestions on what to do then, as the following query exhibits the same 
behavior

(+cat) (-dog)

Due to the implied AND. Removing the parenthesis allows it to work. It doesn't 
seem that adding parenthesis in this case should cause the query to fail???

Doesn't it suggest that there is a bug in the BooleanQuery scorer is not 
handling the case of a REQUIRED clause that is a BooleanQuery, that consists of 
a single prohibited boolean clause?

-Original Message-
From: Daniel Naber [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 22, 2006 1:03 PM
To: java-dev@lucene.apache.org
Subject: Re: query parsing


On Mittwoch 22 März 2006 18:49, Robert Engels wrote:

> If I use
>
> (+cat) AND (-dog)
>
> it parses to
>
> +(+cat) +(-dog)
>
> and returns no results.
>
> Is this a known issue?

Basically yes. QueryParser is known to exhibit strange behavior when 
combining +/- and AND/OR/NOT.

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]