[jira] Commented: (LUCENE-500) Lucene 2.0 requirements - Remove all deprecated code
[ http://issues.apache.org/jira/browse/LUCENE-500?page=comments#action_12371389 ] paul.elschot commented on LUCENE-500: - Here, at revision 387786, the target common.compile-test fails because a few previously deprecated methods are still being used in test code (Field creation and BooleanQuery.add() in TestKipping* and TestRewriting). Is there someone looking into this, or shall I provide a patch? > Lucene 2.0 requirements - Remove all deprecated code > > > Key: LUCENE-500 > URL: http://issues.apache.org/jira/browse/LUCENE-500 > Project: Lucene - Java > Type: Task > Versions: 1.9 > Reporter: Grant Ingersoll > Attachments: deprecation.txt, deprecation2.txt > > Per the move to Lucene 2.0 from 1.9, remove all deprecated code and update > documentation, etc. > Patch to follow shortly. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-500) Lucene 2.0 requirements - Remove all deprecated code
[ http://issues.apache.org/jira/browse/LUCENE-500?page=comments#action_12371392 ] paul.elschot commented on LUCENE-500: - Oops, I checked svn status for the test code, and the two tests that cause these compiler errors never made it into the trunk, so please ignore this. > Lucene 2.0 requirements - Remove all deprecated code > > > Key: LUCENE-500 > URL: http://issues.apache.org/jira/browse/LUCENE-500 > Project: Lucene - Java > Type: Task > Versions: 1.9 > Reporter: Grant Ingersoll > Attachments: deprecation.txt, deprecation2.txt > > Per the move to Lucene 2.0 from 1.9, remove all deprecated code and update > documentation, etc. > Patch to follow shortly. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException
TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException Key: LUCENE-529 URL: http://issues.apache.org/jira/browse/LUCENE-529 Project: Lucene - Java Type: Bug Components: Index Versions: 1.9 Environment: Lucene 1.4.3 with 1.5.0_04 JVM or newer..will aplpy to 1.9 code Reporter: Andy Hind TermInfosReader uses an instance level ThreadLocal for enumerators. This is a transient/odd memory leak in lucene 1.4.3-1.9 and applies to current JVMs, not just an old JVM issue as described in the finalizer of the 1.9 code. There is also an instance level thread local in SegmentReaderwhich will have the same issue. There may be other uses which also need to be fixed. I don't understand the intended use for these variables.however Each ThreadLocal has its own hashcode used for look up, see the ThreadLocal source code. Each instance of TermInfosReader will be creating an instance of the thread local. All this does is create an instance variable on each thread when it accesses the thread local. Setting it to null in the finaliser will set it to null on one thread, the finalizer thread, where it has never been created. There is no point to this :-( I assume there is a good concurrency reason why an instance variable can not be used... I have not used multi-threaded searching, but I have used a lot of threads each making searchers and searching. 1.4.3 has a clear memory leak caused by this thread local. This use case above is definitely solved by setting the thread local to null in the close(). This at least has a chance of being on the correct thread :-) I know reusing Searchers would help but that is my choice and I will get to that later Now you wnat to know why Thread locals are stored in a table of entries. Each entry is *weak reference* to the key (Here the TermInfosReader instance) and a *simple reference* to the thread local value. When the instance is GCed its key becomes null. This is now a stale entry in the table. Stale entries are cleared up in an ad hoc way and until they are cleared up the value will not be garbage collected. Until the instance is GCed it is a valid key and its presence may cause the table to expand. See the ThreadLocal code. So if you have lots of threads, all creating thread locals rapidly, you can get each thread holding a large table of thread locals which all contain many stale entries and preventing some objects from being garbage collected. The limited GC of the thread local table is not enough to save you from running out of memory. Summary: - remove finalizer() - set the thread local to null in close() - values will be available for gc -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [jira] Created: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException
There is only a single TermInfoReader per index. In order to share this instance with multiple threads, and avoid the overhead of creating new enumerators for each request, the enumerator for the thread is stored in a thread local. Normally, in a server application, threads are pooled, so new threads are not constantly created and destroyed, so the memory leak is insiginificant. The same reasoning holds true for the SegmentReader class. -Original Message- From: Andy Hind (JIRA) [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 22, 2006 11:07 AM To: java-dev@lucene.apache.org Subject: [jira] Created: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException Key: LUCENE-529 URL: http://issues.apache.org/jira/browse/LUCENE-529 Project: Lucene - Java Type: Bug Components: Index Versions: 1.9 Environment: Lucene 1.4.3 with 1.5.0_04 JVM or newer..will aplpy to 1.9 code Reporter: Andy Hind TermInfosReader uses an instance level ThreadLocal for enumerators. This is a transient/odd memory leak in lucene 1.4.3-1.9 and applies to current JVMs, not just an old JVM issue as described in the finalizer of the 1.9 code. There is also an instance level thread local in SegmentReaderwhich will have the same issue. There may be other uses which also need to be fixed. I don't understand the intended use for these variables.however Each ThreadLocal has its own hashcode used for look up, see the ThreadLocal source code. Each instance of TermInfosReader will be creating an instance of the thread local. All this does is create an instance variable on each thread when it accesses the thread local. Setting it to null in the finaliser will set it to null on one thread, the finalizer thread, where it has never been created. There is no point to this :-( I assume there is a good concurrency reason why an instance variable can not be used... I have not used multi-threaded searching, but I have used a lot of threads each making searchers and searching. 1.4.3 has a clear memory leak caused by this thread local. This use case above is definitely solved by setting the thread local to null in the close(). This at least has a chance of being on the correct thread :-) I know reusing Searchers would help but that is my choice and I will get to that later Now you wnat to know why Thread locals are stored in a table of entries. Each entry is *weak reference* to the key (Here the TermInfosReader instance) and a *simple reference* to the thread local value. When the instance is GCed its key becomes null. This is now a stale entry in the table. Stale entries are cleared up in an ad hoc way and until they are cleared up the value will not be garbage collected. Until the instance is GCed it is a valid key and its presence may cause the table to expand. See the ThreadLocal code. So if you have lots of threads, all creating thread locals rapidly, you can get each thread holding a large table of thread locals which all contain many stale entries and preventing some objects from being garbage collected. The limited GC of the thread local table is not enough to save you from running out of memory. Summary: - remove finalizer() - set the thread local to null in close() - values will be available for gc -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [jira] Created: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException
There was a small mistake - there is a single TermInfoReader per segment. -Original Message- From: Robert Engels [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 22, 2006 11:37 AM To: java-dev@lucene.apache.org Subject: RE: [jira] Created: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException There is only a single TermInfoReader per index. In order to share this instance with multiple threads, and avoid the overhead of creating new enumerators for each request, the enumerator for the thread is stored in a thread local. Normally, in a server application, threads are pooled, so new threads are not constantly created and destroyed, so the memory leak is insiginificant. The same reasoning holds true for the SegmentReader class. -Original Message- From: Andy Hind (JIRA) [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 22, 2006 11:07 AM To: java-dev@lucene.apache.org Subject: [jira] Created: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException Key: LUCENE-529 URL: http://issues.apache.org/jira/browse/LUCENE-529 Project: Lucene - Java Type: Bug Components: Index Versions: 1.9 Environment: Lucene 1.4.3 with 1.5.0_04 JVM or newer..will aplpy to 1.9 code Reporter: Andy Hind TermInfosReader uses an instance level ThreadLocal for enumerators. This is a transient/odd memory leak in lucene 1.4.3-1.9 and applies to current JVMs, not just an old JVM issue as described in the finalizer of the 1.9 code. There is also an instance level thread local in SegmentReaderwhich will have the same issue. There may be other uses which also need to be fixed. I don't understand the intended use for these variables.however Each ThreadLocal has its own hashcode used for look up, see the ThreadLocal source code. Each instance of TermInfosReader will be creating an instance of the thread local. All this does is create an instance variable on each thread when it accesses the thread local. Setting it to null in the finaliser will set it to null on one thread, the finalizer thread, where it has never been created. There is no point to this :-( I assume there is a good concurrency reason why an instance variable can not be used... I have not used multi-threaded searching, but I have used a lot of threads each making searchers and searching. 1.4.3 has a clear memory leak caused by this thread local. This use case above is definitely solved by setting the thread local to null in the close(). This at least has a chance of being on the correct thread :-) I know reusing Searchers would help but that is my choice and I will get to that later Now you wnat to know why Thread locals are stored in a table of entries. Each entry is *weak reference* to the key (Here the TermInfosReader instance) and a *simple reference* to the thread local value. When the instance is GCed its key becomes null. This is now a stale entry in the table. Stale entries are cleared up in an ad hoc way and until they are cleared up the value will not be garbage collected. Until the instance is GCed it is a valid key and its presence may cause the table to expand. See the ThreadLocal code. So if you have lots of threads, all creating thread locals rapidly, you can get each thread holding a large table of thread locals which all contain many stale entries and preventing some objects from being garbage collected. The limited GC of the thread local table is not enough to save you from running out of memory. Summary: - remove finalizer() - set the thread local to null in close() - values will be available for gc -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
query parsing
Using lucene 1.4.3, if I use the query +cat AND -dog it parses to +cat -dog and works correctly. If I use (+cat) AND (-dog) it parses to +(+cat) +(-dog) and returns no results. Is this a known issue?
[jira] Created: (LUCENE-530) Extend NumberTools to support int/long/float/double to string
Extend NumberTools to support int/long/float/double to string -- Key: LUCENE-530 URL: http://issues.apache.org/jira/browse/LUCENE-530 Project: Lucene - Java Type: Improvement Components: Analysis Versions: 1.9 Reporter: Andy Hind Priority: Minor Extend Number tools to support int/long/float/double to string So you can search using range queries on int/long/float/double, if you want. Here is the basis for how NumberTools cold be extended to support int/long/double/float. As I only write these values to the index and fix tokenisation in searchesI was not so fussed about the reverse transformations back to Strings. public class NumericEncoder { /* * Constants for integer encoding */ static int INTEGER_SIGN_MASK = 0x8000; /* * Constants for long encoding */ static long LONG_SIGN_MASK = 0x8000L; /* * Constants for float encoding */ static int FLOAT_SIGN_MASK = 0x8000; static int FLOAT_EXPONENT_MASK = 0x7F80; static int FLOAT_MANTISSA_MASK = 0x007F; /* * Constants for double encoding */ static long DOUBLE_SIGN_MASK = 0x8000L; static long DOUBLE_EXPONENT_MASK = 0x7FF0L; static long DOUBLE_MANTISSA_MASK = 0x000FL; private NumericEncoder() { super(); } /** * Encode an integer into a string that orders correctly using string * comparison Integer.MIN_VALUE encodes as and MAX_VALUE as * . * * @param intToEncode * @return */ public static String encode(int intToEncode) { int replacement = intToEncode ^ INTEGER_SIGN_MASK; return encodeToHex(replacement); } /** * Encode a long into a string that orders correctly using string comparison * Long.MIN_VALUE encodes as and MAX_VALUE as * . * * @param longToEncode * @return */ public static String encode(long longToEncode) { long replacement = longToEncode ^ LONG_SIGN_MASK; return encodeToHex(replacement); } /** * Encode a float into a string that orders correctly according to string * comparison. Note that there is no negative NaN but there are codings that * imply this. So NaN and -Infinity may not compare as expected. * * @param floatToEncode * @return */ public static String encode(float floatToEncode) { int bits = Float.floatToIntBits(floatToEncode); int sign = bits & FLOAT_SIGN_MASK; int exponent = bits & FLOAT_EXPONENT_MASK; int mantissa = bits & FLOAT_MANTISSA_MASK; if (sign != 0) { exponent ^= FLOAT_EXPONENT_MASK; mantissa ^= FLOAT_MANTISSA_MASK; } sign ^= FLOAT_SIGN_MASK; int replacement = sign | exponent | mantissa; return encodeToHex(replacement); } /** * Encode a double into a string that orders correctly according to string * comparison. Note that there is no negative NaN but there are codings that * imply this. So NaN and -Infinity may not compare as expected. * * @param doubleToEncode * @return */ public static String encode(double doubleToEncode) { long bits = Double.doubleToLongBits(doubleToEncode); long sign = bits & DOUBLE_SIGN_MASK; long exponent = bits & DOUBLE_EXPONENT_MASK; long mantissa = bits & DOUBLE_MANTISSA_MASK; if (sign != 0) { exponent ^= DOUBLE_EXPONENT_MASK; mantissa ^= DOUBLE_MANTISSA_MASK; } sign ^= DOUBLE_SIGN_MASK; long replacement = sign | exponent | mantissa; return encodeToHex(replacement); } private static String encodeToHex(int i) { char[] buf = new char[] { '0', '0', '0', '0', '0', '0', '0', '0' }; int charPos = 8; do { buf[--charPos] = DIGITS[i & MASK]; i >>>= 4; } while (i != 0); return new String(buf); } private static String encodeToHex(long l) { char[] buf = new char[] { '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0' }; int charPos = 16; do { buf[--charPos] = DIGITS[(int) l & MASK]; l >>>= 4; } while (l != 0); return new String(buf); } private static final char[] DIGITS = { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f' }; private static final int MASK = (1 << 4) - 1; } public class NumericEncodingTest extends TestCase { public NumericEncodingTest() { super(); } public NumericEncodingTest(String arg0) { sup
[jira] Commented: (LUCENE-530) Extend NumberTools to support int/long/float/double to string
[ http://issues.apache.org/jira/browse/LUCENE-530?page=comments#action_12371446 ] Yonik Seeley commented on LUCENE-530: - Here is how Solr did it: http://svn.apache.org/viewcvs.cgi/incubator/solr/trunk/src/java/org/apache/solr/util/NumberUtils.java?rev=382610&view=markup It's a binary representation transformed to sort correctly and fit in to chars. A 4 byte int or float is transformed into 3 java chars An 8 byte long or double is transformed into 5 java chars > Extend NumberTools to support int/long/float/double to string > - > > Key: LUCENE-530 > URL: http://issues.apache.org/jira/browse/LUCENE-530 > Project: Lucene - Java > Type: Improvement > Components: Analysis > Versions: 1.9 > Reporter: Andy Hind > Priority: Minor > > Extend Number tools to support int/long/float/double to string > So you can search using range queries on int/long/float/double, if you want. > Here is the basis for how NumberTools cold be extended to support > int/long/double/float. > As I only write these values to the index and fix tokenisation in searchesI > was not so fussed about the reverse transformations back to Strings. > public class NumericEncoder > { > /* > * Constants for integer encoding > */ > static int INTEGER_SIGN_MASK = 0x8000; > /* > * Constants for long encoding > */ > static long LONG_SIGN_MASK = 0x8000L; > /* > * Constants for float encoding > */ > static int FLOAT_SIGN_MASK = 0x8000; > static int FLOAT_EXPONENT_MASK = 0x7F80; > static int FLOAT_MANTISSA_MASK = 0x007F; > /* > * Constants for double encoding > */ > static long DOUBLE_SIGN_MASK = 0x8000L; > static long DOUBLE_EXPONENT_MASK = 0x7FF0L; > static long DOUBLE_MANTISSA_MASK = 0x000FL; > private NumericEncoder() > { > super(); > } > /** > * Encode an integer into a string that orders correctly using string > * comparison Integer.MIN_VALUE encodes as and MAX_VALUE as > * . > * > * @param intToEncode > * @return > */ > public static String encode(int intToEncode) > { > int replacement = intToEncode ^ INTEGER_SIGN_MASK; > return encodeToHex(replacement); > } > /** > * Encode a long into a string that orders correctly using string > comparison > * Long.MIN_VALUE encodes as and MAX_VALUE as > * . > * > * @param longToEncode > * @return > */ > public static String encode(long longToEncode) > { > long replacement = longToEncode ^ LONG_SIGN_MASK; > return encodeToHex(replacement); > } > /** > * Encode a float into a string that orders correctly according to string > * comparison. Note that there is no negative NaN but there are codings > that > * imply this. So NaN and -Infinity may not compare as expected. > * > * @param floatToEncode > * @return > */ > public static String encode(float floatToEncode) > { > int bits = Float.floatToIntBits(floatToEncode); > int sign = bits & FLOAT_SIGN_MASK; > int exponent = bits & FLOAT_EXPONENT_MASK; > int mantissa = bits & FLOAT_MANTISSA_MASK; > if (sign != 0) > { > exponent ^= FLOAT_EXPONENT_MASK; > mantissa ^= FLOAT_MANTISSA_MASK; > } > sign ^= FLOAT_SIGN_MASK; > int replacement = sign | exponent | mantissa; > return encodeToHex(replacement); > } > /** > * Encode a double into a string that orders correctly according to string > * comparison. Note that there is no negative NaN but there are codings > that > * imply this. So NaN and -Infinity may not compare as expected. > * > * @param doubleToEncode > * @return > */ > public static String encode(double doubleToEncode) > { > long bits = Double.doubleToLongBits(doubleToEncode); > long sign = bits & DOUBLE_SIGN_MASK; > long exponent = bits & DOUBLE_EXPONENT_MASK; > long mantissa = bits & DOUBLE_MANTISSA_MASK; > if (sign != 0) > { > exponent ^= DOUBLE_EXPONENT_MASK; > mantissa ^= DOUBLE_MANTISSA_MASK; > } > sign ^= DOUBLE_SIGN_MASK; > long replacement = sign | exponent | mantissa; > return encodeToHex(replacement); > } > private static String encodeToHex(int i) > { > char[] buf = new char[] { '0', '0', '0', '0', '0', '0', '0', '0' }; > int charPos = 8; > do > { > buf[--charPos] = DIGITS[i & MASK]; > i >>>= 4; > } > while (i != 0); > return new String(buf); > } > private
RE: [jira] Created: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException
For every IndexReader that is opened - there is one SegmentReader for every segment in the index - with its thread local - for each of these there is a TermInfosReader + its thread local. So I get 2 * (no of index segments) thread locals. I am creating index readers for a main index and transactional updates and layering the two. At the moment this is an issue, under stress testing, using tomcat, with thread pooling, with a pretty big changing index, left running for a few hours, it blows up. Thread locals are also used in other areas of the app. It would be better if threads were created and destroyed! It is certainly not insignificant for me and gives a JVM that creeps up in size pretty steadily over time. I have fixed this issue locally in the code and it works. Regards Andy -Original Message- From: Robert Engels [mailto:[EMAIL PROTECTED] Sent: 22 March 2006 17:46 To: java-dev@lucene.apache.org Subject: RE: [jira] Created: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException There was a small mistake - there is a single TermInfoReader per segment. -Original Message- From: Robert Engels [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 22, 2006 11:37 AM To: java-dev@lucene.apache.org Subject: RE: [jira] Created: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException There is only a single TermInfoReader per index. In order to share this instance with multiple threads, and avoid the overhead of creating new enumerators for each request, the enumerator for the thread is stored in a thread local. Normally, in a server application, threads are pooled, so new threads are not constantly created and destroyed, so the memory leak is insiginificant. The same reasoning holds true for the SegmentReader class. -Original Message- From: Andy Hind (JIRA) [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 22, 2006 11:07 AM To: java-dev@lucene.apache.org Subject: [jira] Created: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException Key: LUCENE-529 URL: http://issues.apache.org/jira/browse/LUCENE-529 Project: Lucene - Java Type: Bug Components: Index Versions: 1.9 Environment: Lucene 1.4.3 with 1.5.0_04 JVM or newer..will aplpy to 1.9 code Reporter: Andy Hind TermInfosReader uses an instance level ThreadLocal for enumerators. This is a transient/odd memory leak in lucene 1.4.3-1.9 and applies to current JVMs, not just an old JVM issue as described in the finalizer of the 1.9 code. There is also an instance level thread local in SegmentReaderwhich will have the same issue. There may be other uses which also need to be fixed. I don't understand the intended use for these variables.however Each ThreadLocal has its own hashcode used for look up, see the ThreadLocal source code. Each instance of TermInfosReader will be creating an instance of the thread local. All this does is create an instance variable on each thread when it accesses the thread local. Setting it to null in the finaliser will set it to null on one thread, the finalizer thread, where it has never been created. There is no point to this :-( I assume there is a good concurrency reason why an instance variable can not be used... I have not used multi-threaded searching, but I have used a lot of threads each making searchers and searching. 1.4.3 has a clear memory leak caused by this thread local. This use case above is definitely solved by setting the thread local to null in the close(). This at least has a chance of being on the correct thread :-) I know reusing Searchers would help but that is my choice and I will get to that later Now you wnat to know why Thread locals are stored in a table of entries. Each entry is *weak reference* to the key (Here the TermInfosReader instance) and a *simple reference* to the thread local value. When the instance is GCed its key becomes null. This is now a stale entry in the table. Stale entries are cleared up in an ad hoc way and until they are cleared up the value will not be garbage collected. Until the instance is GCed it is a valid key and its presence may cause the table to expand. See the ThreadLocal code. So if you have lots of threads, all creating thread locals rapidly, you can get each thread holding a large table of thread locals which all contain many stale entries and preventing some objects from being garbage collected. The limited GC of the thread local table is not enough to save you from running out of memory. Summary: - remove finalizer() - set the thread local to null
Re: query parsing
On Mittwoch 22 März 2006 18:49, Robert Engels wrote: > If I use > > (+cat) AND (-dog) > > it parses to > > +(+cat) +(-dog) > > and returns no results. > > Is this a known issue? Basically yes. QueryParser is known to exhibit strange behavior when combining +/- and AND/OR/NOT. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [jira] Created: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException
Creating and destroying threads is one of the worst performing operations, and should be avoided at ALMOST all costs. I do not see this problem in my server impl of Lucene, internally multithreaded, and accessed via multiple threads from a Tomcat server. I have to assume many (most?) users of Lucene are doing so in a multithreaded server environment. I reviewed the bugs in java.sun related to memory leaks with ThreadLocal's. I don't think any of them apply in this case. Maybe you could provide a simplified ThreadLocal testcase that demonstrates the 'out of memory' condition? Are you sure that you do not have a modified version of Lucene that is somehow maintain a reference back to the ThreadLocal from the ThreadLocal's value, as this is a known JDK issue http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6254531 I don't see this bug as being applicable to the 1.9.1 or 1.4.3 code. Did you try running your server using 1.4.3? (our server code is based off the 1.4.3 codeset at this time). -Original Message- From: Andy Hind [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 22, 2006 12:48 PM To: java-dev@lucene.apache.org; [EMAIL PROTECTED] Subject: RE: [jira] Created: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException For every IndexReader that is opened - there is one SegmentReader for every segment in the index - with its thread local - for each of these there is a TermInfosReader + its thread local. So I get 2 * (no of index segments) thread locals. I am creating index readers for a main index and transactional updates and layering the two. At the moment this is an issue, under stress testing, using tomcat, with thread pooling, with a pretty big changing index, left running for a few hours, it blows up. Thread locals are also used in other areas of the app. It would be better if threads were created and destroyed! It is certainly not insignificant for me and gives a JVM that creeps up in size pretty steadily over time. I have fixed this issue locally in the code and it works. Regards Andy -Original Message- From: Robert Engels [mailto:[EMAIL PROTECTED] Sent: 22 March 2006 17:46 To: java-dev@lucene.apache.org Subject: RE: [jira] Created: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException There was a small mistake - there is a single TermInfoReader per segment. -Original Message- From: Robert Engels [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 22, 2006 11:37 AM To: java-dev@lucene.apache.org Subject: RE: [jira] Created: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException There is only a single TermInfoReader per index. In order to share this instance with multiple threads, and avoid the overhead of creating new enumerators for each request, the enumerator for the thread is stored in a thread local. Normally, in a server application, threads are pooled, so new threads are not constantly created and destroyed, so the memory leak is insiginificant. The same reasoning holds true for the SegmentReader class. -Original Message- From: Andy Hind (JIRA) [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 22, 2006 11:07 AM To: java-dev@lucene.apache.org Subject: [jira] Created: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException Key: LUCENE-529 URL: http://issues.apache.org/jira/browse/LUCENE-529 Project: Lucene - Java Type: Bug Components: Index Versions: 1.9 Environment: Lucene 1.4.3 with 1.5.0_04 JVM or newer..will aplpy to 1.9 code Reporter: Andy Hind TermInfosReader uses an instance level ThreadLocal for enumerators. This is a transient/odd memory leak in lucene 1.4.3-1.9 and applies to current JVMs, not just an old JVM issue as described in the finalizer of the 1.9 code. There is also an instance level thread local in SegmentReaderwhich will have the same issue. There may be other uses which also need to be fixed. I don't understand the intended use for these variables.however Each ThreadLocal has its own hashcode used for look up, see the ThreadLocal source code. Each instance of TermInfosReader will be creating an instance of the thread local. All this does is create an instance variable on each thread when it accesses the thread local. Setting it to null in the finaliser will set it to null on one thread, the finalizer thread, where it has never been created. There is no point to this :-( I assume there is a good concurrency reason why an instance variable can not be used... I have not used multi-threaded searching, but I
[jira] Commented: (LUCENE-529) TermInfosReader and other + instance ThreadLocal => transient/odd memory leaks => OutOfMemoryException
[ http://issues.apache.org/jira/browse/LUCENE-529?page=comments#action_12371463 ] Otis Gospodnetic commented on LUCENE-529: - This sounds like something that should be reproducable when written as a JUnit test. Could you write one and attach it to this? We could then clearly see the OOM problems and how your changes (patch?) fix it. Also, you said you made your changes locally and things work now. Have you been running your locally-modified code in a serious production environment for days/weeks, and have you observed any side-effects? > TermInfosReader and other + instance ThreadLocal => transient/odd memory > leaks => OutOfMemoryException > --- > > Key: LUCENE-529 > URL: http://issues.apache.org/jira/browse/LUCENE-529 > Project: Lucene - Java > Type: Bug > Components: Index > Versions: 1.9 > Environment: Lucene 1.4.3 with 1.5.0_04 JVM or newer..will aplpy to 1.9 > code > Reporter: Andy Hind > > TermInfosReader uses an instance level ThreadLocal for enumerators. > This is a transient/odd memory leak in lucene 1.4.3-1.9 and applies to > current JVMs, > not just an old JVM issue as described in the finalizer of the 1.9 code. > There is also an instance level thread local in SegmentReaderwhich will > have the same issue. > There may be other uses which also need to be fixed. > I don't understand the intended use for these variables.however > Each ThreadLocal has its own hashcode used for look up, see the ThreadLocal > source code. Each instance of TermInfosReader will be creating an instance of > the thread local. All this does is create an instance variable on each thread > when it accesses the thread local. Setting it to null in the finaliser will > set it to null on one thread, the finalizer thread, where it has never been > created. There is no point to this :-( > I assume there is a good concurrency reason why an instance variable can not > be used... > I have not used multi-threaded searching, but I have used a lot of threads > each making searchers and searching. > 1.4.3 has a clear memory leak caused by this thread local. This use case > above is definitely solved by setting the thread local to null in the > close(). This at least has a chance of being on the correct thread :-) > I know reusing Searchers would help but that is my choice and I will get to > that later > Now you wnat to know why > Thread locals are stored in a table of entries. Each entry is *weak > reference* to the key (Here the TermInfosReader instance) and a *simple > reference* to the thread local value. When the instance is GCed its key > becomes null. > This is now a stale entry in the table. > Stale entries are cleared up in an ad hoc way and until they are cleared up > the value will not be garbage collected. > Until the instance is GCed it is a valid key and its presence may cause the > table to expand. > See the ThreadLocal code. > So if you have lots of threads, all creating thread locals rapidly, you can > get each thread holding a large table of thread locals which all contain many > stale entries and preventing some objects from being garbage collected. > The limited GC of the thread local table is not enough to save you from > running out of memory. > Summary: > > - remove finalizer() > - set the thread local to null in close() > - values will be available for gc -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: query parsing
Any suggestions on what to do then, as the following query exhibits the same behavior (+cat) (-dog) Due to the implied AND. Removing the parenthesis allows it to work. It doesn't seem that adding parenthesis in this case should cause the query to fail??? Doesn't it suggest that there is a bug in the BooleanQuery scorer is not handling the case of a REQUIRED clause that is a BooleanQuery, that consists of a single prohibited boolean clause? -Original Message- From: Daniel Naber [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 22, 2006 1:03 PM To: java-dev@lucene.apache.org Subject: Re: query parsing On Mittwoch 22 März 2006 18:49, Robert Engels wrote: > If I use > > (+cat) AND (-dog) > > it parses to > > +(+cat) +(-dog) > > and returns no results. > > Is this a known issue? Basically yes. QueryParser is known to exhibit strange behavior when combining +/- and AND/OR/NOT. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]