Re: [jira] Commented: (LUCENE-1519) Change Primitive Data Types from int to long in class SegmentMerger.java

2009-01-13 Thread robert engels
Fairly certain it belongs in the dev-list because it is a bug... The index length is a long, but the left size will be truncated in int math before it is converted to a long and compared. On Jan 13, 2009, at 11:36 PM, Otis Gospodnetic (JIRA) wrote: [

Re: Filesystem based bitset

2009-01-10 Thread robert engels
times, or redo his crap all the time. In fact, I would work hard to see that the latter did not work there very long. On Jan 10, 2009, at 7:53 AM, Grant Ingersoll wrote: On Jan 9, 2009, at 8:06 PM, robert engels wrote: Luckily there are entrepreneurs and other managers/owners that value

Re: Filesystem based bitset

2009-01-10 Thread robert engels
possible - different people have different capacities. And if the shoe is on the other foot, it wastes my time, unless the person being talked to has demonstrated a willingness and ability to learn. You can't get blood from a turnip. On Jan 10, 2009, at 11:57 AM, robert engels wrote: You

Re: [jira] Commented: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)

2009-01-09 Thread robert engels
This is not really true these days. Dynamic class instrumentation/ byte modification can remove the calls entirely (for loggers not enabled). They can be enabled during startup (or a reload from a different class loader). See the paper at

Re: [jira] Commented: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)

2009-01-09 Thread robert engels
. Normally this is done like this: if(Logger.isEnabled(loggername)) { Logger.log(loggername,xxx); } The runtime loader can detect the Logger.isEnabled() byte code, and remove the entire if statement during class loading. On Jan 9, 2009, at 1:11 PM, robert engels wrote

Re: [jira] Commented: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)

2009-01-09 Thread robert engels
Mangar wrote: On Sat, Jan 10, 2009 at 12:41 AM, robert engels reng...@ix.netcom.com wrote: This is not really true these days. Dynamic class instrumentation/ byte modification can remove the calls entirely (for loggers not enabled). They can be enabled during startup (or a reload from

Re: Filesystem based bitset

2009-01-09 Thread robert engels
If your index can fit in the IO cache, you should using a completely different implementation... You should be writing a sequential transaction log for add/update/ delete operations, and storing the entire index in memory (RAMDirectory) - with periodic background flushes of the log. If

Re: Filesystem based bitset

2009-01-09 Thread robert engels
35PM -0600, robert engels wrote: If your index can fit in the IO cache, you should using a completely  different implementation...You should be writing a sequential transaction log for add/update/ delete operations, and storing the entire index in memory  (RAMDirectory) - with periodic background flushes

Re: Filesystem based bitset

2009-01-09 Thread robert engels
Can something be offensive if its a statement of fact ?  If you believe it is (under definition #3), then his remarks to me were just as offensive - as they caused me much displeasure and resentment. So please dress him down as well.Main Entry: 1of·fen·sive  Pronunciation: \ə-ˈfen(t)-siv,

Re: Filesystem based bitset

2009-01-09 Thread robert engels
that your approach would work better for them. I've even created a space for you on Google-Code for you to show them:- http://code.google.com/p/roberts-search/ Sincerely Ian. robert engels wrote: I have better things to do than read a 10,000 word incident that discusses about 100

Re: Filesystem based bitset

2009-01-09 Thread robert engels
It was not ad hominem. It was a indirect critique of the value of the answer provided. Ad hominem would be if I called him ugly. On Jan 9, 2009, at 6:34 PM, Doug Cutting wrote: robert engels wrote: Can something be offensive if its a statement of fact ? If you believe it is (under

Re: [jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-08 Thread robert engels
The way we've simplified this that every document has an OID. It simplifies updates and delete tracking (in the transaction log). On Jan 8, 2009, at 2:28 PM, Marvin Humphrey (JIRA) wrote: [ https://issues.apache.org/jira/browse/LUCENE-1476?

Re: [jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-07 Thread robert engels
Why not just write the first byte as 0 for a bitsit, and 1 for a sparse bit set (compressed), and make the determination when writing based on the segment size and/or number of set bits. On Jan 7, 2009, at 8:38 PM, Marvin Humphrey (JIRA) wrote: [

Re: [jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-07 Thread robert engels
be less than the size standard disk block (which is probably 32k these days, meaning 256k documents). On Jan 7, 2009, at 10:28 PM, Marvin Humphrey wrote: On Wed, Jan 07, 2009 at 09:28:40PM -0600, robert engels wrote: Why not just write the first byte as 0 for a bitsit, and 1 for a sparse bit

Re: [jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2009-01-07 Thread robert engels
. Or the policy can be pluggable and the shared can use the old bitset method. On Jan 8, 2009, at 12:04 AM, Marvin Humphrey wrote: On Wed, Jan 07, 2009 at 10:36:01PM -0600, robert engels wrote: Yes, and I don't think the worst-case is correct. When you go to write that segment, you determine

Re: [jira] Commented: (LUCENE-1513) fastss fuzzyquery

2009-01-06 Thread robert engels
Why not just create a new field for this? That is, if you have FieldA, create field FieldAFuzzy and put the various permutations there. The fuzzy scorer/parser can be changed to automatically use the Fuzzy field when required. You could also store positions, and allow that the first

Re: [jira] Commented: (LUCENE-1513) fastss fuzzyquery

2009-01-06 Thread robert engels
I don't think that is the case. You will have single deletion neighborhood. The number of unique terms in the field is going to be the union of the deletion dictionaries of each source term. For example, given the following documents A which have field 'X' with value best, and document B

Re: [jira] Commented: (LUCENE-1513) fastss fuzzyquery

2009-01-06 Thread robert engels
neighborhood term... i think its worth investigating, maybe performance would actually be better, just curious. i think i boxed myself in to auxiliary index because of some other irrelevant thigns i am doing. On Tue, Jan 6, 2009 at 4:42 PM, robert engels reng...@ix.netcom.com wrote: I don't think

Re: [jira] Commented: (LUCENE-1513) fastss fuzzyquery

2009-01-06 Thread robert engels
for 'robrt engels' works. So does 'obert engels', so does 'robt engels', all ask me if I meant 'robert engels', but searching for 'obrt engels' does not. On Jan 6, 2009, at 4:15 PM, robert engels wrote: It is definitely going to increase the index size, but not any more than than the external one

Re: [jira] Commented: (LUCENE-1513) fastss fuzzyquery

2009-01-06 Thread robert engels
6, 2009, at 4:29 PM, Robert Muir wrote: On Tue, Jan 6, 2009 at 5:15 PM, robert engels reng...@ix.netcom.com wrote: It is definitely going to increase the index size, but not any more than than the external one would (if my understanding is correct). The nice thing is that you don't have

Re: Realtime Search

2009-01-05 Thread robert engels
Then your comments are misdirected. On Jan 5, 2009, at 1:19 PM, Doug Cutting wrote: Robert Engels wrote: Do what you like. You obviously will. This is the problem with the Lucene managers - the problems are only the ones they see - same with the solutions. If the solution (or questions

Re: Realtime Search

2008-12-26 Thread Robert Engels
To: java-dev@lucene.apache.org Subject: Re: Realtime Search On Wed, Dec 24, 2008 at 12:02:24PM -0600, robert engels wrote: As I understood this discussion though, it was an attempt to remove the in memory 'skip to' index, to avoid the reading of this during index open/reopen. No. That idea

Re: Realtime Search

2008-12-26 Thread Robert Engels
to be significantly smaller (improving the write time, and the cache efficiency). -Original Message- From: Robert Engels reng...@ix.netcom.com Sent: Dec 26, 2008 11:30 AM To: java-dev@lucene.apache.org, java-dev@lucene.apache.org Subject: Re: Realtime Search That could very well be, but I

Re: Realtime Search

2008-12-26 Thread Robert Engels
This is what we mostly do, but we serialize the documents to a log file first, so if server crashes before the background merge of the RAM segments into the disk segments completes, we can replay the operations on server restart. Since the serialize is a sequential write to an already open

Re: Realtime Search

2008-12-26 Thread Robert Engels
If you move to the either embedded, or server model, the post reopen is trivial, as the structures can be created as the segment is written. It is the networked shared access model that causes a lot of these optimizations to be far more complex than needed. Would it maybe be simpler to move

Re: Realtime Search

2008-12-26 Thread Robert Engels
There is also the distributed model - but in that case each node is running some sort of server anyway (as in Hadoop). It seems that the distributed model would be easier to develop using Hadoop over the embedded model. -Original Message- From: Robert Engels reng...@ix.netcom.com Sent

Re: Realtime Search

2008-12-26 Thread Robert Engels
, they are ignored or dismissed in a tone that is designed to limit any further questions (especially those that might question their ability and/or understanding). -Original Message- From: Marvin Humphrey mar...@rectangular.com Sent: Dec 26, 2008 3:53 PM To: java-dev@lucene.apache.org, Robert

Re: Realtime Search

2008-12-24 Thread robert engels
! On Dec 23, 2008, at 11:02 PM, robert engels wrote: Seems doubtful you will be able to do this without increasing the index size dramatically. Since it will need to be stored unpacked (in order to have random access), yet the terms are variable length - leading to using a maximum=minimum size

Re: Realtime Search

2008-12-24 Thread robert engels
December 2008 17:51:04 schreef robert engels: Thinking about this some more, you could use fixed length pages for the term index, with a page header containing a count of entries, and use key compression (to avoid the constant entry size). The problem with this is that you still have to decode

Re: Realtime Search

2008-12-24 Thread robert engels
On Dec 24, 2008, at 12:23 PM, Jason Rutherglen wrote: Also, what are the requirements? Must a document be visible to search within 10ms of being added? 0-5ms. Otherwise it's not realtime, it's batch indexing. The realtime system can support small batches by encoding them into

Re: Realtime Search

2008-12-23 Thread robert engels
Is there something that I am missing? I see lots of references to using memory mapped files to dramatically improve performance. I don't think this is the case at all. At the lowest levels, it is somewhat more efficient from a CPU standpoint, but with a decent OS cache the IO performance

Re: Realtime Search

2008-12-23 Thread robert engels
: On Tue, Dec 23, 2008 at 08:36:24PM -0600, robert engels wrote: Is there something that I am missing? Yes. I see lots of references to using memory mapped files to dramatically improve performance. There have been substantial discussions about this design in JIRA, notably LUCENE-1458

Re: Realtime Search

2008-12-23 Thread robert engels
on every invocation. On Dec 23, 2008, at 9:20 PM, Marvin Humphrey wrote: On Tue, Dec 23, 2008 at 08:36:24PM -0600, robert engels wrote: Is there something that I am missing? Yes. I see lots of references to using memory mapped files to dramatically improve performance. There have

Re: [jira] Created: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

2008-12-10 Thread robert engels
If wildcards and fuzzyies are supported, why not range ? We have a custom range in phrase parser, and it works really well, but we would like to use standard Lucene is possible. On Dec 10, 2008, at 12:18 PM, Mark Harwood (JIRA) wrote: Wildcards, ORs etc inside Phrase queries

[jira] Commented: (LUCENE-1475) Expose sub-IndexReaders from MultiReader or MultiSegmentReader

2008-12-09 Thread robert engels (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12654924#action_12654924 ] robert engels commented on LUCENE-1475: --- That is not correct. By returning a non

[jira] Commented: (LUCENE-1475) Expose sub-IndexReaders from MultiReader or MultiSegmentReader

2008-12-08 Thread robert engels (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12654518#action_12654518 ] robert engels commented on LUCENE-1475: --- I think the API is wrong. The method

Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

2008-12-08 Thread robert engels
I think an important piece to make this work is the query parser/syntax. We already have a system similar to what is outlined below. We made changes to the query syntax to support our various query extensions. The nice thing, is that persisting queries is a simple string. It also makes

Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

2008-12-08 Thread robert engels
sophisticated and extensible XML query parser in contrib. I've still only scratched the surface of it, but it meets the specs you mentioned. Erik On Dec 8, 2008, at 4:51 PM, robert engels wrote: I think an important piece to make this work is the query parser/ syntax. We

Re: [jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

2008-12-08 Thread robert engels
insane parsing performance. Is there any reason to worry about library-bundled parsers if you're making something more complex then a college project? On Tue, Dec 9, 2008 at 01:49, robert engels [EMAIL PROTECTED] wrote: The problem with that is that in most cases you still need a string based

Re: [jira] Commented: (LUCENE-831) Complete overhaul of FieldCache API/Implementation

2008-12-07 Thread robert engels
One thing to keep in mind about using the field cache for filter caching. The filter bitset cache at worst holds 8 documents per byte (and with bitset compression this can be even more efficient). Using the field cache is going to rather be bytes per document, most likely at least an

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2008-12-05 Thread robert engels (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653793#action_12653793 ] robert engels commented on LUCENE-1476: --- I don't think you can change

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2008-12-05 Thread robert engels (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653912#action_12653912 ] robert engels commented on LUCENE-1476: --- but IndexReader.document(n) throws

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2008-12-05 Thread robert engels (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653954#action_12653954 ] robert engels commented on LUCENE-1476: --- That's my point, in complex multi-treaded

[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

2008-12-04 Thread robert engels (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653421#action_12653421 ] robert engels commented on LUCENE-1473: --- Even if you changed SUIDs based on version

jira attachments ?

2008-12-04 Thread robert engels
I am having a problem posting an attachment to Jira. Just spins, and spins... Everything else seems to work fine (comments, etc.). Anyone else experiencing this? Thanks. - To unsubscribe, e-mail: [EMAIL PROTECTED] For

Re: [jira] Commented: (LUCENE-855) MemoryCachedRangeFilter to boost performance of Range queries

2008-12-04 Thread robert engels
I can't seem to post to Jira, so I am attaching here...I attached QueryFilter.java.In reading this patch, and other similar ones, the problem seems to be that if the index is modified, the cache is invalidated, causing a complete reload of the cache. Do I have this correct?The attached patch

Re: jira attachments ?

2008-12-04 Thread robert engels
Dear God, I've been blocked ! What will the Lucene community do ! :) On Dec 4, 2008, at 3:27 PM, Uwe Schindler wrote: Hi Robert, two minutes ago I uploaded a patch... Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: [EMAIL PROTECTED] From: robert

Re: [jira] Commented: (LUCENE-855) MemoryCachedRangeFilter to boost performance of Range queries

2008-12-04 Thread robert engels
] From: robert engels [mailto:[EMAIL PROTECTED] Sent: Thursday, December 04, 2008 9:39 PM To: java-dev@lucene.apache.org Subject: Re: [jira] Commented: (LUCENE-855) MemoryCachedRangeFilter to boost performance of Range queries I can't seem to post to Jira, so I am

Re: [jira] Commented: (LUCENE-855) MemoryCachedRangeFilter to boost performance of Range queries

2008-12-04 Thread robert engels
more memory, as every field used needs to be cached. With my code you would only have a single bitset for the filter. On Dec 4, 2008, at 4:00 PM, robert engels wrote: Lucene-831 is far more comprehensive. I also think that by exposing access to the sub-readers it can be far simpler (closer

Re: jira attachments ?

2008-12-04 Thread robert engels
I am using Safari 3.2 (on OSX Tiger). On Dec 4, 2008, at 5:38 PM, Michael McCandless wrote: Robert which browser are you using? Mike robert engels wrote: Dear God, I've been blocked ! What will the Lucene community do ! :) On Dec 4, 2008, at 3:27 PM, Uwe Schindler wrote: Hi Robert

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

2008-12-03 Thread robert engels (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12652843#action_12652843 ] robert engels commented on LUCENE-1473: --- I don't see why you can't just break

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

2008-12-03 Thread robert engels (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12652892#action_12652892 ] robert engels commented on LUCENE-1473: --- In regards to Doug's comment about

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

2008-12-03 Thread robert engels (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12652940#action_12652940 ] robert engels commented on LUCENE-1473: --- Jason, you are only partially correct

[jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

2008-12-03 Thread robert engels (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12652962#action_12652962 ] robert engels commented on LUCENE-1473: --- The reason the XML is not needed

[jira] Commented: (LUCENE-1473) Implement standard Serialization across Lucene versions

2008-12-03 Thread robert engels (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653058#action_12653058 ] robert engels commented on LUCENE-1473: --- Even better. Thanks Mark. Implement

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet

2008-12-03 Thread robert engels (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653076#action_12653076 ] robert engels commented on LUCENE-1476: --- BitSet is already random access, DocIdSet

Re: [jira] Commented: (LUCENE-1473) Implement Externalizable in main top level searcher classes

2008-12-03 Thread robert engels
, not Lucene's Making something protected is very different than making it public. Robert Engels On Dec 3, 2008, at 11:36 PM, John Wang wrote: Grant: I am sorry that I disagree with some points: 1) I think it's a sign that Lucene is pretty stable. - While lucene is a great project

[jira] Commented: (LUCENE-1472) DateTools.stringToDate() can cause lock contention under load

2008-12-01 Thread robert engels (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12652220#action_12652220 ] robert engels commented on LUCENE-1472: --- If you review the source

[jira] Commented: (LUCENE-1472) DateTools.stringToDate() can cause lock contention under load

2008-12-01 Thread robert engels (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12652221#action_12652221 ] robert engels commented on LUCENE-1472: --- The last comment was tested using Java 5

Re: Option to fsync files

2008-11-19 Thread robert engels
I would really like to see some PROOF of these drives lying. If that were the case, no database system would ever be reliable on these drives ! And data corruption would be happening all over the place ! On Nov 19, 2008, at 10:56 AM, Mark Miller wrote: Michael McCandless wrote: Mark

Re: Option to fsync files

2008-11-19 Thread robert engels
to you, there is more testing of it. I have also seen bits or pieces about it elsewhere. I choose to believe myself, but I will admit I was 100% wrong about santa clause, so take it for what its worth. I havn't tested it at all. robert engels wrote: I would really like to see some PROOF

Re: Option to fsync files

2008-11-19 Thread robert engels
The utility referenced no longer exists... and its no wonder. If is most likely that the tester did not have the drives configured properly. In almost all cases, if the drive did this, you could not run a database system with any resiliency. They would also have problems with shutdown -

Re: Option to fsync files

2008-11-19 Thread robert engels
I don't believe it - unless the older drives had no cache, in which case it wouldn't matter. It is also doubtful at the OS level, as system integrity would be hopelessly compromised... On Nov 19, 2008, at 12:11 PM, Mark Miller wrote: robert engels wrote: There is an option on some hard

Re: Option to fsync files

2008-11-19 Thread robert engels
It is not just database by the way, any journaling file system would be pointless... On Nov 19, 2008, at 12:55 PM, robert engels wrote: The utility referenced no longer exists... and its no wonder. If is most likely that the tester did not have the drives configured properly. In almost

Re: Allow IndexReader to take ownership of Directory

2008-11-18 Thread robert engels
Why not create new lightweight references to the the directory, and using WeakReferences and ReferenceQueues and avoid the need to manually use incRef and decRef ? Tracking state like this almost always leads to problems - this is why Java has GC in the first place - because it is very

[jira] Commented: (LUCENE-1383) Work around ThreadLocal's leak

2008-10-01 Thread robert engels (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12636075#action_12636075 ] robert engels commented on LUCENE-1383: --- It doesn't need to be fixed. It works fine

[jira] Commented: (LUCENE-1383) Work around ThreadLocal's leak

2008-10-01 Thread robert engels (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12636095#action_12636095 ] robert engels commented on LUCENE-1383: --- You cannot control this 'externally, since

Re: [jira] Updated: (LUCENE-1381) Hanging while indexing/digesting on multiple threads

2008-09-11 Thread robert engels
By the stacktraces, I think there may be a bug in MethodUtils. By it's name it would appear to be static, with a weak hash map of names to methods, but it appears that multiple threads are accessing the same map without synchronization This may be wrecking havoc with the WeakReference

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-11 Thread robert engels
out from under the ThreadLocal even before the ThreadLocal purges its stale entries. Mike robert engels wrote: You can't hold the ThreadLocal value in a WeakReference, because there is no hard reference between enumeration calls (so it would be cleared out from under you while enumerating

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-11 Thread robert engels
. That's it's only purpose. Then, when SegmentReader is closed this list is cleared and GC is free to reclaim all SegmentTermEnums. Mike robert engels wrote: But you need it by thread, so it can't be a list. You could have a HashMap of Thread,ThreadState in FieldsReader, and when

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-11 Thread robert engels
. Retrieving an existing thread has no sync. Mike robert engels wrote: You still need to sync access to the list, and how would it be removed from the list prior to close? That is you need one per thread, but you can have the reader shared across all threads. So if threads were created

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-11 Thread robert engels
. Mike robert engels wrote: You still need to sync access to the list, and how would it be removed from the list prior to close? That is you need one per thread, but you can have the reader shared across all threads. So if threads were created and destroyed without ever closing

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread robert engels
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ index.php?title=Create_Lucene_Database_Search_in_3_minutes DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding! On Tue, Sep 9, 2008 at 10:43 PM, robert engels [EMAIL PROTECTED] wrote

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread robert engels
=Create_Lucene_Database_Search_in_3_minutes DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding! On Wed, Sep 10, 2008 at 7:12 AM, robert engels [EMAIL PROTECTED] wrote: Sorry, but I am fairly certain you are mistaken. If you only have a single IndexReader, the RAMDirectory

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread robert engels
, at 10:34 AM, robert engels wrote: It is basic Java. Threads are not guaranteed to run on any sort of schedule. If you create lots of large objects in one thread, releasing them in another, there is a good chance you will get an OOM (since the releasing thread may not run before the OOM occurs

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread robert engels
, and no idea when it'll be released. Of course ThreadLocal is not Lucene's problem... Chris On Wed, Sep 10, 2008 at 8:34 AM, robert engels [EMAIL PROTECTED] wrote: It is basic Java. Threads are not guaranteed to run on any sort of schedule. If you create lots of large objects in one thread

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread robert engels
, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding! On Tue, Sep 9, 2008 at 10:43 PM, robert engels [EMAIL PROTECTED] wrote: You do not need a pool of IndexReaders... It does not matter what class it is, what matters is the class that ultimately holds

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread robert engels
always clear. But it will be far more deterministic. If someone is interested I can post the class, but I think it is well within the understanding of the core Lucene developers. On Sep 10, 2008, at 11:10 AM, robert engels wrote: You do not need to create a new RAMDirectory - just write

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread robert engels
request) got 2.6 Million Euro funding! On Wed, Sep 10, 2008 at 9:10 AM, robert engels [EMAIL PROTECTED] wrote: You do not need to create a new RAMDirectory - just write to the existing one, and then reopen() the IndexReader using it. This will prevent lots of big objects being created

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread robert engels
, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding! On Wed, Sep 10, 2008 at 10:39 AM, robert engels [EMAIL PROTECTED] wrote: Close() does work - it is just that the memory may not be freed until much later... When working with VERY LARGE objects, this can

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread robert engels
=Create_Lucene_Database_Search_in_3_minutes DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding! On Wed, Sep 10, 2008 at 10:39 AM, robert engels [EMAIL PROTECTED] wrote: Close() does work - it is just that the memory may not be freed until much later... When working with VERY

[jira] Updated: (LUCENE-1195) Performance improvement for TermInfosReader

2008-09-10 Thread robert engels (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] robert engels updated LUCENE-1195: -- Attachment: SafeThreadLocal.java A safe ThreadLocal that can be used for more deterministic

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread robert engels
You can't hold the ThreadLocal value in a WeakReference, because there is no hard reference between enumeration calls (so it would be cleared out from under you while enumerating). All of this occurs because you have some objects (readers/segments etc.) that are shared across all threads,

[jira] Commented: (LUCENE-1195) Performance improvement for TermInfosReader

2008-09-10 Thread robert engels (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12630091#action_12630091 ] robert engels commented on LUCENE-1195: --- Also, SafeThreadLocal can be trivially

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-09 Thread robert engels
Your code is not correct. You cannot release it on another thread - the first thread may creating hundreds/thousands of instances before the other thread ever runs... On Sep 9, 2008, at 10:10 PM, Chris Lu wrote: If I release it on the thread that's creating the searcher, by setting

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-09 Thread robert engels
demo: http://search.dbsight.com Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ index.php?title=Create_Lucene_Database_Search_in_3_minutes DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding! On Tue, Sep 9, 2008 at 8:14 PM, robert

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-09 Thread robert engels
: http://wiki.dbsight.com/ index.php?title=Create_Lucene_Database_Search_in_3_minutes DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding! On Tue, Sep 9, 2008 at 9:03 PM, robert engels [EMAIL PROTECTED] wrote: You need to close the searcher within

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-09 Thread robert engels
://search.dbsight.com Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ index.php?title=Create_Lucene_Database_Search_in_3_minutes DBSight customer, a shopping comparison site, (anonymous per request) got 2.6 Million Euro funding! On Tue, Sep 9, 2008 at 10:14 PM, robert engels [EMAIL

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-09 Thread robert engels
As a follow-up, the SegmentTermEnum does contain an IndexInput and based on your configuration (buffer sizes, eg) this could be a large object, so you do need to be careful ! On Sep 10, 2008, at 12:14 AM, robert engels wrote: A searcher uses an IndexReader - the IndexReader is slow to open

[jira] Commented: (LUCENE-753) Use NIO positional read to avoid synchronization in FSIndexInput

2008-08-31 Thread robert engels (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12627313#action_12627313 ] robert engels commented on LUCENE-753: -- SUN is accepting outside bug fixes to the Open

performance optimizations

2008-07-23 Thread robert engels
I hope this doesn't offend anyone, but I think this is an excellent article that the Lucene development team might find helpful. I have often been dismayed at complex code being written to achieve negligible performance improvements. Most often, a micro benchmark is used to justify the

Re: performance optimizations

2008-07-23 Thread robert engels
performance is one of the primary features. -Yonik On Wed, Jul 23, 2008 at 4:01 PM, robert engels [EMAIL PROTECTED] wrote: I hope this doesn't offend anyone, but I think this is an excellent article that the Lucene development team might find helpful. I have often been dismayed at complex

Re: performance optimizations

2008-07-23 Thread robert engels
The bug/issue I was referring to is the pread/multiple file descriptors. This is a clear issue in the JVM, and has been for a long time. Count the hours spent of discussing/devising/debugging/implementing this issue, instead of just having it fixed in the JVM. Not worth the work IMO

Re: ThreadLocal in SegmentReader

2008-07-14 Thread robert engels
to be removed only when the table starts running out of space. How do you suggest to force the removal of stale entries to the Lucene user? Robert Engels wrote: You are mistaken - Yonik's comment in that thread is correct (although it is not just at table resize - any time a ThreadLocal

Re: ThreadLocal in SegmentReader

2008-07-14 Thread robert engels
is webapp itself and may redeploy other applications). Here a bunch of threads deploy webapp, and they are all different. 2. What if a user just wants to undeploy the webapp, without the redeploy? He expects the memory to be released, but it will not be. Robert Engels wrote: If you attempting

Re: ThreadLocal in SegmentReader

2008-07-12 Thread robert engels
collected. So I think we continue to use non-static ThreadLocals in Lucene... Mike robert engels wrote: Once again, these are static thread locals. A completely different issue. Since the object is available statically, the weak reference cannot be cleared so stale entries will never

Re: ThreadLocal in SegmentReader

2008-07-12 Thread robert engels
is that they are referenced by ThreadLocals map in the thread which is still alive. Robert Engels wrote: This is only an issue for static ThreadLocals ... On Jul 11, 2008, at 11:32 PM, Roman Puchkovskiy wrote: The problem here is not because ThreadLocal instances are not GC'd (they are GC'd

Re: ThreadLocal in SegmentReader

2008-07-11 Thread robert engels
in Lucene... Mike ThreadTest.java robert engels wrote: Once again, these are static thread locals. A completely different issue. Since the object is available statically, the weak reference cannot be cleared so stale entries will never be cleared as long as the thread is alive. On Jul 9

Re: [jira] Created: (LUCENE-1328) FileNotFoundException in

2008-07-08 Thread robert engels
On most system, /tmp is a link to /var/tmp, so it is cleared on reboot. On Jul 7, 2008, at 12:44 PM, Yajun wrote: My bad, we don't use /tmp explicitly. We use /var/tmp/ snapshot_timestamp which is not deleted by OS when reboot. --Yajun Robert Engels wrote: If your automatic recycle

Re: ThreadLocal in SegmentReader

2008-07-08 Thread robert engels
Using synchronization is a poor/invalid substitute for thread locals in many cases. The point of the thread local in these referenced cases is too allow streaming reads on a file descriptor. if you use a shared file descriptor/buffer you are going to continually invalidate the buffer. On

  1   2   3   4   5   >