Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-09 Thread yueyu lin
Hi,  We're building a server based Lucene. When doing the multiple threads performance test, we found a minor synchronized issue.  We found if we were using 2 IndexSearcher, we would get 10% performance benefit.   But if we increased the number of IndexSearcher from 2, the performance improvement b

Re: Changing Lucene scoring?

2006-05-09 Thread eks dev
Hi Otis, "I often need just yes/no (matches/doesn't match) answers,... " Not sure if you ment this: "how I could implement pure boolean model, completely avoiding scoring?". If yes, what comes to my mind is Filtering, ChainedFilter, ConstantScore* and all these discussions about implementing n

Re: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-09 Thread Chris Hostetter
: We found if we were using 2 IndexSearcher, we would get 10% performance : benefit. : But if we increased the number of IndexSearcher from 2, the performance : improvement became slight even worse. Why use more then 2 IndexSearchers? Typically 1 is all you need, except for when you want to

Re: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-09 Thread yueyu lin
Please trace the codes into the Lucene when searching. Here is a table about how invokations are called. The trace log: *Steps* *ClassName* *Functions* *Description* 1. org.apache.lucene.search.Searcher public final Hits search(Query query) It will call another search function. 2. org.apac

Re: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-09 Thread yueyu lin
ooops, the table seems twisted. Can you see that clearly? On 5/9/06, yueyu lin <[EMAIL PROTECTED]> wrote: Please trace the codes into the Lucene when searching. Here is a table about how invokations are called. The trace log: *Steps* *ClassName* *Functions* *Description* 1. org.apache.l

Re: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-09 Thread yueyu lin
One IndexSearcher is one IndexSearcher instance. The instance has a lot of functions. Unfortunately they will call another synchronized function in other class's instance (TermInfosReader). That's the point why we need two IndexSearchers. But two searchers will cost double cache memory. It's not w

Re: Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

2006-05-09 Thread Nadav Har'El
Ning Li <[EMAIL PROTECTED]> wrote on 09/05/2006 02:07:26 AM: > Today, applications have to open/close an IndexWriter and open/close an > IndexReader directly or indirectly (via IndexModifier) in order to handle a > mix of inserts and deletes. This performs well when inserts and deletes > come in fa

Re: Vector

2006-05-09 Thread karl wettin
On Sat, 2006-05-06 at 09:40 +0200, karl wettin wrote: > > There are a couple of Vector:s in the code. Is it really > necessary to use this expensive thread safe artifact from the dark > ages? > The question is what needs and not needs to be synchronized. I take it > nothing needs to, but I'm not

RE: weird behavior of IndexReader.indexExists()

2006-05-09 Thread Andy Hind
Hi I think I have discovered this too. It is on my list of issues to raise The index exist test looks for the segment file. When the index is committing, and you are unlucky, this file may not be found as the new segments file replaces the old one. The result is the index appears not to exi

Re: Changing Lucene scoring?

2006-05-09 Thread karl wettin
On Mon, 2006-05-08 at 18:34 -0700, Otis Gospodnetic wrote: > Hi, > > Not sure if people caught my question over on java-user@ > about the possibility of eliminating floating point > calculations from Lucene's scoring. Before I embark on this, > I thought I'd ask: > > - Am I crazy? I'm all for

RE: weird behavior of IndexReader.indexExists()

2006-05-09 Thread Vanlerberghe, Luc
Make sure both instances are using the same lock directory. The segments file should only be read or written while holding the commit lock. If the lock directories don't match, you'll get more 'strange' errors... In Lucene 1.4.2 some methods did not use the lock, this has been patched a couple of

Re: Changing Lucene scoring?

2006-05-09 Thread Murat . Yakici
Hi, I'm not happy with the how scoring works either, it might be efficient though. I have been investigating the code for a while. Everything gets down to -Query (TermQuery, BooleanQuery, PhraseQuery etc.), -Inner class Weight in Query, -Similarity, -Scorer (TermScorer, BooleanScorer etc.)

Re: weird behavior of IndexReader.indexExists()

2006-05-09 Thread wenjie zheng
Thanks, I guess my question is how to make sure both instances are using the same lock directory. Wenjie On 5/9/06, Vanlerberghe, Luc <[EMAIL PROTECTED]> wrote: Make sure both instances are using the same lock directory. The segments file should only be read or written while holding the commit

Re: weird behavior of IndexReader.indexExists()

2006-05-09 Thread Otis Gospodnetic
If you don't explicitly change the lock directory and do not disable locking, the same directory should be used. I'm assuming this is all done on a single server sharing the same file system. The locks are stored in the system's default temporary directory. That's typically /tmp under UNIX/OS

Re: Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

2006-05-09 Thread Otis Gospodnetic
I agree - a delete (typically for a Term that represents a "primary key" for a Document in an index) followed by re-add of a Document is a very common scenario, and I'd love to see the numbers for that. Thanks, Otis > We experimented with three workloads: > - Insert only. 1.6M documents were

Re: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-09 Thread Otis Gospodnetic
Yueyu Lin, >From what I can tell from a quick look at the method, that method need to >remain synchronized, so multiple threads don't accidentally re-read that >'indexTerms' (Term[] type). Even though the method is synchronized, it looks >like only the first invocation would enter that try/cat

Re: weird behavior of IndexReader.indexExists()

2006-05-09 Thread wenjie zheng
I am a little bit confused here. Since I didn't change the lock directory or disable locking and it is on the single server sharing same fs, does it mean I am not supposed to get the error? But I did get the error. What should I do? Wenjie On 5/9/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:

Re: weird behavior of IndexReader.indexExists()

2006-05-09 Thread Otis Gospodnetic
Somebody (ah, Andy) already mentioned that you may be getting unlucky and calling that IndexReader.indexExists method right when the 'segments' file is being renamed. It looks like there is no lock in that method, but it looks like we may have to add it. Take a look at the IndexReader.open(...

InstanciatedIndex

2006-05-09 Thread karl wettin
I'm about to release a new version. Is there a specific corpus you want me to use for the test case? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Changing Lucene scoring?

2006-05-09 Thread Yonik Seeley
On 5/8/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: Not sure if people caught my question over on java-user@ about the possibility of eliminating floating point calculations from Lucene's scoring. Before I embark on this, I thought I'd ask: - Am I crazy? Is this at all doable? Do you ha

Re: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-09 Thread Doug Cutting
The best search performance is achieved using a single IndexSearcher shared by multiple threads. Peter Keegan has demonstrated rates of up to 400 searches per second on eight-CPU machines using this approach: http://www.mail-archive.com/java-user@lucene.apache.org/msg05074.html So the synchro

Re: Vector

2006-05-09 Thread Yonik Seeley
On 5/9/06, karl wettin <[EMAIL PROTECTED]> wrote: Did anybody know what needs to be synchronized and what does not need to be synchronized? Needs to be considered on a case-by-case basis IMO. Should I summarize the uses and post it here for discussion? Sure! -Yonik http://incubator.apache.

[jira] Updated: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

2006-05-09 Thread Ning Li (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-565?page=all ] Ning Li updated LUCENE-565: --- Attachment: IndexWriter.patch Here is the diff file of IndexWriter.java. > Supporting deleteDocuments in IndexWriter (Code and Performance Results > Provided) > ---

[jira] Resolved: (LUCENE-383) ConstantScoreRangeQuery - fixes "too many clauses" exception

2006-05-09 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-383?page=all ] Doug Cutting resolved LUCENE-383: - Fix Version: 2.0 1.9 Resolution: Fixed Assign To: Yonik Seeley (was: Lucene Developers) This has been fixed. > ConstantScor

[jira] Commented: (LUCENE-438) add Token.setTermText(), remove final

2006-05-09 Thread Doug Cutting (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-438?page=comments#action_12378700 ] Doug Cutting commented on LUCENE-438: - +1 This sounds like a good change. > add Token.setTermText(), remove final > - > > Key

[jira] Created: (LUCENE-566) Esperanto Analyzer

2006-05-09 Thread Otis Gospodnetic (JIRA)
Esperanto Analyzer -- Key: LUCENE-566 URL: http://issues.apache.org/jira/browse/LUCENE-566 Project: Lucene - Java Type: New Feature Components: Analysis Reporter: Otis Gospodnetic Priority: Minor Esperanto stemmer and analyzer from Brio

[jira] Updated: (LUCENE-550) InstanciatedIndex - faster but memory consuming index

2006-05-09 Thread Karl Wettin (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-550?page=all ] Karl Wettin updated LUCENE-550: --- Attachment: src_20060509.tar.gz Some new statistics. * A corpus of 500 documents, 1-5K text per document. * Placed 150 000 term and boolean queries. * Retrieved

[jira] Commented: (LUCENE-550) InstanciatedIndex - faster but memory consuming index

2006-05-09 Thread Karl Wettin (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-550?page=comments#action_12378776 ] Karl Wettin commented on LUCENE-550: Oups InstanciatedIndex: Corpus creation took 14011 ms. Term queries took 33608 ms. RAMDirectory: Corpus creation took 9144 ms. Term q

Re: Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

2006-05-09 Thread Ning Li
The machine is swamped with tests. I will run the experiment when the machine is free. Regards, Ning Ning Li Search Technologies IBM Almaden Research Center 650 Harry Road San Jose, CA 95120 |-+> | | Otis Gospodnetic | | |

[jira] Created: (LUCENE-567) BooleanQuery Does Not Work With One Query indicated as MUST_NOT

2006-05-09 Thread Nicholaus Shupe (JIRA)
BooleanQuery Does Not Work With One Query indicated as MUST_NOT --- Key: LUCENE-567 URL: http://issues.apache.org/jira/browse/LUCENE-567 Project: Lucene - Java Type: Bug Components: Search Versions:

Re: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-09 Thread yueyu lin
Yes, the modification is still synchronized and the first thread will be responsible for reading first. And then other threads will not read and the synchronization is unnecessary. private void ensureIndexIsRead() throws IOException { if (indexTerms != null) // index alrea

Re: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-09 Thread yueyu lin
My assumption is that every query is relatively quick. If the times lapsed in other process when querying, the ensureIndexIsRead() function will not cause a lot of problems. If not, the ensureIndexIsRead() function will be a bottle neck. I could understand that a lot of systems' queries are quiet

Re: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-09 Thread yueyu lin
Oh,please believe in me that I've forced the JVM to print the thread dump. It waited here indeed. I'll try to post the patch to JIRA. I don't want to modify these codes by myself because that will break the Lucene codes. So I wish you can do me the favor to check these codes and make it availabe i

RE: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-09 Thread Robert Engels
I am interested in the exact performance difference in ms per query removing the synchronized block? I can see that after a while when using your code, the JIT will probably inline the 'non-reading' path. Even then... I would not think that 2 lines of synchronized code would contribute much when

RE: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-09 Thread Robert Engels
I think your basic problem is that you are using multiple IndexSearchers? And creating new instances during runtime? If so, you will be reading the index information far too often. This is not a good configuration. -Original Message- From: yueyu lin [mailto:[EMAIL PROTECTED] Sent: Tuesday,

Re: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-09 Thread yueyu lin
o, I think I didn't express it clearly. First, I only have one IndexSearcher and multiple threads will share it. Then I found the performance is not so good like I expect in a dual CPUs machine. So I forced the JVM to print thread dump and I found the threads are waiting here. After that, I trace

Re: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-09 Thread Yonik Seeley
Yueyu Lin, Your patch below looks suspiciously like the double-checked locking anti-pattern, and is not guaranteed to work. There really isn't a way to safely lazily initialize without using synchronized or volatile. -Yonik http://incubator.apache.org/solr Solr, the open-source Lucene search ser

Re: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-09 Thread yueyu lin
In java, call a synchronized function in a synchronized block, if they have the same mutex object, nothing will happen. If they have different mutex objects, something may be screwed up. On 5/10/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: Yueyu Lin, Your patch below looks suspiciously like the

[jira] Created: (LUCENE-568) [PATCH]Multiple threads performance enhancement when querying.

2006-05-09 Thread Yueyu Lin (JIRA)
[PATCH]Multiple threads performance enhancement when querying. -- Key: LUCENE-568 URL: http://issues.apache.org/jira/browse/LUCENE-568 Project: Lucene - Java Type: Improvement Components: Search Ver

[jira] Updated: (LUCENE-568) [PATCH]Multiple threads performance enhancement when querying.

2006-05-09 Thread Yueyu Lin (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-568?page=all ] Yueyu Lin updated LUCENE-568: - Attachment: TermInfosReader.java That attachment is the patched file. > [PATCH]Multiple threads performance enhancement when querying. >

RE: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-09 Thread Robert Engels
I am fairly certain his code is ok, since it rechecks the initialized state in the synchronized block before initializing. Worst case, during the initial checks when the initialization is occurring there may be some unneeded checking, but after that, the code should perform better since it will ne

Re: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-09 Thread Yonik Seeley
Here is a reference to double-checked locking. Many people have tried to get around synchronization during lazy initialization - AFAIK, none have succeeded. With the new memory model in Java5, you can get away with just volatile, which is like half a synchronization (a read barrier + a write bar

Re: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-09 Thread Yonik Seeley
On 5/9/06, Robert Engels <[EMAIL PROTECTED]> wrote: I am fairly certain his code is ok, since it rechecks the initialized state in the synchronized block before initializing. That "recheck" is why the pattern (or anti-pattern) is called double-checked locking :-) -Yonik http://incubator.apache

[jira] Commented: (LUCENE-568) [PATCH]Multiple threads performance enhancement when querying.

2006-05-09 Thread Otis Gospodnetic (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-568?page=comments#action_12378819 ] Otis Gospodnetic commented on LUCENE-568: - Please provide a patch instead of the whole file, so your changes can be clearly seen. Here is how to do it: http://wiki.apa

Re: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-09 Thread Chris Hostetter
: > I am fairly certain his code is ok, since it rechecks the initialized state : > in the synchronized block before initializing. : : That "recheck" is why the pattern (or anti-pattern) is called : double-checked locking :-) More specificly, this is functionally half way between example labeled

Re: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-09 Thread Otis Gospodnetic
Yueyu Lin, Sorry, I don't follow this part: "To resolve the problem, first I try to modify the codes and rebuild another Lucene jar. That's a bad idea, I didn't want to maintain my custom Lucene package." Are you saying you _did_ make the code changes and _did_ run your application with a modifi

[jira] Commented: (LUCENE-567) BooleanQuery Does Not Work With One Query indicated as MUST_NOT

2006-05-09 Thread Otis Gospodnetic (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-567?page=comments#action_12378820 ] Otis Gospodnetic commented on LUCENE-567: - That is by design. Purely negative queries are not supported, which is why you had to add that MatchAllDocsQuery to get thi

[jira] Closed: (LUCENE-567) BooleanQuery Does Not Work With One Query indicated as MUST_NOT

2006-05-09 Thread Otis Gospodnetic (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-567?page=all ] Otis Gospodnetic closed LUCENE-567: --- Resolution: Won't Fix > BooleanQuery Does Not Work With One Query indicated as MUST_NOT > ---

Re: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-09 Thread yueyu lin
I met these problem before indeed.The compiler did something optimized for me that was bad for me when I see the byte-codes. When I'm using a function local variable, m_indexTerms and in JDK1.5.06, it seems ok. Whether it will break in other environments, I still don't know about it. On 5/10/06, Y

[jira] Closed: (LUCENE-568) [PATCH]Multiple threads performance enhancement when querying.

2006-05-09 Thread Yonik Seeley (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-568?page=all ] Yonik Seeley closed LUCENE-568: --- Resolution: Invalid Assign To: Yonik Seeley I'm closing this improvement without commiting since the proposed change is in the form of a well known ant

RE: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-09 Thread Robert Engels
I wrote a test case to test the performance (assuming that it worked, but based on reading the double-checked articles I understand the dilemma). Using 30,000,000 simple iterations and 2 threads: (note this is on a single processor machine). sync time = 39532 unsync time = 2250 diff time = 37282

RE: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-09 Thread Robert Engels
I think you could use a volatile primitive boolean to control whether or not the index needs to be read, and also mark the index data volatile and it SHOULD PROBABLY work. But as stated, I don't think the performance difference is worth it. -Original Message- From: yueyu lin [mailto:[EMA

RE: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-09 Thread Chris Hostetter
: I think you could use a volatile primitive boolean to control whether or not : the index needs to be read, and also mark the index data volatile and it : SHOULD PROBABLY work. : : But as stated, I don't think the performance difference is worth it. My understanding is: 1) volatile will only h

Re: Multiple threads searching in Lucene and the synchronized issue. -- solution attached.

2006-05-09 Thread yueyu lin
I understand that. Thanks for all. I will still use the original Lucene jar and will continue to dig Lucene. Wish I would find something useful for all of you. :) On 5/10/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: : I think you could use a volatile primitive boolean to control whether or