Hi, We're building a server based Lucene. When doing the multiple threads performance test, we found a minor synchronized issue. We found if we were using 2 IndexSearcher, we would get 10% performance benefit.
But if we increased the number of IndexSearcher from 2, the performance improvement b
Hi Otis,
"I often need just yes/no (matches/doesn't match) answers,... "
Not sure if you ment this: "how I could implement pure boolean model,
completely avoiding scoring?".
If yes, what comes to my mind is Filtering, ChainedFilter, ConstantScore* and
all these discussions about implementing n
: We found if we were using 2 IndexSearcher, we would get 10% performance
: benefit.
: But if we increased the number of IndexSearcher from 2, the performance
: improvement became slight even worse.
Why use more then 2 IndexSearchers?
Typically 1 is all you need, except for when you want to
Please trace the codes into the Lucene when searching.
Here is a table about how invokations are called.
The trace log: *Steps*
*ClassName*
*Functions*
*Description*
1. org.apache.lucene.search.Searcher public final Hits search(Query
query) It will call another search function. 2.
org.apac
ooops, the table seems twisted.
Can you see that clearly?
On 5/9/06, yueyu lin <[EMAIL PROTECTED]> wrote:
Please trace the codes into the Lucene when searching.
Here is a table about how invokations are called.
The trace log: *Steps*
*ClassName*
*Functions*
*Description*
1. org.apache.l
One IndexSearcher is one IndexSearcher instance. The instance has a lot of
functions. Unfortunately they will call another synchronized function in
other class's instance (TermInfosReader). That's the point why we need two
IndexSearchers. But two searchers will cost double cache memory. It's not
w
Ning Li <[EMAIL PROTECTED]> wrote on 09/05/2006 02:07:26 AM:
> Today, applications have to open/close an IndexWriter and open/close an
> IndexReader directly or indirectly (via IndexModifier) in order to handle
a
> mix of inserts and deletes. This performs well when inserts and deletes
> come in fa
On Sat, 2006-05-06 at 09:40 +0200, karl wettin wrote:
>
> There are a couple of Vector:s in the code. Is it really
> necessary to use this expensive thread safe artifact from the dark
> ages?
> The question is what needs and not needs to be synchronized. I take it
> nothing needs to, but I'm not
Hi
I think I have discovered this too.
It is on my list of issues to raise
The index exist test looks for the segment file.
When the index is committing, and you are unlucky, this file may not be
found as the new segments file replaces the old one. The result is the
index appears not to exi
On Mon, 2006-05-08 at 18:34 -0700, Otis Gospodnetic wrote:
> Hi,
>
> Not sure if people caught my question over on java-user@
> about the possibility of eliminating floating point
> calculations from Lucene's scoring. Before I embark on this,
> I thought I'd ask:
>
> - Am I crazy?
I'm all for
Make sure both instances are using the same lock directory.
The segments file should only be read or written while holding the
commit lock.
If the lock directories don't match, you'll get more 'strange' errors...
In Lucene 1.4.2 some methods did not use the lock, this has been patched
a couple of
Hi,
I'm not happy with the how scoring works either, it might be efficient
though. I have been investigating the code for a while. Everything gets
down to
-Query (TermQuery, BooleanQuery, PhraseQuery etc.),
-Inner class Weight in Query,
-Similarity,
-Scorer (TermScorer, BooleanScorer etc.)
Thanks,
I guess my question is how to make sure both instances are using the same
lock directory.
Wenjie
On 5/9/06, Vanlerberghe, Luc <[EMAIL PROTECTED]> wrote:
Make sure both instances are using the same lock directory.
The segments file should only be read or written while holding the
commit
If you don't explicitly change the lock directory and do not disable locking,
the same directory should be used. I'm assuming this is all done on a single
server sharing the same file system. The locks are stored in the system's
default temporary directory. That's typically /tmp under UNIX/OS
I agree - a delete (typically for a Term that represents a "primary key" for a
Document in an index) followed by re-add of a Document is a very common
scenario, and I'd love to see the numbers for that.
Thanks,
Otis
> We experimented with three workloads:
> - Insert only. 1.6M documents were
Yueyu Lin,
>From what I can tell from a quick look at the method, that method need to
>remain synchronized, so multiple threads don't accidentally re-read that
>'indexTerms' (Term[] type). Even though the method is synchronized, it looks
>like only the first invocation would enter that try/cat
I am a little bit confused here.
Since I didn't change the lock directory or disable locking and it is on the
single server sharing same fs, does it mean I am not supposed to get the
error?
But I did get the error. What should I do?
Wenjie
On 5/9/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
Somebody (ah, Andy) already mentioned that you may be getting unlucky and
calling that IndexReader.indexExists method right when the 'segments' file is
being renamed. It looks like there is no lock in that method, but it looks
like we may have to add it.
Take a look at the IndexReader.open(...
I'm about to release a new version. Is there a specific corpus you want
me to use for the test case?
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
On 5/8/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
Not sure if people caught my question over on java-user@ about the possibility
of eliminating floating point calculations from Lucene's scoring. Before I
embark on this, I thought I'd ask:
- Am I crazy? Is this at all doable?
Do you ha
The best search performance is achieved using a single IndexSearcher
shared by multiple threads. Peter Keegan has demonstrated rates of up
to 400 searches per second on eight-CPU machines using this approach:
http://www.mail-archive.com/java-user@lucene.apache.org/msg05074.html
So the synchro
On 5/9/06, karl wettin <[EMAIL PROTECTED]> wrote:
Did anybody know what needs to be synchronized and what does not need to
be synchronized?
Needs to be considered on a case-by-case basis IMO.
Should I summarize the uses and post it here for
discussion?
Sure!
-Yonik
http://incubator.apache.
[ http://issues.apache.org/jira/browse/LUCENE-565?page=all ]
Ning Li updated LUCENE-565:
---
Attachment: IndexWriter.patch
Here is the diff file of IndexWriter.java.
> Supporting deleteDocuments in IndexWriter (Code and Performance Results
> Provided)
> ---
[ http://issues.apache.org/jira/browse/LUCENE-383?page=all ]
Doug Cutting resolved LUCENE-383:
-
Fix Version: 2.0
1.9
Resolution: Fixed
Assign To: Yonik Seeley (was: Lucene Developers)
This has been fixed.
> ConstantScor
[
http://issues.apache.org/jira/browse/LUCENE-438?page=comments#action_12378700 ]
Doug Cutting commented on LUCENE-438:
-
+1 This sounds like a good change.
> add Token.setTermText(), remove final
> -
>
> Key
Esperanto Analyzer
--
Key: LUCENE-566
URL: http://issues.apache.org/jira/browse/LUCENE-566
Project: Lucene - Java
Type: New Feature
Components: Analysis
Reporter: Otis Gospodnetic
Priority: Minor
Esperanto stemmer and analyzer from Brio
[ http://issues.apache.org/jira/browse/LUCENE-550?page=all ]
Karl Wettin updated LUCENE-550:
---
Attachment: src_20060509.tar.gz
Some new statistics.
* A corpus of 500 documents, 1-5K text per document.
* Placed 150 000 term and boolean queries.
* Retrieved
[
http://issues.apache.org/jira/browse/LUCENE-550?page=comments#action_12378776 ]
Karl Wettin commented on LUCENE-550:
Oups
InstanciatedIndex:
Corpus creation took 14011 ms.
Term queries took 33608 ms.
RAMDirectory:
Corpus creation took 9144 ms.
Term q
The machine is swamped with tests. I will run the experiment when the
machine is free.
Regards,
Ning
Ning Li
Search Technologies
IBM Almaden Research Center
650 Harry Road
San Jose, CA 95120
|-+>
| | Otis Gospodnetic |
| |
BooleanQuery Does Not Work With One Query indicated as MUST_NOT
---
Key: LUCENE-567
URL: http://issues.apache.org/jira/browse/LUCENE-567
Project: Lucene - Java
Type: Bug
Components: Search
Versions:
Yes, the modification is still synchronized and the first thread will be
responsible for reading first. And then other threads will not read and the
synchronization is unnecessary.
private void ensureIndexIsRead() throws IOException {
if (indexTerms != null) // index alrea
My assumption is that every query is relatively quick. If the times lapsed
in other process when querying, the ensureIndexIsRead() function will not
cause a lot of problems. If not, the ensureIndexIsRead() function will be a
bottle neck.
I could understand that a lot of systems' queries are quiet
Oh,please believe in me that I've forced the JVM to print the thread dump.
It waited here indeed.
I'll try to post the patch to JIRA.
I don't want to modify these codes by myself because that will break the
Lucene codes. So I wish you can do me the favor to check these codes and
make it availabe i
I am interested in the exact performance difference in ms per query removing
the synchronized block?
I can see that after a while when using your code, the JIT will probably
inline the 'non-reading' path.
Even then...
I would not think that 2 lines of synchronized code would contribute much
when
I think your basic problem is that you are using multiple IndexSearchers?
And creating new instances during runtime? If so, you will be reading the
index information far too often. This is not a good configuration.
-Original Message-
From: yueyu lin [mailto:[EMAIL PROTECTED]
Sent: Tuesday,
o, I think I didn't express it clearly.
First, I only have one IndexSearcher and multiple threads will share it.
Then I found the performance is not so good like I expect in a dual CPUs
machine.
So I forced the JVM to print thread dump and I found the threads are waiting
here.
After that, I trace
Yueyu Lin,
Your patch below looks suspiciously like the double-checked locking
anti-pattern, and is not guaranteed to work.
There really isn't a way to safely lazily initialize without using
synchronized or volatile.
-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search ser
In java, call a synchronized function in a synchronized block, if they have
the same mutex object, nothing will happen.
If they have different mutex objects, something may be screwed up.
On 5/10/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
Yueyu Lin,
Your patch below looks suspiciously like the
[PATCH]Multiple threads performance enhancement when querying.
--
Key: LUCENE-568
URL: http://issues.apache.org/jira/browse/LUCENE-568
Project: Lucene - Java
Type: Improvement
Components: Search
Ver
[ http://issues.apache.org/jira/browse/LUCENE-568?page=all ]
Yueyu Lin updated LUCENE-568:
-
Attachment: TermInfosReader.java
That attachment is the patched file.
> [PATCH]Multiple threads performance enhancement when querying.
>
I am fairly certain his code is ok, since it rechecks the initialized state
in the synchronized block before initializing.
Worst case, during the initial checks when the initialization is occurring
there may be some unneeded checking, but after that, the code should perform
better since it will ne
Here is a reference to double-checked locking. Many people have tried
to get around synchronization during lazy initialization - AFAIK, none
have succeeded. With the new memory model in Java5, you can get away
with just volatile, which is like half a synchronization (a read
barrier + a write bar
On 5/9/06, Robert Engels <[EMAIL PROTECTED]> wrote:
I am fairly certain his code is ok, since it rechecks the initialized state
in the synchronized block before initializing.
That "recheck" is why the pattern (or anti-pattern) is called
double-checked locking :-)
-Yonik
http://incubator.apache
[
http://issues.apache.org/jira/browse/LUCENE-568?page=comments#action_12378819 ]
Otis Gospodnetic commented on LUCENE-568:
-
Please provide a patch instead of the whole file, so your changes can be
clearly seen.
Here is how to do it: http://wiki.apa
: > I am fairly certain his code is ok, since it rechecks the initialized state
: > in the synchronized block before initializing.
:
: That "recheck" is why the pattern (or anti-pattern) is called
: double-checked locking :-)
More specificly, this is functionally half way between example labeled
Yueyu Lin,
Sorry, I don't follow this part:
"To resolve the problem, first I try to modify the codes and rebuild another
Lucene jar.
That's a bad idea, I didn't want to maintain my custom Lucene package."
Are you saying you _did_ make the code changes and _did_ run your application
with a modifi
[
http://issues.apache.org/jira/browse/LUCENE-567?page=comments#action_12378820 ]
Otis Gospodnetic commented on LUCENE-567:
-
That is by design. Purely negative queries are not supported, which is why you
had to add that MatchAllDocsQuery to get thi
[ http://issues.apache.org/jira/browse/LUCENE-567?page=all ]
Otis Gospodnetic closed LUCENE-567:
---
Resolution: Won't Fix
> BooleanQuery Does Not Work With One Query indicated as MUST_NOT
> ---
I met these problem before indeed.The compiler did something optimized for
me that was bad for me when I see the byte-codes.
When I'm using a function local variable, m_indexTerms and in JDK1.5.06, it
seems ok.
Whether it will break in other environments, I still don't know about it.
On 5/10/06, Y
[ http://issues.apache.org/jira/browse/LUCENE-568?page=all ]
Yonik Seeley closed LUCENE-568:
---
Resolution: Invalid
Assign To: Yonik Seeley
I'm closing this improvement without commiting since the proposed change is in
the form of a well known ant
I wrote a test case to test the performance (assuming that it worked, but
based on reading the double-checked articles I understand the dilemma).
Using 30,000,000 simple iterations and 2 threads: (note this is on a single
processor machine).
sync time = 39532
unsync time = 2250
diff time = 37282
I think you could use a volatile primitive boolean to control whether or not
the index needs to be read, and also mark the index data volatile and it
SHOULD PROBABLY work.
But as stated, I don't think the performance difference is worth it.
-Original Message-
From: yueyu lin [mailto:[EMA
: I think you could use a volatile primitive boolean to control whether or not
: the index needs to be read, and also mark the index data volatile and it
: SHOULD PROBABLY work.
:
: But as stated, I don't think the performance difference is worth it.
My understanding is:
1) volatile will only h
I understand that. Thanks for all.
I will still use the original Lucene jar and will continue to dig Lucene.
Wish I would find something useful for all of you.
:)
On 5/10/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:
: I think you could use a volatile primitive boolean to control whether or
54 matches
Mail list logo