Applied!!
Erik
On May 3, 2005, at 1:31 PM, Wolfgang Hoschek wrote:
Here's a performance patch for MemoryIndex.MemoryIndexReader that
caches the norms for a given field, avoiding repeated recomputation
of the norms. Recall that, depending on the query, norms() can be
called over and over a
Here's a performance patch for MemoryIndex.MemoryIndexReader that
caches the norms for a given field, avoiding repeated recomputation of
the norms. Recall that, depending on the query, norms() can be called
over and over again with mostly the same parameters. Thus, replace
public byte[] norms(S
Thanks!
Wolfgang.
I've committed this change after it successfully worked for me.
Thanks!
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
On May 2, 2005, at 5:21 PM, Wolfgang Hoschek wrote:
Finally found and fixed the bug!
The fix is simply to replace MemoryIndex.MemoryIndexReader skipTo()
with the following:
public boolean skipTo(int target) {
if (DEBUG) System.err.println(".skipTo: " +
targe
The version I sent returns in O(1), if performance was your concern.
Or
did you mean something else?
Since 0 is the only document number in the index, a
return target == 0;
might be nice for skipTo(). It doesn't really help performance, though,
and the next() works just as well.
Regards,
Paul Elsc
On Monday 02 May 2005 23:38, Wolfgang Hoschek wrote:
> > Yes, the svn trunk uses skipTo more often than 1.4.3.
> >
> > However, your implementation of skipTo() needs some improvement.
> > See the javadoc of skipTo of class Scorer:
> >
> > http://lucene.apache.org/java/docs/api/org/apache/lucene/sea
Yes, the svn trunk uses skipTo more often than 1.4.3.
However, your implementation of skipTo() needs some improvement.
See the javadoc of skipTo of class Scorer:
http://lucene.apache.org/java/docs/api/org/apache/lucene/search/
Scorer.html#skipTo(int)
What's wrong with the version I sent? Remeber t
Wolfgang,
On Monday 02 May 2005 23:21, Wolfgang Hoschek wrote:
> Finally found and fixed the bug!
> The fix is simply to replace MemoryIndex.MemoryIndexReader skipTo()
> with the following:
>
> public boolean skipTo(int target) {
>
Finally found and fixed the bug!
The fix is simply to replace MemoryIndex.MemoryIndexReader skipTo()
with the following:
public boolean skipTo(int target) {
if (DEBUG) System.err.println(".skipTo:
" + target);
This is what I have as scoring calculation, and it seems to do exactly
what lucene-1.4.3 does because the tests pass.
public byte[] norms(String fieldName) {
if (DEBUG) System.err.println("MemoryIndexReader.norms: " +
fieldName);
Info info = getInfo(fieldName);
int numTokens = info
I'm looking at it right now. The tests pass fine when you put
lucene-1.4.3.jar instead of the current lucene onto the classpath which
is what I've been doing so far. Something seems to have changed in the
scoring calculation. No idea what that might be. I'll see if I can find
out.
Wolfgang
On May 1, 2005, at 10:20 PM, Wolfgang Hoschek wrote:
I've uploaded code that now runs against the current SVN, plus junit
test cases, plus some minor internal updates to the functionality
itself. For details see
http://issues.apache.org/bugzilla/show_bug.cgi?id=34585
Be prepared for the test
I've uploaded code that now runs against the current SVN, plus junit
test cases, plus some minor internal updates to the functionality
itself. For details see
http://issues.apache.org/bugzilla/show_bug.cgi?id=34585
Be prepared for the testcases to take some minutes to complete - don't
hit CTRL
OK. I'll send an update as soon as I get round to it...
Wolfgang.
On Apr 27, 2005, at 12:22 PM, Doug Cutting wrote:
Erik Hatcher wrote:
I'm not quite sure where to put MemoryIndex - maybe it deserves to
stand on its own in a new contrib area?
That sounds good to me.
Ok... once Wolfgang gives me
On Apr 27, 2005, at 12:22 PM, Doug Cutting wrote:
Erik Hatcher wrote:
I'm not quite sure where to put MemoryIndex - maybe it deserves to
stand on its own in a new contrib area?
That sounds good to me.
Ok... once Wolfgang gives me one last round up updates (JUnit tests
instead of main() and upgr
Whichever place you settle on is fine with me.
[In case it might make a difference: Just note that MemoryIndex has a
small auxiliary dependency on PatternAnalyzer in addField() because the
Analyzer superclass doesn't have a tokenStream(String fieldName, String
text) method. And PatternAnalyzer r
Erik Hatcher wrote:
I'm not quite sure
where to put MemoryIndex - maybe it deserves to stand on its own in a
new contrib area?
That sounds good to me.
Or does it make sense to put this into misc (still
in sandbox/misc)? Or where?
Isn't the goal for sandbox/ to go away, replaced with contrib/
Wolfgang,
You have provided a superb set of patches! I'm in awe of the extensive
documentation you've done.
There is nothing further you need to do, but be patient while we
incorporate it into the contrib area somewhere. Your PatternAnalyzer
could fit into the contrib/analyzers area nicely
I've uploaded slightly improved versions of the fast MemoryIndex
contribution to http://issues.apache.org/bugzilla/show_bug.cgi?id=34585
along with another contrib - PatternAnalyzer.
For a quick overview without downloading code, there's javadoc for it
all at
http://dsd.lbl.gov/nux/api/o
I've now got the contrib code cleaned up, tested and documented into a
decent state, ready for your review and comments.
Consider this a formal contrib (Apache license is attached).
The relevant files are attached to the following bug ID:
http://issues.apache.org/bugzilla/show_bug.cgi?id
On Apr 20, 2005, at 9:22 AM, Erik Hatcher wrote:
On Apr 20, 2005, at 12:11 PM, Wolfgang Hoschek wrote:
By the way, by now I have a version against 1.4.3 that is 10-100
times faster (i.e. 3 - 20 index+query steps/sec) than the
simplistic RAMDirectory approach, depending on the nature of th
On Apr 20, 2005, at 12:11 PM, Wolfgang Hoschek wrote:
By the way, by now I have a version against 1.4.3 that is 10-100 times
faster (i.e. 3 - 20 index+query steps/sec) than the simplistic
RAMDirectory approach, depending on the nature of the input data and
query. From some preliminary te
eveloper to debug and find the reason in
the first place!)
Luc
-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Saturday, April 16, 2005 2:09 AM
To: java-dev@lucene.apache.org
Subject: Re: [Performance] Streaming main memory indexing of single
strings
On Apr 15, 2005,
or the developer to debug and find the reason in
the first place!)
Luc
-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Saturday, April 16, 2005 2:09 AM
To: java-dev@lucene.apache.org
Subject: Re: [Performance] Streaming main memory indexing of single
strings
On
On Apr 16, 2005, at 1:17 PM, Wolfgang Hoschek wrote:
Note that "fish*~" is not a valid query expression :)
Perhaps the Lucene QueryParser should throw an exception then.
Currently 1.4.3 accepts the expression as is without grumbling...
Several minor QueryParser weirdnesses like this have turned u
On Apr 16, 2005, at 1:17 PM, Wolfgang Hoschek wrote:
Note that "fish*~" is not a valid query expression :)
Perhaps the Lucene QueryParser should throw an exception then.
Currently 1.4.3 accepts the expression as is without grumbling...
Several minor QueryParser weirdnesses like this have turned up
On Apr 16, 2005, at 2:58 AM, Erik Hatcher wrote:
On Apr 15, 2005, at 9:50 PM, Wolfgang Hoschek wrote:
So, all the text analyzed is in a given field... that means that
anything in the Query not associated with that field has no bearing
on whether the text matches or not, correct?
Right, it has no
On Apr 15, 2005, at 9:50 PM, Wolfgang Hoschek wrote:
So, all the text analyzed is in a given field... that means that
anything in the Query not associated with that field has no bearing
on whether the text matches or not, correct?
Right, it has no bearing. A query wouldn't specify any fields, it
cument arrives matching the saved queries?
Erik
-Original Message-
From: Wolfgang Hoschek [mailto:[EMAIL PROTECTED]
Sent: Thursday, April 14, 2005 4:04 PM
To: java-dev@lucene.apache.org
Subject: Re: [Performance] Streaming main memory indexing of
single
strings
This seems to
system and need to be alerted whenever a
new document arrives matching the saved queries?
Erik
-Original Message-
From: Wolfgang Hoschek [mailto:[EMAIL PROTECTED]
Sent: Thursday, April 14, 2005 4:04 PM
To: java-dev@lucene.apache.org
Subject: Re: [Performance] Streaming main
tuations where users
have queries saved in a system and need to be alerted whenever a new
document arrives matching the saved queries?
Erik
-Original Message-
From: Wolfgang Hoschek [mailto:[EMAIL PROTECTED]
Sent: Thursday, April 14, 2005 4:04 PM
To: java-dev@lucene.apache.org
Subje
s?
Erik
-Original Message-
From: Wolfgang Hoschek [mailto:[EMAIL PROTECTED]
Sent: Thursday, April 14, 2005 4:04 PM
To: java-dev@lucene.apache.org
Subject: Re: [Performance] Streaming main memory indexing of single
strings
This seems to be a promising avenue worth exploring. My
nd the core would be moved into AbstractIndexReader so
projects like this would be much easier).
Robert
-Original Message-
From: Wolfgang Hoschek [mailto:[EMAIL PROTECTED]
Sent: Friday, April 15, 2005 5:58 PM
To: java-dev@lucene.apache.org
Subject: Re: [Performance] Streaming main memory indexi
On Apr 15, 2005, at 4:15 PM, Doug Cutting wrote:
Wolfgang Hoschek wrote:
The classic fuzzy fulltext search and similarity matching that Lucene
is good for :-)
So you need a score that can be compared to other matches? This will
be based on nothing but term frequency, which a regex can compute.
@lucene.apache.org
Subject: Re: [Performance] Streaming main memory indexing of single
strings
Wolfgang Hoschek wrote:
> The classic fuzzy fulltext search and similarity matching that Lucene is
> good for :-)
So you need a score that can be compared to other matches? This will be
based on nothi
Wolfgang Hoschek wrote:
The classic fuzzy fulltext search and similarity matching that Lucene is
good for :-)
So you need a score that can be compared to other matches? This will be
based on nothing but term frequency, which a regex can compute. With a
single document there'll be no IDFs, so y
xReader so
projects like this would be much easier).
Robert
-Original Message-
From: Wolfgang Hoschek [mailto:[EMAIL PROTECTED]
Sent: Friday, April 15, 2005 5:58 PM
To: java-dev@lucene.apache.org
Subject: Re: [Performance] Streaming main memory indexing of single
strings
A primary reason for the
On Apr 15, 2005, at 4:00 PM, Doug Cutting wrote:
Erik Hatcher wrote:
I think something like this would make a handy addition to our
contrib area at least.
Perhaps.
What use cases cannot be met by regular expression matching?
Doug
The classic fuzzy fulltext search and similarity matching that Lucen
Erik Hatcher wrote:
I think something like this would make a handy addition to our contrib
area at least.
Perhaps.
What use cases cannot be met by regular expression matching?
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For a
Erik
-Original Message-
From: Wolfgang Hoschek [mailto:[EMAIL PROTECTED]
Sent: Thursday, April 14, 2005 4:04 PM
To: java-dev@lucene.apache.org
Subject: Re: [Performance] Streaming main memory indexing of single
strings
This seems to be a promising avenue worth exploring. My gutfeeling is
th
Original Message-
From: Wolfgang Hoschek [mailto:[EMAIL PROTECTED]
Sent: Thursday, April 14, 2005 4:04 PM
To: java-dev@lucene.apache.org
Subject: Re: [Performance] Streaming main memory indexing of single
strings
This seems to be a promising avenue worth exploring. My gutfeeling is
that thi
document arrives matching the saved queries?
Erik
-Original Message-
From: Wolfgang Hoschek [mailto:[EMAIL PROTECTED]
Sent: Thursday, April 14, 2005 4:04 PM
To: java-dev@lucene.apache.org
Subject: Re: [Performance] Streaming main memory indexing of single
strings
This seems to be a pr
ement termDocs() and termPositions() to use the structures from
> above.
>
> run searches.
>
> start again with next document.
>
>
>
> -Original Message-
> From: Wolfgang Hoschek [mailto:[EMAIL PROTECTED]
> Sent: Thursday, April 14, 2005 2:56 PM
> To: java
earches.
start again with next document.
-Original Message-
From: Wolfgang Hoschek [mailto:[EMAIL PROTECTED]
Sent: Thursday, April 14, 2005 2:56 PM
To: java-dev@lucene.apache.org
Subject: Re: [Performance] Streaming main memory indexing of single
strings
Otis, this might be a misunderstandi
to use the structures from above.
run searches.
start again with next document.
-Original Message-
From: Wolfgang Hoschek [mailto:[EMAIL PROTECTED]
Sent: Thursday, April 14, 2005 2:56 PM
To: java-dev@lucene.apache.org
Subject: Re: [Performance] Streaming main memory indexing of single
stri
Otis, this might be a misunderstanding.
- I'm not calling optimize(). That piece is commented out you if look
again at the code.
- The *streaming* use case requires that for each query I add one (and
only one) document (aka string) to an empty index:
repeat N times (where N is millions or billio
It looks like you are calling that IndexWriter code in some loops,
opening it and closing it in every iteration of the loop and also
calling optimize. All of those things could be improved.
Keep your IndexWriter open, don't close it, and optimize the index only
once you are done adding documents t
Hi,
I'm wondering if anyone could let me know how to improve Lucene
performance for "streaming main memory indexing of single strings".
This would help to effectively integrate Lucene with the Nux XQuery
engine.
Below is a small microbenchmark simulating STREAMING XQuery fulltext
search as typ
Hi,
I'm wondering if anyone could let me know how to improve Lucene
performance for "streaming main memory indexing of single strings".
This would help to effectively integrate Lucene with the Nux XQuery
engine.
Below is a small microbenchmark simulating STREAMING XQuery fulltext
search as typ
49 matches
Mail list logo