[jira] Commented: (LUCENE-129) Finalizers are non-canonical

2005-11-16 Thread Sam Hough (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-129?page=comments#action_12357779 ] Sam Hough commented on LUCENE-129: -- I think FSDirectory needs a finalize method adding to remove its reference from FSDirectory.DIRECTORIES otherwise, through normal garbage c

[jira] Commented: (LUCENE-129) Finalizers are non-canonical

2005-11-16 Thread Sam Hough (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-129?page=comments#action_12357780 ] Sam Hough commented on LUCENE-129: -- Doh. Sorry. Been a long day. Finalize wont be called if DIRECTORIES still points at it :( Think twice, post once. Does this mean that cli

Re: svn commit: r332747 - in /lucene/java/trunk: ./ src/java/org/apache/lucene/search/regex/ src/test/org/apache/lucene/search/regex/

2005-11-16 Thread Yonik Seeley
If that's the way to go, we should do it by default so the user doesn't have to. Unless the scores between two types of queries are compatible, It's a bad idea to transparently switch between them since it will cause relevancy to unpredictably change in the future (triggered by either a query chan

Re: Lucene Index backboned by DB

2005-11-16 Thread Robert Kirchgessner
Hi, > 1) It might be OK to implement retrieving field values separately for a > document. However, I think from a simplicity point of view, it might be > better to have the application code do this drudgery. Adding this feature > could complicate the nice and simple design of Lucene without much

[jira] Resolved: (LUCENE-395) CoordConstrainedBooleanQuery + QueryParser support

2005-11-16 Thread Yonik Seeley (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-395?page=all ] Yonik Seeley resolved LUCENE-395: - Resolution: Fixed Assign To: Yonik Seeley (was: Lucene Developers) fixed BooleanQuery hashCode/equals and committed patches. > CoordConstrainedBoo

[jira] Created: (LUCENE-466) Need QueryParser support for BooleanQuery.minNrShouldMatch

2005-11-16 Thread Yonik Seeley (JIRA)
Need QueryParser support for BooleanQuery.minNrShouldMatch -- Key: LUCENE-466 URL: http://issues.apache.org/jira/browse/LUCENE-466 Project: Lucene - Java Type: Improvement Components: Search Versions: un

Re: svn commit: r332747 - in /lucene/java/trunk: ./ src/java/org/apache/lucene/search/regex/ src/test/org/apache/lucene/search/regex/

2005-11-16 Thread Chris Hostetter
: > Should we dynamically decide to switch to FieldNormQuery when : > BooleanQuery.maxClauseCount is exceeded? That way queries that : Why not leave that decision to the program using the query? : Something like this: : - catch the TooManyClauses exception, : - adapt (the offending parts of) th

[jira] Commented: (LUCENE-323) [PATCH] MultiFieldQueryParser and BooleanQuery do not provide adequate support for queries across multiple fields

2005-11-16 Thread Yonik Seeley (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-323?page=comments#action_12357806 ] Yonik Seeley commented on LUCENE-323: - Added Iterable to DisjunctionMaxQuery as a semi Java5 friendly way to iterate over the disjuncts. Added ability to add all disjunct

Re: svn commit: r332747 - in /lucene/java/trunk: ./ src/java/org/apache/lucene/search/regex/ src/test/org/apache/lucene/search/regex/

2005-11-16 Thread Doug Cutting
Yonik Seeley wrote: Totally untested, but here is a hack at what the scorer might look like when the number of terms is large. Looks plausible to me. You could instead use a byte[maxDoc] and encode/decode floats as you store and read them, to use a lot less RAM. // could also use a bitse

Re: svn commit: r332747 - in /lucene/java/trunk: ./ src/java/org/apache/lucene/search/regex/ src/test/org/apache/lucene/search/regex/

2005-11-16 Thread Yonik Seeley
On 11/16/05, Doug Cutting <[EMAIL PROTECTED]> wrote: > You could instead use a byte[maxDoc] and encode/decode floats as you > store and read them, to use a lot less RAM. Hmmm, very interesting idea. Less than one decimal digit of precision might be hard to swallow when you have to add scores toget

Float.floatToRawIntBits

2005-11-16 Thread Yonik Seeley
Float.floatToRawIntBits (in Java1.4) gives the raw float bits without normalization (like *(int*)&floatvar would in C). Since it doesn't do normalization of NaN values, it's faster (and hopefully optimized to a simple inline machine instruction by the JVM). On my Pentium4, using floatToRawIntBits

Issues while doing ant on lucene source

2005-11-16 Thread Pol, Parikshit
Hi Folks. I downloaded the Lucene and tried to do an ant. It initially gave me the following error: BUILD FAILED file:/home/parikpol/downloads/lucene-1.4.3/build.xml:11: Unexpected element "tstamp" I commented out the tstamp tag from build.xml, and now it gives me the following errors: compile-

Re: svn commit: r332747 - in /lucene/java/trunk: ./ src/java/org/apache/lucene/search/regex/ src/test/org/apache/lucene/search/regex/

2005-11-16 Thread Doug Cutting
Yonik Seeley wrote: Hmmm, very interesting idea. Less than one decimal digit of precision might be hard to swallow when you have to add scores together though: smallfloat(score1) + smallfloat(score2) + smallfloat(score3) Do you think that the 5/3 exponent/mantissa split is right for this, or wo

Re: svn commit: r332747 - in /lucene/java/trunk: ./ src/java/org/apache/lucene/search/regex/ src/test/org/apache/lucene/search/regex/

2005-11-16 Thread Paul Elschot
On Tuesday 15 November 2005 23:45, Yonik Seeley wrote: > Totally untested, but here is a hack at what the scorer might look > like when the number of terms is large. > > -Yonik > > > package org.apache.lucene.search; > > import org.apache.lucene.index.TermEnum; > import org.apache.lucene.index.

Re: Float.floatToRawIntBits

2005-11-16 Thread Paul Smith
I can confirm this takes ~ 20% of an overall Indexing operation (see attached link from YourKit). http://people.apache.org/~psmith/luceneYourkit.jpg Mind you, the whole "signalling via IOException" in the FastCharStream is a way bigger overhead, although I agree much harder to fix. Paul

Re: Float.floatToRawIntBits

2005-11-16 Thread Yonik Seeley
Wow! A much larger gain than I expected! Thanks for the profile Paul! -Yonik Now hiring -- http://forms.cnet.com/slink?231706 On 11/16/05, Paul Smith <[EMAIL PROTECTED]> wrote: > I can confirm this takes ~ 20% of an overall Indexing operation (see > attached link from YourKit). > > http://peopl

[jira] Created: (LUCENE-467) Use Float.floatToRawIntBits over Float.floatToIntBits

2005-11-16 Thread Yonik Seeley (JIRA)
Use Float.floatToRawIntBits over Float.floatToIntBits -- Key: LUCENE-467 URL: http://issues.apache.org/jira/browse/LUCENE-467 Project: Lucene - Java Type: Improvement Components: Other Versions: 1.9

[jira] Commented: (LUCENE-467) Use Float.floatToRawIntBits over Float.floatToIntBits

2005-11-16 Thread Yonik Seeley (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-467?page=comments#action_12357827 ] Yonik Seeley commented on LUCENE-467: - Paul Smith's profiling shows that that encodeNorm() taking 20% of the total indexing time, with floatToIntBits registering all of th

Re: Float.floatToRawIntBits

2005-11-16 Thread Doug Cutting
In general I would not take this sort of profiler output too literally. If floatToRawIntBits is 5x faster, then you'd expect a 16% improvement from using it, but my guess is you'll see far less. Still, it's probably worth switching & measuring as it might be significant. Doug Paul Smith wro

Re: Float.floatToRawIntBits

2005-11-16 Thread Paul Smith
On 17/11/2005, at 9:24 AM, Doug Cutting wrote: In general I would not take this sort of profiler output too literally. If floatToRawIntBits is 5x faster, then you'd expect a 16% improvement from using it, but my guess is you'll see far less. Still, it's probably worth switching & measuri

Re: Float.floatToRawIntBits

2005-11-16 Thread Chris Lamprecht
1. Run profiler 2. Sort methods by CPU time spent 3. Optimize 4. Repeat :) On 11/16/05, Paul Smith <[EMAIL PROTECTED]> wrote: > > On 17/11/2005, at 9:24 AM, Doug Cutting wrote: > > > In general I would not take this sort of profiler output too > > literally. If floatToRawIntBits is 5x faster, th

Re: Float.floatToRawIntBits

2005-11-16 Thread Paul Smith
On 17/11/2005, at 10:21 AM, Chris Lamprecht wrote: 1. Run profiler 2. Sort methods by CPU time spent 3. Optimize 4. Repeat :) Umm, well I know I could make it quicker, it's just whether it still _works_ as expected Maintaining the contract means I'll need to develop some good junit

[jira] Commented: (LUCENE-467) Use Float.floatToRawIntBits over Float.floatToIntBits

2005-11-16 Thread Yonik Seeley (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-467?page=comments#action_12357838 ] Yonik Seeley commented on LUCENE-467: - With -server mode, it's only 3 times as fast, and both are really fairly fast. I do wonder if the profiler had it's numbers right, or

[jira] Commented: (LUCENE-467) Use Float.floatToRawIntBits over Float.floatToIntBits

2005-11-16 Thread Paul Smith (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-467?page=comments#action_12357839 ] Paul Smith commented on LUCENE-467: --- I probably didn't make my testing framework as clear as I should. Yourkit was setup to use method sampling (waking up every X milliseco

[jira] Commented: (LUCENE-467) Use Float.floatToRawIntBits over Float.floatToIntBits

2005-11-16 Thread Yonik Seeley (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-467?page=comments#action_12357851 ] Yonik Seeley commented on LUCENE-467: - Fun with premature optimization! I know this isn't a bottleneck, but here is the fastest floatToByte() that I could come up with: