Hudson Upgrade Dec 19

2007-12-18 Thread Nigel Daley
I'd like to upgrade Hudson (http://lucene.zones.apache.org:8080/ hudson/) from 1.136 to 1.161 tomorrow (Dec 19). I'll also be upgrading some existing plugins and installing a new plugin: http://hudson.gotdns.com/wiki/display/HUDSON/SCP+plugin The changelog for Hudson is at https://hudson.

[jira] Commented: (LUCENE-1093) SpanFirstQuery modification to aid term boosting based on position.

2007-12-18 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12553232 ] Hoss Man commented on LUCENE-1093: -- NOTE: this should be an option, not a change to the default behavior (right now

[jira] Commented: (LUCENE-150) [PATCH] DBDirectory implementation

2007-12-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12553222 ] Grant Ingersoll commented on LUCENE-150: Is this still useful in light of the Oracle and Berkeley implementat

[jira] Commented: (LUCENE-150) [PATCH] DBDirectory implementation

2007-12-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12553222 ] Grant Ingersoll commented on LUCENE-150: Is this still useful in light of the Oracle and Berkeley implementat

[jira] Closed: (LUCENE-99) ability to retrieve the number of occurrences for a phrase

2007-12-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll closed LUCENE-99. - Resolution: Won't Fix Assignee: (was: Lucene Developers) SpanNearQuery should handle thi

[jira] Closed: (LUCENE-50) Directory implementation that uses ZIP

2007-12-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll closed LUCENE-50. - Resolution: Won't Fix Assignee: (was: Lucene Developers) The CompoundFile format pretty

[jira] Issue Comment Edited: (LUCENE-25) QueryParser produces empty BooleanQueries when all clauses are stop words

2007-12-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-25?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12553215 ] gsingers edited comment on LUCENE-25 at 12/18/07 7:27 PM: - Attached patch adds a test

[jira] Updated: (LUCENE-25) QueryParser produces empty BooleanQueries when all clauses are stop words

2007-12-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-25?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-25: -- Attachment: LUCENE-25.patch Attached patch adds a test to QP for this. It now seems like QP is

[jira] Commented: (LUCENE-666) TERM1 OR NOT TERM2 does not perform as expected

2007-12-18 Thread Dejan Nenov (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12553213 ] Dejan Nenov commented on LUCENE-666: For Dejan Nenov - please use this new email address: d [at] panaton [dot] c

[jira] Updated: (LUCENE-666) TERM1 OR NOT TERM2 does not perform as expected

2007-12-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-666: --- Comment: was deleted > TERM1 OR NOT TERM2 does not perform as expected >

[jira] Updated: (LUCENE-666) TERM1 OR NOT TERM2 does not perform as expected

2007-12-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-666: --- Comment: was deleted > TERM1 OR NOT TERM2 does not perform as expected >

[jira] Updated: (LUCENE-1068) Invalid behavior of StandardTokenizerImpl

2007-12-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-1068: Attachment: LUCENE-1068.patch Applied patch. Updated some documentation. Changed it to u

Re: O/S Search Comparisons

2007-12-18 Thread Grant Ingersoll
My testing experience has shown around 100 to be good for things like Wikipedia, etc. That is an interesting point to think about in regards to paying the cost once optimize is undertaken and may be worth exploring more. I also wonder how partial optimizes may help. The Javadocs say: Det

[jira] Issue Comment Edited: (LUCENE-1095) StopFilter should have option to incr positionIncrement after stop word

2007-12-18 Thread Steven Rowe (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552916 ] steve_rowe edited comment on LUCENE-1095 at 12/18/07 2:29 PM: --- {quote} Is there a good

[jira] Commented: (LUCENE-1095) StopFilter should have option to incr positionIncrement after stop word

2007-12-18 Thread Steven Rowe (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552916 ] Steven Rowe commented on LUCENE-1095: - {quote} Is there a good reason *not* to change QP to *always* pass the po

[jira] Commented: (LUCENE-1095) StopFilter should have option to incr positionIncrement after stop word

2007-12-18 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552911 ] Doron Cohen commented on LUCENE-1095: - I checked further. Query parser currently ignores the position incremen

[jira] Commented: (LUCENE-1095) StopFilter should have option to incr positionIncrement after stop word

2007-12-18 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552906 ] Hoss Man commented on LUCENE-1095: -- I knew there must have been a good reason why it hadn't been done before, but

[jira] Commented: (LUCENE-1095) StopFilter should have option to incr positionIncrement after stop word

2007-12-18 Thread Erik Hatcher (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552905 ] Erik Hatcher commented on LUCENE-1095: -- I believe QueryParser has been fixed since that first change I made men

[jira] Commented: (LUCENE-1095) StopFilter should have option to incr positionIncrement after stop word

2007-12-18 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552903 ] Doron Cohen commented on LUCENE-1095: - The problem Doug refers to is the effect of this on a Phrase Query. I bel

[jira] Commented: (LUCENE-1095) StopFilter should have option to incr positionIncrement after stop word

2007-12-18 Thread Steven Rowe (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552901 ] Steven Rowe commented on LUCENE-1095: - >From >

Re: Background Merges

2007-12-18 Thread Grant Ingersoll
On Dec 18, 2007, at 2:22 PM, Michael McCandless wrote: Grant Ingersoll wrote: The field that is causing the problem in the stack trace is neither binary nor compressed, nor is it even stored. This would also be possible with the one bug I found on hitting an exception in DocumentsWriter

[jira] Updated: (LUCENE-1094) Exception in DocumentsWriter.addDocument can corrupt stored fields file (fdt)

2007-12-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1094: --- Attachment: LUCENE-1094.patch Attached patch. I plan to commit in a day or two. I

[jira] Created: (LUCENE-1095) StopFilter should have option to incr positionIncrement after stop word

2007-12-18 Thread Hoss Man (JIRA)
StopFilter should have option to incr positionIncrement after stop word --- Key: LUCENE-1095 URL: https://issues.apache.org/jira/browse/LUCENE-1095 Project: Lucene - Java Is

[jira] Created: (LUCENE-1094) Exception in DocumentsWriter.addDocument can corrupt stored fields file (fdt)

2007-12-18 Thread Michael McCandless (JIRA)
Exception in DocumentsWriter.addDocument can corrupt stored fields file (fdt) - Key: LUCENE-1094 URL: https://issues.apache.org/jira/browse/LUCENE-1094 Project: Lucene - Java

[jira] Created: (LUCENE-1093) SpanFirstQuery modification to aid term boosting based on position.

2007-12-18 Thread Peter Keegan (JIRA)
SpanFirstQuery modification to aid term boosting based on position. --- Key: LUCENE-1093 URL: https://issues.apache.org/jira/browse/LUCENE-1093 Project: Lucene - Java Issue Type

Re: Background Merges

2007-12-18 Thread Michael McCandless
Grant Ingersoll wrote: The field that is causing the problem in the stack trace is neither binary nor compressed, nor is it even stored. This would also be possible with the one bug I found on hitting an exception in DocumentsWriter.addDocument. Basically the bug can cause only a subset

Re: Background Merges

2007-12-18 Thread Grant Ingersoll
I don't think I did, but I wasn't really thinking too much about it at the time. Like I said, let's hold off on it and at least we have a record of it for now. Sorry for the noise. On Dec 18, 2007, at 1:30 PM, Yonik Seeley wrote: On Dec 18, 2007 1:09 PM, Grant Ingersoll <[EMAIL PROTECTED

Re: Background Merges

2007-12-18 Thread Yonik Seeley
On Dec 18, 2007 1:09 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > Based on the comment in the if condition, I am assuming the field > numbers are not identical in this clause, which would explain the fact > that the Fields info is being misinterpreted. Did you change the schema then add some m

Re: Background Merges

2007-12-18 Thread Grant Ingersoll
I think the issue is my fault, but I am not exactly sure how it happened. I deleted my index and have not been able to reproduce the problem since. However, here's what I can tell from some debugging I did before that: The field that is causing the problem in the stack trace is neither bi

Re: TeeTokenFilter performance testing

2007-12-18 Thread Karl Wettin
18 dec 2007 kl. 13.26 skrev Grant Ingersoll: I might be missing something here, but why do you clone? Because the Token is changing and I am not saving all Tokens, just the ones changed. Aha! The first thing to note is that TeeTokenFilter (TTF) is much _slower_ in the case that all toke

[jira] Resolved: (LUCENE-1091) Big IndexWriter memory leak: when Field.Index.TOKENIZED

2007-12-18 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-1091. - Resolution: Invalid Not a Lucene bug... > Big IndexWriter memory leak: when Field.Index.TOKENIZ

[jira] Reopened: (LUCENE-1091) Big IndexWriter memory leak: when Field.Index.TOKENIZED

2007-12-18 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen reopened LUCENE-1091: - Reopening just to close with "Invalid" - "Won't fix" suggests a known issue that we are not going t

[jira] Resolved: (LUCENE-1091) Big IndexWriter memory leak: when Field.Index.TOKENIZED

2007-12-18 Thread Mirza Hadzic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mirza Hadzic resolved LUCENE-1091. -- Resolution: Won't Fix This is bug of (probably) JVM running in debug mode, Lucene only expose

[jira] Commented: (LUCENE-1091) Big IndexWriter memory leak: when Field.Index.TOKENIZED

2007-12-18 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552786 ] Doron Cohen commented on LUCENE-1091: - I tried on XP with Java 1.6: {noformat} > java -version java version

[jira] Commented: (LUCENE-1091) Big IndexWriter memory leak: when Field.Index.TOKENIZED

2007-12-18 Thread Mirza Hadzic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552784 ] Mirza Hadzic commented on LUCENE-1091: -- I found the reason: Only when running in *debug* mode (NetBeans) JVM ta

[jira] Resolved: (LUCENE-1045) SortField.AUTO doesn't work with long

2007-12-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved LUCENE-1045. - Resolution: Fixed Lucene Fields: [New, Patch Available] (was: [Patch Available, N

[jira] Updated: (LUCENE-1091) Big IndexWriter memory leak: when Field.Index.TOKENIZED

2007-12-18 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-1091: Attachment: screenshot-1.jpg > Big IndexWriter memory leak: when Field.Index.TOKENIZED > -

[jira] Commented: (LUCENE-1091) Big IndexWriter memory leak: when Field.Index.TOKENIZED

2007-12-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552772 ] Grant Ingersoll commented on LUCENE-1091: - My understanding of virtual memory is this is just the OS being s

[jira] Updated: (LUCENE-1091) Big IndexWriter memory leak: when Field.Index.TOKENIZED

2007-12-18 Thread Mirza Hadzic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mirza Hadzic updated LUCENE-1091: - Attachment: LuceneOOM.PNG > Big IndexWriter memory leak: when Field.Index.TOKENIZED > --

[jira] Commented: (LUCENE-1091) Big IndexWriter memory leak: when Field.Index.TOKENIZED

2007-12-18 Thread Mirza Hadzic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552764 ] Mirza Hadzic commented on LUCENE-1091: -- I am inclined to think this is JVM 6 issue. When running TestOOM with J

[jira] Commented: (LUCENE-1091) Big IndexWriter memory leak: when Field.Index.TOKENIZED

2007-12-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552744 ] Grant Ingersoll commented on LUCENE-1091: - What are your settings for heap size? Are you actually getting a

Re: Background Merges

2007-12-18 Thread Grant Ingersoll
No, there were not any exceptions during indexing. I am still trying to work up some test cases using open documents (i.e. wikipedia) -Grant On Dec 18, 2007, at 6:09 AM, Michael McCandless wrote: Grant, Do you know whether you hit any exceptions while adding docs, before you hit those m

Re: TeeTokenFilter performance testing

2007-12-18 Thread Grant Ingersoll
On Dec 18, 2007, at 2:55 AM, Karl Wettin wrote: 17 dec 2007 kl. 05.40 skrev Grant Ingersoll: a somewhat common case whereby two or more fields share a fair number of common analysis steps. Right. For the smaller token counts, any performance difference is negligible. However, even at

[jira] Commented: (LUCENE-1091) Big IndexWriter memory leak: when Field.Index.TOKENIZED

2007-12-18 Thread Mirza Hadzic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552725 ] Mirza Hadzic commented on LUCENE-1091: -- yes, I am getting info from process viewer/task manager on both Linux a

[jira] Commented: (LUCENE-1091) Big IndexWriter memory leak: when Field.Index.TOKENIZED

2007-12-18 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552707 ] Doron Cohen commented on LUCENE-1091: - Hi Mirza, The log you attached indicates that Java's total memory consump

[jira] Commented: (LUCENE-1091) Big IndexWriter memory leak: when Field.Index.TOKENIZED

2007-12-18 Thread Mirza Hadzic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552704 ] Mirza Hadzic commented on LUCENE-1091: -- I tested Windows XP / JDK 6, but problem is same like in Linux. When pr

Re: Background Merges

2007-12-18 Thread Michael McCandless
Grant, Do you know whether you hit any exceptions while adding docs, before you hit those merge exceptions? I have found one case where an exception that runs back through DocumentsWriter (during addDocument()) can produce a corrupt fdt (stored field) file. I have a test case that shows

[jira] Updated: (LUCENE-1091) Big IndexWriter memory leak: when Field.Index.TOKENIZED

2007-12-18 Thread Mirza Hadzic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mirza Hadzic updated LUCENE-1091: - Attachment: lucene.txt > Big IndexWriter memory leak: when Field.Index.TOKENIZED > -

Re: [Lucene-java Wiki] Update of "PoweredBy" by PietSchmidt

2007-12-18 Thread Chris Hostetter
: I don't claim that this is spam, but more and more of the Wiki "PoweredBy" : links look like someone just wants a link from the Lucene project, : probably to boost their Google ranking. We cannot tell whether these FYI: MoinMoin automaticly adds... ...to every page, so no one is ever g

[jira] Resolved: (LUCENE-1092) KeywordTokenizer/Analyzer cannot be re-used

2007-12-18 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1092. Resolution: Fixed I just committed this. Thanks Hideaki! > KeywordTokenizer/Anal

[jira] Created: (LUCENE-1092) KeywordTokenizer/Analyzer cannot be re-used

2007-12-18 Thread Michael McCandless (JIRA)
KeywordTokenizer/Analyzer cannot be re-used --- Key: LUCENE-1092 URL: https://issues.apache.org/jira/browse/LUCENE-1092 Project: Lucene - Java Issue Type: Bug Affects Versions: 2.3 Repo

Re: KeywordTokenizer isn't reusable

2007-12-18 Thread Michael McCandless
Awesome, thanks! I'll commit this. Mike TAKAHASHI hideaki wrote: Hi, Here is the patch for KeywordAnalyzer, KeywordTokenizer, TestKeywordAnalyzer. Thanks, Hideaki, On Dec 17, 2007 6:49 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: Yes please do! Thanks. Mike TAKAHASHI hideak