Re: TeeTokenFilter performance testing

2007-12-17 Thread Karl Wettin
17 dec 2007 kl. 05.40 skrev Grant Ingersoll: a somewhat common case whereby two or more fields share a fair number of common analysis steps. Right. For the smaller token counts, any performance difference is negligible. However, even at 500 tokens, one starts to see a difference. The

[jira] Assigned: (LUCENE-588) Escaped wildcard character in wildcard term not handled correctly

2007-12-17 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch reassigned LUCENE-588: Assignee: Michael Busch > Escaped wildcard character in wildcard term not handled correctly

[jira] Updated: (LUCENE-588) Escaped wildcard character in wildcard term not handled correctly

2007-12-17 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch updated LUCENE-588: - Lucene Fields: [Patch Available] > Escaped wildcard character in wildcard term not handled correc

[jira] Updated: (LUCENE-588) Escaped wildcard character in wildcard term not handled correctly

2007-12-17 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch updated LUCENE-588: - Priority: Minor (was: Major) > Escaped wildcard character in wildcard term not handled correctly

[jira] Issue Comment Edited: (LUCENE-1091) Big IndexWriter memory leak: when Field.Index.TOKENIZED

2007-12-17 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552634 ] doronc edited comment on LUCENE-1091 at 12/17/07 10:09 PM: I was not able to recreat

[jira] Updated: (LUCENE-1091) Big IndexWriter memory leak: when Field.Index.TOKENIZED

2007-12-17 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-1091: Attachment: TestOOM.java Attached TestOMM, not reproducing the problem on XP, JRE 1.5 > Big Index

[jira] Commented: (LUCENE-1091) Big IndexWriter memory leak: when Field.Index.TOKENIZED

2007-12-17 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552634 ] Doron Cohen commented on LUCENE-1091: - I was not able to recreate this. Can you run the attached TestOOM (it ex

Re: O/S Search Comparisons

2007-12-17 Thread Doron Cohen
On Dec 18, 2007 2:38 AM, Mark Miller <[EMAIL PROTECTED]> wrote: > For the data that I normally work with (short articles), I found that > the sweet spot was around 80-120. I actually saw a slight decrease going > above that...not sure if that held forever though. That was testing on > an earlier r

Re: KeywordTokenizer isn't reusable

2007-12-17 Thread TAKAHASHI hideaki
Hi, Here is the patch for KeywordAnalyzer, KeywordTokenizer, TestKeywordAnalyzer. Thanks, Hideaki, On Dec 17, 2007 6:49 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > Yes please do! Thanks. > > Mike > > > TAKAHASHI hideaki wrote: > > > Hi, all > > > > I found KeywordAnalyzer/KeywordTokeni

Re: O/S Search Comparisons

2007-12-17 Thread Mark Miller
For the data that I normally work with (short articles), I found that the sweet spot was around 80-120. I actually saw a slight decrease going above that...not sure if that held forever though. That was testing on an earlier release (I think 2.1?). However, if you want to test searching it wou

Re: O/S Search Comparisons

2007-12-17 Thread Grant Ingersoll
I did hear back from the authors. Some of the issues were based on values chosen for mergeFactor (10,000) I think, but there also seemed to be some questions about parsing the TREC collection. It was split out into individual files, as opposed to trying to stream in the documents like we

[jira] Created: (LUCENE-1091) Big IndexWriter memory leak: when Field.Index.TOKENIZED

2007-12-17 Thread Mirza Hadzic (JIRA)
Big IndexWriter memory leak: when Field.Index.TOKENIZED --- Key: LUCENE-1091 URL: https://issues.apache.org/jira/browse/LUCENE-1091 Project: Lucene - Java Issue Type: Bug Componen

Re: Background Merges

2007-12-17 Thread Grant Ingersoll
I will try to work up a test case that I can share and will double check that I have all the right pieces in place. -Grant On Dec 17, 2007, at 2:50 PM, Michael McCandless wrote: Yonik Seeley wrote: On Dec 17, 2007 2:15 PM, Michael McCandless <[EMAIL PROTECTED] > wrote: Not good! It's a

Re: Background Merges

2007-12-17 Thread Michael McCandless
Yonik Seeley wrote: On Dec 17, 2007 2:15 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: Not good! It's almost certainly a bug with Lucene, I think, because Solr is just a consumer of Lucene's API, which shouldn't ever cause something like this. Yeah... a solr level commit should just t

Re: Background Merges

2007-12-17 Thread Yonik Seeley
On Dec 17, 2007 2:15 PM, Michael McCandless <[EMAIL PROTECTED]> wrote: > > Not good! > > It's almost certainly a bug with Lucene, I think, because Solr is > just a consumer of Lucene's API, which shouldn't ever cause something > like this. Yeah... a solr level commit should just translate into wri

Re: [Lucene-java Wiki] Update of "PoweredBy" by PietSchmidt

2007-12-17 Thread Daniel Naber
On Montag, 17. Dezember 2007, Apache Wiki wrote: > +  * [http://frauen-kennenlernen.com/ Frauen kennenlernen] - Search > engine using Lucene I don't claim that this is spam, but more and more of the Wiki "PoweredBy" links look like someone just wants a link from the Lucene project, probably to

Re: Background Merges

2007-12-17 Thread Michael McCandless
Not good! It's almost certainly a bug with Lucene, I think, because Solr is just a consumer of Lucene's API, which shouldn't ever cause something like this. Apparently, while merging stored fields, SegmentMerger tried to read too far. Is this easily repeatable? Mike Grant Ingersoll w

Background Merges

2007-12-17 Thread Grant Ingersoll
I am running Lucene trunk with Solr and am getting the exception below when I call Solr's optimize. I will see if I can isolate it to a test case, but thought I would throw it out there if anyone sees anything obvious. In this case, I am adding documents sequentially and then at the end

[jira] Resolved: (LUCENE-1089) Add insertWithOverflow to PriorityQueue

2007-12-17 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1089. Resolution: Fixed Lucene Fields: [New, Patch Available] (was: [Patch Availa

[jira] Updated: (LUCENE-588) Escaped wildcard character in wildcard term not handled correctly

2007-12-17 Thread Terry Yang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Terry Yang updated LUCENE-588: -- Attachment: LUCENE-588.patch > Escaped wildcard character in wildcard term not handled correctly >

[jira] Commented: (LUCENE-588) Escaped wildcard character in wildcard term not handled correctly

2007-12-17 Thread Terry Yang (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552434 ] Terry Yang commented on LUCENE-588: --- I wrote my first patch to this issue. if QueryParser knows the query is wildc

Re: KeywordTokenizer isn't reusable

2007-12-17 Thread Michael McCandless
Yes please do! Thanks. Mike TAKAHASHI hideaki wrote: Hi, all I found KeywordAnalyzer/KeywordTokenizer on trunk has a problem. These have a condition(tokenStreams in Analyzer and done in KeywordTokenizer), but these don't reset the condition. So KeywordAnalyzer can't analyze a field more