[jira] Updated: (LUCENE-840) contrib/benchmark unit tests

2007-03-20 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-840: --- Attachment: 840-benchmark-tests.patch Attached 840-benchmark-tests.patch adds unit tests. No bugs ex

[jira] Created: (LUCENE-840) contrib/benchmark unit tests

2007-03-20 Thread Doron Cohen (JIRA)
contrib/benchmark unit tests Key: LUCENE-840 URL: https://issues.apache.org/jira/browse/LUCENE-840 Project: Lucene - Java Issue Type: Test Components: contrib/benchmark Reporter: Doron Cohen

Re: [jira] Updated: (LUCENE-725) NovelAnalyzer - wraps your choice of Lucene Analyzer and filters out all "boilerplate" text

2007-03-20 Thread jian chen
Also, how about this scenario. 1) The Analyzer does 100 documents, each with copy right notice inside. I guess in this case, the copy right notices will be removed when indexing. 2) The Analyzer does another 50 documents, each without any copy right notice inside. 3) Then, the Analyzer runs int

Re: [jira] Updated: (LUCENE-725) NovelAnalyzer - wraps your choice of Lucene Analyzer and filters out all "boilerplate" text

2007-03-20 Thread jian chen
Hi, Mark, Your program is very helpful. I am trying to understand your code but it seems would take longer to do that than simply asking you some questions. 1) What is the sliding window used for? It is that the Analyzer remembers the previously seen N tokens, and N is the window size? 2) As th

[jira] Updated: (LUCENE-725) NovelAnalyzer - wraps your choice of Lucene Analyzer and filters out all "boilerplate" text

2007-03-20 Thread Mark Harwood (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Harwood updated LUCENE-725: Attachment: NovelAnalyzer.java Updated version can now process any number of documents and remove

[jira] Resolved: (LUCENE-839) WildcardQuery do not find documents if leading and trailing * is used

2007-03-20 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-839. Resolution: Cannot Reproduce Working correctly in trunk. Checked by modifying TestWildcard.testPar

Re: [jira] Updated: (LUCENE-837) contrib/benchmark QueryMaker and Task Refactorings

2007-03-20 Thread Doron Cohen
Grant Ingersoll <[EMAIL PROTECTED]> wrote on 20/03/2007 05:10:47: > Thanks, Doron. If you good w/ all the changes I will commit tonight. Yes please.. > > We might want to start thinking about Unit Tests... :-) Seems kind > of weird to have tests for tests, but this is becoming sufficiently >

[jira] Assigned: (LUCENE-839) WildcardQuery do not find documents if leading and trailing * is used

2007-03-20 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen reassigned LUCENE-839: -- Assignee: Doron Cohen > WildcardQuery do not find documents if leading and trailing * is used >

[jira] Commented: (LUCENE-839) WildcardQuery do not find documents if leading and trailing * is used

2007-03-20 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12482471 ] Doron Cohen commented on LUCENE-839: I checked - all these queries do work correctly in Lucene trunk. There were

[jira] Resolved: (LUCENE-838) WildcardQuery do not find documents

2007-03-20 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved LUCENE-838. - Resolution: Cannot Reproduce > WildcardQuery do not find documents > ---

[jira] Commented: (LUCENE-838) WildcardQuery do not find documents

2007-03-20 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12482461 ] Hoss Man commented on LUCENE-838: - please do not open Jira bugs without first consulting the java-user list to ensur

[jira] Created: (LUCENE-838) WildcardQuery do not find documents

2007-03-20 Thread Michael Schlegel (JIRA)
WildcardQuery do not find documents --- Key: LUCENE-838 URL: https://issues.apache.org/jira/browse/LUCENE-838 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.1, 2.0.

[jira] Updated: (LUCENE-838) WildcardQuery do not find documents

2007-03-20 Thread Michael Schlegel (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Schlegel updated LUCENE-838: Summary: WildcardQuery do not find documents (was: WildcardQuery do not find documents if

[jira] Updated: (LUCENE-838) WildcardQuery do not find documents if different analyzer will be used.

2007-03-20 Thread Michael Schlegel (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Schlegel updated LUCENE-838: Summary: WildcardQuery do not find documents if different analyzer will be used. (was: Wil

[jira] Created: (LUCENE-839) WildcardQuery do not find documents if leading and trailing * is used

2007-03-20 Thread Michael Schlegel (JIRA)
WildcardQuery do not find documents if leading and trailing * is used - Key: LUCENE-839 URL: https://issues.apache.org/jira/browse/LUCENE-839 Project: Lucene - Java Issue Ty

Re: future releases: Append Function for Indexing

2007-03-20 Thread robert engels
If you have "unique ids" available to you, I think the best solution to accomplish a lot of this would be to use a very simple embedded db to store the documents (we use a version of JDBM). Just store the key as a stored field in the Lucene document, and the document in JDBM. This has the a

future releases: Append Function for Indexing

2007-03-20 Thread Alexander Kern
As for now, when ever the index of a document needs to be updated, the complete document needs to be deleted, then newly indexed & finally added to the index repository. If, however, information merely needed to be added to the existing document (->appended), the described procedure creates a grea

[jira] Assigned: (LUCENE-707) Lucene Java Site docs

2007-03-20 Thread Erik Hatcher (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher reassigned LUCENE-707: --- Assignee: Erik Hatcher (was: Grant Ingersoll) > Lucene Java Site docs >

[jira] Closed: (LUCENE-707) Lucene Java Site docs

2007-03-20 Thread Erik Hatcher (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher closed LUCENE-707. --- Applied, thanks George! > Lucene Java Site docs > - > > Key: LUCENE-

Re: [jira] Commented: (LUCENE-836) Benchmarks Enhancements (precision/recall, TREC, Wikipedia)

2007-03-20 Thread Grant Ingersoll
I think the Reuters corpus is pretty good and it pretty well known in the community. Probably the most important part would be to build up a set of judgments. I don't think it is too hard to come up w/ 50-100 questions/queries, but creating the relevance pool will be more difficult. I su

Re: [jira] Updated: (LUCENE-837) contrib/benchmark QueryMaker and Task Refactorings

2007-03-20 Thread Grant Ingersoll
Thanks, Doron. If you good w/ all the changes I will commit tonight. We might want to start thinking about Unit Tests... :-) Seems kind of weird to have tests for tests, but this is becoming sufficiently complex that it should have some tests. Also, +1 for deprecating and eventually remo

[jira] Commented: (LUCENE-836) Benchmarks Enhancements (precision/recall, TREC, Wikipedia)

2007-03-20 Thread Karl Wettin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12482367 ] Karl Wettin commented on LUCENE-836: Regarding data and user queries, I have a 150 000 document corpus with 4 000

[jira] Updated: (LUCENE-837) contrib/benchmark QueryMaker and Task Refactorings

2007-03-20 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-837: --- Attachment: benchmark-more-updates.patch The changes looks good. I integrated in my changes: - Modi