[jira] Assigned: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reassigned LUCENE-848: -- Assignee: Grant Ingersoll (was: Steven Parkes) > Add supported for Wikipedia English as a corpus in the benchmarker stuff > > > Key: LUCENE-848 > URL: https://issues.apache.org/jira/browse/LUCENE-848 > Project: Lucene - Java > Issue Type: New Feature > Components: contrib/benchmark >Reporter: Steven Parkes > Assigned To: Grant Ingersoll >Priority: Minor > Fix For: 2.2 > > Attachments: LUCENE-848.txt, WikipediaHarvester.java > > > Add support for using Wikipedia for benchmarking. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-863) Deprecate StandardBenchmarker and "old" benchmarker code in favor of the Task based approach
Deprecate StandardBenchmarker and "old" benchmarker code in favor of the Task based approach Key: LUCENE-863 URL: https://issues.apache.org/jira/browse/LUCENE-863 Project: Lucene - Java Issue Type: Task Components: contrib/benchmark Reporter: Grant Ingersoll Priority: Minor We should deprecate the StandardBechmarker code that was the start of the benchmark contribution in favor of the much easier to use/extend byTask benchmark code -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-736) Sloppy Phrase Scoring Misbehavior
[ https://issues.apache.org/jira/browse/LUCENE-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489187 ] Otis Gospodnetic commented on LUCENE-736: - Doron, sounds like this is ripe for a commit now to take care of both this and LUCENE-697. > Sloppy Phrase Scoring Misbehavior > - > > Key: LUCENE-736 > URL: https://issues.apache.org/jira/browse/LUCENE-736 > Project: Lucene - Java > Issue Type: Bug > Components: Search >Reporter: Doron Cohen > Assigned To: Doron Cohen >Priority: Minor > Attachments: perf-search-new.log, perf-search-orig.log, > res-search-new2.log, res-search-orig2.log, sloppy_phrase.patch2.txt, > sloppy_phrase.patch3.txt, sloppy_phrase_java.patch.txt, > sloppy_phrase_tests.patch.txt > > > This is an extension of https://issues.apache.org/jira/browse/LUCENE-697 > In addition to abnormalities Yonik pointed out in 697, there seem to be other > issues with slopy phrase search and scoring. > 1) A phrase with a repeated word would be detected in a document although it > is not there. > I.e. document = A B D C E , query = "B C B" would not find this document (as > expected), but query "B C B"~2 would find it. > I think that no matter how large the slop is, this document should not be a > match. > 2) A document containing both orders of a query, symmetrically, would score > differently for the queru and for its reveresed form. > I.e. document = A B C B A would score differently for queries "B C"~2 and "C > B"~2, although it is symmetric to both. > I will attach test cases that show both these problems and the one reported > by Yonik in 697. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-730) Restore top level disjunction performance
[ https://issues.apache.org/jira/browse/LUCENE-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489199 ] Otis Gospodnetic commented on LUCENE-730: - Paul, what is special about the number 32 here (BooleanScorer2): +if ((requiredScorers.size() == 0) && +prohibitedScorers.size() < 32) { + // fall back to BooleanScorer, scores documents somewhat out of order + BooleanScorer bs = new BooleanScorer(getSimilarity(), minNrShouldMatch); Why can we use BooleanScorer if there are less than 32 prohibited clauses, but not otherwise? Thanks. > Restore top level disjunction performance > - > > Key: LUCENE-730 > URL: https://issues.apache.org/jira/browse/LUCENE-730 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Reporter: Paul Elschot >Priority: Minor > Attachments: TopLevelDisjunction20061127.patch > > > This patch restores the performance of top level disjunctions. > The introduction of BooleanScorer2 had impacted this as reported > on java-user on 21 Nov 2006 by Stanislav Jordanov. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-730) Restore top level disjunction performance
[ https://issues.apache.org/jira/browse/LUCENE-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489206 ] Yonik Seeley commented on LUCENE-730: - 32 is the max number of required + prohibited clauses in the orig BooleanScorer (because it uses an int as a bitfield for each document in the current id range being considered). > Restore top level disjunction performance > - > > Key: LUCENE-730 > URL: https://issues.apache.org/jira/browse/LUCENE-730 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Reporter: Paul Elschot >Priority: Minor > Attachments: TopLevelDisjunction20061127.patch > > > This patch restores the performance of top level disjunctions. > The introduction of BooleanScorer2 had impacted this as reported > on java-user on 21 Nov 2006 by Stanislav Jordanov. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-730) Restore top level disjunction performance
[ https://issues.apache.org/jira/browse/LUCENE-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489229 ] Paul Elschot commented on LUCENE-730: - Further to Yonik's answer, I have not done any tests with prohibited scorers comparing BooleanScorer and BooleanScorer2. It is quite possible that using skipTo() on any prohibited scorer (via BooleanScorer2) is generally faster than using BooleanScorer. Prohibited clauses in queries are quite seldom, so it is going to be difficult to find out whether a smaller value than 32 would be generally optimal. > Restore top level disjunction performance > - > > Key: LUCENE-730 > URL: https://issues.apache.org/jira/browse/LUCENE-730 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Reporter: Paul Elschot >Priority: Minor > Attachments: TopLevelDisjunction20061127.patch > > > This patch restores the performance of top level disjunctions. > The introduction of BooleanScorer2 had impacted this as reported > on java-user on 21 Nov 2006 by Stanislav Jordanov. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-864) contrib/benchmark files need eol-style set
contrib/benchmark files need eol-style set -- Key: LUCENE-864 URL: https://issues.apache.org/jira/browse/LUCENE-864 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark Affects Versions: 2.1 Reporter: Steven Parkes Priority: Minor The following files in contrib/benchmark don't have eol-style set to native, so when they are checked out, they don't get converted. ./build.xml: ./CHANGES.txt: ./conf/sample.alg: ./conf/standard.alg: ./conf/sloppy-phrase.alg: ./conf/deletes.alg: ./conf/micro-standard.alg: ./conf/compound-penalty.alg: -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Parkes updated LUCENE-848: - Attachment: LUCENE-848.txt Update of the previous patch. Used Doron's suggestion for variable name. Cleaned up a little (reverted the eol style on build.txt so the diff makes sense; see LUCENE-864 to for fixing the eol-styles in contrib/benchmark. Right now the test algorithm is wikipedia.alg but I think the idea is to create specific benchmarks, so maybe this should be something like ingest-enwiki meaning a test of ingest rate against wikipedia. > Add supported for Wikipedia English as a corpus in the benchmarker stuff > > > Key: LUCENE-848 > URL: https://issues.apache.org/jira/browse/LUCENE-848 > Project: Lucene - Java > Issue Type: New Feature > Components: contrib/benchmark >Reporter: Steven Parkes > Assigned To: Grant Ingersoll >Priority: Minor > Fix For: 2.2 > > Attachments: LUCENE-848.txt, LUCENE-848.txt, WikipediaHarvester.java > > > Add support for using Wikipedia for benchmarking. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Packaging Lucene 2.1.0 for Debian; found 2 junit errors
Sami Siren wrote: I also saw those when I did my maven trials. I didn't dig any deeper. Fixed the highlighter problem in this report - see change here: http://svn.apache.org/viewvc?view=rev&revision=529417. Cheers, Mark - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Assigned: (LUCENE-864) contrib/benchmark files need eol-style set
[ https://issues.apache.org/jira/browse/LUCENE-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen reassigned LUCENE-864: -- Assignee: Doron Cohen > contrib/benchmark files need eol-style set > -- > > Key: LUCENE-864 > URL: https://issues.apache.org/jira/browse/LUCENE-864 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/benchmark >Affects Versions: 2.1 >Reporter: Steven Parkes > Assigned To: Doron Cohen >Priority: Minor > > The following files in contrib/benchmark don't have eol-style set to native, > so when they are checked out, they don't get converted. > ./build.xml: > ./CHANGES.txt: > ./conf/sample.alg: > > ./conf/standard.alg: > > ./conf/sloppy-phrase.alg: > > ./conf/deletes.alg: > > ./conf/micro-standard.alg: > > ./conf/compound-penalty.alg: > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Fwd: Call for Papers Opens for ApacheCon US 2007
The one valid use of cross-posting... Begin forwarded message: From: Rich Bowen <[EMAIL PROTECTED]> Date: April 16, 2007 10:50:54 AM EDT To: [EMAIL PROTECTED] Subject: Call for Papers Opens for ApacheCon US 2007 Reply-To: [EMAIL PROTECTED] Reply-To: [EMAIL PROTECTED] PMCs, please send this announcement to your various users@ and devs@ mailing lists, as appropriate for your particular community. Remember, your project can only be represented at ApacheCon if your community submits talks proposals: Call for Papers Opens for ApacheCon US 2007 The Call for Papers is now open for ApacheCon US, to be held November 12-16 at the Peachtree Westin, Atlanta. The conference will consist of two day of tutorials (November 12-13) and three days of regular conference sessions (November 14-16). Please log in to the website at http://apachecon.com/html/ login.html to submit your proposal. Further details about fees and are avaialable on the CFP form. Topics appropriate for submission to this conference are manifold, and may include but are not restricted to: * ASF projects * ASF-Incubated projects * Scripting languages and dynamic content such as Java, Perl, Python, Ruby, XSL, and PHP * New technologies and broader initiatives such as Web Services and Web 2.0 * Security and e-commerce, performance tuning, load balancing, and high availability * Business and community issues surrounding the ASF and Open Source The paper submission deadline is Monday, 28 April 2007, Midnight GMT. Thanks, and we hope to hear from you, and to see you in Atlanta. -- The ApacheCon Planners [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff
[ https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489283 ] Steven Parkes commented on LUCENE-848: -- Blah. This patch doesn't work quite right with 1.4. My intention was/is to use xerces to do the xml parsing but the setup doesn't work quite right under 1.4 which has some crimson stuff in rt.jar that I don't (yet) understand. > Add supported for Wikipedia English as a corpus in the benchmarker stuff > > > Key: LUCENE-848 > URL: https://issues.apache.org/jira/browse/LUCENE-848 > Project: Lucene - Java > Issue Type: New Feature > Components: contrib/benchmark >Reporter: Steven Parkes > Assigned To: Grant Ingersoll >Priority: Minor > Fix For: 2.2 > > Attachments: LUCENE-848.txt, LUCENE-848.txt, WikipediaHarvester.java > > > Add support for using Wikipedia for benchmarking. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]