[jira] Assigned: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-04-16 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll reassigned LUCENE-848:
--

Assignee: Grant Ingersoll  (was: Steven Parkes)

> Add supported for Wikipedia English as a corpus in the benchmarker stuff
> 
>
> Key: LUCENE-848
> URL: https://issues.apache.org/jira/browse/LUCENE-848
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/benchmark
>Reporter: Steven Parkes
> Assigned To: Grant Ingersoll
>Priority: Minor
> Fix For: 2.2
>
> Attachments: LUCENE-848.txt, WikipediaHarvester.java
>
>
> Add support for using Wikipedia for benchmarking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-863) Deprecate StandardBenchmarker and "old" benchmarker code in favor of the Task based approach

2007-04-16 Thread Grant Ingersoll (JIRA)
Deprecate StandardBenchmarker and "old" benchmarker code in favor of the Task 
based approach


 Key: LUCENE-863
 URL: https://issues.apache.org/jira/browse/LUCENE-863
 Project: Lucene - Java
  Issue Type: Task
  Components: contrib/benchmark
Reporter: Grant Ingersoll
Priority: Minor


We should deprecate the StandardBechmarker code that was the start of the 
benchmark contribution in favor of the much easier to use/extend byTask 
benchmark code

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-736) Sloppy Phrase Scoring Misbehavior

2007-04-16 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489187
 ] 

Otis Gospodnetic commented on LUCENE-736:
-

Doron, sounds like this is ripe for a commit now to take care of both this and 
LUCENE-697.


> Sloppy Phrase Scoring Misbehavior
> -
>
> Key: LUCENE-736
> URL: https://issues.apache.org/jira/browse/LUCENE-736
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Reporter: Doron Cohen
> Assigned To: Doron Cohen
>Priority: Minor
> Attachments: perf-search-new.log, perf-search-orig.log, 
> res-search-new2.log, res-search-orig2.log, sloppy_phrase.patch2.txt, 
> sloppy_phrase.patch3.txt, sloppy_phrase_java.patch.txt, 
> sloppy_phrase_tests.patch.txt
>
>
> This is an extension of https://issues.apache.org/jira/browse/LUCENE-697
> In addition to abnormalities Yonik pointed out in 697, there seem to be other 
> issues with slopy phrase search and scoring.
> 1) A phrase with a repeated word would be detected in a document although it 
> is not there.
> I.e. document = A B D C E , query = "B C B" would not find this document (as 
> expected), but query "B C B"~2 would find it. 
> I think that no matter how large the slop is, this document should not be a 
> match.
> 2) A document containing both orders of a query, symmetrically, would score 
> differently for the queru and for its reveresed form.
> I.e. document = A B C B A would score differently for queries "B C"~2 and "C 
> B"~2, although it is symmetric to both.
> I will attach test cases that show both these problems and the one reported 
> by Yonik in 697. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-730) Restore top level disjunction performance

2007-04-16 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489199
 ] 

Otis Gospodnetic commented on LUCENE-730:
-

Paul, what is special about the number 32 here (BooleanScorer2):

+if ((requiredScorers.size() == 0) &&
+prohibitedScorers.size() < 32) {
+  // fall back to BooleanScorer, scores documents somewhat out of order
+  BooleanScorer bs = new BooleanScorer(getSimilarity(), minNrShouldMatch);

Why can we use BooleanScorer if there are less than 32 prohibited clauses, but 
not otherwise?  Thanks.


> Restore top level disjunction performance
> -
>
> Key: LUCENE-730
> URL: https://issues.apache.org/jira/browse/LUCENE-730
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: TopLevelDisjunction20061127.patch
>
>
> This patch restores the performance of top level disjunctions. 
> The introduction of BooleanScorer2 had impacted this as reported
> on java-user on 21 Nov 2006 by Stanislav Jordanov.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-730) Restore top level disjunction performance

2007-04-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489206
 ] 

Yonik Seeley commented on LUCENE-730:
-

32 is the max number of required + prohibited clauses in the orig BooleanScorer 
(because it uses an int as a bitfield for each document in the current id range 
being considered).

> Restore top level disjunction performance
> -
>
> Key: LUCENE-730
> URL: https://issues.apache.org/jira/browse/LUCENE-730
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: TopLevelDisjunction20061127.patch
>
>
> This patch restores the performance of top level disjunctions. 
> The introduction of BooleanScorer2 had impacted this as reported
> on java-user on 21 Nov 2006 by Stanislav Jordanov.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-730) Restore top level disjunction performance

2007-04-16 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489229
 ] 

Paul Elschot commented on LUCENE-730:
-

Further to Yonik's answer, I have not done any tests with prohibited scorers 
comparing BooleanScorer and BooleanScorer2.

It is quite possible that using skipTo() on any prohibited scorer (via 
BooleanScorer2) is generally faster than using BooleanScorer. Prohibited 
clauses in queries are quite seldom, so it is going to be difficult to find out 
whether a smaller value than 32 would be generally optimal.




> Restore top level disjunction performance
> -
>
> Key: LUCENE-730
> URL: https://issues.apache.org/jira/browse/LUCENE-730
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: TopLevelDisjunction20061127.patch
>
>
> This patch restores the performance of top level disjunctions. 
> The introduction of BooleanScorer2 had impacted this as reported
> on java-user on 21 Nov 2006 by Stanislav Jordanov.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-864) contrib/benchmark files need eol-style set

2007-04-16 Thread Steven Parkes (JIRA)
contrib/benchmark files need eol-style set
--

 Key: LUCENE-864
 URL: https://issues.apache.org/jira/browse/LUCENE-864
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Affects Versions: 2.1
Reporter: Steven Parkes
Priority: Minor


The following files in contrib/benchmark don't have eol-style set to native, so 
when they are checked out, they don't get converted.

./build.xml:
./CHANGES.txt: 
./conf/sample.alg:  
  
./conf/standard.alg:
   
./conf/sloppy-phrase.alg:   
  
./conf/deletes.alg: 

./conf/micro-standard.alg:  
 
./conf/compound-penalty.alg:
  


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-04-16 Thread Steven Parkes (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Parkes updated LUCENE-848:
-

Attachment: LUCENE-848.txt

Update of the previous patch. Used Doron's suggestion for variable name. 
Cleaned up a little (reverted the eol style on build.txt so the diff makes 
sense; see LUCENE-864 to for fixing the eol-styles in contrib/benchmark.

Right now the test algorithm is wikipedia.alg but I think the idea is to create 
specific benchmarks, so maybe this should be something like ingest-enwiki 
meaning a test of ingest rate against wikipedia.

> Add supported for Wikipedia English as a corpus in the benchmarker stuff
> 
>
> Key: LUCENE-848
> URL: https://issues.apache.org/jira/browse/LUCENE-848
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/benchmark
>Reporter: Steven Parkes
> Assigned To: Grant Ingersoll
>Priority: Minor
> Fix For: 2.2
>
> Attachments: LUCENE-848.txt, LUCENE-848.txt, WikipediaHarvester.java
>
>
> Add support for using Wikipedia for benchmarking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Packaging Lucene 2.1.0 for Debian; found 2 junit errors

2007-04-16 Thread markharw00d

Sami Siren wrote:


I also saw those when I did my maven trials. I didn't dig any deeper.
  


Fixed the highlighter problem in this report - see change here: 
http://svn.apache.org/viewvc?view=rev&revision=529417.


Cheers,
Mark


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Assigned: (LUCENE-864) contrib/benchmark files need eol-style set

2007-04-16 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen reassigned LUCENE-864:
--

Assignee: Doron Cohen

> contrib/benchmark files need eol-style set
> --
>
> Key: LUCENE-864
> URL: https://issues.apache.org/jira/browse/LUCENE-864
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/benchmark
>Affects Versions: 2.1
>Reporter: Steven Parkes
> Assigned To: Doron Cohen
>Priority: Minor
>
> The following files in contrib/benchmark don't have eol-style set to native, 
> so when they are checked out, they don't get converted.
> ./build.xml:
> ./CHANGES.txt: 
> ./conf/sample.alg:
> 
> ./conf/standard.alg:  
>  
> ./conf/sloppy-phrase.alg: 
> 
> ./conf/deletes.alg:   
>   
> ./conf/micro-standard.alg:
>
> ./conf/compound-penalty.alg:  
> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Fwd: Call for Papers Opens for ApacheCon US 2007

2007-04-16 Thread Erik Hatcher

The one valid use of cross-posting...

Begin forwarded message:


From: Rich Bowen <[EMAIL PROTECTED]>
Date: April 16, 2007 10:50:54 AM EDT
To: [EMAIL PROTECTED]
Subject: Call for Papers Opens for ApacheCon US 2007
Reply-To: [EMAIL PROTECTED]
Reply-To: [EMAIL PROTECTED]

PMCs, please send this announcement to your various users@ and  
devs@ mailing lists, as appropriate for your particular community.  
Remember, your project can only be represented at ApacheCon if your  
community submits talks proposals:






Call for Papers Opens for ApacheCon US 2007

The Call for Papers is now open for ApacheCon US, to be held  
November 12-16 at the Peachtree Westin, Atlanta. The conference  
will consist of two day of tutorials (November 12-13) and three  
days of regular conference sessions (November 14-16).


Please log in to the website at http://apachecon.com/html/ 
login.html to submit your proposal. Further details about fees and  
are avaialable on the CFP form.


Topics appropriate for submission to this conference are manifold,  
and may include but are not restricted to:


* ASF projects
* ASF-Incubated projects
* Scripting languages and dynamic content such as Java, Perl,  
Python, Ruby, XSL, and PHP
* New technologies and broader initiatives such as Web Services and  
Web 2.0
* Security and e-commerce, performance tuning, load balancing, and  
high availability

* Business and community issues surrounding the ASF and Open Source

The paper submission deadline is Monday, 28 April 2007, Midnight GMT.

Thanks, and we hope to hear from you, and to see you in Atlanta.
--
The ApacheCon Planners
[EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-848) Add supported for Wikipedia English as a corpus in the benchmarker stuff

2007-04-16 Thread Steven Parkes (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12489283
 ] 

Steven Parkes commented on LUCENE-848:
--

Blah. This patch doesn't work quite right with 1.4. My intention was/is to use 
xerces to do the xml parsing but the setup doesn't work quite right under 1.4 
which has some crimson stuff in rt.jar that I don't (yet) understand.

> Add supported for Wikipedia English as a corpus in the benchmarker stuff
> 
>
> Key: LUCENE-848
> URL: https://issues.apache.org/jira/browse/LUCENE-848
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: contrib/benchmark
>Reporter: Steven Parkes
> Assigned To: Grant Ingersoll
>Priority: Minor
> Fix For: 2.2
>
> Attachments: LUCENE-848.txt, LUCENE-848.txt, WikipediaHarvester.java
>
>
> Add support for using Wikipedia for benchmarking.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]