[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities

2011-08-12 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084098#comment-13084098
 ] 

David Mark Nemeskey commented on LUCENE-3220:
-

Robert: Since we use 
[LUCENE-3357|https://issues.apache.org/jira/browse/LUCENE-3357] for testing  
bug fixing, I propose we close this issue. If we decide to implement other 
methods as well, we can do it under a new issue. Or do you have something else 
in mind (such as to rename EasySimilarity to SimilarityBase)?

 Implement various ranking models as Similarities
 

 Key: LUCENE-3220
 URL: https://issues.apache.org/jira/browse/LUCENE-3220
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/query/scoring, core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
  Labels: gsoc, gsoc2011
 Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
 can finally work on implementing the standard ranking models. Currently DFR, 
 BM25 and LM are on the menu.
 Done:
  * {{EasyStats}}: contains all statistics that might be relevant for a 
 ranking algorithm
  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
 DocScorers and as much implementation detail as possible
  * _BM25_: the current mock implementation might be OK
  * _LM_
  * _DFR_
  * The so-called _Information-Based Models_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities

2011-08-09 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081895#comment-13081895
 ] 

Robert Muir commented on LUCENE-3220:
-

Thanks David: I committed this.

 Implement various ranking models as Similarities
 

 Key: LUCENE-3220
 URL: https://issues.apache.org/jira/browse/LUCENE-3220
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/query/scoring, core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
  Labels: gsoc, gsoc2011
 Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
 can finally work on implementing the standard ranking models. Currently DFR, 
 BM25 and LM are on the menu.
 Done:
  * {{EasyStats}}: contains all statistics that might be relevant for a 
 ranking algorithm
  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
 DocScorers and as much implementation detail as possible
  * _BM25_: the current mock implementation might be OK
  * _LM_
  * _DFR_
  * The so-called _Information-Based Models_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities

2011-08-02 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13076171#comment-13076171
 ] 

Robert Muir commented on LUCENE-3220:
-

Hi David, i was thinking for the norm, we could store it like 
DefaultSimilarity. this would make it especially convenient, as you could 
easily use these similarities with the same exact index as one using Lucene's 
default scoring. Also I think (not sure!) by using 1/sqrt we will get better 
quantization from smallfloat?

{noformat}
  public byte computeNorm(FieldInvertState state) {
final int numTerms;
if (discountOverlaps)
  numTerms = state.getLength() - state.getNumOverlap();
else
  numTerms = state.getLength();
return encodeNormValue(state.getBoost() * ((float) (1.0 / 
Math.sqrt(numTerms;
  }
{noformat}

for computations, you have to 'undo' the sqrt() to get the quantized length, 
but thats ok since its only done up-front a single time and tableized, so it 
won't slow anything down.


 Implement various ranking models as Similarities
 

 Key: LUCENE-3220
 URL: https://issues.apache.org/jira/browse/LUCENE-3220
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/query/scoring, core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
  Labels: gsoc, gsoc2011
 Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
 can finally work on implementing the standard ranking models. Currently DFR, 
 BM25 and LM are on the menu.
 Done:
  * {{EasyStats}}: contains all statistics that might be relevant for a 
 ranking algorithm
  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
 DocScorers and as much implementation detail as possible
  * _BM25_: the current mock implementation might be OK
  * _LM_
  * _DFR_
  * The so-called _Information-Based Models_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities

2011-07-25 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070654#comment-13070654
 ] 

David Mark Nemeskey commented on LUCENE-3220:
-

I think I realized what I wanted with numberOfFieldTokens. I was afraid that 
sumTotalTermFreq is affected by norms / index time boost / etc, and I wanted to 
make numberOfFieldTokens to unaffected by those (I don't know now how); only I 
forgot to do so.

But if sumTotalTermFreq is really just the number of tokens in the field, I 
will delete one of them. Not sure which, because for me numberOfFieldTokens 
seems a more descriptive name than sumTotalTermFreq, but the latter is used 
everywhere in Lucene. May I ask your opinion on this question?

 Implement various ranking models as Similarities
 

 Key: LUCENE-3220
 URL: https://issues.apache.org/jira/browse/LUCENE-3220
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
  Labels: gsoc
 Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
 can finally work on implementing the standard ranking models. Currently DFR, 
 BM25 and LM are on the menu.
 Done:
  * {{EasyStats}}: contains all statistics that might be relevant for a 
 ranking algorithm
  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
 DocScorers and as much implementation detail as possible
  * _BM25_: the current mock implementation might be OK
  * _LM_
  * _DFR_
  * The so-called _Information-Based Models_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities

2011-07-25 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070656#comment-13070656
 ] 

Robert Muir commented on LUCENE-3220:
-

{quote}
Not sure which, because for me numberOfFieldTokens seems a more descriptive 
name than sumTotalTermFreq
{quote}

I think I agree with you: in the context of stats for scoring this might be the 
way to go, as its easier to understand.

 Implement various ranking models as Similarities
 

 Key: LUCENE-3220
 URL: https://issues.apache.org/jira/browse/LUCENE-3220
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
  Labels: gsoc
 Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
 can finally work on implementing the standard ranking models. Currently DFR, 
 BM25 and LM are on the menu.
 Done:
  * {{EasyStats}}: contains all statistics that might be relevant for a 
 ranking algorithm
  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
 DocScorers and as much implementation detail as possible
  * _BM25_: the current mock implementation might be OK
  * _LM_
  * _DFR_
  * The so-called _Information-Based Models_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities

2011-07-20 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13068303#comment-13068303
 ] 

Robert Muir commented on LUCENE-3220:
-

Thanks David: i committed this.

 Implement various ranking models as Similarities
 

 Key: LUCENE-3220
 URL: https://issues.apache.org/jira/browse/LUCENE-3220
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
  Labels: gsoc
 Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
 can finally work on implementing the standard ranking models. Currently DFR, 
 BM25 and LM are on the menu.
 Done:
  * {{EasyStats}}: contains all statistics that might be relevant for a 
 ranking algorithm
  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
 DocScorers and as much implementation detail as possible
  * _BM25_: the current mock implementation might be OK
  * _LM_
  * _DFR_
  * The so-called _Information-Based Models_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities

2011-07-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13065896#comment-13065896
 ] 

Robert Muir commented on LUCENE-3220:
-

Hi David, this is looking really good! The patch is quite large so what i did 
was:
# re-sync flexscoring branch to trunk
# commit your patch as is (i did a tiny tweak for LUCENE-3299)

I saw a couple things we should address (full review will really mean i have to 
take quite a bit of time for each model!)
But we can take care of some of this easy stuff first!

* numberOfFieldTokens seems to be the same as sumOfTotalTF? we should only have 
one name for this stat i think
* i like the idea of NoAfterAffect/NoNormalization in DFR, maybe we should make 
these ordinary classes, and in DFR we just don't allow null for any of the 
components? just thought it might look cleaner.
* some of the files in .similarities need apache license header.
* because we dont need the norm for averaging, maybe we should use lucene's 
encoding? we can pre-build the decode table like TF-IDF similarity, except our 
decode table is basically 1 / decode(float)^2 to give us the quantized doc 
length. from a practical perspective, this would mean someone could use this 
stuff with existing lucene indexes (once they upgrade their segments to 4.0's 
format), and easily switch between things without reindexing.
 
if you want, you can do these things on this issue or open separate issues, 
whichever is easiest. but i think looking at smaller patches at this point will 
make iteration easier!

 Implement various ranking models as Similarities
 

 Key: LUCENE-3220
 URL: https://issues.apache.org/jira/browse/LUCENE-3220
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
  Labels: gsoc
 Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
 can finally work on implementing the standard ranking models. Currently DFR, 
 BM25 and LM are on the menu.
 Done:
  * {{EasyStats}}: contains all statistics that might be relevant for a 
 ranking algorithm
  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
 DocScorers and as much implementation detail as possible
  * _BM25_: the current mock implementation might be OK
  * _LM_
  * _DFR_
  * The so-called _Information-Based Models_

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities

2011-07-06 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060680#comment-13060680
 ] 

Robert Muir commented on LUCENE-3220:
-

Hi David: I had some ideas on stats to simplify some of these sims:
# I think we can use an easier way to compute average document length: 
sumTotalTermFreq() / maxDoc(). This way the average is 'exact' and not skewed 
by index-time-boosts, smallfloat quantization, or anything like that.
# To support pivoted unique normalization like lnu.ltc, I think we can solve 
this in a similar way: add sumDocFreq(), which is just a single long, and 
divide this by maxDoc. this gives us avg # of unique terms. I think terrier 
might have a similar stat (#postings or #pointers or something)?

so i think this could make for nice simplifications: especially for switching 
norms completely over to docvalues: we should be able to do #1 immediately 
right now, change the way we compute avgdoclen for e.g. BM25 and DFR.

then in a separate issue i could revert this norm summation stuff to make the 
docvalues integration simpler, and open a new issue for sumDocFreq()


 Implement various ranking models as Similarities
 

 Key: LUCENE-3220
 URL: https://issues.apache.org/jira/browse/LUCENE-3220
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
  Labels: gsoc
 Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
 can finally work on implementing the standard ranking models. Currently DFR, 
 BM25 and LM are on the menu.
 TODO:
  * {{EasyStats}}: contains all statistics that might be relevant for a 
 ranking algorithm
  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
 DocScorers and as much implementation detail as possible
  * _BM25_: the current mock implementation might be OK
  * _LM_
  * _DFR_
 Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053329#comment-13053329
 ] 

Robert Muir commented on LUCENE-3220:
-

Just took a look, a few things that might help:

* yes the maxdoc does not reflect deletions, but neither does things like 
totalTermFreq or docFreq either... so its best to not worry about deletions in 
the scoring and to be consistent and use the stats (e.g. maxDoc, not numDocs) 
that do not take deletions into account.

* for the computeStats(TermContext... termContexts) its wierd to sum the DF 
across the different terms in the case? But i don't honestly have any 
suggestions here... maybe in this case we should make a EasyPhraseStats that 
computes the EasyStats for each term, so its not hiding anything or limiting 
anyone? and you could then do an instanceof check and have a different method 
like scorePhrase() that it forwards to in case its an EasyPhraseStats? In 
general i'm not sure how other ranking systems tend to handle this case, the 
phrase estimation for IDF in lucene's formula is done by summing the IDFs


 Implement various ranking models as Similarities
 

 Key: LUCENE-3220
 URL: https://issues.apache.org/jira/browse/LUCENE-3220
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
  Labels: gsoc
 Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, 
 LUCENE-3220.patch, LUCENE-3220.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
 can finally work on implementing the standard ranking models. Currently DFR, 
 BM25 and LM are on the menu.
 TODO:
  * {{EasyStats}}: contains all statistics that might be relevant for a 
 ranking algorithm
  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
 DocScorers and as much implementation detail as possible
  * _BM25_: the current mock implementation might be OK
  * _LM_
  * _DFR_
 Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-20 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052019#comment-13052019
 ] 

Robert Muir commented on LUCENE-3220:
-

a few comments (it generally looks close to me):
* maybe we should use 'numberOfDocuments' instead of 'docNo' and same with 
'numberOfFieldTokens'? this might make the naming more clear
* i'm worried about 'uniqueTermCount', do you know of which implementations 
require this? this number is not accurate if the index has more than one 
segment.


 Implement various ranking models as Similarities
 

 Key: LUCENE-3220
 URL: https://issues.apache.org/jira/browse/LUCENE-3220
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
  Labels: gsoc
 Attachments: LUCENE-3220.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
 can finally work on implementing the standard ranking models. Currently DFR, 
 BM25 and LM are on the menu.
 TODO:
  * {{EasyStats}}: contains all statistics that might be relevant for a 
 ranking algorithm
  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
 DocScorers and as much implementation detail as possible
  * _BM25_: the current mock implementation might be OK
  * _LM_
  * _DFR_
 Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-20 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052025#comment-13052025
 ] 

David Mark Nemeskey commented on LUCENE-3220:
-

 * I was wondering about that too -- actually docNo is a mistake, it should 
have been noDocs or noOfDocs anyway, but I guess I'll just go with 
numberOfDocuments.
 * I'll put a nocommit there for the time being, and if no sims use it, I'll 
just remove it from the Stats. Terrier has it, though, so I guess there should 
be at least one method that depends on it.

 Implement various ranking models as Similarities
 

 Key: LUCENE-3220
 URL: https://issues.apache.org/jira/browse/LUCENE-3220
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
  Labels: gsoc
 Attachments: LUCENE-3220.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
 can finally work on implementing the standard ranking models. Currently DFR, 
 BM25 and LM are on the menu.
 TODO:
  * {{EasyStats}}: contains all statistics that might be relevant for a 
 ranking algorithm
  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
 DocScorers and as much implementation detail as possible
  * _BM25_: the current mock implementation might be OK
  * _LM_
  * _DFR_
 Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-20 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052029#comment-13052029
 ] 

Robert Muir commented on LUCENE-3220:
-

bq. I'll put a nocommit there for the time being, and if no sims use it, I'll 
just remove it from the Stats. Terrier has it, though, so I guess there should 
be at least one method that depends on it.

I've never seen one that did... I don't imagine us ever implementing this 
efficiently given that we support incremental indexing.


 Implement various ranking models as Similarities
 

 Key: LUCENE-3220
 URL: https://issues.apache.org/jira/browse/LUCENE-3220
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
  Labels: gsoc
 Attachments: LUCENE-3220.patch, LUCENE-3220.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
 can finally work on implementing the standard ranking models. Currently DFR, 
 BM25 and LM are on the menu.
 TODO:
  * {{EasyStats}}: contains all statistics that might be relevant for a 
 ranking algorithm
  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
 DocScorers and as much implementation detail as possible
  * _BM25_: the current mock implementation might be OK
  * _LM_
  * _DFR_
 Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-20 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052032#comment-13052032
 ] 

Robert Muir commented on LUCENE-3220:
-

oh two more nitpicky comments: 
* can you update the patch to use two-spaces instead of tabs? if you use 
eclipse, you can download this and configure this as your default codestyle: 
http://people.apache.org/~rmuir/Eclipse-Lucene-Codestyle.xml
* can you also remove the @author? For legal reasons (i think actually for your 
protection!) we omit these from new files.
* it might be a good idea to use the tag @lucene.experimental also for new 
classes: this is a template that 'ant-javadocs' replaces with WARNING: This 
API is experimental and might change in incompatible ways in the next release. 
to tell users that its very new and not to expect precise backwards 
compatibility.


 Implement various ranking models as Similarities
 

 Key: LUCENE-3220
 URL: https://issues.apache.org/jira/browse/LUCENE-3220
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
  Labels: gsoc
 Attachments: LUCENE-3220.patch, LUCENE-3220.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
 can finally work on implementing the standard ranking models. Currently DFR, 
 BM25 and LM are on the menu.
 TODO:
  * {{EasyStats}}: contains all statistics that might be relevant for a 
 ranking algorithm
  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
 DocScorers and as much implementation detail as possible
  * _BM25_: the current mock implementation might be OK
  * _LM_
  * _DFR_
 Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3220) Implement various ranking models as Similarities

2011-06-20 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052052#comment-13052052
 ] 

Robert Muir commented on LUCENE-3220:
-

one last thing, can we do 'numberOfFieldTokens' instead of noFieldTokens? 

then I think we can commit this as a step, should make things a lot easier for 
experimentation, if you are new to lucene it will make life much easier.


 Implement various ranking models as Similarities
 

 Key: LUCENE-3220
 URL: https://issues.apache.org/jira/browse/LUCENE-3220
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/search
Affects Versions: flexscoring branch
Reporter: David Mark Nemeskey
Assignee: David Mark Nemeskey
  Labels: gsoc
 Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we 
 can finally work on implementing the standard ranking models. Currently DFR, 
 BM25 and LM are on the menu.
 TODO:
  * {{EasyStats}}: contains all statistics that might be relevant for a 
 ranking algorithm
  * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the 
 DocScorers and as much implementation detail as possible
  * _BM25_: the current mock implementation might be OK
  * _LM_
  * _DFR_
 Done:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org