[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Got rid of all but one nocommits. Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/query/scoring, core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc, gsoc2011 Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. Done: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ * The so-called _Information-Based Models_ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Added discountOverlaps to EasySimilarity. Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/query/scoring, core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc, gsoc2011 Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. Done: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ * The so-called _Information-Based Models_ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Done. Actually, I wanted to implement the norm table in the way you said, but somehow forgot about it. Two questions remain on my side: * the one about discountOverlaps (see above) * what kind of index-time boosts do people usually use? Too big a boost might cause problems if we just divide the length with it. Maybe we should take the logarithm or sth like that? Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/query/scoring, core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc, gsoc2011 Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. Done: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ * The so-called _Information-Based Models_ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Added a short explanation on the parameter for the Jelinek-Mercer method. Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/query/scoring, core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc, gsoc2011 Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. Done: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ * The so-called _Information-Based Models_ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Deleted the accidentally forgot abstract modifier from the Distribution classes. Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/query/scoring, core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc, gsoc2011 Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. Done: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ * The so-called _Information-Based Models_ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Removed reflection from IBSimilarity. Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/query/scoring, core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc, gsoc2011 Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. Done: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ * The so-called _Information-Based Models_ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Added norm decoding table to EasySimilarity, and removed sumTotalFreq. Sorry I could only upload this patch now but I didn't have time to work on Lucene the last week. As I see, all the problems you mentioned have been corrected, so maybe we can go on with the review? Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. Done: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ * The so-called _Information-Based Models_ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Component/s: core/query/scoring Labels: gsoc gsoc2011 (was: gsoc) Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/query/scoring, core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc, gsoc2011 Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. Done: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ * The so-called _Information-Based Models_ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch EasySimilarity now computes norms in the same way as DefaultSimilarity. Actually not exactly the same way, as I have not yet added the discountOverlaps property. I think it would be a good idea for EasySimilarity as well (it is for phrases, right), what do you reckon? I also wrote a quick test to see which norm (length directly or 1/sqrt) is closer to the original value and it seems that the direct one is usually much closer (RMSE is 0.09689688608375747 vs 0.23787634482532286). Of course, I know it is much more important that the new Similarities can use existing indices. Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/query/scoring, core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc, gsoc2011 Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. Done: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ * The so-called _Information-Based Models_ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Fixed two of the issues you mentioned: * Apache license header added to all files in the similarities package; * cleaned up the constructor of DFRSimilarity and added a few new ones. I have not yet moved the NoNormalization and NoAfterEffect classes to their own files, because I feel a bit uncomfortable about the naming, since it's different from that of the other classes, e.g. NormalizationH2 vs NoNormalization. Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. Done: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ * The so-called _Information-Based Models_ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Made the score() and explain() methods in Similarity components final. Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. Done: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ * The so-called _Information-Based Models_ -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch * log2() moved from DFRSimilarity to EasySimilarity, * changed DFRSimilarity so that it constructor does not use reflection. Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Fixed a few things in MockBM25Similarity. Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Information-based model framework due to Clinchant and Gaussier added. Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Made the signature of EasySimilarity.score() a bit saner. Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Implementation of the DFR framework added. Lots of nocommits, though. I things to think about: * lots of (float) conversions. Maybe the inner API (BasicModel, etc.) could use doubles? According to my experience, double is faster anyway, at least on 64bit architectures * I am not overly happy about the naming scheme, i.e. BasicModelBE, etc. Maybe we should do it the same way as in Terrier, with a basicmodel package and class names like BE? * A regular SimilarityProvider implementation won't play well with DFRSimilarity, in case the user wants to use several different setups. Actually, this is a problem for all similarities that have parameters (e.g. BM25 with b and k). Also, I think we need that NormConverter we talked earlier on irc, so that the Similarities can run on any index. Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch EasySimilarity added. Lots of questions and nocommit in the code. Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Done. Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: (was: LUCENE-3220.patch) Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Done. Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Comment: was deleted (was: Done.) Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Description: With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ Done: was: With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * `EasyStats`: contains all statistics that might be relevant for a ranking algorithm * `EasySimilarity`: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ Done: Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: New Feature Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Issue Type: Sub-task (was: New Feature) Parent: LUCENE-2959 Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch EasyStats object added. Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc Attachments: LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc Attachments: LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3220) Implement various ranking models as Similarities
[ https://issues.apache.org/jira/browse/LUCENE-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Mark Nemeskey updated LUCENE-3220: Attachment: LUCENE-3220.patch Oh, sorry, how lame of me :( Actually I am working now on a different machine than the one I usually do, so that's why I made those mistakes. Anyhow, I have fixed them. Implement various ranking models as Similarities Key: LUCENE-3220 URL: https://issues.apache.org/jira/browse/LUCENE-3220 Project: Lucene - Java Issue Type: Sub-task Components: core/search Affects Versions: flexscoring branch Reporter: David Mark Nemeskey Assignee: David Mark Nemeskey Labels: gsoc Attachments: LUCENE-3220.patch, LUCENE-3220.patch, LUCENE-3220.patch Original Estimate: 336h Remaining Estimate: 336h With [LUCENE-3174|https://issues.apache.org/jira/browse/LUCENE-3174] done, we can finally work on implementing the standard ranking models. Currently DFR, BM25 and LM are on the menu. TODO: * {{EasyStats}}: contains all statistics that might be relevant for a ranking algorithm * {{EasySimilarity}}: the ancestor of all the other similarities. Hides the DocScorers and as much implementation detail as possible * _BM25_: the current mock implementation might be OK * _LM_ * _DFR_ Done: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org