[jira] [Commented] (SOLR-2754) create Solr similarity factories for new ranking algorithms
[ https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13107489#comment-13107489 ] Robert Muir commented on SOLR-2754: --- Thanks David, I took your patch and I'm adding tests right now, will upload a new patch soon. create Solr similarity factories for new ranking algorithms --- Key: SOLR-2754 URL: https://issues.apache.org/jira/browse/SOLR-2754 Project: Solr Issue Type: New Feature Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Attachments: SOLR-2754.patch, SOLR-2754.patch To make it easy to use some of the new ranking algorithms, we should add factories to solr: * for parametric models like LM and BM25 so that parameters can be set from schema.xml * for framework models like DFR and IB, so that different basic models/normalizations/lambdas can be chosen -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2754) create Solr similarity factories for new ranking algorithms
[ https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13107069#comment-13107069 ] David Mark Nemeskey commented on SOLR-2754: --- bq. Well, we can do both: we can provide these basic parameters as default values to be friendly, but at the same time in the test or example xml configurations that use these, our examples can have the parameters set. That's a good idea. I could modify the patch if you want to, and also break the long lines into two in the meantime. create Solr similarity factories for new ranking algorithms --- Key: SOLR-2754 URL: https://issues.apache.org/jira/browse/SOLR-2754 Project: Solr Issue Type: New Feature Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Attachments: SOLR-2754.patch To make it easy to use some of the new ranking algorithms, we should add factories to solr: * for parametric models like LM and BM25 so that parameters can be set from schema.xml * for framework models like DFR and IB, so that different basic models/normalizations/lambdas can be chosen -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2754) create Solr similarity factories for new ranking algorithms
[ https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13107116#comment-13107116 ] Robert Muir commented on SOLR-2754: --- +1 create Solr similarity factories for new ranking algorithms --- Key: SOLR-2754 URL: https://issues.apache.org/jira/browse/SOLR-2754 Project: Solr Issue Type: New Feature Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Attachments: SOLR-2754.patch To make it easy to use some of the new ranking algorithms, we should add factories to solr: * for parametric models like LM and BM25 so that parameters can be set from schema.xml * for framework models like DFR and IB, so that different basic models/normalizations/lambdas can be chosen -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2754) create Solr similarity factories for new ranking algorithms
[ https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13106683#comment-13106683 ] David Mark Nemeskey commented on SOLR-2754: --- Robert, I've reviewed the patch. Even though I don't have any experience with Solr, the code is very clear, well documented and easy to understand. I have the following observations (or questions, more like): 1. {{LMDirichletSimilarity}} has a mu-less constructor. Maybe we could avoid defining a constant in two places if we used that? E.g. {code} mu = params.getFloat(mu); ... LMDirichletSimilarity sim = (mu != null) ? new LMDirichletSimilarity(mu) : new LMDirichletSimilarity(); {code} Same goes for H3 and Z. 2. I think it is a nice feature of the new framework that the user can create new basic models, normalizations, distributions, etc. and just plug them in to {{DFRSimilarity}} or {{IBSimilarity}}. However, these factories can only handle those that we have defined ourselves. Wouldn't it be good if we could instantiate custom classes via reflection? It could work similarily as in Terrier: keep the current code for core models, and use reflection if the user specifies a (fully specified) classname. 3. I don't know the Lucene/Solr conventions for line length. There are some rather long lines in IB and DFR, but maybe its not a problem? create Solr similarity factories for new ranking algorithms --- Key: SOLR-2754 URL: https://issues.apache.org/jira/browse/SOLR-2754 Project: Solr Issue Type: New Feature Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Attachments: SOLR-2754.patch To make it easy to use some of the new ranking algorithms, we should add factories to solr: * for parametric models like LM and BM25 so that parameters can be set from schema.xml * for framework models like DFR and IB, so that different basic models/normalizations/lambdas can be chosen -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2754) create Solr similarity factories for new ranking algorithms
[ https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13106707#comment-13106707 ] Robert Muir commented on SOLR-2754: --- Thanks for the review David! {quote} LMDirichletSimilarity has a mu-less constructor. Maybe we could avoid defining a constant in two places if we used that? E.g. Same goes for H3 and Z. {quote} +1, I think it would be good (though probably unavoidable for e.g. BM25's (k1, b) to do it this way if we want to provide default parameters. Alternative, another idea would be for all 'parametric' models to require the parameter? Then in the still-to-be-written test config that tests all these sims, we would just have good default parameters specified? Part of me likes this solution: if you are using a parametric model then it requires you to think about it? {quote} Wouldn't it be good if we could instantiate custom classes via reflection? {quote} We could add this, e.g. if the parameter doesnt match any of the supplied names. But i started thinking about this, say I created NormalizationRob, and it wants a bunch of parameters... at the end of the day for practical purposes a user could just make their own simple factory that uses I(F)BRob(2.3, 4.5, 6.99) or whatever they wanted, because I think the intent here is to support all of lucene-core's capabilities? Its still pluggable in the sense someone can always make their own factory for their custom stuff. {quote} I don't know the Lucene/Solr conventions for line length. There are some rather long lines in IB and DFR, but maybe its not a problem? {quote} Yeah maybe especially for the javadocs, but we can probably re-arrange these. create Solr similarity factories for new ranking algorithms --- Key: SOLR-2754 URL: https://issues.apache.org/jira/browse/SOLR-2754 Project: Solr Issue Type: New Feature Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Attachments: SOLR-2754.patch To make it easy to use some of the new ranking algorithms, we should add factories to solr: * for parametric models like LM and BM25 so that parameters can be set from schema.xml * for framework models like DFR and IB, so that different basic models/normalizations/lambdas can be chosen -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2754) create Solr similarity factories for new ranking algorithms
[ https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13106795#comment-13106795 ] David Mark Nemeskey commented on SOLR-2754: --- bq. Alternative, another idea would be for all 'parametric' models to require the parameter? ... Part of me likes this solution: if you are using a parametric model then it requires you to think about it? I can understand the reasoning behind this idea. On the other hand, for some models, the parameter has a value that's optimal in a wide range of cases. In such cases, I think it we could make the life of the user easier by falling back to this value. (Actually, that's why {{LMJelinekMercerSimilarity}} does not have a default constructor; there is no single parameter value that is kind-of-optimal in all cases). bq. But i started thinking about this, say I created NormalizationRob, and it wants a bunch of parameters... Yes, I know, it'd be a bit difficult to support that... maybe if all Similarities and models had a constructor with a map as a parameter? I'm not sure we want that, though. bq. I think the intent here is to support all of lucene-core's capabilities? In that case let's forget reflection for now. create Solr similarity factories for new ranking algorithms --- Key: SOLR-2754 URL: https://issues.apache.org/jira/browse/SOLR-2754 Project: Solr Issue Type: New Feature Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Attachments: SOLR-2754.patch To make it easy to use some of the new ranking algorithms, we should add factories to solr: * for parametric models like LM and BM25 so that parameters can be set from schema.xml * for framework models like DFR and IB, so that different basic models/normalizations/lambdas can be chosen -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2754) create Solr similarity factories for new ranking algorithms
[ https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13106924#comment-13106924 ] Robert Muir commented on SOLR-2754: --- {quote} I can understand the reasoning behind this idea. On the other hand, for some models, the parameter has a value that's optimal in a wide range of cases. In such cases, I think it we could make the life of the user easier by falling back to this value. (Actually, that's why LMJelinekMercerSimilarity does not have a default constructor; there is no single parameter value that is kind-of-optimal in all cases). {quote} Well, we can do both: we can provide these basic parameters as default values to be friendly, but at the same time in the test or example xml configurations that use these, our examples can have the parameters set. Even in the JelinekMercer case, our example can also be set to 0.7, because thats the default for long queries and you typically don't use this smoothing for short queries (you would usually use Dirichlet instead), at least that was my reasoning with the default. {quote} Yes, I know, it'd be a bit difficult to support that... maybe if all Similarities and models had a constructor with a map as a parameter? I'm not sure we want that, though. {quote} Yeah, I think we want to have hard type-safe apis for the sims themselves, and part of my line of thinking is the case of I'm going to plug in a custom normalization into DFR is a pretty expert case for a Solr user at this moment, if you are that expert you could also write a 3 LOC sim factory that sets up your sim with your custom normalization method. create Solr similarity factories for new ranking algorithms --- Key: SOLR-2754 URL: https://issues.apache.org/jira/browse/SOLR-2754 Project: Solr Issue Type: New Feature Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Attachments: SOLR-2754.patch To make it easy to use some of the new ranking algorithms, we should add factories to solr: * for parametric models like LM and BM25 so that parameters can be set from schema.xml * for framework models like DFR and IB, so that different basic models/normalizations/lambdas can be chosen -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org