[jira] [Commented] (SOLR-2754) create Solr similarity factories for new ranking algorithms

2011-09-18 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13107489#comment-13107489
 ] 

Robert Muir commented on SOLR-2754:
---

Thanks David, I took your patch and I'm adding tests right now, will upload a 
new patch soon.

 create Solr similarity factories for new ranking algorithms
 ---

 Key: SOLR-2754
 URL: https://issues.apache.org/jira/browse/SOLR-2754
 Project: Solr
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: SOLR-2754.patch, SOLR-2754.patch


 To make it easy to use some of the new ranking algorithms, we should add 
 factories to solr:
 * for parametric models like LM and BM25 so that parameters can be set from 
 schema.xml
 * for framework models like DFR and IB, so that different basic 
 models/normalizations/lambdas can be chosen

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2754) create Solr similarity factories for new ranking algorithms

2011-09-17 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13107069#comment-13107069
 ] 

David Mark Nemeskey commented on SOLR-2754:
---

bq. Well, we can do both: we can provide these basic parameters as default 
values to be friendly, but at the same time in the test or example xml 
configurations that use these, our examples can have the parameters set.

That's a good idea. I could modify the patch if you want to, and also break the 
long lines into two in the meantime.

 create Solr similarity factories for new ranking algorithms
 ---

 Key: SOLR-2754
 URL: https://issues.apache.org/jira/browse/SOLR-2754
 Project: Solr
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: SOLR-2754.patch


 To make it easy to use some of the new ranking algorithms, we should add 
 factories to solr:
 * for parametric models like LM and BM25 so that parameters can be set from 
 schema.xml
 * for framework models like DFR and IB, so that different basic 
 models/normalizations/lambdas can be chosen

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2754) create Solr similarity factories for new ranking algorithms

2011-09-17 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13107116#comment-13107116
 ] 

Robert Muir commented on SOLR-2754:
---

+1

 create Solr similarity factories for new ranking algorithms
 ---

 Key: SOLR-2754
 URL: https://issues.apache.org/jira/browse/SOLR-2754
 Project: Solr
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: SOLR-2754.patch


 To make it easy to use some of the new ranking algorithms, we should add 
 factories to solr:
 * for parametric models like LM and BM25 so that parameters can be set from 
 schema.xml
 * for framework models like DFR and IB, so that different basic 
 models/normalizations/lambdas can be chosen

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2754) create Solr similarity factories for new ranking algorithms

2011-09-16 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13106683#comment-13106683
 ] 

David Mark Nemeskey commented on SOLR-2754:
---

Robert, I've reviewed the patch. Even though I don't have any experience with 
Solr, the code is very clear, well documented and easy to understand. I have 
the following observations (or questions, more like):

1. {{LMDirichletSimilarity}} has a mu-less constructor. Maybe we could avoid 
defining a constant in two places if we used that? E.g.
{code}
mu = params.getFloat(mu);
...

LMDirichletSimilarity sim = (mu != null) ? new LMDirichletSimilarity(mu)
 : new LMDirichletSimilarity();
{code}
Same goes for H3 and Z.

2. I think it is a nice feature of the new framework that the user can create 
new basic models, normalizations, distributions, etc. and just plug them in to 
{{DFRSimilarity}} or {{IBSimilarity}}. However, these factories can only handle 
those that we have defined ourselves. Wouldn't it be good if we could 
instantiate custom classes via reflection? It could work similarily as in 
Terrier: keep the current code for core models, and use reflection if the user 
specifies a (fully specified) classname.

3. I don't know the Lucene/Solr conventions for line length. There are some 
rather long lines in IB and DFR, but maybe its not a problem?

 create Solr similarity factories for new ranking algorithms
 ---

 Key: SOLR-2754
 URL: https://issues.apache.org/jira/browse/SOLR-2754
 Project: Solr
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: SOLR-2754.patch


 To make it easy to use some of the new ranking algorithms, we should add 
 factories to solr:
 * for parametric models like LM and BM25 so that parameters can be set from 
 schema.xml
 * for framework models like DFR and IB, so that different basic 
 models/normalizations/lambdas can be chosen

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2754) create Solr similarity factories for new ranking algorithms

2011-09-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13106707#comment-13106707
 ] 

Robert Muir commented on SOLR-2754:
---

Thanks for the review David!

{quote}
LMDirichletSimilarity has a mu-less constructor. Maybe we could avoid defining 
a constant in two places if we used that? E.g.

Same goes for H3 and Z.
{quote}

+1, I think it would be good (though probably unavoidable for e.g. BM25's (k1, 
b) to do it this way if we want to provide default parameters.

Alternative, another idea would be for all 'parametric' models to require the 
parameter? Then in the still-to-be-written test config
that tests all these sims, we would just have good default parameters 
specified? Part of me likes this solution: if you are using a parametric
model then it requires you to think about it?

{quote}
Wouldn't it be good if we could instantiate custom classes via reflection?
{quote}

We could add this, e.g. if the parameter doesnt match any of the supplied 
names. But i started thinking about this, say I created NormalizationRob,
and it wants a bunch of parameters... at the end of the day for practical 
purposes a user could just make their own simple factory that uses 
I(F)BRob(2.3, 4.5, 6.99) or whatever they wanted, because I think the intent 
here is to support all of lucene-core's capabilities? 
Its still pluggable in the sense someone can always make their own factory for 
their custom stuff.

{quote}
I don't know the Lucene/Solr conventions for line length. There are some rather 
long lines in IB and DFR, but maybe its not a problem?
{quote}

Yeah maybe especially for the javadocs, but we can probably re-arrange these.


 create Solr similarity factories for new ranking algorithms
 ---

 Key: SOLR-2754
 URL: https://issues.apache.org/jira/browse/SOLR-2754
 Project: Solr
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: SOLR-2754.patch


 To make it easy to use some of the new ranking algorithms, we should add 
 factories to solr:
 * for parametric models like LM and BM25 so that parameters can be set from 
 schema.xml
 * for framework models like DFR and IB, so that different basic 
 models/normalizations/lambdas can be chosen

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2754) create Solr similarity factories for new ranking algorithms

2011-09-16 Thread David Mark Nemeskey (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13106795#comment-13106795
 ] 

David Mark Nemeskey commented on SOLR-2754:
---

bq. Alternative, another idea would be for all 'parametric' models to require 
the parameter? ... Part of me likes this solution: if you are using a 
parametric model then it requires you to think about it?

I can understand the reasoning behind this idea. On the other hand, for some 
models, the parameter has a value that's optimal in a wide range of cases. In 
such cases, I think it we could make the life of the user easier by falling 
back to this value. (Actually, that's why {{LMJelinekMercerSimilarity}} does 
not have a default constructor; there is no single parameter value that is 
kind-of-optimal in all cases).

bq. But i started thinking about this, say I created NormalizationRob, and it 
wants a bunch of parameters...

Yes, I know, it'd be a bit difficult to support that... maybe if all 
Similarities and models had a constructor with a map as a parameter? I'm not 
sure we want that, though.

bq. I think the intent here is to support all of lucene-core's capabilities?

In that case let's forget reflection for now.

 create Solr similarity factories for new ranking algorithms
 ---

 Key: SOLR-2754
 URL: https://issues.apache.org/jira/browse/SOLR-2754
 Project: Solr
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: SOLR-2754.patch


 To make it easy to use some of the new ranking algorithms, we should add 
 factories to solr:
 * for parametric models like LM and BM25 so that parameters can be set from 
 schema.xml
 * for framework models like DFR and IB, so that different basic 
 models/normalizations/lambdas can be chosen

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2754) create Solr similarity factories for new ranking algorithms

2011-09-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13106924#comment-13106924
 ] 

Robert Muir commented on SOLR-2754:
---

{quote}
I can understand the reasoning behind this idea. On the other hand, for some 
models, the parameter has a value that's optimal in a wide range of cases. In 
such cases, I think it we could make the life of the user easier by falling 
back to this value. (Actually, that's why LMJelinekMercerSimilarity does not 
have a default constructor; there is no single parameter value that is 
kind-of-optimal in all cases).
{quote}

Well, we can do both: we can provide these basic parameters as default values 
to be friendly, but at the same time in the test or example xml configurations 
that use these, our examples can have the parameters set. Even in the 
JelinekMercer case, our example can also be set to 0.7, because thats the 
default for long queries and you typically don't use this smoothing for short 
queries (you would usually use Dirichlet instead), at least that was my 
reasoning with the default.

{quote}
Yes, I know, it'd be a bit difficult to support that... maybe if all 
Similarities and models had a constructor with a map as a parameter? I'm not 
sure we want that, though.
{quote}

Yeah, I think we want to have hard type-safe apis for the sims themselves, and 
part of my line of thinking is the case of I'm going to plug in a custom 
normalization into DFR is a pretty expert case for a Solr user at this moment, 
if you are that expert you could also write a 3 LOC sim factory that sets up 
your sim with your custom normalization method.


 create Solr similarity factories for new ranking algorithms
 ---

 Key: SOLR-2754
 URL: https://issues.apache.org/jira/browse/SOLR-2754
 Project: Solr
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: SOLR-2754.patch


 To make it easy to use some of the new ranking algorithms, we should add 
 factories to solr:
 * for parametric models like LM and BM25 so that parameters can be set from 
 schema.xml
 * for framework models like DFR and IB, so that different basic 
 models/normalizations/lambdas can be chosen

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org