[jira] Commented: (SOLR-1365) Add configurable Sweetspot Similarity factory

Hoss Man (JIRA) Wed, 17 Feb 2010 11:16:51 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834938#action_12834938
 ]


Hoss Man commented on SOLR-1365:
--------------------------------

The constraints on what can be SolrCoreAware exist for two main reasons:

 # to ensure some sanity in initialization .. one of the main reasons the 
SolrCoreAware interface was needed in the first place was because some plugins 
wanted to use the SolrCore to get access to other plugins during their 
initialization -- but those other components weren't necessarily initialized 
yet.  with the inform(SolrCore) method SolrCoreAware plugins know that all 
other components have been initialized, but they haven't necessarily been 
informed about the SolrCore, so they might not be "ready" to deal with other 
plugins yet ... it's generally just a big initialization-cluster-fuck, so the 
fewer classes involved the better
 # prevent too much pollution of the SolrCore API.  having direct access to the 
SolrCore is "a big deal" -- once you have a reference to the core, you can get 
to pretty much anything, which opens us (ie: Solr maintainers) up to a lot of 
crazy code paths to worry about -- so the fewer plugin types that we need to 
consider when making changes to SolrCore the better.

In the case of SimilarityFactor, i'm not entirely sure how i feel about making 
it SolrCoreAware(able) ... we have tried really, REALLY hard to make sure 
nothing initialized as part of the IndexSchema can be SolrCore aware because it 
opens up the possibility of plugin behavior being affected by SolrCore 
configuration which might be differnet between master and slave machines -- 
which could provide disastrous results.  a schema.xml needs to be internally 
consistent regardless of what solrconfig.xml might refrence it.

In this case the real issue isn't that we have a use case where 
SImilarityFactory _needs_ access to SolrCore -- what it wants access to is the 
IndexSchema, so it might make sense to just provide access to that in some way 
w/o having to expos the entire SolrCore.

Practically speaking, after re-skimming the patch: I'm not even convinced that 
would eally add anything.  refactoring/reusing some of the *code* that 
IndexSchema uses to manage dynamicFIelds might be handy for the 
SweetSpotSimilarityFactory, but i don't actual see how being able to inspect 
the IndexSchema to get the list of dynamicFields (or find out if a field is 
dynamic) would make it any better or easier to use.  We'd still want people to 
configure it with field names and field name globs directly because there won't 
necessarily be a one to one correspondence between what fields are dynamic in 
the schema and how you want the sweetspots defined ... you might have a generic 
"en_*" dynamicField in your schema for english text, and an "fr_*" dynamicField 
for french text, but that doesn't mean the sweetspot for all "fr_*" fields will 
be the same ... you are just as likely to want some very specific field names 
to have their own sweetspot, or to have the sweetspot be suffix based (ie: 
"*_title" could have one sweetspot even the resulting field names are fr_title 
and en_title.

I think the patch could be improved, and i think there is definitely some code 
reuse possibility for parsing the field name globs, but i don't know that it 
really needs run time access to the IndexSchema (and it definitely doesn't need 
access to the SolrCore)

> Add configurable Sweetspot Similarity factory
> ---------------------------------------------
>
>                 Key: SOLR-1365
>                 URL: https://issues.apache.org/jira/browse/SOLR-1365
>             Project: Solr
>          Issue Type: New Feature
>    Affects Versions: 1.3
>            Reporter: Kevin Osborn
>            Priority: Minor
>             Fix For: 1.5
>
>         Attachments: SOLR-1365.patch
>
>
> This is some code that I wrote a while back.
> Normally, if you use SweetSpotSimilarity, you are going to make it do 
> something useful by extending SweetSpotSimilarity. So, instead, I made a 
> factory class and an configurable SweetSpotSimilarty. There are two classes. 
> SweetSpotSimilarityFactory reads the parameters from schema.xml. It then 
> creates an instance of VariableSweetSpotSimilarity, which is my custom 
> SweetSpotSimilarity class. In addition to the standard functions, it also 
> handles dynamic fields.
> So, in schema.xml, you could have something like this:
> <similarity class="org.apache.solr.schema.SweetSpotSimilarityFactory">
>     <bool name="useHyperbolicTf">true</bool>
>       <float name="hyperbolicTfFactorsMin">1.0</float>
>       <float name="hyperbolicTfFactorsMax">1.5</float>
>       <float name="hyperbolicTfFactorsBase">1.3</float>
>       <float name="hyperbolicTfFactorsXOffset">2.0</float>
>       <int name="lengthNormFactorsMin">1</int>
>       <int name="lengthNormFactorsMax">1</int>
>       <float name="lengthNormFactorsSteepness">0.5</float>
>       <int name="lengthNormFactorsMin_description">2</int>
>       <int name="lengthNormFactorsMax_description">9</int>
>       <float name="lengthNormFactorsSteepness_description">0.2</float>
>       <int name="lengthNormFactorsMin_supplierDescription_*">2</int>
>       <int name="lengthNormFactorsMax_supplierDescription_*">7</int>
>       <float 
> name="lengthNormFactorsSteepness_supplierDescription_*">0.4</float>
>  </similarity>
> So, now everything is in a config file instead of having to create your own 
> subclass.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1365) Add configurable Sweetspot Similarity factory

Reply via email to