[
https://issues.apache.org/jira/browse/SOLR-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834938#action_12834938
]
Hoss Man commented on SOLR-1365:
--------------------------------
The constraints on what can be SolrCoreAware exist for two main reasons:
# to ensure some sanity in initialization .. one of the main reasons the
SolrCoreAware interface was needed in the first place was because some plugins
wanted to use the SolrCore to get access to other plugins during their
initialization -- but those other components weren't necessarily initialized
yet. with the inform(SolrCore) method SolrCoreAware plugins know that all
other components have been initialized, but they haven't necessarily been
informed about the SolrCore, so they might not be "ready" to deal with other
plugins yet ... it's generally just a big initialization-cluster-fuck, so the
fewer classes involved the better
# prevent too much pollution of the SolrCore API. having direct access to the
SolrCore is "a big deal" -- once you have a reference to the core, you can get
to pretty much anything, which opens us (ie: Solr maintainers) up to a lot of
crazy code paths to worry about -- so the fewer plugin types that we need to
consider when making changes to SolrCore the better.
In the case of SimilarityFactor, i'm not entirely sure how i feel about making
it SolrCoreAware(able) ... we have tried really, REALLY hard to make sure
nothing initialized as part of the IndexSchema can be SolrCore aware because it
opens up the possibility of plugin behavior being affected by SolrCore
configuration which might be differnet between master and slave machines --
which could provide disastrous results. a schema.xml needs to be internally
consistent regardless of what solrconfig.xml might refrence it.
In this case the real issue isn't that we have a use case where
SImilarityFactory _needs_ access to SolrCore -- what it wants access to is the
IndexSchema, so it might make sense to just provide access to that in some way
w/o having to expos the entire SolrCore.
Practically speaking, after re-skimming the patch: I'm not even convinced that
would eally add anything. refactoring/reusing some of the *code* that
IndexSchema uses to manage dynamicFIelds might be handy for the
SweetSpotSimilarityFactory, but i don't actual see how being able to inspect
the IndexSchema to get the list of dynamicFields (or find out if a field is
dynamic) would make it any better or easier to use. We'd still want people to
configure it with field names and field name globs directly because there won't
necessarily be a one to one correspondence between what fields are dynamic in
the schema and how you want the sweetspots defined ... you might have a generic
"en_*" dynamicField in your schema for english text, and an "fr_*" dynamicField
for french text, but that doesn't mean the sweetspot for all "fr_*" fields will
be the same ... you are just as likely to want some very specific field names
to have their own sweetspot, or to have the sweetspot be suffix based (ie:
"*_title" could have one sweetspot even the resulting field names are fr_title
and en_title.
I think the patch could be improved, and i think there is definitely some code
reuse possibility for parsing the field name globs, but i don't know that it
really needs run time access to the IndexSchema (and it definitely doesn't need
access to the SolrCore)
> Add configurable Sweetspot Similarity factory
> ---------------------------------------------
>
> Key: SOLR-1365
> URL: https://issues.apache.org/jira/browse/SOLR-1365
> Project: Solr
> Issue Type: New Feature
> Affects Versions: 1.3
> Reporter: Kevin Osborn
> Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1365.patch
>
>
> This is some code that I wrote a while back.
> Normally, if you use SweetSpotSimilarity, you are going to make it do
> something useful by extending SweetSpotSimilarity. So, instead, I made a
> factory class and an configurable SweetSpotSimilarty. There are two classes.
> SweetSpotSimilarityFactory reads the parameters from schema.xml. It then
> creates an instance of VariableSweetSpotSimilarity, which is my custom
> SweetSpotSimilarity class. In addition to the standard functions, it also
> handles dynamic fields.
> So, in schema.xml, you could have something like this:
> <similarity class="org.apache.solr.schema.SweetSpotSimilarityFactory">
> <bool name="useHyperbolicTf">true</bool>
> <float name="hyperbolicTfFactorsMin">1.0</float>
> <float name="hyperbolicTfFactorsMax">1.5</float>
> <float name="hyperbolicTfFactorsBase">1.3</float>
> <float name="hyperbolicTfFactorsXOffset">2.0</float>
> <int name="lengthNormFactorsMin">1</int>
> <int name="lengthNormFactorsMax">1</int>
> <float name="lengthNormFactorsSteepness">0.5</float>
> <int name="lengthNormFactorsMin_description">2</int>
> <int name="lengthNormFactorsMax_description">9</int>
> <float name="lengthNormFactorsSteepness_description">0.2</float>
> <int name="lengthNormFactorsMin_supplierDescription_*">2</int>
> <int name="lengthNormFactorsMax_supplierDescription_*">7</int>
> <float
> name="lengthNormFactorsSteepness_supplierDescription_*">0.4</float>
> </similarity>
> So, now everything is in a config file instead of having to create your own
> subclass.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.