Also the TermsComponent now can export the docFreq for a list of terms and the numDocs for the index. This can be used as a general purpose mechanism for scoring facets with a callback.
https://issues.apache.org/jira/browse/SOLR-9243 Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Aug 3, 2016 at 8:52 AM, Joel Bernstein <joels...@gmail.com> wrote: > What you're describing is implemented with Graph aggregations in this > ticket using tf-idf. Other scoring methods can be implemented as well. > > https://issues.apache.org/jira/browse/SOLR-9193 > > I'll update this thread with a description of how this can be used with > the facet() streaming expression as well as with graph queries later today. > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Wed, Aug 3, 2016 at 8:18 AM, <heuw...@uni-hildesheim.de> wrote: > >> Dear everybody, >> >> as the JSON-API now makes configuration of facets and sub-facets easier, >> there appears to be a lot of potential to enable instant calculation of >> facet-recommendations for a query, that is, to sort facets by their >> relative importance/interestingess/signficance for a current query relative >> to the complete collection or relative to a result set defined by a >> different query. >> >> An example would be to show the most typical terms which are used in >> descriptions of horror-movies, in contrast to the most popular ones for >> this query, as these may include terms that occur as often in other genres. >> >> This feature has been discussed earlier in the context of solr: >> * >> http://stackoverflow.duapp.com/questions/26399264/how-can-i-sort-facets-by-their-tf-idf-score-rather-than-popularity >> * >> http://lucene.472066.n3.nabble.com/Facets-with-an-IDF-concept-td504070.html >> >> In elasticsearch, the specific feature that I am looking for is called >> Significant Terms Aggregation: >> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html#search-aggregations-bucket-significantterms-aggregation >> >> As of now, I have two questions: >> >> a) Are there workarounds in the current solr-implementation or known >> patches that implement such a sort-option for fields with a large number of >> possible values, e.g. text-fields? (for smaller vocabularies it is easy to >> do this client-side with two queries) >> b) Are there plans to implement this in facet.pivot or in the >> facet.json-API? >> >> The first step could be to define "interestingness" as a sort-option for >> facets and to define interestingness as facet-count in the result-set as >> compared to the complete collection: documentfrequency_termX(bucket) * >> inverse_documentfrequency_termX(collection) >> >> As an extension, the JSON-API could be used to change the domain used as >> base for the comparison. Another interesting option would be to compare >> facet-counts against a current parent-facet for nested facets, e.g. the 5 >> most interesting terms by genre for a query on 70s movies, returning the >> terms specific to horror, comedy, action etc. compared to all terminology >> at the time (i.e. in the parent-query). >> >> A call-back-function could be used to define other measures of >> interestingness such as the log-likelihood-ratio ( >> http://tdunning.blogspot.de/2008/03/surprise-and-coincidence.html). Most >> measures need at least the following 4 values: document-frequency for a >> term for the result-set, document-frequency for the result-set, >> document-frequency for a term in the index (or base-domain), >> document-frequency in the index (or base-domain). >> >> I guess, this feature might be of interest for those who want to do some >> small-scale term-analysis in addition to search, e.g. as in my case in >> digital humanities projects. But it might also be an interesting navigation >> device, e.g. when searching on job-offers to show the skills that are most >> distinctive for a category. >> >> It would be great to know, if others are interested in this feature. If >> there are any implementations out there or if anybody else is working on >> this, a pointer would be a great start. In the absence of existing >> solutions: Perhaps somebody has some idea on where and how to start >> implementing this? >> >> Best regards, >> >> Ben >> >> >> >