[jira] [Commented] (SOLR-12581) add a "min_popularity" option to relatedness() aggregation that forces scores to -Inf if fg/bg pops don't meet a threshold
[ https://issues.apache.org/jira/browse/SOLR-12581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556036#comment-16556036 ] ASF subversion and git services commented on SOLR-12581: Commit 71c0bddd149b7c0364fbba8d31494dcd9f57f1ef in lucene-solr's branch refs/heads/master from Chris Hostetter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=71c0bdd ] SOLR-12581: the JSON Facet 'relatedness()' aggregate function now supports a 'min_popularity' option using the extended type:func syntax > add a "min_popularity" option to relatedness() aggregation that forces scores > to -Inf if fg/bg pops don't meet a threshold > -- > > Key: SOLR-12581 > URL: https://issues.apache.org/jira/browse/SOLR-12581 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module >Reporter: Hoss Man >Assignee: Hoss Man >Priority: Major > Attachments: SOLR-12581.patch, SOLR-12581.patch > > > as discussed in SOLR-9480 and noted in TODO comments, the original SKG code > base had a "min_pop" option that would completely ignore "terms" if the fg/bg > popularities weren't above a user specified threshold. with the > implementation of SKG as a {{relatedness()}} aggregation function, we need to > leave any actual filtering of buckets by an aggregation result to a future > generalized JSON facet enhancement, but what we can do today is implement an > optional {{min_popularity}} option on {{relatedness()}} that forces the > {{relatedness}} score to -Infinity so buckets that don't meat the threshold > at least score "as low as possible" -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12581) add a "min_popularity" option to relatedness() aggregation that forces scores to -Inf if fg/bg pops don't meet a threshold
[ https://issues.apache.org/jira/browse/SOLR-12581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16556035#comment-16556035 ] ASF subversion and git services commented on SOLR-12581: Commit cf9c3c11a28deff188f4edb5ee5cdd0637cdb958 in lucene-solr's branch refs/heads/branch_7x from Chris Hostetter [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cf9c3c1 ] SOLR-12581: the JSON Facet 'relatedness()' aggregate function now supports a 'min_popularity' option using the extended type:func syntax (cherry picked from commit 71c0bddd149b7c0364fbba8d31494dcd9f57f1ef) > add a "min_popularity" option to relatedness() aggregation that forces scores > to -Inf if fg/bg pops don't meet a threshold > -- > > Key: SOLR-12581 > URL: https://issues.apache.org/jira/browse/SOLR-12581 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Facet Module >Reporter: Hoss Man >Assignee: Hoss Man >Priority: Major > Attachments: SOLR-12581.patch, SOLR-12581.patch > > > as discussed in SOLR-9480 and noted in TODO comments, the original SKG code > base had a "min_pop" option that would completely ignore "terms" if the fg/bg > popularities weren't above a user specified threshold. with the > implementation of SKG as a {{relatedness()}} aggregation function, we need to > leave any actual filtering of buckets by an aggregation result to a future > generalized JSON facet enhancement, but what we can do today is implement an > optional {{min_popularity}} option on {{relatedness()}} that forces the > {{relatedness}} score to -Infinity so buckets that don't meat the threshold > at least score "as low as possible" -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12581) add a "min_popularity" option to relatedness() aggregation that forces scores to -Inf if fg/bg pops don't meet a threshold
[ https://issues.apache.org/jira/browse/SOLR-12581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553510#comment-16553510 ] Hoss Man commented on SOLR-12581: - bq. Sort of a side-question, but this work seems to overlap/compliment the significantTerms .. That does seem very tangential, but important to consider – so i spun it off into SOLR-12582. One key aspect of my comments there that is directly signifincat to this issue is the question of the name "min_popularity" and wether it makes sense to re-think that... {quote}as far as trying to maintain the same option names – i don't know that that is feasible or really makes sense – at least in so much as adding new options to relatedness() using hte same names as the existing options on significantTerms. notably the existing {{minDocFreq}} option on significantTerms is similar _in concept_ to the {{min_popularity}} option proposed in SOLR-12581 for relatedness(), but it would not really make sense to use {{minDocFreq}} as the option name in SOLR-12581 since the relatedness() function isn't tied to "terms" the way significantTerms is – so "docFreq" has no real meaning, and the more general "popularity" makes more sense (i suppose we could change the option name in signifncatTerms – but even then significantTerms doesn't produce the same concept of "popularity" that relatedness() does, andeven if it did because that that expression focuses exclusively on "terms" the concept of "DocFreq" is very appropriate. {quote} > add a "min_popularity" option to relatedness() aggregation that forces scores > to -Inf if fg/bg pops don't meet a threshold > -- > > Key: SOLR-12581 > URL: https://issues.apache.org/jira/browse/SOLR-12581 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Assignee: Hoss Man >Priority: Major > Attachments: SOLR-12581.patch > > > as discussed in SOLR-9480 and noted in TODO comments, the original SKG code > base had a "min_pop" option that would completely ignore "terms" if the fg/bg > popularities weren't above a user specified threshold. with the > implementation of SKG as a {{relatedness()}} aggregation function, we need to > leave any actual filtering of buckets by an aggregation result to a future > generalized JSON facet enhancement, but what we can do today is implement an > optional {{min_popularity}} option on {{relatedness()}} that forces the > {{relatedness}} score to -Infinity so buckets that don't meat the threshold > at least score "as low as possible" -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12581) add a "min_popularity" option to relatedness() aggregation that forces scores to -Inf if fg/bg pops don't meet a threshold
[ https://issues.apache.org/jira/browse/SOLR-12581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16553241#comment-16553241 ] Alexandre Rafalovitch commented on SOLR-12581: -- Sort of a side-question, but this work seems to overlap/compliment the significantTerms work done for streaming/QueryParser: [http://lucene.apache.org/solr/guide/7_4/stream-source-reference.html#significantterms] Are we saying SignificantTerms is for simpler use cases (as fore/back queries are corpus-wide) and then go into relatedness() for more complex analysis? Should the options be roughly compatible where it makes sense and/or similarly named? Just wondering because I could see this confusing newbies trying to see when to use which option. > add a "min_popularity" option to relatedness() aggregation that forces scores > to -Inf if fg/bg pops don't meet a threshold > -- > > Key: SOLR-12581 > URL: https://issues.apache.org/jira/browse/SOLR-12581 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Hoss Man >Assignee: Hoss Man >Priority: Major > Attachments: SOLR-12581.patch > > > as discussed in SOLR-9480 and noted in TODO comments, the original SKG code > base had a "min_pop" option that would completely ignore "terms" if the fg/bg > popularities weren't above a user specified threshold. with the > implementation of SKG as a {{relatedness()}} aggregation function, we need to > leave any actual filtering of buckets by an aggregation result to a future > generalized JSON facet enhancement, but what we can do today is implement an > optional {{min_popularity}} option on {{relatedness()}} that forces the > {{relatedness}} score to -Infinity so buckets that don't meat the threshold > at least score "as low as possible" -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org