On 4/9/2019 7:03 AM, Erie Data Systems wrote:
Solr 8.0.0, I have a HASHTAG string field I am trying to facet on to get
the most popular hashtags (top 100) across many sources. (SITE field is
string)

/select?facet.field=hashtag&facet=on&rows=0&q=%2Bhashtag:*%20%2BDT:[" .
date('Y-m-d') . "T00:00:00Z+TO+" . date('Y-m-d')  .
"T23:59:59Z]&facet.limit=100&facet.mincount=1&facet.method=fc

It works but not to what I feel should happen... For example if one site
has 1000 rows on todays date and they all have a HASHTAG in common, that
HASHTAG automatically rises to the top simply because one SITE has 1000
pages with the same HASHTAG.

That is exactly what faceting is designed to do. It is behaving exactly as designed.

Is there a way to get a better more even distribution of top HASHTAGS for a
given date, ie facet. ..by a grouping or distinct or filter of some sort?
Im more interesting in knowing if a HASHTAG is used frequently among SITEs,
not just one one.

If you use pivot facets, first on the field you want to classify on, then on HASHTAG, that MIGHT get you what you want.

You could also try running many different facet queries, each one with a specific query and/or filter that achieves the results you want.

FYI: Including "hashtag:*" in your query makes it a wildcard query. This is most likely VERY slow. If you are trying to match all possible values in the hashtag field, then take it out, it's unnecessary. If you are trying to match only documents where hashtag contains a value, then replace it with this for a performance improvement:

hashtag:[* TO *]

Range queries are almost always faster than wildcards.

Thanks,
Shawn

Reply via email to