subject:"Interesting Grouping\/Facet issue"

Re: Interesting Grouping/Facet issue

2019-04-09 Thread Shawn Heisey


On 4/9/2019 7:03 AM, Erie Data Systems wrote:

Solr 8.0.0, I have a HASHTAG string field I am trying to facet on to get
the most popular hashtags (top 100) across many sources. (SITE field is
string)

/select?facet.field=hashtag=on=0=%2Bhashtag:*%20%2BDT:[" .
date('Y-m-d') . "T00:00:00Z+TO+" . date('Y-m-d')  .
"T23:59:59Z]=100=1=fc

It works but not to what I feel should happen... For example if one site
has 1000 rows on todays date and they all have a HASHTAG in common, that
HASHTAG automatically rises to the top simply because one SITE has 1000
pages with the same HASHTAG.


That is exactly what faceting is designed to do.  It is behaving exactly 
as designed.



Is there a way to get a better more even distribution of top HASHTAGS for a
given date, ie facet. ..by a grouping or distinct or filter of some sort?
Im more interesting in knowing if a HASHTAG is used frequently among SITEs,
not just one one.


If you use pivot facets, first on the field you want to classify on, 
then on HASHTAG, that MIGHT get you what you want.


You could also try running many different facet queries, each one with a 
specific query and/or filter that achieves the results you want.


FYI:  Including "hashtag:*" in your query makes it a wildcard query. 
This is most likely VERY slow.  If you are trying to match all possible 
values in the hashtag field, then take it out, it's unnecessary.  If you 
are trying to match only documents where hashtag contains a value, then 
replace it with this for a performance improvement:


hashtag:[* TO *]

Range queries are almost always faster than wildcards.

Thanks,
Shawn

Interesting Grouping/Facet issue

2019-04-09 Thread Erie Data Systems

Solr 8.0.0, I have a HASHTAG string field I am trying to facet on to get
the most popular hashtags (top 100) across many sources. (SITE field is
string)

/select?facet.field=hashtag=on=0=%2Bhashtag:*%20%2BDT:[" .
date('Y-m-d') . "T00:00:00Z+TO+" . date('Y-m-d')  .
"T23:59:59Z]=100=1=fc

It works but not to what I feel should happen... For example if one site
has 1000 rows on todays date and they all have a HASHTAG in common, that
HASHTAG automatically rises to the top simply because one SITE has 1000
pages with the same HASHTAG.

Is there a way to get a better more even distribution of top HASHTAGS for a
given date, ie facet. ..by a grouping or distinct or filter of some sort?
Im more interesting in knowing if a HASHTAG is used frequently among SITEs,
not just one one.

Hope this makes sense... any recommendations welcomed.

Thank you in advance,
-Craig

Re: Interesting Grouping/Facet issue

Interesting Grouping/Facet issue

2 matches

Site Navigation

Mail list logo

Footer information