Thank you. But as I showed in my example we used refine and overrequest is not strictly needed because we need all buckets anyway. But that can hardly explain an error of 60%, right?
Op 10-nov.-2017 19:29 schreef "Amrit Sarkar" <sarkaramr...@gmail.com>: > Kenny, > > This is a known behavior in multi-sharded collection where the field values > belonging to same facet doesn't reside in same shard. Yonik Seeley has > improved the Json Facet feature by introducing "overrequest" and "refine" > parameters. > > Kindly checkout Jira: > https://issues.apache.org/jira/browse/SOLR-7452 > https://issues.apache.org/jira/browse/SOLR-9432 > > Relevant blog: https://medium.com/@abb67cbb46b/1acfa77cd90c > > On 10 Nov 2017 10:02 p.m., "kenny" <ke...@ontoforce.com> wrote: > > > Hi all, > > > > We are doing some tests in solr 6.6 with json facet api and we get > > completely wrong counts for some combination of facets > > > > Setting: We have a set of fields for 376k documents in our query (total > > 120M documents). We work with 2 shards. When doing first a faceting over > > the first facet and keeping these numbers, we subsequently do a nested > > faceting over both facets. > > > > Then we add the numbers of sub-facet and expect to get the > (approximately) > > the same numbers back. Sometimes we get rounding errors of about 1% > > difference. But on other occasions it seems to way off > > > > for example > > > > Gender (3 values) Country (211 values) > > 16226 - 18424 = -2198 (-13.5461604832%) > > 282854 - 464387 = -181533 (-64.1790464338%) > > 40489 - 47902 = -7413 (-18.3086764306%) > > 36672 - 49749 = -13077 (-35.6593586387%) > > > > Gender (3 values) Status (17 Values) > > 16226 - 16273 = -47 (-0.289658572661%) > > 282854 - 435974 = -153120 (-54.1339348215%) > > 40489 - 49925 = -9436 (-23.305095211%) > > 36672 - 54019 = -17347 (-47.3031195462%) > > > > ... > > > > These are the typical requests we submit. So note that we have refine and > > an overrequest, but we in the case of Gender vs Request we should query > all > > the buckets anyway. > > > > {"wt":"json","rows":0,"json.facet":"{\"Status_sfhll\":\"hll( > > Status_sf)\",\"Status_sf\":{\"type\":\"terms\",\"field\":\"S > > tatus_sf\",\"missing\":true,\"refine\":true,\"overrequest\": > > 50,\"limit\":50,\"offset\":0}}","q":"*:*","fq":["type:\"something\""]} > > > > {"wt":"json","rows":0,"json.facet":"{\"Gender_sf\":{\"type\" > > :\"terms\",\"field\":\"Gender_sf\",\"missing\":true,\"refine > > \":true,\"overrequest\":10,\"limit\":10,\"offset\":0,\" > > facet\":{\"Status_sf\":{\"type\":\"terms\",\"field\":\"Statu > > s_sf\",\"missing\":true,\"refine\":true,\"overrequest\":50,\ > > "limit\":50,\"offset\":0}}},\"Gender_sfhll\":\"hll(Gender_ > > sf)\"}","q":"*:*","fq":["type:\"something\""]} > > > > Is this a known bug? Would switching to old facet api resolve this? Are > > there other parameters we miss? > > > > > > Thanks > > > > > > kenny > > > > > > >