Hello, I've encountered 2 issues while trying to apply unique()/hll() function to a string field inside a range facet:
1. Results are incorrect for a single-valued string field. 2. I’m getting ArrayIndexOutOfBoundsException for a multi-valued string field. How to reproduce: 1. Create a core based on the default configSet. 2. Add several simple documents to the core, like these: [ { "id": "14790", "int_i": 2010, "date_dt": "2010-01-01T00:00:00Z", "string_s": "a", "string_ss": ["a", "b"] }, { "id": "12254", "int_i": 2014, "date_dt": "2014-01-01T00:00:00Z", "string_s": "e", "string_ss": ["b", "c"] }, { "id": "12937", "int_i": 2008, "date_dt": "2008-01-01T00:00:00Z", "string_s": "c", "string_ss": ["c", "d"] }, { "id": "10575", "int_i": 2008, "date_dt": "2008-01-01T00:00:00Z", "string_s": "b", "string_ss": ["d", "e"] }, { "id": "13644", "int_i": 2014, "date_dt": "2014-01-01T00:00:00Z", "string_s": "e", "string_ss": ["e", "a"] }, { "id": "8405", "int_i": 2014, "date_dt": "2014-01-01T00:00:00Z", "string_s": "d", "string_ss": ["a", "b"] }, { "id": "6128", "int_i": 2008, "date_dt": "2008-01-01T00:00:00Z", "string_s": "a", "string_ss": ["b", "c"] }, { "id": "5220", "int_i": 2015, "date_dt": "2015-01-01T00:00:00Z", "string_s": "d", "string_ss": ["c", "d"] }, { "id": "6850", "int_i": 2012, "date_dt": "2012-01-01T00:00:00Z", "string_s": "b", "string_ss": ["d", "e"] }, { "id": "5748", "int_i": 2014, "date_dt": "2014-01-01T00:00:00Z", "string_s": "e", "string_ss": ["e", "a"] } ] 3. Try queries like the following for a single-valued string field: q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_s)"}}}} q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_s)"}}}} Distinct counts returned are incorrect in general. For example, for the set of documents above, the response will contain: { "val": 2010, "count": 1, "distinct_count": 0 } and "between": { "count": 10, "distinct_count": 1 } (there should be 5 distinct values). Note, the result depends on the order in which the documents are added. 4. Try queries like the following for a multi-valued string field: q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_ss)"}}}} q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_ss)"}}}} I’m getting ArrayIndexOutOfBoundsException for such queries. Note, everything looks Ok for other field types (I tried single- and multi-valued ints, doubles and dates) or when the enclosing facet is a terms facet or there is no enclosing facet at all. I can reproduce these issues both for Solr 7.0.1 and 7.1.0. Solr 6.x and 5.x, as it seems, do not have such issues. Is it a bug? Or, may be, I’ve missed something? Thanks, Volodymyr
q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_ss)"}}}}
docs_1-10.json
Description: application/json
q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_s)"}}}}
q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_ss)"}}}}
q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_s)"}}}}