Hello,

I've encountered 2 issues while trying to apply unique()/hll() function to
a string field inside a range facet:

   1. Results are incorrect for a single-valued string field.
   2. I’m getting ArrayIndexOutOfBoundsException for a multi-valued string
   field.


How to reproduce:

   1. Create a core based on the default configSet.
   2. Add several simple documents to the core, like these:

[
  {
    "id": "14790",
    "int_i": 2010,
    "date_dt": "2010-01-01T00:00:00Z",
    "string_s": "a",
    "string_ss": ["a", "b"]
  },
  {
    "id": "12254",
    "int_i": 2014,
    "date_dt": "2014-01-01T00:00:00Z",
    "string_s": "e",
    "string_ss": ["b", "c"]
  },
  {
    "id": "12937",
    "int_i": 2008,
    "date_dt": "2008-01-01T00:00:00Z",
    "string_s": "c",
    "string_ss": ["c", "d"]
  },
  {
    "id": "10575",
    "int_i": 2008,
    "date_dt": "2008-01-01T00:00:00Z",
    "string_s": "b",
    "string_ss": ["d", "e"]
  },
  {
    "id": "13644",
    "int_i": 2014,
    "date_dt": "2014-01-01T00:00:00Z",
    "string_s": "e",
    "string_ss": ["e", "a"]
  },
  {
    "id": "8405",
    "int_i": 2014,
    "date_dt": "2014-01-01T00:00:00Z",
    "string_s": "d",
    "string_ss": ["a", "b"]
  },
  {
    "id": "6128",
    "int_i": 2008,
    "date_dt": "2008-01-01T00:00:00Z",
    "string_s": "a",
    "string_ss": ["b", "c"]
  },
  {
    "id": "5220",
    "int_i": 2015,
    "date_dt": "2015-01-01T00:00:00Z",
    "string_s": "d",
    "string_ss": ["c", "d"]
  },
  {
    "id": "6850",
    "int_i": 2012,
    "date_dt": "2012-01-01T00:00:00Z",
    "string_s": "b",
    "string_ss": ["d", "e"]
  },
  {
    "id": "5748",
    "int_i": 2014,
    "date_dt": "2014-01-01T00:00:00Z",
    "string_s": "e",
    "string_ss": ["e", "a"]
  }
]

3. Try queries like the following for a single-valued string field:

q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_s)"}}}}

q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_s)"}}}}

Distinct counts returned are incorrect in general. For example, for the set
of documents above, the response will contain:

{
    "val": 2010,
    "count": 1,
    "distinct_count": 0
}

and

"between": {
    "count": 10,
    "distinct_count": 1
}

(there should be 5 distinct values).

Note, the result depends on the order in which the documents are added.

4. Try queries like the following for a multi-valued string field:

q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_ss)"}}}}

q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_ss)"}}}}

I’m getting ArrayIndexOutOfBoundsException for such queries.

Note, everything looks Ok for other field types (I tried single- and
multi-valued ints, doubles and dates) or when the enclosing facet is a
terms facet or there is no enclosing facet at all.

I can reproduce these issues both for Solr 7.0.1 and 7.1.0. Solr 6.x and
5.x, as it seems, do not have such issues.

Is it a bug? Or, may be, I’ve missed something?

Thanks,

Volodymyr
q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_ss)"}}}}

Attachment: docs_1-10.json
Description: application/json

q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"date_dt","gap":"%2B1YEAR","missing":false,"start":"2008-01-01T00:00:00Z","end":"2016-01-01T00:00:00Z","type":"range","facet":{"distinct_count":"unique(string_s)"}}}}
q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_ss)"}}}}
q=*:*&rows=0&json={"facet":{"histogram":{"include":"lower,edge","other":"all","field":"int_i","gap":1,"missing":false,"start":2008,"end":2016,"type":"range","facet":{"distinct_count":"unique(string_s)"}}}}

Reply via email to