Hello, Artur. Thanks for your interest. Perhaps, we can amend doc mentioning this effect. In long term it can be optimized by adding a proper condition. Both patches are welcome.
On Wed, Feb 12, 2020 at 10:48 PM Rudenko, Artur <artur.rude...@verint.com> wrote: > Hello everyone, > I'm am currently investigating a performance issue in our environment and > it looks like we found a performance bug. > Our environment: > 20M large PARENT documents and 800M nested small CHILD documents. > The system inserts about 400K PARENT documents and 16M CHILD documents per > day. (Currently we stopped the calls insertion to investigate the > performance issue) > This is a solr cloud 8.3 environment with 7 servers (64 VCPU 128 GB RAM > each, 24GB allocated to Solr) with single collection (32 shards and > replication factor 2). > > The below query runs in about 14-16 seconds (we have to use limit:-1 due > to a business case - cardinality is 1K values). > > fq=channel:345133 > &fq=content_type:PARENT > &fq=Meta_is_organizationIds:(344996998 344594999 345000001.... total of > int 562 values) > &q=*:* > &json.facet={ > "Chart_01_Bins":{ > type:terms, > field:groupIds, > mincount:1, > limit:-1, > numBuckets:true, > missing:false, > refine:true, > facet:{ > > min_score_avg:"avg(min_score)", > > max_score_avg:"avg(max_score)", > > avg_score_avg:"avg(avg_score)" > } > }, > "Chart_01_FIELD_NOT_EXISTS":{ > type:query, > q:"-groupIds:[* TO *]", > facet:{ > > min_score_avg:"avg(min_score)", > > max_score_avg:"avg(max_score)", > > avg_score_avg:"avg(avg_score)" > } > } > } > &rows=0 > > Also, when the facet is simplified, it takes about 4-6 seconds > > fq=channel:345133 > &fq=content_type:PARENT > &fq=Meta_is_organizationIds:(344996998 344594999 345000001.... total of > int 562 values) > &q=*:* > &json.facet={ > "Chart_01_Bins":{ > type:terms, > field:groupIds, > mincount:1, > limit:-1, > numBuckets:true, > missing:false, > refine:true > } > } > &rows=0 > > Schema relevant fields: > > <fieldType name="pfloat" class="solr.FloatPointField" docValues="true"/> > <fieldType name="pint" class="solr.IntPointField" docValues="true"/> > > <!-- Currently only 1 value, in the future we expect to have about 25 > different values --> > <field name="channel" type="string" indexed="true" stored="true" > required="true" multiValued="false" /> > > <!-- 2 Possible values (PARENT\CHILD) --> > <field name="content_type" type="string" indexed="true" stored="true" > required="true" multiValued="false" /> > > <!-- Cardinality of 1K values, document may have 0 to all possible values > --> > <field name="groupIds" type="pint" indexed="true" stored="true" > required="false" multiValued="true" /> > > <!-- Float value between -2 to 2, all documents have this field (applied > for the below 3 fields) --> > <field name="min_score" type="pfloat" indexed="true" stored="true" > required="false" multiValued="false" /> > <field name="avg_score" type="pfloat" indexed="true" stored="true" > required="false" multiValued="false" /> > <field name="max_score" type="pfloat" indexed="true" stored="true" > required="false" multiValued="false" /> > > <!-- Cardinality with about few thousands values, currently only 1 dynamic > field exists with this prefix, document may have 1 to all possible values > --> > <dynamicField name="Meta_is_*" type="pint" indexed="true" stored="true" > multiValued="true" /> > > > > I noticed that when we set numBuckets:false, the result returns faster > (1.5-3.5 seconds less) - that sounds like a performance bug: > The limit is -1, which means all bucks, so adding about significant time > to the overall time just to get number of buckets when we will get all of > them anyway doesn't seems to be right. > > Any thoughts? > > > Thanks > Artur Rudenko > > > This electronic message may contain proprietary and confidential > information of Verint Systems Inc., its affiliates and/or subsidiaries. The > information is intended to be for the use of the individual(s) or > entity(ies) named above. If you are not the intended recipient (or > authorized to receive this e-mail for the intended recipient), you may not > use, copy, disclose or distribute to anyone this message or any information > contained in this message. If you have received this electronic message in > error, please notify us by replying to this e-mail. > -- Sincerely yours Mikhail Khludnev