Hello, Artur.

Thanks for your interest.
Perhaps, we can amend doc mentioning this effect. In long term it can be
optimized by adding a proper condition. Both patches are welcome.

On Wed, Feb 12, 2020 at 10:48 PM Rudenko, Artur <artur.rude...@verint.com>
wrote:

> Hello everyone,
> I'm am currently investigating a performance issue in our environment and
> it looks like we found a performance bug.
> Our environment:
> 20M large PARENT documents and 800M nested small CHILD documents.
> The system inserts about 400K PARENT documents and 16M CHILD documents per
> day. (Currently we stopped the calls insertion to investigate the
> performance issue)
> This is a solr cloud 8.3 environment with 7 servers (64 VCPU 128 GB RAM
> each, 24GB allocated to Solr) with single collection (32 shards and
> replication factor 2).
>
> The below query runs in about 14-16 seconds (we have to use limit:-1 due
> to a business case - cardinality is 1K values).
>
> fq=channel:345133
> &fq=content_type:PARENT
> &fq=Meta_is_organizationIds:(344996998 344594999 345000001.... total of
> int 562 values)
> &q=*:*
> &json.facet={
>                 "Chart_01_Bins":{
>                                                 type:terms,
>                                                 field:groupIds,
>                                                 mincount:1,
>                                                 limit:-1,
>                                                 numBuckets:true,
>                                                 missing:false,
>                                                 refine:true,
>                                                 facet:{
>
> min_score_avg:"avg(min_score)",
>
> max_score_avg:"avg(max_score)",
>
> avg_score_avg:"avg(avg_score)"
>                                                 }
>                 },
>                 "Chart_01_FIELD_NOT_EXISTS":{
>                                 type:query,
>                                 q:"-groupIds:[* TO *]",
>                                 facet:{
>
> min_score_avg:"avg(min_score)",
>
> max_score_avg:"avg(max_score)",
>
> avg_score_avg:"avg(avg_score)"
>                                 }
>                 }
> }
> &rows=0
>
> Also, when the facet is simplified, it takes about 4-6 seconds
>
> fq=channel:345133
> &fq=content_type:PARENT
> &fq=Meta_is_organizationIds:(344996998 344594999 345000001.... total of
> int 562 values)
> &q=*:*
> &json.facet={
>                 "Chart_01_Bins":{
>                                 type:terms,
>                                 field:groupIds,
>                                 mincount:1,
>                                 limit:-1,
>                                 numBuckets:true,
>                                 missing:false,
>                                 refine:true
>                 }
> }
> &rows=0
>
> Schema relevant fields:
>
> <fieldType name="pfloat" class="solr.FloatPointField" docValues="true"/>
> <fieldType name="pint" class="solr.IntPointField" docValues="true"/>
>
> <!-- Currently only 1 value, in the future we expect to have about 25
> different values -->
> <field name="channel" type="string" indexed="true" stored="true"
> required="true" multiValued="false" />
>
> <!-- 2 Possible values (PARENT\CHILD) -->
> <field name="content_type" type="string" indexed="true" stored="true"
> required="true" multiValued="false" />
>
> <!-- Cardinality of 1K values, document may have 0 to all possible values
> -->
> <field name="groupIds" type="pint" indexed="true" stored="true"
> required="false" multiValued="true" />
>
> <!-- Float value between -2 to 2, all documents have this field (applied
> for the below 3 fields) -->
> <field name="min_score" type="pfloat" indexed="true" stored="true"
> required="false" multiValued="false" />
> <field name="avg_score" type="pfloat" indexed="true" stored="true"
> required="false" multiValued="false" />
> <field name="max_score" type="pfloat" indexed="true" stored="true"
> required="false" multiValued="false" />
>
> <!-- Cardinality with about few thousands values, currently only 1 dynamic
> field exists with this prefix, document may have 1 to all possible values
> -->
> <dynamicField name="Meta_is_*" type="pint" indexed="true" stored="true"
> multiValued="true" />
>
>
>
> I noticed that when we set numBuckets:false, the result returns faster
> (1.5-3.5 seconds less) - that sounds like a performance bug:
> The limit is -1, which means all bucks, so adding about significant time
> to the overall time just to get number of buckets when we will get all of
> them anyway doesn't seems to be right.
>
> Any thoughts?
>
>
> Thanks
> Artur Rudenko
>
>
> This electronic message may contain proprietary and confidential
> information of Verint Systems Inc., its affiliates and/or subsidiaries. The
> information is intended to be for the use of the individual(s) or
> entity(ies) named above. If you are not the intended recipient (or
> authorized to receive this e-mail for the intended recipient), you may not
> use, copy, disclose or distribute to anyone this message or any information
> contained in this message. If you have received this electronic message in
> error, please notify us by replying to this e-mail.
>


-- 
Sincerely yours
Mikhail Khludnev

Reply via email to