Hardware related issue
Hi Guys, We were planning on using 7 physical servers for our solr node with 64 VCPUs 2GHZ and 128 GRAM each but due to some constrains we had to use virtual environment that does not has the same amount of cpus. We were suggested to use less cpus with higher ghz. Do we need to look for L1-L3 cache metrics? Do we need to increase RAM/IOPS relationally to fill the "gap" of less amount of CPU cores? Any particular suggestions? Artur Rudenko This electronic message may contain proprietary and confidential information of Verint Systems Inc., its affiliates and/or subsidiaries. The information is intended to be for the use of the individual(s) or entity(ies) named above. If you are not the intended recipient (or authorized to receive this e-mail for the intended recipient), you may not use, copy, disclose or distribute to anyone this message or any information contained in this message. If you have received this electronic message in error, please notify us by replying to this e-mail.
RE: Filtering large amount of values
Hi Mikhail, Thank you for the help, with you suggestion we actually managed to improve the results. We now get and store the docValues in this method instead of inside collect() method: @Override protected void doSetNextReader(LeafReaderContext context) throws IOException { super.doSetNextReader(context); sortedDocValues = DocValues.getSorted(context.reader(), FileFilterPostQuery.this.metaField); } We see a big improvement. Is this the most efficient way? Since it's a post filter, we have to return "false" in getCache method. Is there a way to implement it with cache? Thanks, Artur Rudenko -Original Message- From: Mikhail Khludnev Sent: Thursday, May 14, 2020 2:57 PM To: solr-user Subject: Re: Filtering large amount of values Hi, Artur. Please, don't tell me that you obtain docValues per every doc? It's deadly slow see https://issues.apache.org/jira/browse/LUCENE-9328 for related problem. Make sure you obtain them once per segment, when leaf reader is injected. Recently there are some new method(s) for {!terms} I'm wondering if any of them might solve the problem. On Thu, May 14, 2020 at 2:36 PM Rudenko, Artur wrote: > Hi, > We have a requirement of implementing a boolean filter with up to 500k > values. > > We took the approach of post filter. > > Our environment has 7 servers of 128gb ram and 64cpus each server. We > have 20-40m very large documents. Each solr instance has 64 shards > with 2 replicas and JVM memory xms and xmx set to 31GB. > > We are seeing that using single post filter with 1000 on 20m documents > takes about 4.5 seconds. > > Logic in our collect method: > numericDocValues = > reader.getNumericDocValues(FileFilterPostQuery.this.metaField); > > if (numericDocValues != null && > numericDocValues.advanceExact(docNumber)) { > longVal = numericDocValues.longValue(); > } else { > return; > } > } > > if (numericValuesSet.contains(longVal)) { > super.collect(docNumber); > } > > > Is it the best we can get? > > > Thanks, > Artur Rudenko > > > This electronic message may contain proprietary and confidential > information of Verint Systems Inc., its affiliates and/or > subsidiaries. The information is intended to be for the use of the > individual(s) or > entity(ies) named above. If you are not the intended recipient (or > authorized to receive this e-mail for the intended recipient), you may > not use, copy, disclose or distribute to anyone this message or any > information contained in this message. If you have received this > electronic message in error, please notify us by replying to this e-mail. > -- Sincerely yours Mikhail Khludnev This electronic message may contain proprietary and confidential information of Verint Systems Inc., its affiliates and/or subsidiaries. The information is intended to be for the use of the individual(s) or entity(ies) named above. If you are not the intended recipient (or authorized to receive this e-mail for the intended recipient), you may not use, copy, disclose or distribute to anyone this message or any information contained in this message. If you have received this electronic message in error, please notify us by replying to this e-mail.
Filtering large amount of values
Hi, We have a requirement of implementing a boolean filter with up to 500k values. We took the approach of post filter. Our environment has 7 servers of 128gb ram and 64cpus each server. We have 20-40m very large documents. Each solr instance has 64 shards with 2 replicas and JVM memory xms and xmx set to 31GB. We are seeing that using single post filter with 1000 on 20m documents takes about 4.5 seconds. Logic in our collect method: numericDocValues = reader.getNumericDocValues(FileFilterPostQuery.this.metaField); if (numericDocValues != null && numericDocValues.advanceExact(docNumber)) { longVal = numericDocValues.longValue(); } else { return; } } if (numericValuesSet.contains(longVal)) { super.collect(docNumber); } Is it the best we can get? Thanks, Artur Rudenko This electronic message may contain proprietary and confidential information of Verint Systems Inc., its affiliates and/or subsidiaries. The information is intended to be for the use of the individual(s) or entity(ies) named above. If you are not the intended recipient (or authorized to receive this e-mail for the intended recipient), you may not use, copy, disclose or distribute to anyone this message or any information contained in this message. If you have received this electronic message in error, please notify us by replying to this e-mail.
RE: Possible performance bug - JSON facet - numBuckets:true
Update: I started working on a fix to this issue and I found that the result for "numBuckets" in the original implementation is not accurate: Query using my fix for limit -1: { "responseHeader":{ "zkConnected":true, "status":0, "QTime":31, "params":{ "q":"*:*", "json.facet":"{\"Chart_01_Bins\":{type:terms, field:date, mincount:1, limit:-1, numBuckets:true, missing:false, refine:true }}", "rows":"0"}}, "response":{"numFound":170500,"start":0,"maxScore":1.0,"docs":[] }, "facets":{ "count":170500, "Chart_01":{ "numBuckets":2660, "buckets":[{ "val":"2019-01-16T15:17:03Z", "count":749}, { "val":"2019-01-23T21:46:44Z", "count":742}, { "val":"2019-01-04T11:06:22Z", "count":603}, { "val":"2019-01-08T01:08:58Z", "count":484}, . . . . { "val":"2019-01-26T06:30:33Z", "count":3}]}}} Query with high limit that should include all buckets, based on current solr implementation: { "responseHeader":{ "zkConnected":true, "status":0, "QTime":29, "params":{ "q":"*:*", "json.facet":"{\"Chart_01_Bins\":{type:terms, field:date, mincount:1, limit:5000, numBuckets:true, missing:false, refine:true }}", "rows":"0"}}, "response":{"numFound":170500,"start":0,"maxScore":1.0,"docs":[] }, "facets":{ "count":170500, "Chart_01_Bins":{ "numBuckets":2671, "buckets":[{ "val":"2019-01-16T15:17:03Z", "count":749}, { "val":"2019-01-23T21:46:44Z", "count":742}, { "val":"2019-01-04T11:06:22Z", "count":603}, { "val":"2019-01-08T01:08:58Z", "count":484}, . . . . "val":"2019-01-26T06:30:33Z", "count":3}]}}} There is 2660 buckets (which is the result of my fix) while the original solr implementation claims there are 2671 buckets (11 more) The result of both query were compared with comparing tool and except of QTime, different limit value and numbuckets value all were the same (I decided not to pace all the buckets response but all were the same = 2660 and not 2671). I also could not find in the docs that "numbuckets" is an estimation. For low cardinality values, the result was accurate. Is this the expected behavior? Artur Rudenko -Original Message- From: Mikhail Khludnev Sent: Tuesday, March 10, 2020 8:46 AM To: solr-user Subject: Re: Possible performance bug - JSON facet - numBuckets:true Hello, Artur. Thanks for your interest. Perhaps, we can amend doc mentioning this effect. In long term it can be optimized by adding a proper condition. Both patches are welcome. On Wed, Feb 12, 2020 at 10:48 PM Rudenko, Artur wrote: > Hello everyone, > I'm am currently investigating a performance issue in our environment > and it looks like we found a performance bug. > Our environment: > 20M large PARENT documents and 800M nested small CHILD documents. > The system inserts about 400K PARENT documents and 16M CHILD documents > per day. (Currently we stopped the calls insertion to investigate the > performance issue) This is a solr cloud 8.3 environment with 7 servers > (64 VCPU 128 GB RAM each, 24GB allocated to Solr) with single > collection (32 shards and replication factor 2). > > The below query runs in about 14-16 seconds (we have to use limit:-1 > due to a business case - cardinality is 1K values). > > fq=channel:345133 > &fq=content_type:PARENT > &fq=Meta_is_organizationIds:(344996998 344594999 34501 total > of int 562 values) > &q=*:* > &json.facet={ > "Chart_01_Bins":{ > type:terms, > field:groupIds, > mincount:1, > limit:-1, > numBuckets:true, > missing:false, >
RE: Possible performance bug - JSON facet - numBuckets:true
Guys? Artur Rudenko -Original Message- From: Rudenko, Artur Sent: Saturday, February 15, 2020 12:50 PM To: solr-user@lucene.apache.org Subject: RE: Possible performance bug - JSON facet - numBuckets:true Promoting my question Thanks, Artur Rudenko From: Rudenko, Artur Sent: Wednesday, February 12, 2020 9:48 PM To: solr-user@lucene.apache.org Subject: Possible performance bug - JSON facet - numBuckets:true Hello everyone, I'm am currently investigating a performance issue in our environment and it looks like we found a performance bug. Our environment: 20M large PARENT documents and 800M nested small CHILD documents. The system inserts about 400K PARENT documents and 16M CHILD documents per day. (Currently we stopped the calls insertion to investigate the performance issue) This is a solr cloud 8.3 environment with 7 servers (64 VCPU 128 GB RAM each, 24GB allocated to Solr) with single collection (32 shards and replication factor 2). The below query runs in about 14-16 seconds (we have to use limit:-1 due to a business case - cardinality is 1K values). fq=channel:345133 &fq=content_type:PARENT &fq=Meta_is_organizationIds:(344996998 344594999 34501 total of int 562 values) &q=*:* &json.facet={ "Chart_01_Bins":{ type:terms, field:groupIds, mincount:1, limit:-1, numBuckets:true, missing:false, refine:true, facet:{ min_score_avg:"avg(min_score)", max_score_avg:"avg(max_score)", avg_score_avg:"avg(avg_score)" } }, "Chart_01_FIELD_NOT_EXISTS":{ type:query, q:"-groupIds:[* TO *]", facet:{ min_score_avg:"avg(min_score)", max_score_avg:"avg(max_score)", avg_score_avg:"avg(avg_score)" } } } &rows=0 Also, when the facet is simplified, it takes about 4-6 seconds fq=channel:345133 &fq=content_type:PARENT &fq=Meta_is_organizationIds:(344996998 344594999 34501 total of int 562 values) &q=*:* &json.facet={ "Chart_01_Bins":{ type:terms, field:groupIds, mincount:1, limit:-1, numBuckets:true, missing:false, refine:true } } &rows=0 Schema relevant fields: I noticed that when we set numBuckets:false, the result returns faster (1.5-3.5 seconds less) - that sounds like a performance bug: The limit is -1, which means all bucks, so adding about significant time to the overall time just to get number of buckets when we will get all of them anyway doesn't seems to be right. Any thoughts? Thanks Artur Rudenko This electronic message may contain proprietary and confidential information of Verint Systems Inc., its affiliates and/or subsidiaries. The information is intended to be for the use of the individual(s) or entity(ies) named above. If you are not the intended recipient (or authorized to receive this e-mail for the intended recipient), you may not use, copy, disclose or distribute to anyone this message or any information contained in this message. If you have received this electronic message in error, please notify us by replying to this e-mail. This electronic message may contain proprietary and confidential information of Verint Systems Inc., its affiliates and/or subsidiaries. The information is intended to be for the use of the individual(s) or entity(ies) named above. If you are not the intended recipient (or authorized to receive this e-mail for the intended recipient), you may not use, copy, disclose or distribute to anyone this message or any information contained in this message. If you have received this electronic message in error, please notify us by replying to this e-mail.
RE: Slow quires and facets
Promoting my question Artur Rudenko -Original Message- From: Rudenko, Artur Sent: Wednesday, February 12, 2020 10:33 PM To: solr-user@lucene.apache.org Subject: Slow quires and facets Hello everyone, I'm am currently investigating a performance issue in our environment: 20M large PARENT documents and 800M nested small CHILD documents. The system inserts about 400K PARENT documents and 16M CHILD documents per day. (Currently we stopped the calls insertion to investigate the performance issue) This is a solr cloud 8.3 environment with 7 servers (64 VCPU 128 GB RAM each, 24GB allocated to Solr) with single collection (32 shards and replication factor 2). We experience generally slow queries (about 4-7 seconds) and facet times. The below query runs in about 14-16 seconds (we have to use limit:-1 due to a business case - cardinality is 1K values). fq=channel:345133 &fq=content_type:PARENT &fq=Meta_is_organizationIds:(344996998 344594999 34501 total of int 562 values) &q=*:* &json.facet={ "Chart_01_Bins":{ type:terms, field:groupIds, mincount:1, limit:-1, numBuckets:true, missing:false, refine:true, facet:{ min_score_avg:"avg(min_score)", max_score_avg:"avg(max_score)", avg_score_avg:"avg(avg_score)" } }, "Chart_01_FIELD_NOT_EXISTS":{ type:query, q:"-groupIds:[* TO *]", facet:{ min_score_avg:"avg(min_score)", max_score_avg:"avg(max_score)", avg_score_avg:"avg(avg_score)" } } } &rows=0 Also, when the facet is simplified, it takes about 4-6 seconds fq=channel:345133 &fq=content_type:PARENT &fq=Meta_is_organizationIds:(344996998 344594999 34501 total of int 562 values) &q=*:* &json.facet={ "Chart_01_Bins":{ type:terms, field:groupIds, mincount:1, limit:-1, numBuckets:true, missing:false, refine:true } } &rows=0 Schema relevant fields: Any suggestions how to proceed with the investigation? Right now we are trying to figure out if using single shard on each machine will help. Artur Rudenko Analytics Developer Customer Engagement Solutions, VERINT T +972.74.747.2536 | M +972.52.425.4686 This electronic message may contain proprietary and confidential information of Verint Systems Inc., its affiliates and/or subsidiaries. The information is intended to be for the use of the individual(s) or entity(ies) named above. If you are not the intended recipient (or authorized to receive this e-mail for the intended recipient), you may not use, copy, disclose or distribute to anyone this message or any information contained in this message. If you have received this electronic message in error, please notify us by replying to this e-mail. This electronic message may contain proprietary and confidential information of Verint Systems Inc., its affiliates and/or subsidiaries. The information is intended to be for the use of the individual(s) or entity(ies) named above. If you are not the intended recipient (or authorized to receive this e-mail for the intended recipient), you may not use, copy, disclose or distribute to anyone this message or any information contained in this message. If you have received this electronic message in error, please notify us by replying to this e-mail.
RE: Possible performance bug - JSON facet - numBuckets:true
Promoting my question Thanks, Artur Rudenko From: Rudenko, Artur Sent: Wednesday, February 12, 2020 9:48 PM To: solr-user@lucene.apache.org Subject: Possible performance bug - JSON facet - numBuckets:true Hello everyone, I'm am currently investigating a performance issue in our environment and it looks like we found a performance bug. Our environment: 20M large PARENT documents and 800M nested small CHILD documents. The system inserts about 400K PARENT documents and 16M CHILD documents per day. (Currently we stopped the calls insertion to investigate the performance issue) This is a solr cloud 8.3 environment with 7 servers (64 VCPU 128 GB RAM each, 24GB allocated to Solr) with single collection (32 shards and replication factor 2). The below query runs in about 14-16 seconds (we have to use limit:-1 due to a business case - cardinality is 1K values). fq=channel:345133 &fq=content_type:PARENT &fq=Meta_is_organizationIds:(344996998 344594999 34501 total of int 562 values) &q=*:* &json.facet={ "Chart_01_Bins":{ type:terms, field:groupIds, mincount:1, limit:-1, numBuckets:true, missing:false, refine:true, facet:{ min_score_avg:"avg(min_score)", max_score_avg:"avg(max_score)", avg_score_avg:"avg(avg_score)" } }, "Chart_01_FIELD_NOT_EXISTS":{ type:query, q:"-groupIds:[* TO *]", facet:{ min_score_avg:"avg(min_score)", max_score_avg:"avg(max_score)", avg_score_avg:"avg(avg_score)" } } } &rows=0 Also, when the facet is simplified, it takes about 4-6 seconds fq=channel:345133 &fq=content_type:PARENT &fq=Meta_is_organizationIds:(344996998 344594999 34501 total of int 562 values) &q=*:* &json.facet={ "Chart_01_Bins":{ type:terms, field:groupIds, mincount:1, limit:-1, numBuckets:true, missing:false, refine:true } } &rows=0 Schema relevant fields: I noticed that when we set numBuckets:false, the result returns faster (1.5-3.5 seconds less) - that sounds like a performance bug: The limit is -1, which means all bucks, so adding about significant time to the overall time just to get number of buckets when we will get all of them anyway doesn't seems to be right. Any thoughts? Thanks Artur Rudenko This electronic message may contain proprietary and confidential information of Verint Systems Inc., its affiliates and/or subsidiaries. The information is intended to be for the use of the individual(s) or entity(ies) named above. If you are not the intended recipient (or authorized to receive this e-mail for the intended recipient), you may not use, copy, disclose or distribute to anyone this message or any information contained in this message. If you have received this electronic message in error, please notify us by replying to this e-mail.
Slow quires and facets
Hello everyone, I'm am currently investigating a performance issue in our environment: 20M large PARENT documents and 800M nested small CHILD documents. The system inserts about 400K PARENT documents and 16M CHILD documents per day. (Currently we stopped the calls insertion to investigate the performance issue) This is a solr cloud 8.3 environment with 7 servers (64 VCPU 128 GB RAM each, 24GB allocated to Solr) with single collection (32 shards and replication factor 2). We experience generally slow queries (about 4-7 seconds) and facet times. The below query runs in about 14-16 seconds (we have to use limit:-1 due to a business case - cardinality is 1K values). fq=channel:345133 &fq=content_type:PARENT &fq=Meta_is_organizationIds:(344996998 344594999 34501 total of int 562 values) &q=*:* &json.facet={ "Chart_01_Bins":{ type:terms, field:groupIds, mincount:1, limit:-1, numBuckets:true, missing:false, refine:true, facet:{ min_score_avg:"avg(min_score)", max_score_avg:"avg(max_score)", avg_score_avg:"avg(avg_score)" } }, "Chart_01_FIELD_NOT_EXISTS":{ type:query, q:"-groupIds:[* TO *]", facet:{ min_score_avg:"avg(min_score)", max_score_avg:"avg(max_score)", avg_score_avg:"avg(avg_score)" } } } &rows=0 Also, when the facet is simplified, it takes about 4-6 seconds fq=channel:345133 &fq=content_type:PARENT &fq=Meta_is_organizationIds:(344996998 344594999 34501 total of int 562 values) &q=*:* &json.facet={ "Chart_01_Bins":{ type:terms, field:groupIds, mincount:1, limit:-1, numBuckets:true, missing:false, refine:true } } &rows=0 Schema relevant fields: Any suggestions how to proceed with the investigation? Right now we are trying to figure out if using single shard on each machine will help. Artur Rudenko Analytics Developer Customer Engagement Solutions, VERINT T +972.74.747.2536 | M +972.52.425.4686 This electronic message may contain proprietary and confidential information of Verint Systems Inc., its affiliates and/or subsidiaries. The information is intended to be for the use of the individual(s) or entity(ies) named above. If you are not the intended recipient (or authorized to receive this e-mail for the intended recipient), you may not use, copy, disclose or distribute to anyone this message or any information contained in this message. If you have received this electronic message in error, please notify us by replying to this e-mail.
Possible performance bug - JSON facet - numBuckets:true
Hello everyone, I'm am currently investigating a performance issue in our environment and it looks like we found a performance bug. Our environment: 20M large PARENT documents and 800M nested small CHILD documents. The system inserts about 400K PARENT documents and 16M CHILD documents per day. (Currently we stopped the calls insertion to investigate the performance issue) This is a solr cloud 8.3 environment with 7 servers (64 VCPU 128 GB RAM each, 24GB allocated to Solr) with single collection (32 shards and replication factor 2). The below query runs in about 14-16 seconds (we have to use limit:-1 due to a business case - cardinality is 1K values). fq=channel:345133 &fq=content_type:PARENT &fq=Meta_is_organizationIds:(344996998 344594999 34501 total of int 562 values) &q=*:* &json.facet={ "Chart_01_Bins":{ type:terms, field:groupIds, mincount:1, limit:-1, numBuckets:true, missing:false, refine:true, facet:{ min_score_avg:"avg(min_score)", max_score_avg:"avg(max_score)", avg_score_avg:"avg(avg_score)" } }, "Chart_01_FIELD_NOT_EXISTS":{ type:query, q:"-groupIds:[* TO *]", facet:{ min_score_avg:"avg(min_score)", max_score_avg:"avg(max_score)", avg_score_avg:"avg(avg_score)" } } } &rows=0 Also, when the facet is simplified, it takes about 4-6 seconds fq=channel:345133 &fq=content_type:PARENT &fq=Meta_is_organizationIds:(344996998 344594999 34501 total of int 562 values) &q=*:* &json.facet={ "Chart_01_Bins":{ type:terms, field:groupIds, mincount:1, limit:-1, numBuckets:true, missing:false, refine:true } } &rows=0 Schema relevant fields: I noticed that when we set numBuckets:false, the result returns faster (1.5-3.5 seconds less) - that sounds like a performance bug: The limit is -1, which means all bucks, so adding about significant time to the overall time just to get number of buckets when we will get all of them anyway doesn't seems to be right. Any thoughts? Thanks Artur Rudenko This electronic message may contain proprietary and confidential information of Verint Systems Inc., its affiliates and/or subsidiaries. The information is intended to be for the use of the individual(s) or entity(ies) named above. If you are not the intended recipient (or authorized to receive this e-mail for the intended recipient), you may not use, copy, disclose or distribute to anyone this message or any information contained in this message. If you have received this electronic message in error, please notify us by replying to this e-mail.
RE: Possible performance issue in my environment setup
Thanks for helping, I will keep investigating. Just note, we did stopped indexing and we did not saw any significant changes. Artur Rudenko Analytics Developer Customer Engagement Solutions, VERINT T +972.74.747.2536 | M +972.52.425.4686 -Original Message- From: Erick Erickson Sent: Tuesday, February 11, 2020 4:16 PM To: solr-user@lucene.apache.org Subject: Re: Possible performance issue in my environment setup My first bit of advice would be to fix your autocommit intervals. There’s not much point in having openSearcher set to true _and_ having your soft commit times also set, all soft commit does is open a searcher and your autocommit does that. I’d also reduce the time for autoCommit. You’re _probably_ being saved by the maxDoc entry, Fix here is set openSearcher=false in autoCommit, and reduce the time. And let soft commit handle opening searchers. Here’s more than you want to know about how all this works: https://lucidworks.com/post/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Given your observation that you see a new searcher being opened 65K times, my bet is that you’re somehow committing far, far too often. What’s the rate of opening new searchers? Do those 65K entries span an hour? 10 days? Either you’re sending 50K docs very frequently or your client is sending commits. So here’s what I’d do as a quick-n-dirty triage of where to look first: - first turn off indexing. Does your query performance improve? If so, consider autowarming and tuning your commit interval. - next, add &debug=timing to some of your queries. That’ll tell you if a particular _component_ is taking a long time, something like faceting say. - If nothing jumps out, throw a profiler at Solr to see where it’s spending it’s time. Best, Erick > On Feb 11, 2020, at 6:17 AM, Rudenko, Artur wrote: > > I'm am currently investigating a performance issue in our environment (20M > large PARENT documents and 800M nested small CHILD documents). The system > inserts about 400K PARENT documents and 16M CHILD documents per day. > This is a solr cloud 8.3 environment with 7 servers (64 VCPU 128 GB RAM each, > 24GB allocated to Solr) with single collection (32 shards and replication > factor 2). > > Solr config related info : > > > ${solr.autoCommit.maxTime:360} > ${solr.autoCommit.maxDocs:5} > true > > > > > ${solr.autoSoftCommit.maxTime:30} > > > I found in the solr log the following log line: > > [2020-02-10T00:01:00.522] INFO [qtp1686100174-100525] > org.apache.solr.search.SolrIndexSearcher Opening > [Searcher@37c9205b[0_shard29_replica_n112] realtime] > > From a log with 100K records, the above log record appears 65K times. > > We are experiencing extremely slow query time while the indexing time is fast > and sufficient. > > Is this a possible direction to keep investigating? If so, any advices? > > > Thanks, > Artur Rudenko > > > This electronic message may contain proprietary and confidential information > of Verint Systems Inc., its affiliates and/or subsidiaries. The information > is intended to be for the use of the individual(s) or entity(ies) named > above. If you are not the intended recipient (or authorized to receive this > e-mail for the intended recipient), you may not use, copy, disclose or > distribute to anyone this message or any information contained in this > message. If you have received this electronic message in error, please notify > us by replying to this e-mail. This electronic message may contain proprietary and confidential information of Verint Systems Inc., its affiliates and/or subsidiaries. The information is intended to be for the use of the individual(s) or entity(ies) named above. If you are not the intended recipient (or authorized to receive this e-mail for the intended recipient), you may not use, copy, disclose or distribute to anyone this message or any information contained in this message. If you have received this electronic message in error, please notify us by replying to this e-mail.
Possible performance issue in my environment setup
I'm am currently investigating a performance issue in our environment (20M large PARENT documents and 800M nested small CHILD documents). The system inserts about 400K PARENT documents and 16M CHILD documents per day. This is a solr cloud 8.3 environment with 7 servers (64 VCPU 128 GB RAM each, 24GB allocated to Solr) with single collection (32 shards and replication factor 2). Solr config related info : ${solr.autoCommit.maxTime:360} ${solr.autoCommit.maxDocs:5} true ${solr.autoSoftCommit.maxTime:30} I found in the solr log the following log line: [2020-02-10T00:01:00.522] INFO [qtp1686100174-100525] org.apache.solr.search.SolrIndexSearcher Opening [Searcher@37c9205b[0_shard29_replica_n112] realtime] >From a log with 100K records, the above log record appears 65K times. We are experiencing extremely slow query time while the indexing time is fast and sufficient. Is this a possible direction to keep investigating? If so, any advices? Thanks, Artur Rudenko This electronic message may contain proprietary and confidential information of Verint Systems Inc., its affiliates and/or subsidiaries. The information is intended to be for the use of the individual(s) or entity(ies) named above. If you are not the intended recipient (or authorized to receive this e-mail for the intended recipient), you may not use, copy, disclose or distribute to anyone this message or any information contained in this message. If you have received this electronic message in error, please notify us by replying to this e-mail.
Solr fact response strange behaviour
I'm trying to parse facet response, but sometimes the count returns as Long type and sometimes as Integer type(on different environments), The error is: "java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Long" Can you please explain why this happenes? Why it not consistent? I know the workaround to use Number class and longValue method but I want to to the root cause before using this workaround Artur Rudenko This electronic message may contain proprietary and confidential information of Verint Systems Inc., its affiliates and/or subsidiaries. The information is intended to be for the use of the individual(s) or entity(ies) named above. If you are not the intended recipient (or authorized to receive this e-mail for the intended recipient), you may not use, copy, disclose or distribute to anyone this message or any information contained in this message. If you have received this electronic message in error, please notify us by replying to this e-mail.
Type of auto suggest feature
Hi, I am quite new to solr and I am interested in implementing a sort of auto terms suggest (not auto complete) feature based on the user query. Users builds some query (on multiple fields) and I am trying to help him refining his query by suggesting to add more terms based on his current query. The suggestions should contain synonyms and different word forms (query:close , result: closed, closing) and also some other "interesting" (hard to define what interesting is) terms and phrases based on that search. The queries are perform on text field with about 1000 words on document sets of about 20-50M So far I came up with solution that uses Suggester component over the 1000 words text field (copy field) as shown below and im trying to find how to add to it more "interesting" terms and phrases based on the text field Thanks, Artur Rudenko This electronic message may contain proprietary and confidential information of Verint Systems Inc., its affiliates and/or subsidiaries. The information is intended to be for the use of the individual(s) or entity(ies) named above. If you are not the intended recipient (or authorized to receive this e-mail for the intended recipient), you may not use, copy, disclose or distribute to anyone this message or any information contained in this message. If you have received this electronic message in error, please notify us by replying to this e-mail.