ElevateIds - should I remove those that might be filtered off in the underlying query
Hi, Suppose I have say 50 ElevateIds and I have a way to identify those that would get filtered out in the query by predefined fqs. So they would in reality never be even in the results and hence never be elevated. Is there any advantage if I avoid passing them in the elevateIds at the time of creating the elevateIds, thinking I can gain performance or they remaining in the elevateIds does not cause any performance difference? Thanks! Mark
Re: Need help in understanding the below error message when running solr-exporter
Can someone help on the above pls?? On Sat, Oct 17, 2020 at 6:22 AM yaswanth kumar wrote: > Using Solr 8.2; Zoo 3.4; Solr mode: Cloud with multiple collections; Basic > Authentication: Enabled > > I am trying to run the > > export JAVA_OPTS="-Djavax.net.ssl.trustStore=etc/solr-keystore.jks > -Djavax.net.ssl.trustStorePassword=solrssl > -Dsolr.httpclient.builder.factory=org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory > -Dbasicauth=solrrocks:" > > export > CLASSPATH_PREFIX="../../server/solr-webapp/webapp/WEB-INF/lib/commons-codec-1.11.jar" > > /bin/solr-exporter -p 8085 -z localhost:2181/solr -f > ./conf/solr-exporter-config.xml -n 16 > > and seeing these below messages and on the grafana solr dashboard I do see > panels coming in but data is not populating on them. > > Can someone help me if I am missing something interms of configuration? > > WARN - 2020-10-17 11:17:59.687; org.apache.solr.prometheus.scraper.Async; > Error occurred during metrics collection => > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > java.lang.NullPointerException > at > java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > java.util.concurrent.ExecutionException: java.lang.RuntimeException: > java.lang.NullPointerException > at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395) > ~[?:?] > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999) > ~[?:?] > at > org.apache.solr.prometheus.scraper.Async.lambda$null$1(Async.java:45) > [solr-prometheus-exporter-8.2.0.jar:8.2.0 > 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57] > at > org.apache.solr.prometheus.scraper.Async$$Lambda$190/.accept(Unknown > Source) [solr-prometheus-exporter-8.2.0.jar:8.2.0 > 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57] > at > java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183) > [?:?] > at > java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177) > [?:?] > at > java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1654) > [?:?] > at > java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:497) [?:?] > at > java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:487) > [?:?] > at > java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) > [?:?] > at > java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) > [?:?] > at > java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:239) [?:?] > at > java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) [?:?] > at > org.apache.solr.prometheus.scraper.Async.lambda$waitForAllSuccessfulResponses$3(Async.java:43) > [solr-prometheus-exporter-8.2.0.jar:8.2.0 > 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57] > at > org.apache.solr.prometheus.scraper.Async$$Lambda$165/.apply(Unknown > Source) [solr-prometheus-exporter-8.2.0.jar:8.2.0 > 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57] > at > java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986) > [?:?] > at > java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970) > [?:?] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506) > [?:?] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1705) > [?:?] > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) > [solr-solrj-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - > ivera - 2019-07-19 15:11:07] > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$142/.run(Unknown > Source) [solr-solrj-8.2.0.jar:8.2.0 > 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:11:07] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > [?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > [?:?] > at java.lang.Thread.run(Thread.java:834) [?:?] > Caused by: java.lang.RuntimeException: java.lang.NullPointerException > at > org.apache.solr.prometheus.collector.SchedulerMetricsCollector.lambda$collectMetrics$0(SchedulerMetricsCollector.java:92) > ~[solr-prometheus-exporter-8.2.0.jar:8.2.0 > 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57] > at > org.apache.solr.prometheus.collector.SchedulerMetricsCollector$$Lambda$163/.get(Unknown > Source) ~[?:?] > at > java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700) >
s3 or other cloud hosted storage options?
Hi all, Hopefully someone can provide insight. We are looking to see if there are any viable options for S3 or similar for index/data storage. Preferably (if possible) shared between nodes for dynamic scalability needs. -Mike/NewsRx
security.json help
Hey, I'm new to configuring Solr. I'm trying to configure Solr with Rule Based Authorization. https://lucene.apache.org/solr/guide/8_6/rule-based-authorization-plugin.html I have permissions working if I allow everything with "all", but I want to limit access so that a site can only access its own collection, in addition to a server ping path, so I'm trying to add the collection-specific permission at the top: "permissions": [ { "name": "custom-example", "collection": "example", "path": "*", "role": [ "admin", "example" ] }, { "name": "custom-collection", "collection": "*", "path": [ "/admin/luke", "/admin/mbeans", "/admin/system" ], "role": "*" }, { "name": "custom-ping", "collection": null, "path": [ "/admin/info/system" ], "role": "*" }, { "name": "all", "role": "admin" } ] The rule "custom-ping" works, and "all" works. But when the above permissions are used, access is denied to the "example" user-role for collection "example" at the path "/solr/example/select". If I specify paths explicitly, the permissions work, but I can't get permissions to work with path wildcards for a specific collection. I also had to declare "custom-collection" with the specific paths needed to get collection info in order for those paths to work. I would've expected that these paths would be included in the collection-specific paths and be covered by the first rule, but they aren't. For example, the call to "/solr/example/admin/luke" will fail if the path is removed from this rule. I don't really want to specify every single path I might need to use. Am I using the path wildcard wrong somehow? Is there a better way to do collection-specific authorizations for a collection "example"? Thanks. - M
Re: Faceting on indexed=false stored=false docValues=true fields
Sorry, correction, taking "the" time On Mon, 19 Oct 2020 22:18:30 +0300 uyilmaz wrote: > Thanks for taking time to write a detailed answer. > > We use Solr to both store our data and to perform aggregations, using > faceting or streaming expressions. When required analysis is too complex to > do in Solr, we export large query results from Solr to a more capable > analysis tool. > > So I guess all fields need to be docValues="true", because export handler and > streaming both require fields to have docValues, and even if I won't use a > field in queries or facets, it should be in available to read in result set. > Fields that won't be searched or faceted can be (indexed=false stored=false > docValues=true) right? > > --uyilmaz > > > On Mon, 19 Oct 2020 14:14:27 -0400 > Michael Gibney wrote: > > > As you've observed, it is indeed possible to facet on fields with > > docValues=true, indexed=false; but in almost all cases you should > > probably set indexed=true. 1. for distributed facet count refinement, > > the "indexed" approach is used to look up counts by value; 2. assuming > > you're wanting to do something usual, e.g. allow users to apply > > filters based on facet counts, the filter application would use the > > "indexed" approach as well. Where indexed=false, if either filtering > > or distributed refinement is attempted, I'm not 100% sure what > > happens. It might fail, or lead to inconsistent results, or attempt to > > look up results via the equivalent of a "table scan" over docValues (I > > think the last of these is what actually happens, fwiw) ... but none > > of these options is likely desirable. > > > > Michael > > > > On Mon, Oct 19, 2020 at 1:42 PM uyilmaz wrote: > > > > > > Thanks! This also contributed to my confusion: > > > > > > https://lucene.apache.org/solr/guide/8_4/faceting.html#field-value-faceting-parameters > > > > > > "If you want Solr to perform both analysis (for searching) and faceting > > > on the full literal strings, use the copyField directive in your Schema > > > to create two versions of the field: one Text and one String. Make sure > > > both are indexed="true"." > > > > > > On Mon, 19 Oct 2020 13:08:00 -0400 > > > Alexandre Rafalovitch wrote: > > > > > > > I think this is all explained quite well in the Ref Guide: > > > > https://lucene.apache.org/solr/guide/8_6/docvalues.html > > > > > > > > DocValues is a different way to index/store values. Faceting is a > > > > primary use case where docValues are better than what 'indexed=true' > > > > gives you. > > > > > > > > Regards, > > > >Alex. > > > > > > > > On Mon, 19 Oct 2020 at 12:51, uyilmaz > > > > wrote: > > > > > > > > > > > > > > > Hey all, > > > > > > > > > > From my little experiments, I see that (if I didn't make a stupid > > > > > mistake) we can facet on fields marked as both indexed and stored > > > > > being false: > > > > > > > > > > > > > > indexed="false" stored="false" docValues="true"/> > > > > > > > > > > I'm suprised by this, I thought I would need to index it. Can you > > > > > confirm this? > > > > > > > > > > Regards > > > > > > > > > > -- > > > > > uyilmaz > > > > > > > > > -- > > > uyilmaz > > > -- > uyilmaz -- uyilmaz
Re: Faceting on indexed=false stored=false docValues=true fields
Thanks for taking time to write a detailed answer. We use Solr to both store our data and to perform aggregations, using faceting or streaming expressions. When required analysis is too complex to do in Solr, we export large query results from Solr to a more capable analysis tool. So I guess all fields need to be docValues="true", because export handler and streaming both require fields to have docValues, and even if I won't use a field in queries or facets, it should be in available to read in result set. Fields that won't be searched or faceted can be (indexed=false stored=false docValues=true) right? --uyilmaz On Mon, 19 Oct 2020 14:14:27 -0400 Michael Gibney wrote: > As you've observed, it is indeed possible to facet on fields with > docValues=true, indexed=false; but in almost all cases you should > probably set indexed=true. 1. for distributed facet count refinement, > the "indexed" approach is used to look up counts by value; 2. assuming > you're wanting to do something usual, e.g. allow users to apply > filters based on facet counts, the filter application would use the > "indexed" approach as well. Where indexed=false, if either filtering > or distributed refinement is attempted, I'm not 100% sure what > happens. It might fail, or lead to inconsistent results, or attempt to > look up results via the equivalent of a "table scan" over docValues (I > think the last of these is what actually happens, fwiw) ... but none > of these options is likely desirable. > > Michael > > On Mon, Oct 19, 2020 at 1:42 PM uyilmaz wrote: > > > > Thanks! This also contributed to my confusion: > > > > https://lucene.apache.org/solr/guide/8_4/faceting.html#field-value-faceting-parameters > > > > "If you want Solr to perform both analysis (for searching) and faceting on > > the full literal strings, use the copyField directive in your Schema to > > create two versions of the field: one Text and one String. Make sure both > > are indexed="true"." > > > > On Mon, 19 Oct 2020 13:08:00 -0400 > > Alexandre Rafalovitch wrote: > > > > > I think this is all explained quite well in the Ref Guide: > > > https://lucene.apache.org/solr/guide/8_6/docvalues.html > > > > > > DocValues is a different way to index/store values. Faceting is a > > > primary use case where docValues are better than what 'indexed=true' > > > gives you. > > > > > > Regards, > > >Alex. > > > > > > On Mon, 19 Oct 2020 at 12:51, uyilmaz wrote: > > > > > > > > > > > > Hey all, > > > > > > > > From my little experiments, I see that (if I didn't make a stupid > > > > mistake) we can facet on fields marked as both indexed and stored being > > > > false: > > > > > > > > > > > indexed="false" stored="false" docValues="true"/> > > > > > > > > I'm suprised by this, I thought I would need to index it. Can you > > > > confirm this? > > > > > > > > Regards > > > > > > > > -- > > > > uyilmaz > > > > > > -- > > uyilmaz -- uyilmaz
Re: Faceting on indexed=false stored=false docValues=true fields
Hmm. Fields used for faceting will also be used for filtering, which is a kind of search. Are docValues OK for filtering? I expect they might be slow the first time, then cached. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 19, 2020, at 11:15 AM, Erick Erickson wrote: > > uyilmaz: > > Hmm, that _is_ confusing. And inaccurate. > > In this context, it should read something like > > The Text field should have indexed="true" docValues=“false" if used for > searching > but not faceting and the String field should have indexed="false" > docValues=“true" > if used for faceting but not searching. > > I’ll fix this, thanks for pointing this out. > > Erick > >> On Oct 19, 2020, at 1:42 PM, uyilmaz wrote: >> >> Thanks! This also contributed to my confusion: >> >> https://lucene.apache.org/solr/guide/8_4/faceting.html#field-value-faceting-parameters >> >> "If you want Solr to perform both analysis (for searching) and faceting on >> the full literal strings, use the copyField directive in your Schema to >> create two versions of the field: one Text and one String. Make sure both >> are indexed="true"." >> >> On Mon, 19 Oct 2020 13:08:00 -0400 >> Alexandre Rafalovitch wrote: >> >>> I think this is all explained quite well in the Ref Guide: >>> https://lucene.apache.org/solr/guide/8_6/docvalues.html >>> >>> DocValues is a different way to index/store values. Faceting is a >>> primary use case where docValues are better than what 'indexed=true' >>> gives you. >>> >>> Regards, >>> Alex. >>> >>> On Mon, 19 Oct 2020 at 12:51, uyilmaz wrote: Hey all, From my little experiments, I see that (if I didn't make a stupid mistake) we can facet on fields marked as both indexed and stored being false: >>> stored="false" docValues="true"/> I'm suprised by this, I thought I would need to index it. Can you confirm this? Regards -- uyilmaz >> >> >> -- >> uyilmaz >
Re: Faceting on indexed=false stored=false docValues=true fields
uyilmaz: Hmm, that _is_ confusing. And inaccurate. In this context, it should read something like The Text field should have indexed="true" docValues=“false" if used for searching but not faceting and the String field should have indexed="false" docValues=“true" if used for faceting but not searching. I’ll fix this, thanks for pointing this out. Erick > On Oct 19, 2020, at 1:42 PM, uyilmaz wrote: > > Thanks! This also contributed to my confusion: > > https://lucene.apache.org/solr/guide/8_4/faceting.html#field-value-faceting-parameters > > "If you want Solr to perform both analysis (for searching) and faceting on > the full literal strings, use the copyField directive in your Schema to > create two versions of the field: one Text and one String. Make sure both are > indexed="true"." > > On Mon, 19 Oct 2020 13:08:00 -0400 > Alexandre Rafalovitch wrote: > >> I think this is all explained quite well in the Ref Guide: >> https://lucene.apache.org/solr/guide/8_6/docvalues.html >> >> DocValues is a different way to index/store values. Faceting is a >> primary use case where docValues are better than what 'indexed=true' >> gives you. >> >> Regards, >> Alex. >> >> On Mon, 19 Oct 2020 at 12:51, uyilmaz wrote: >>> >>> >>> Hey all, >>> >>> From my little experiments, I see that (if I didn't make a stupid mistake) >>> we can facet on fields marked as both indexed and stored being false: >>> >>> >> stored="false" docValues="true"/> >>> >>> I'm suprised by this, I thought I would need to index it. Can you confirm >>> this? >>> >>> Regards >>> >>> -- >>> uyilmaz > > > -- > uyilmaz
Re: Faceting on indexed=false stored=false docValues=true fields
As you've observed, it is indeed possible to facet on fields with docValues=true, indexed=false; but in almost all cases you should probably set indexed=true. 1. for distributed facet count refinement, the "indexed" approach is used to look up counts by value; 2. assuming you're wanting to do something usual, e.g. allow users to apply filters based on facet counts, the filter application would use the "indexed" approach as well. Where indexed=false, if either filtering or distributed refinement is attempted, I'm not 100% sure what happens. It might fail, or lead to inconsistent results, or attempt to look up results via the equivalent of a "table scan" over docValues (I think the last of these is what actually happens, fwiw) ... but none of these options is likely desirable. Michael On Mon, Oct 19, 2020 at 1:42 PM uyilmaz wrote: > > Thanks! This also contributed to my confusion: > > https://lucene.apache.org/solr/guide/8_4/faceting.html#field-value-faceting-parameters > > "If you want Solr to perform both analysis (for searching) and faceting on > the full literal strings, use the copyField directive in your Schema to > create two versions of the field: one Text and one String. Make sure both are > indexed="true"." > > On Mon, 19 Oct 2020 13:08:00 -0400 > Alexandre Rafalovitch wrote: > > > I think this is all explained quite well in the Ref Guide: > > https://lucene.apache.org/solr/guide/8_6/docvalues.html > > > > DocValues is a different way to index/store values. Faceting is a > > primary use case where docValues are better than what 'indexed=true' > > gives you. > > > > Regards, > >Alex. > > > > On Mon, 19 Oct 2020 at 12:51, uyilmaz wrote: > > > > > > > > > Hey all, > > > > > > From my little experiments, I see that (if I didn't make a stupid > > > mistake) we can facet on fields marked as both indexed and stored being > > > false: > > > > > > > > stored="false" docValues="true"/> > > > > > > I'm suprised by this, I thought I would need to index it. Can you confirm > > > this? > > > > > > Regards > > > > > > -- > > > uyilmaz > > > -- > uyilmaz
Re: Faceting on indexed=false stored=false docValues=true fields
Thanks! This also contributed to my confusion: https://lucene.apache.org/solr/guide/8_4/faceting.html#field-value-faceting-parameters "If you want Solr to perform both analysis (for searching) and faceting on the full literal strings, use the copyField directive in your Schema to create two versions of the field: one Text and one String. Make sure both are indexed="true"." On Mon, 19 Oct 2020 13:08:00 -0400 Alexandre Rafalovitch wrote: > I think this is all explained quite well in the Ref Guide: > https://lucene.apache.org/solr/guide/8_6/docvalues.html > > DocValues is a different way to index/store values. Faceting is a > primary use case where docValues are better than what 'indexed=true' > gives you. > > Regards, >Alex. > > On Mon, 19 Oct 2020 at 12:51, uyilmaz wrote: > > > > > > Hey all, > > > > From my little experiments, I see that (if I didn't make a stupid mistake) > > we can facet on fields marked as both indexed and stored being false: > > > > > stored="false" docValues="true"/> > > > > I'm suprised by this, I thought I would need to index it. Can you confirm > > this? > > > > Regards > > > > -- > > uyilmaz -- uyilmaz
Re: Faceting on indexed=false stored=false docValues=true fields
I think this is all explained quite well in the Ref Guide: https://lucene.apache.org/solr/guide/8_6/docvalues.html DocValues is a different way to index/store values. Faceting is a primary use case where docValues are better than what 'indexed=true' gives you. Regards, Alex. On Mon, 19 Oct 2020 at 12:51, uyilmaz wrote: > > > Hey all, > > From my little experiments, I see that (if I didn't make a stupid mistake) we > can facet on fields marked as both indexed and stored being false: > > stored="false" docValues="true"/> > > I'm suprised by this, I thought I would need to index it. Can you confirm > this? > > Regards > > -- > uyilmaz
Faceting on indexed=false stored=false docValues=true fields
Hey all, >From my little experiments, I see that (if I didn't make a stupid mistake) we >can facet on fields marked as both indexed and stored being false: I'm suprised by this, I thought I would need to index it. Can you confirm this? Regards -- uyilmaz
Re: SolrCloud 6.6.2 suddenly crash due to slow queries and Log4j issue
Shawn, According to the log4j description ( https://bz.apache.org/bugzilla/show_bug.cgi?id=57714), the issue is related to lock during appenders collection process. In addition to CONSOLE and file appenders in the default log4j.properties, my customer added 2 extra FileAppender dedicated to all requests and slow requests. I suggested removing these two extra appenders. Regards Dominique Le lun. 19 oct. 2020 à 15:48, Dominique Bejean a écrit : > Hi Shawn, > > Thank you for your response. > > You are confirming my diagnosis. > > This is in fact a 8 nodes cluster with one single collection with 4 shards > and 1 replica (8 cores). > > 4 Gb heap and 90 Gb Ram > > > When no issue occurs nearly 50% of the heap is used. > > Num Docs in collection : 10.000.000 > > Num Docs per core is more or less 2.500.000 > > Max Doc per core is more or less 3.000.000 > > Core Data size is more or less 70 Gb > > Here are the JVM settings > > -DSTOP.KEY=solrrocks > > -DSTOP.PORT=7983 > > -Dcom.sun.management.jmxremote > > -Dcom.sun.management.jmxremote.authenticate=false > > -Dcom.sun.management.jmxremote.local.only=false > > -Dcom.sun.management.jmxremote.port=18983 > > -Dcom.sun.management.jmxremote.rmi.port=18983 > > -Dcom.sun.management.jmxremote.ssl=false > > -Dhost= > > -Djava.rmi.server.hostname=XXX > > -Djetty.home=/x/server > > -Djetty.port=8983 > > -Dlog4j.configuration=file:/xx/log4j.properties > > -Dsolr.install.dir=/xx/solr > > -Dsolr.jetty.request.header.size=32768 > > -Dsolr.log.dir=/xxx/Logs > > -Dsolr.log.muteconsole > > -Dsolr.solr.home=//data > > -Duser.timezone=Europe/Paris > > -DzkClientTimeout=3 > > -DzkHost=xxx > > -XX:+CMSParallelRemarkEnabled > > -XX:+CMSScavengeBeforeRemark > > -XX:+ParallelRefProcEnabled > > -XX:+PrintGCApplicationStoppedTime > > -XX:+PrintGCDateStamps > > -XX:+PrintGCDetails > > -XX:+PrintGCTimeStamps > > -XX:+PrintHeapAtGC > > -XX:+PrintTenuringDistribution > > -XX:+UseCMSInitiatingOccupancyOnly > > -XX:+UseConcMarkSweepGC > > -XX:+UseGCLogFileRotation > > -XX:+UseGCLogFileRotation > > -XX:+UseParNewGC > > -XX:-OmitStackTraceInFastThrow > > -XX:CMSInitiatingOccupancyFraction=50 > > -XX:CMSMaxAbortablePrecleanTime=6000 > > -XX:ConcGCThreads=4 > > -XX:GCLogFileSize=20M > > -XX:MaxTenuringThreshold=8 > > -XX:NewRatio=3 > > -XX:NumberOfGCLogFiles=9 > > -XX:OnOutOfMemoryError=/xxx/solr/bin/oom_solr.sh > > 8983 > > /xx/Logs > > -XX:ParallelGCThreads=4 > > -XX:PretenureSizeThreshold=64m > > -XX:SurvivorRatio=4 > > -XX:TargetSurvivorRatio=90 > > -Xloggc:/xx/solr_gc.log > > -Xloggc:/xx/solr_gc.log > > -Xms4g > > -Xmx4g > > -Xss256k > > -verbose:gc > > > > Here is one screenshot of top command for the node that failed last week. > > [image: 2020-10-19 15_48_06-Photos.png] > > Regards > > Dominique > > > > Le dim. 18 oct. 2020 à 22:03, Shawn Heisey a écrit : > >> On 10/18/2020 3:22 AM, Dominique Bejean wrote: >> > A few months ago, I reported an issue with Solr nodes crashing due to >> the >> > old generation heap growing suddenly and generating OOM. This problem >> > occurred again this week. I have threads dumps for each minute during >> the 3 >> > minutes the problem occured. I am using fastthread.io in order to >> analyse >> > these dumps. >> >> >> >> > * The Log4j issue starts ( >> > https://blog.fastthread.io/2020/01/24/log4j-bug-slows-down-your-app/) >> >> If the log4j bug is the root cause here, then the only way you can fix >> this is to upgrade to at least Solr 7.4. That is the Solr version where >> we first upgraded from log4j 1.2.x to log4j2. You cannot upgrade log4j >> in Solr 6.6.2 without changing Solr code. The code changes required >> were extensive. Note that I did not do anything to confirm whether the >> log4j bug is responsible here. You seem pretty confident that this is >> the case. >> >> Note that if you upgrade to 8.x, you will need to reindex from scratch. >> Upgrading an existing index is possible with one major version bump, but >> if your index has ever been touched by a release that's two major >> versions back, it won't work. In 8.x, that is enforced -- 8.x will not >> even try to read an old index touched by 6.x or earlier. >> >> In the following wiki page, I provided instructions for getting a >> screenshot of the process listing. >> >> https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems >> >> In addition to that screenshot, I would like to know the on-disk size of >> all the cores running on the problem node, along with a document count >> from those cores. It might be possible to work around the OOM just by >> increasing the size of the heap. That won't do anything about problems >> with log4j. >> >> Thanks, >> Shawn >> >
Re: SolrCloud 6.6.2 suddenly crash due to slow queries and Log4j issue
Hi Shawn, Thank you for your response. You are confirming my diagnosis. This is in fact a 8 nodes cluster with one single collection with 4 shards and 1 replica (8 cores). 4 Gb heap and 90 Gb Ram When no issue occurs nearly 50% of the heap is used. Num Docs in collection : 10.000.000 Num Docs per core is more or less 2.500.000 Max Doc per core is more or less 3.000.000 Core Data size is more or less 70 Gb Here are the JVM settings -DSTOP.KEY=solrrocks -DSTOP.PORT=7983 -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.port=18983 -Dcom.sun.management.jmxremote.rmi.port=18983 -Dcom.sun.management.jmxremote.ssl=false -Dhost= -Djava.rmi.server.hostname=XXX -Djetty.home=/x/server -Djetty.port=8983 -Dlog4j.configuration=file:/xx/log4j.properties -Dsolr.install.dir=/xx/solr -Dsolr.jetty.request.header.size=32768 -Dsolr.log.dir=/xxx/Logs -Dsolr.log.muteconsole -Dsolr.solr.home=//data -Duser.timezone=Europe/Paris -DzkClientTimeout=3 -DzkHost=xxx -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark -XX:+ParallelRefProcEnabled -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseConcMarkSweepGC -XX:+UseGCLogFileRotation -XX:+UseGCLogFileRotation -XX:+UseParNewGC -XX:-OmitStackTraceInFastThrow -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:ConcGCThreads=4 -XX:GCLogFileSize=20M -XX:MaxTenuringThreshold=8 -XX:NewRatio=3 -XX:NumberOfGCLogFiles=9 -XX:OnOutOfMemoryError=/xxx/solr/bin/oom_solr.sh 8983 /xx/Logs -XX:ParallelGCThreads=4 -XX:PretenureSizeThreshold=64m -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -Xloggc:/xx/solr_gc.log -Xloggc:/xx/solr_gc.log -Xms4g -Xmx4g -Xss256k -verbose:gc Here is one screenshot of top command for the node that failed last week. [image: 2020-10-19 15_48_06-Photos.png] Regards Dominique Le dim. 18 oct. 2020 à 22:03, Shawn Heisey a écrit : > On 10/18/2020 3:22 AM, Dominique Bejean wrote: > > A few months ago, I reported an issue with Solr nodes crashing due to the > > old generation heap growing suddenly and generating OOM. This problem > > occurred again this week. I have threads dumps for each minute during > the 3 > > minutes the problem occured. I am using fastthread.io in order to > analyse > > these dumps. > > > > > * The Log4j issue starts ( > > https://blog.fastthread.io/2020/01/24/log4j-bug-slows-down-your-app/) > > If the log4j bug is the root cause here, then the only way you can fix > this is to upgrade to at least Solr 7.4. That is the Solr version where > we first upgraded from log4j 1.2.x to log4j2. You cannot upgrade log4j > in Solr 6.6.2 without changing Solr code. The code changes required > were extensive. Note that I did not do anything to confirm whether the > log4j bug is responsible here. You seem pretty confident that this is > the case. > > Note that if you upgrade to 8.x, you will need to reindex from scratch. > Upgrading an existing index is possible with one major version bump, but > if your index has ever been touched by a release that's two major > versions back, it won't work. In 8.x, that is enforced -- 8.x will not > even try to read an old index touched by 6.x or earlier. > > In the following wiki page, I provided instructions for getting a > screenshot of the process listing. > > https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems > > In addition to that screenshot, I would like to know the on-disk size of > all the cores running on the problem node, along with a document count > from those cores. It might be possible to work around the OOM just by > increasing the size of the heap. That won't do anything about problems > with log4j. > > Thanks, > Shawn >
Re: Improve results/relevance
Hi Jayadevan, There are a couple of ways to achieve the result you want! Two things you could do from the top of my head are: You could sort the results based on some field, or boost some fields so that they get a higher score. > On 17 Oct 2020, at 05:51, Jayadevan Maymala wrote: > > Hi all, > > We have a catalogue of many products, including smart phones. We use > *edismax* query parser. If someone types in iPhone 11, we are getting the > correct results. But iPhone 11 Pro is coming before iPhone 11. What options > can be used to improve this? > > Regards, > Jayadevan == Konstantinos Koukouvis konstantinos.koukou...@mecenat.com Using Golang and Solr? Try this: https://github.com/mecenat/solr
Re: Improve results/relevance
Hi, A few strategies you can use: 1. First you need to know why the result has matched. Solr provides detailed debug info but it's not easy to interpret. Consider using something like www.splainer.io to give you better visibility (disclaimer: this is something we maintain, there are other alternatives including a cool Chrome plugin). You can now see where scores are being calculated. 2. Next you should read up on how Lucene/Solr edismax scoring works - remember it's a 'winner takes all' strategy. Here's a great blog by Doug on this https://opensourceconnections.com/blog/2013/07/02/getting-dissed-by-dismax-why-your-incorrect-assumptions-about-dismax-are-hurting-search-relevancy/ . Now you should know why your results are being ordered as they are. 3. You've now got lots of options: you should set up some tests (perhaps use Quepid? www.quepid.com disclaimer: yes that's us too :) to monitor what happens as you try each and to check for side-effects. You could boost exact phrase matches - here's one way to do this http://everydaydeveloper.blogspot.com/2012/02/solr-improve-relevancy-by-boosting.html or you could use Querqy which gives you much more flexibility https://querqy.org/ (check out SMUI too as this is a great way to manage Querqy rules). 4. What you're doing is active search tuning for ecommerce, and this won't be the first example you'll come across. You should also implement a system for tracking these kinds of issues, what you do to fix them and the tests carried out: it's analogous to a bug tracker and something we call a 'Relevancy Register'. Otherwise you'll end up with a huge pile of hacks and will swiftly forget why they were implemented and what problem they were trying to solve! 5. We're running a blog series about ecommerce search which you might want to follow: https://opensourceconnections.com/blog/2020/07/07/meet-pete-the-e-commerce-search-product-manager/ HTH Charlie On 17/10/2020 04:51, Jayadevan Maymala wrote: Hi all, We have a catalogue of many products, including smart phones. We use *edismax* query parser. If someone types in iPhone 11, we are getting the correct results. But iPhone 11 Pro is coming before iPhone 11. What options can be used to improve this? Regards, Jayadevan -- Charlie Hull OpenSource Connections, previously Flax tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.o19s.com