ElevateIds - should I remove those that might be filtered off in the underlying query

2020-10-19 Thread Mark Robinson
Hi,

Suppose I have say 50 ElevateIds and I have a way to identify those that
would get filtered out in the query by predefined  fqs. So they would in
reality never be even in the results and hence never be elevated.

Is there any advantage if I avoid passing them in the elevateIds at the
time of creating the elevateIds,  thinking I can gain performance or they
remaining in the elevateIds does not cause any performance difference?

Thanks!
Mark


Re: Need help in understanding the below error message when running solr-exporter

2020-10-19 Thread yaswanth kumar
Can someone help on the above pls??

On Sat, Oct 17, 2020 at 6:22 AM yaswanth kumar 
wrote:

> Using Solr 8.2; Zoo 3.4; Solr mode: Cloud with multiple collections; Basic
> Authentication: Enabled
>
> I am trying to run the
>
> export JAVA_OPTS="-Djavax.net.ssl.trustStore=etc/solr-keystore.jks
> -Djavax.net.ssl.trustStorePassword=solrssl
> -Dsolr.httpclient.builder.factory=org.apache.solr.client.solrj.impl.PreemptiveBasicAuthClientBuilderFactory
> -Dbasicauth=solrrocks:"
>
> export
> CLASSPATH_PREFIX="../../server/solr-webapp/webapp/WEB-INF/lib/commons-codec-1.11.jar"
>
> /bin/solr-exporter -p 8085 -z localhost:2181/solr -f
> ./conf/solr-exporter-config.xml -n 16
>
> and seeing these below messages and on the grafana solr dashboard I do see
> panels coming in but data is not populating on them.
>
> Can someone help me if I am missing something interms of configuration?
>
> WARN  - 2020-10-17 11:17:59.687; org.apache.solr.prometheus.scraper.Async;
> Error occurred during metrics collection =>
> java.util.concurrent.ExecutionException: java.lang.RuntimeException:
> java.lang.NullPointerException
> at
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> java.util.concurrent.ExecutionException: java.lang.RuntimeException:
> java.lang.NullPointerException
> at
> java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
> ~[?:?]
> at
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
> ~[?:?]
> at
> org.apache.solr.prometheus.scraper.Async.lambda$null$1(Async.java:45)
> [solr-prometheus-exporter-8.2.0.jar:8.2.0
> 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57]
> at
> org.apache.solr.prometheus.scraper.Async$$Lambda$190/.accept(Unknown
> Source) [solr-prometheus-exporter-8.2.0.jar:8.2.0
> 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57]
> at
> java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
> [?:?]
> at
> java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
> [?:?]
> at
> java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1654)
> [?:?]
> at
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:497) [?:?]
> at
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:487)
> [?:?]
> at
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
> [?:?]
> at
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
> [?:?]
> at
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:239) [?:?]
> at
> java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) [?:?]
> at
> org.apache.solr.prometheus.scraper.Async.lambda$waitForAllSuccessfulResponses$3(Async.java:43)
> [solr-prometheus-exporter-8.2.0.jar:8.2.0
> 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57]
> at
> org.apache.solr.prometheus.scraper.Async$$Lambda$165/.apply(Unknown
> Source) [solr-prometheus-exporter-8.2.0.jar:8.2.0
> 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57]
> at
> java.util.concurrent.CompletableFuture.uniExceptionally(CompletableFuture.java:986)
> [?:?]
> at
> java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(CompletableFuture.java:970)
> [?:?]
> at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
> [?:?]
> at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1705)
> [?:?]
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
> [solr-solrj-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe -
> ivera - 2019-07-19 15:11:07]
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$142/.run(Unknown
> Source) [solr-solrj-8.2.0.jar:8.2.0
> 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:11:07]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> [?:?]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> [?:?]
> at java.lang.Thread.run(Thread.java:834) [?:?]
> Caused by: java.lang.RuntimeException: java.lang.NullPointerException
> at
> org.apache.solr.prometheus.collector.SchedulerMetricsCollector.lambda$collectMetrics$0(SchedulerMetricsCollector.java:92)
> ~[solr-prometheus-exporter-8.2.0.jar:8.2.0
> 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe - ivera - 2019-07-19 15:10:57]
> at
> org.apache.solr.prometheus.collector.SchedulerMetricsCollector$$Lambda$163/.get(Unknown
> Source) ~[?:?]
> at
> java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1700)
> 

s3 or other cloud hosted storage options?

2020-10-19 Thread Michael Conrad

Hi all,

Hopefully someone can provide insight.

We are looking to see if there are any viable options for S3 or similar 
for index/data storage.


Preferably (if possible) shared between nodes for dynamic scalability needs.

-Mike/NewsRx


security.json help

2020-10-19 Thread Mark Dadisman
Hey, I'm new to configuring Solr. I'm trying to configure Solr with Rule Based 
Authorization. 
https://lucene.apache.org/solr/guide/8_6/rule-based-authorization-plugin.html

I have permissions working if I allow everything with "all", but I want to 
limit access so that a site can only access its own collection, in addition to 
a server ping path, so I'm trying to add the collection-specific permission at 
the top:

"permissions": [
  {
"name": "custom-example",
"collection": "example",
"path": "*",
"role": [
  "admin",
  "example"
]
  },
  {
"name": "custom-collection",
"collection": "*",
"path": [
  "/admin/luke",
  "/admin/mbeans",
  "/admin/system"
],
"role": "*"
  },
  {
"name": "custom-ping",
"collection": null,
"path": [
  "/admin/info/system"
],
"role": "*"
  },
  {
"name": "all",
"role": "admin"
  }
]

The rule "custom-ping" works, and "all" works. But when the above permissions 
are used, access is denied to the "example" user-role for collection "example" 
at the path "/solr/example/select". If I specify paths explicitly, the 
permissions work, but I can't get permissions to work with path wildcards for a 
specific collection.

I also had to declare "custom-collection" with the specific paths needed to get 
collection info in order for those paths to work. I would've expected that 
these paths would be included in the collection-specific paths and be covered 
by the first rule, but they aren't. For example, the call to 
"/solr/example/admin/luke" will fail if the path is removed from this rule.

I don't really want to specify every single path I might need to use. Am I 
using the path wildcard wrong somehow? Is there a better way to do 
collection-specific authorizations for a collection "example"?

Thanks.
- M



Re: Faceting on indexed=false stored=false docValues=true fields

2020-10-19 Thread uyilmaz
Sorry, correction, taking "the" time

On Mon, 19 Oct 2020 22:18:30 +0300
uyilmaz  wrote:

> Thanks for taking time to write a detailed answer.
> 
> We use Solr to both store our data and to perform aggregations, using 
> faceting or streaming expressions. When required analysis is too complex to 
> do in Solr, we export large query results from Solr to a more capable 
> analysis tool.
> 
> So I guess all fields need to be docValues="true", because export handler and 
> streaming both require fields to have docValues, and even if I won't use a 
> field in queries or facets, it should be in available to read in result set. 
> Fields that won't be searched or faceted can be (indexed=false stored=false 
> docValues=true) right?
> 
> --uyilmaz
> 
> 
> On Mon, 19 Oct 2020 14:14:27 -0400
> Michael Gibney  wrote:
> 
> > As you've observed, it is indeed possible to facet on fields with
> > docValues=true, indexed=false; but in almost all cases you should
> > probably set indexed=true. 1. for distributed facet count refinement,
> > the "indexed" approach is used to look up counts by value; 2. assuming
> > you're wanting to do something usual, e.g. allow users to apply
> > filters based on facet counts, the filter application would use the
> > "indexed" approach as well. Where indexed=false, if either filtering
> > or distributed refinement is attempted, I'm not 100% sure what
> > happens. It might fail, or lead to inconsistent results, or attempt to
> > look up results via the equivalent of a "table scan" over docValues (I
> > think the last of these is what actually happens, fwiw) ... but none
> > of these options is likely desirable.
> > 
> > Michael
> > 
> > On Mon, Oct 19, 2020 at 1:42 PM uyilmaz  wrote:
> > >
> > > Thanks! This also contributed to my confusion:
> > >
> > > https://lucene.apache.org/solr/guide/8_4/faceting.html#field-value-faceting-parameters
> > >
> > > "If you want Solr to perform both analysis (for searching) and faceting 
> > > on the full literal strings, use the copyField directive in your Schema 
> > > to create two versions of the field: one Text and one String. Make sure 
> > > both are indexed="true"."
> > >
> > > On Mon, 19 Oct 2020 13:08:00 -0400
> > > Alexandre Rafalovitch  wrote:
> > >
> > > > I think this is all explained quite well in the Ref Guide:
> > > > https://lucene.apache.org/solr/guide/8_6/docvalues.html
> > > >
> > > > DocValues is a different way to index/store values. Faceting is a
> > > > primary use case where docValues are better than what 'indexed=true'
> > > > gives you.
> > > >
> > > > Regards,
> > > >Alex.
> > > >
> > > > On Mon, 19 Oct 2020 at 12:51, uyilmaz  
> > > > wrote:
> > > > >
> > > > >
> > > > > Hey all,
> > > > >
> > > > > From my little experiments, I see that (if I didn't make a stupid 
> > > > > mistake) we can facet on fields marked as both indexed and stored 
> > > > > being false:
> > > > >
> > > > >  > > > > indexed="false" stored="false" docValues="true"/>
> > > > >
> > > > > I'm suprised by this, I thought I would need to index it. Can you 
> > > > > confirm this?
> > > > >
> > > > > Regards
> > > > >
> > > > > --
> > > > > uyilmaz 
> > >
> > >
> > > --
> > > uyilmaz 
> 
> 
> -- 
> uyilmaz 


-- 
uyilmaz 


Re: Faceting on indexed=false stored=false docValues=true fields

2020-10-19 Thread uyilmaz
Thanks for taking time to write a detailed answer.

We use Solr to both store our data and to perform aggregations, using faceting 
or streaming expressions. When required analysis is too complex to do in Solr, 
we export large query results from Solr to a more capable analysis tool.

So I guess all fields need to be docValues="true", because export handler and 
streaming both require fields to have docValues, and even if I won't use a 
field in queries or facets, it should be in available to read in result set. 
Fields that won't be searched or faceted can be (indexed=false stored=false 
docValues=true) right?

--uyilmaz


On Mon, 19 Oct 2020 14:14:27 -0400
Michael Gibney  wrote:

> As you've observed, it is indeed possible to facet on fields with
> docValues=true, indexed=false; but in almost all cases you should
> probably set indexed=true. 1. for distributed facet count refinement,
> the "indexed" approach is used to look up counts by value; 2. assuming
> you're wanting to do something usual, e.g. allow users to apply
> filters based on facet counts, the filter application would use the
> "indexed" approach as well. Where indexed=false, if either filtering
> or distributed refinement is attempted, I'm not 100% sure what
> happens. It might fail, or lead to inconsistent results, or attempt to
> look up results via the equivalent of a "table scan" over docValues (I
> think the last of these is what actually happens, fwiw) ... but none
> of these options is likely desirable.
> 
> Michael
> 
> On Mon, Oct 19, 2020 at 1:42 PM uyilmaz  wrote:
> >
> > Thanks! This also contributed to my confusion:
> >
> > https://lucene.apache.org/solr/guide/8_4/faceting.html#field-value-faceting-parameters
> >
> > "If you want Solr to perform both analysis (for searching) and faceting on 
> > the full literal strings, use the copyField directive in your Schema to 
> > create two versions of the field: one Text and one String. Make sure both 
> > are indexed="true"."
> >
> > On Mon, 19 Oct 2020 13:08:00 -0400
> > Alexandre Rafalovitch  wrote:
> >
> > > I think this is all explained quite well in the Ref Guide:
> > > https://lucene.apache.org/solr/guide/8_6/docvalues.html
> > >
> > > DocValues is a different way to index/store values. Faceting is a
> > > primary use case where docValues are better than what 'indexed=true'
> > > gives you.
> > >
> > > Regards,
> > >Alex.
> > >
> > > On Mon, 19 Oct 2020 at 12:51, uyilmaz  wrote:
> > > >
> > > >
> > > > Hey all,
> > > >
> > > > From my little experiments, I see that (if I didn't make a stupid 
> > > > mistake) we can facet on fields marked as both indexed and stored being 
> > > > false:
> > > >
> > > >  > > > indexed="false" stored="false" docValues="true"/>
> > > >
> > > > I'm suprised by this, I thought I would need to index it. Can you 
> > > > confirm this?
> > > >
> > > > Regards
> > > >
> > > > --
> > > > uyilmaz 
> >
> >
> > --
> > uyilmaz 


-- 
uyilmaz 


Re: Faceting on indexed=false stored=false docValues=true fields

2020-10-19 Thread Walter Underwood
Hmm. Fields used for faceting will also be used for filtering, which is a kind
of search. Are docValues OK for filtering? I expect they might be slow the
first time, then cached.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 19, 2020, at 11:15 AM, Erick Erickson  wrote:
> 
> uyilmaz:
> 
> Hmm, that _is_ confusing. And inaccurate.
> 
> In this context, it should read something like
> 
> The Text field should have indexed="true" docValues=“false" if used for 
> searching 
> but not faceting and the String field should have indexed="false" 
> docValues=“true"
> if used for faceting but not searching.
> 
> I’ll fix this, thanks for pointing this out.
> 
> Erick
> 
>> On Oct 19, 2020, at 1:42 PM, uyilmaz  wrote:
>> 
>> Thanks! This also contributed to my confusion:
>> 
>> https://lucene.apache.org/solr/guide/8_4/faceting.html#field-value-faceting-parameters
>> 
>> "If you want Solr to perform both analysis (for searching) and faceting on 
>> the full literal strings, use the copyField directive in your Schema to 
>> create two versions of the field: one Text and one String. Make sure both 
>> are indexed="true"."
>> 
>> On Mon, 19 Oct 2020 13:08:00 -0400
>> Alexandre Rafalovitch  wrote:
>> 
>>> I think this is all explained quite well in the Ref Guide:
>>> https://lucene.apache.org/solr/guide/8_6/docvalues.html
>>> 
>>> DocValues is a different way to index/store values. Faceting is a
>>> primary use case where docValues are better than what 'indexed=true'
>>> gives you.
>>> 
>>> Regards,
>>>  Alex.
>>> 
>>> On Mon, 19 Oct 2020 at 12:51, uyilmaz  wrote:
 
 
 Hey all,
 
 From my little experiments, I see that (if I didn't make a stupid mistake) 
 we can facet on fields marked as both indexed and stored being false:
 
 >>> stored="false" docValues="true"/>
 
 I'm suprised by this, I thought I would need to index it. Can you confirm 
 this?
 
 Regards
 
 --
 uyilmaz 
>> 
>> 
>> -- 
>> uyilmaz 
> 



Re: Faceting on indexed=false stored=false docValues=true fields

2020-10-19 Thread Erick Erickson
uyilmaz:

Hmm, that _is_ confusing. And inaccurate.

In this context, it should read something like

The Text field should have indexed="true" docValues=“false" if used for 
searching 
but not faceting and the String field should have indexed="false" 
docValues=“true"
if used for faceting but not searching.

I’ll fix this, thanks for pointing this out.

Erick

> On Oct 19, 2020, at 1:42 PM, uyilmaz  wrote:
> 
> Thanks! This also contributed to my confusion:
> 
> https://lucene.apache.org/solr/guide/8_4/faceting.html#field-value-faceting-parameters
> 
> "If you want Solr to perform both analysis (for searching) and faceting on 
> the full literal strings, use the copyField directive in your Schema to 
> create two versions of the field: one Text and one String. Make sure both are 
> indexed="true"."
> 
> On Mon, 19 Oct 2020 13:08:00 -0400
> Alexandre Rafalovitch  wrote:
> 
>> I think this is all explained quite well in the Ref Guide:
>> https://lucene.apache.org/solr/guide/8_6/docvalues.html
>> 
>> DocValues is a different way to index/store values. Faceting is a
>> primary use case where docValues are better than what 'indexed=true'
>> gives you.
>> 
>> Regards,
>>   Alex.
>> 
>> On Mon, 19 Oct 2020 at 12:51, uyilmaz  wrote:
>>> 
>>> 
>>> Hey all,
>>> 
>>> From my little experiments, I see that (if I didn't make a stupid mistake) 
>>> we can facet on fields marked as both indexed and stored being false:
>>> 
>>> >> stored="false" docValues="true"/>
>>> 
>>> I'm suprised by this, I thought I would need to index it. Can you confirm 
>>> this?
>>> 
>>> Regards
>>> 
>>> --
>>> uyilmaz 
> 
> 
> -- 
> uyilmaz 



Re: Faceting on indexed=false stored=false docValues=true fields

2020-10-19 Thread Michael Gibney
As you've observed, it is indeed possible to facet on fields with
docValues=true, indexed=false; but in almost all cases you should
probably set indexed=true. 1. for distributed facet count refinement,
the "indexed" approach is used to look up counts by value; 2. assuming
you're wanting to do something usual, e.g. allow users to apply
filters based on facet counts, the filter application would use the
"indexed" approach as well. Where indexed=false, if either filtering
or distributed refinement is attempted, I'm not 100% sure what
happens. It might fail, or lead to inconsistent results, or attempt to
look up results via the equivalent of a "table scan" over docValues (I
think the last of these is what actually happens, fwiw) ... but none
of these options is likely desirable.

Michael

On Mon, Oct 19, 2020 at 1:42 PM uyilmaz  wrote:
>
> Thanks! This also contributed to my confusion:
>
> https://lucene.apache.org/solr/guide/8_4/faceting.html#field-value-faceting-parameters
>
> "If you want Solr to perform both analysis (for searching) and faceting on 
> the full literal strings, use the copyField directive in your Schema to 
> create two versions of the field: one Text and one String. Make sure both are 
> indexed="true"."
>
> On Mon, 19 Oct 2020 13:08:00 -0400
> Alexandre Rafalovitch  wrote:
>
> > I think this is all explained quite well in the Ref Guide:
> > https://lucene.apache.org/solr/guide/8_6/docvalues.html
> >
> > DocValues is a different way to index/store values. Faceting is a
> > primary use case where docValues are better than what 'indexed=true'
> > gives you.
> >
> > Regards,
> >Alex.
> >
> > On Mon, 19 Oct 2020 at 12:51, uyilmaz  wrote:
> > >
> > >
> > > Hey all,
> > >
> > > From my little experiments, I see that (if I didn't make a stupid 
> > > mistake) we can facet on fields marked as both indexed and stored being 
> > > false:
> > >
> > >  > > stored="false" docValues="true"/>
> > >
> > > I'm suprised by this, I thought I would need to index it. Can you confirm 
> > > this?
> > >
> > > Regards
> > >
> > > --
> > > uyilmaz 
>
>
> --
> uyilmaz 


Re: Faceting on indexed=false stored=false docValues=true fields

2020-10-19 Thread uyilmaz
Thanks! This also contributed to my confusion:

https://lucene.apache.org/solr/guide/8_4/faceting.html#field-value-faceting-parameters

"If you want Solr to perform both analysis (for searching) and faceting on the 
full literal strings, use the copyField directive in your Schema to create two 
versions of the field: one Text and one String. Make sure both are 
indexed="true"."

On Mon, 19 Oct 2020 13:08:00 -0400
Alexandre Rafalovitch  wrote:

> I think this is all explained quite well in the Ref Guide:
> https://lucene.apache.org/solr/guide/8_6/docvalues.html
> 
> DocValues is a different way to index/store values. Faceting is a
> primary use case where docValues are better than what 'indexed=true'
> gives you.
> 
> Regards,
>Alex.
> 
> On Mon, 19 Oct 2020 at 12:51, uyilmaz  wrote:
> >
> >
> > Hey all,
> >
> > From my little experiments, I see that (if I didn't make a stupid mistake) 
> > we can facet on fields marked as both indexed and stored being false:
> >
> >  > stored="false" docValues="true"/>
> >
> > I'm suprised by this, I thought I would need to index it. Can you confirm 
> > this?
> >
> > Regards
> >
> > --
> > uyilmaz 


-- 
uyilmaz 


Re: Faceting on indexed=false stored=false docValues=true fields

2020-10-19 Thread Alexandre Rafalovitch
I think this is all explained quite well in the Ref Guide:
https://lucene.apache.org/solr/guide/8_6/docvalues.html

DocValues is a different way to index/store values. Faceting is a
primary use case where docValues are better than what 'indexed=true'
gives you.

Regards,
   Alex.

On Mon, 19 Oct 2020 at 12:51, uyilmaz  wrote:
>
>
> Hey all,
>
> From my little experiments, I see that (if I didn't make a stupid mistake) we 
> can facet on fields marked as both indexed and stored being false:
>
>  stored="false" docValues="true"/>
>
> I'm suprised by this, I thought I would need to index it. Can you confirm 
> this?
>
> Regards
>
> --
> uyilmaz 


Faceting on indexed=false stored=false docValues=true fields

2020-10-19 Thread uyilmaz


Hey all,

>From my little experiments, I see that (if I didn't make a stupid mistake) we 
>can facet on fields marked as both indexed and stored being false:



I'm suprised by this, I thought I would need to index it. Can you confirm this?

Regards

-- 
uyilmaz 


Re: SolrCloud 6.6.2 suddenly crash due to slow queries and Log4j issue

2020-10-19 Thread Dominique Bejean
Shawn,

According to the log4j description (
https://bz.apache.org/bugzilla/show_bug.cgi?id=57714), the issue is related
to lock during appenders collection process.

In addition to CONSOLE and file appenders in the default log4j.properties,
my customer added 2 extra FileAppender dedicated to all requests and slow
requests. I suggested removing these two extra appenders.

Regards

Dominique



Le lun. 19 oct. 2020 à 15:48, Dominique Bejean 
a écrit :

> Hi Shawn,
>
> Thank you for your response.
>
> You are confirming my diagnosis.
>
> This is in fact a 8 nodes cluster with one single collection with 4 shards
> and 1 replica (8 cores).
>
> 4 Gb heap and 90 Gb Ram
>
>
> When no issue occurs nearly 50% of the heap is used.
>
> Num Docs in collection : 10.000.000
>
> Num Docs per core is more or less 2.500.000
>
> Max Doc per core is more or less 3.000.000
>
> Core Data size is more or less 70 Gb
>
> Here are the JVM settings
>
> -DSTOP.KEY=solrrocks
>
> -DSTOP.PORT=7983
>
> -Dcom.sun.management.jmxremote
>
> -Dcom.sun.management.jmxremote.authenticate=false
>
> -Dcom.sun.management.jmxremote.local.only=false
>
> -Dcom.sun.management.jmxremote.port=18983
>
> -Dcom.sun.management.jmxremote.rmi.port=18983
>
> -Dcom.sun.management.jmxremote.ssl=false
>
> -Dhost=
>
> -Djava.rmi.server.hostname=XXX
>
> -Djetty.home=/x/server
>
> -Djetty.port=8983
>
> -Dlog4j.configuration=file:/xx/log4j.properties
>
> -Dsolr.install.dir=/xx/solr
>
> -Dsolr.jetty.request.header.size=32768
>
> -Dsolr.log.dir=/xxx/Logs
>
> -Dsolr.log.muteconsole
>
> -Dsolr.solr.home=//data
>
> -Duser.timezone=Europe/Paris
>
> -DzkClientTimeout=3
>
> -DzkHost=xxx
>
> -XX:+CMSParallelRemarkEnabled
>
> -XX:+CMSScavengeBeforeRemark
>
> -XX:+ParallelRefProcEnabled
>
> -XX:+PrintGCApplicationStoppedTime
>
> -XX:+PrintGCDateStamps
>
> -XX:+PrintGCDetails
>
> -XX:+PrintGCTimeStamps
>
> -XX:+PrintHeapAtGC
>
> -XX:+PrintTenuringDistribution
>
> -XX:+UseCMSInitiatingOccupancyOnly
>
> -XX:+UseConcMarkSweepGC
>
> -XX:+UseGCLogFileRotation
>
> -XX:+UseGCLogFileRotation
>
> -XX:+UseParNewGC
>
> -XX:-OmitStackTraceInFastThrow
>
> -XX:CMSInitiatingOccupancyFraction=50
>
> -XX:CMSMaxAbortablePrecleanTime=6000
>
> -XX:ConcGCThreads=4
>
> -XX:GCLogFileSize=20M
>
> -XX:MaxTenuringThreshold=8
>
> -XX:NewRatio=3
>
> -XX:NumberOfGCLogFiles=9
>
> -XX:OnOutOfMemoryError=/xxx/solr/bin/oom_solr.sh
>
> 8983
>
> /xx/Logs
>
> -XX:ParallelGCThreads=4
>
> -XX:PretenureSizeThreshold=64m
>
> -XX:SurvivorRatio=4
>
> -XX:TargetSurvivorRatio=90
>
> -Xloggc:/xx/solr_gc.log
>
> -Xloggc:/xx/solr_gc.log
>
> -Xms4g
>
> -Xmx4g
>
> -Xss256k
>
> -verbose:gc
>
>
>
> Here is one screenshot of top command for the node that failed last week.
>
> [image: 2020-10-19 15_48_06-Photos.png]
>
> Regards
>
> Dominique
>
>
>
> Le dim. 18 oct. 2020 à 22:03, Shawn Heisey  a écrit :
>
>> On 10/18/2020 3:22 AM, Dominique Bejean wrote:
>> > A few months ago, I reported an issue with Solr nodes crashing due to
>> the
>> > old generation heap growing suddenly and generating OOM. This problem
>> > occurred again this week. I have threads dumps for each minute during
>> the 3
>> > minutes the problem occured. I am using fastthread.io in order to
>> analyse
>> > these dumps.
>>
>> 
>>
>> > * The Log4j issue starts (
>> > https://blog.fastthread.io/2020/01/24/log4j-bug-slows-down-your-app/)
>>
>> If the log4j bug is the root cause here, then the only way you can fix
>> this is to upgrade to at least Solr 7.4.  That is the Solr version where
>> we first upgraded from log4j 1.2.x to log4j2.  You cannot upgrade log4j
>> in Solr 6.6.2 without changing Solr code.  The code changes required
>> were extensive.  Note that I did not do anything to confirm whether the
>> log4j bug is responsible here.  You seem pretty confident that this is
>> the case.
>>
>> Note that if you upgrade to 8.x, you will need to reindex from scratch.
>> Upgrading an existing index is possible with one major version bump, but
>> if your index has ever been touched by a release that's two major
>> versions back, it won't work.  In 8.x, that is enforced -- 8.x will not
>> even try to read an old index touched by 6.x or earlier.
>>
>> In the following wiki page, I provided instructions for getting a
>> screenshot of the process listing.
>>
>> https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems
>>
>> In addition to that screenshot, I would like to know the on-disk size of
>> all the cores running on the problem node, along with a document count
>> from those cores.  It might be possible to work around the OOM just by
>> increasing the size of the heap.  That won't do anything about problems
>> with log4j.
>>
>> Thanks,
>> Shawn
>>
>


Re: SolrCloud 6.6.2 suddenly crash due to slow queries and Log4j issue

2020-10-19 Thread Dominique Bejean
Hi Shawn,

Thank you for your response.

You are confirming my diagnosis.

This is in fact a 8 nodes cluster with one single collection with 4 shards
and 1 replica (8 cores).

4 Gb heap and 90 Gb Ram


When no issue occurs nearly 50% of the heap is used.

Num Docs in collection : 10.000.000

Num Docs per core is more or less 2.500.000

Max Doc per core is more or less 3.000.000

Core Data size is more or less 70 Gb

Here are the JVM settings

-DSTOP.KEY=solrrocks

-DSTOP.PORT=7983

-Dcom.sun.management.jmxremote

-Dcom.sun.management.jmxremote.authenticate=false

-Dcom.sun.management.jmxremote.local.only=false

-Dcom.sun.management.jmxremote.port=18983

-Dcom.sun.management.jmxremote.rmi.port=18983

-Dcom.sun.management.jmxremote.ssl=false

-Dhost=

-Djava.rmi.server.hostname=XXX

-Djetty.home=/x/server

-Djetty.port=8983

-Dlog4j.configuration=file:/xx/log4j.properties

-Dsolr.install.dir=/xx/solr

-Dsolr.jetty.request.header.size=32768

-Dsolr.log.dir=/xxx/Logs

-Dsolr.log.muteconsole

-Dsolr.solr.home=//data

-Duser.timezone=Europe/Paris

-DzkClientTimeout=3

-DzkHost=xxx

-XX:+CMSParallelRemarkEnabled

-XX:+CMSScavengeBeforeRemark

-XX:+ParallelRefProcEnabled

-XX:+PrintGCApplicationStoppedTime

-XX:+PrintGCDateStamps

-XX:+PrintGCDetails

-XX:+PrintGCTimeStamps

-XX:+PrintHeapAtGC

-XX:+PrintTenuringDistribution

-XX:+UseCMSInitiatingOccupancyOnly

-XX:+UseConcMarkSweepGC

-XX:+UseGCLogFileRotation

-XX:+UseGCLogFileRotation

-XX:+UseParNewGC

-XX:-OmitStackTraceInFastThrow

-XX:CMSInitiatingOccupancyFraction=50

-XX:CMSMaxAbortablePrecleanTime=6000

-XX:ConcGCThreads=4

-XX:GCLogFileSize=20M

-XX:MaxTenuringThreshold=8

-XX:NewRatio=3

-XX:NumberOfGCLogFiles=9

-XX:OnOutOfMemoryError=/xxx/solr/bin/oom_solr.sh

8983

/xx/Logs

-XX:ParallelGCThreads=4

-XX:PretenureSizeThreshold=64m

-XX:SurvivorRatio=4

-XX:TargetSurvivorRatio=90

-Xloggc:/xx/solr_gc.log

-Xloggc:/xx/solr_gc.log

-Xms4g

-Xmx4g

-Xss256k

-verbose:gc



Here is one screenshot of top command for the node that failed last week.

[image: 2020-10-19 15_48_06-Photos.png]

Regards

Dominique



Le dim. 18 oct. 2020 à 22:03, Shawn Heisey  a écrit :

> On 10/18/2020 3:22 AM, Dominique Bejean wrote:
> > A few months ago, I reported an issue with Solr nodes crashing due to the
> > old generation heap growing suddenly and generating OOM. This problem
> > occurred again this week. I have threads dumps for each minute during
> the 3
> > minutes the problem occured. I am using fastthread.io in order to
> analyse
> > these dumps.
>
> 
>
> > * The Log4j issue starts (
> > https://blog.fastthread.io/2020/01/24/log4j-bug-slows-down-your-app/)
>
> If the log4j bug is the root cause here, then the only way you can fix
> this is to upgrade to at least Solr 7.4.  That is the Solr version where
> we first upgraded from log4j 1.2.x to log4j2.  You cannot upgrade log4j
> in Solr 6.6.2 without changing Solr code.  The code changes required
> were extensive.  Note that I did not do anything to confirm whether the
> log4j bug is responsible here.  You seem pretty confident that this is
> the case.
>
> Note that if you upgrade to 8.x, you will need to reindex from scratch.
> Upgrading an existing index is possible with one major version bump, but
> if your index has ever been touched by a release that's two major
> versions back, it won't work.  In 8.x, that is enforced -- 8.x will not
> even try to read an old index touched by 6.x or earlier.
>
> In the following wiki page, I provided instructions for getting a
> screenshot of the process listing.
>
> https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems
>
> In addition to that screenshot, I would like to know the on-disk size of
> all the cores running on the problem node, along with a document count
> from those cores.  It might be possible to work around the OOM just by
> increasing the size of the heap.  That won't do anything about problems
> with log4j.
>
> Thanks,
> Shawn
>


Re: Improve results/relevance

2020-10-19 Thread Konstantinos Koukouvis
Hi Jayadevan,

There are a couple of ways to achieve the result you want! Two things you could 
do from the top of my head are: You could sort the results based on some field, 
or boost some fields so that they get a higher score.

> On 17 Oct 2020, at 05:51, Jayadevan Maymala  wrote:
> 
> Hi all,
> 
> We have a catalogue of many products, including smart phones.  We use
> *edismax* query parser. If someone types in iPhone 11, we are getting the
> correct results. But iPhone 11 Pro is coming before iPhone 11. What options
> can be used to improve this?
> 
> Regards,
> Jayadevan

==
Konstantinos Koukouvis
konstantinos.koukou...@mecenat.com

Using Golang and Solr? Try this: https://github.com/mecenat/solr







Re: Improve results/relevance

2020-10-19 Thread Charlie Hull

Hi,

A few strategies you can use:

1. First you need to know why the result has matched. Solr provides 
detailed debug info but it's not easy to interpret. Consider using 
something like www.splainer.io to give you better visibility 
(disclaimer: this is something we maintain, there are other alternatives 
including a cool Chrome plugin). You can now see where scores are being 
calculated.


2. Next you should read up on how Lucene/Solr edismax scoring works - 
remember it's a 'winner takes all' strategy. Here's a great blog by Doug 
on this 
https://opensourceconnections.com/blog/2013/07/02/getting-dissed-by-dismax-why-your-incorrect-assumptions-about-dismax-are-hurting-search-relevancy/ 
. Now you should know why your results are being ordered as they are.


3. You've now got lots of options: you should set up some tests (perhaps 
use Quepid? www.quepid.com disclaimer: yes that's us too :) to monitor 
what happens as you try each and to check for side-effects. You could 
boost exact phrase matches - here's one way to do this 
http://everydaydeveloper.blogspot.com/2012/02/solr-improve-relevancy-by-boosting.html 
or you could use Querqy which gives you much more flexibility 
https://querqy.org/ (check out SMUI too as this is a great way to manage 
Querqy rules).


4. What you're doing is active search tuning for ecommerce, and this 
won't be the first example you'll come across. You should also implement 
a system for tracking these kinds of issues, what you do to fix them and 
the tests carried out: it's analogous to a bug tracker and something we 
call a 'Relevancy Register'. Otherwise you'll end up with a huge pile of 
hacks and will swiftly forget why they were implemented and what problem 
they were trying to solve!


5. We're running a blog series about ecommerce search which you might 
want to follow: 
https://opensourceconnections.com/blog/2020/07/07/meet-pete-the-e-commerce-search-product-manager/


HTH

Charlie

On 17/10/2020 04:51, Jayadevan Maymala wrote:

Hi all,

We have a catalogue of many products, including smart phones.  We use
*edismax* query parser. If someone types in iPhone 11, we are getting the
correct results. But iPhone 11 Pro is coming before iPhone 11. What options
can be used to improve this?

Regards,
Jayadevan



--
Charlie Hull
OpenSource Connections, previously Flax

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.o19s.com