Re: Solrcloud create collection ignores createNodeSet parameter

2020-10-27 Thread Erick Erickson
You’re confusing replicas and shards a bit. Solr tries its best to put multiple 
replicas _of the same shard_ on different nodes. You have two shards though 
with _one_ replica. Thi is a bit of a nit, but important to keep in mind when 
your replicatinFactor increases. So from an HA perspective, this isn’t 
catastrophic since both shards must be up to run.

That said, it does seem reasonable to use all the nodes in your case. If you 
omit the createNodeSet, what happens? I’m curious if that’s confusing things 
somehow. And can you totally guarantee that both nodes are accessible when the 
collection is created?

BTW, I’ve always disliked the parameter name “maxShardsPerNode”, shards isn’t 
what it’s actually about. But I suppose 
“maxReplicasOfAnyIndividualShardOnASingleNode” is a little verbose...

> On Oct 27, 2020, at 2:17 PM, Webster Homer  
> wrote:
> 
> We have a solrcloud set up with 2 nodes, 1 zookeeper and running Solr 7.7.2 
> This cloud is used for development purposes. Collections are sharded across 
> the 2 nodes.
> 
> Recently we noticed that one of the main collections we use had both replicas 
> running on the same node. Normally we don't see collections created where the 
> replicas run on the same node.
> 
> I tried to create a new version of the collection forcing it to use both 
> nodes. However, that doesn't work both replicas are created on the same node:
> /solr/admin/collections?action=CREATE=sial-catalog-product-20201027=sial-catalog-product-20200808=2=1=1=uc1a-ecomdev-msc02:8983_solr,uc1a-ecomdev-msc01:8983_solr
> The call returns this:
> {
>"responseHeader": {
>"status": 0,
>"QTime": 4659
>},
>"success": {
>"uc1a-ecomdev-msc01:8983_solr": {
>"responseHeader": {
>"status": 0,
>"QTime": 3900
>},
>"core": "sial-catalog-product-20201027_shard2_replica_n2"
>},
>"uc1a-ecomdev-msc01:8983_solr": {
>"responseHeader": {
>"status": 0,
>"QTime": 4012
>},
>"core": "sial-catalog-product-20201027_shard1_replica_n1"
>}
>}
> }
> 
> Both replicas are created on the same node. Why is this happening?
> 
> How do we force the replicas be placed on different nodes?
> 
> 
> 
> This message and any attachment are confidential and may be privileged or 
> otherwise protected from disclosure. If you are not the intended recipient, 
> you must not copy this message or attachment or disclose the contents to any 
> other person. If you have received this transmission in error, please notify 
> the sender immediately and delete the message and any attachment from your 
> system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not 
> accept liability for any omissions or errors in this message which may arise 
> as a result of E-Mail-transmission or for damages resulting from any 
> unauthorized changes of the content of this message and any attachment 
> thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not 
> guarantee that this message is free of viruses and does not accept liability 
> for any damages caused by any virus transmitted therewith.
> 
> 
> 
> Click http://www.merckgroup.com/disclaimer to access the German, French, 
> Spanish and Portuguese versions of this disclaimer.



Re: Solr dependency update at Apache Beam - which versions should be supported

2020-10-27 Thread Mike Drob
Piotr,

Based on the questions that we've seen over the past month on this list,
there are still users with Solr on 6, 7, and 8. I suspect there are still
Solr 5 users out there too, although they don't appear to be asking for
help - likely they are in set it and forget it mode.

Solr 7 may not be officially deprecated on our site, but it's pretty old at
this point and we're not doing any development on it outside of mybe a
very high profile security fix. Even then, we might acknowledge it and
recommend users update to 8.x anyway.

The index files generated by Lucene and consumed by Solr are backwards
compatible up to one major version. Some of the API remains compatible, a
client issuing simple queries to Solr 5 would probably work fine even
against Solr 9 when it comes out eventually. A client doing admin
operations will be less certain. I don't know enough about Beam to tell you
where on the spectrum your use will fall.

I'm not sure if this was helpful or not, but maybe it is a nudge in the
right direction.

Good luck,
Mike


On Tue, Oct 27, 2020 at 11:09 AM Piotr Szuberski <
piotr.szuber...@polidea.com> wrote:

> Hi,
>
> We are working on dependency updates at Apache Beam and I would like to
> consult which versions should be supported so we don't break any existing
> users.
>
> Previously the supported Solr version was 5.5.4.
>
> Versions 8.x.y and 7.x.y naturally come to mind as they are the only not
> deprecated. But maybe there are users that use some earlier versions?
>
> Are these versions backwards-compatible or there are things to be aware of?
>
> Regards
>


Solrcloud create collection ignores createNodeSet parameter

2020-10-27 Thread Webster Homer
We have a solrcloud set up with 2 nodes, 1 zookeeper and running Solr 7.7.2 
This cloud is used for development purposes. Collections are sharded across the 
2 nodes.

Recently we noticed that one of the main collections we use had both replicas 
running on the same node. Normally we don't see collections created where the 
replicas run on the same node.

I tried to create a new version of the collection forcing it to use both nodes. 
However, that doesn't work both replicas are created on the same node:
/solr/admin/collections?action=CREATE=sial-catalog-product-20201027=sial-catalog-product-20200808=2=1=1=uc1a-ecomdev-msc02:8983_solr,uc1a-ecomdev-msc01:8983_solr
The call returns this:
{
"responseHeader": {
"status": 0,
"QTime": 4659
},
"success": {
"uc1a-ecomdev-msc01:8983_solr": {
"responseHeader": {
"status": 0,
"QTime": 3900
},
"core": "sial-catalog-product-20201027_shard2_replica_n2"
},
"uc1a-ecomdev-msc01:8983_solr": {
"responseHeader": {
"status": 0,
"QTime": 4012
},
"core": "sial-catalog-product-20201027_shard1_replica_n1"
}
}
}

Both replicas are created on the same node. Why is this happening?

How do we force the replicas be placed on different nodes?



This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, you 
must not copy this message or attachment or disclose the contents to any other 
person. If you have received this transmission in error, please notify the 
sender immediately and delete the message and any attachment from your system. 
Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept 
liability for any omissions or errors in this message which may arise as a 
result of E-Mail-transmission or for damages resulting from any unauthorized 
changes of the content of this message and any attachment thereto. Merck KGaA, 
Darmstadt, Germany and any of its subsidiaries do not guarantee that this 
message is free of viruses and does not accept liability for any damages caused 
by any virus transmitted therewith.



Click http://www.merckgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Re: Backup fails despite allowPaths=* being set

2020-10-27 Thread Philipp Trulson
Thanks for the answer! We added the line and now everything is working as
expected. Sorry for not reading the manual properly :)

Philipp

Am Mo., 26. Okt. 2020 um 09:27 Uhr schrieb Jan Høydahl <
jan@cominvent.com>:

> According to the source code here
>
>
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.6.2/solr/core/src/java/org/apache/solr/core/SolrPaths.java#L134
>
> your allowPaths value is NOT equal to «*» (which is stored as _ALL_)
> (parsed here
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.6.2/solr/core/src/java/org/apache/solr/core/SolrXmlConfig.java#L311
> )
>
>
> Please check your solr.xml file, it needs to contain this line
>
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.6.2/solr/server/solr/solr.xml#L33
>
> Jan
>
> > 22. okt. 2020 kl. 15:57 skrev Philipp Trulson :
> >
> > I'm sure that this is not the case. On the Java Properties page it says
> > "solr.allowPaths  *", on the dashboard I can verify that the
> > "-Dsolr.allowPaths=*" option is present.
> >
> > Am Mi., 21. Okt. 2020 um 19:10 Uhr schrieb Jan Høydahl <
> > jan@cominvent.com>:
> >
> >> Are you sure the * is not eaten by the shell since it’s a special char?
> >> You can view the sys props in admin UI to check.
> >>
> >> Jan Høydahl
> >>
> >>> 16. okt. 2020 kl. 19:39 skrev Philipp Trulson :
> >>>
> >>> Hello everyone,
> >>>
> >>> we are having problems with our backup script since we upgraded to Solr
> >>> 8.6.2 on kubernetes. To be more precise the message is
> >>> *Path /data/backup/2020-10-16/collection must be relative to SOLR_HOME,
> >>> SOLR_DATA_HOME coreRootDirectory. Set system property 'solr.allowPaths'
> >> to
> >>> add other allowed paths.*
> >>>
> >>> I executed the script by calling this endpoint
> >>> *curl
> >>> '
> >>
> http://solr.default.svc.cluster.local/solr/admin/collections?action=BACKUP=collection=
> >>> <
> >>
> http://solr.default.svc.cluster.local/solr/admin/collections?action=BACKUP=collection=
> >>> *
> >>> collection*=/data/backup/2020-10-16=1114'*
> >>>
> >>> The strange thing is that all 5 nodes are started with
> >> *-Dsolr.allowPaths=**,
> >>> so in theory it should work. The folder is an AWS EFS share, that's the
> >>> only reason I can imagine. Or can I check any other options?
> >>>
> >>> Thank you for your help!
> >>> Philipp
> >>>
> >>> --
> >>>
> >>>
> >>> 
> >>>
> >>> reBuy reCommerce GmbH* · *Potsdamer Str. 188* ·
> >>> *10783 Berlin* · *Geschäftsführer: Dr. Philipp GattnerSitz und
> >>> Registergericht: Berlin, Amtsgericht Charlottenburg, HRB 109344 B,
> >>> *USt-ID-Nr.:* DE237458635
> >>
> >
> >
> > --
> >
> > Philipp Trulson
> >
> > Platform Engineer
> > mail: p.trul...@rebuy.com · web: www.reBuy.de 
> >
> > --
> >
> >
> > 
> >
> > reBuy reCommerce GmbH* · *Potsdamer Str. 188* ·
> > *10783 Berlin* · *Geschäftsführer: Dr. Philipp GattnerSitz und
> > Registergericht: Berlin, Amtsgericht Charlottenburg, HRB 109344 B,
> > *USt-ID-Nr.:* DE237458635
>
>

-- 

Philipp Trulson

Platform Engineer
mail: p.trul...@rebuy.com · web: www.reBuy.de 

-- 


 

reBuy reCommerce GmbH* · *Potsdamer Str. 188* · 
*10783 Berlin* · *Geschäftsführer: Dr. Philipp GattnerSitz und 
Registergericht: Berlin, Amtsgericht Charlottenburg, HRB 109344 B, 
*USt-ID-Nr.:* DE237458635


Re: Massively unbalanced CPU by different SOLR Nodes

2020-10-27 Thread Shalin Shekhar Mangar
Good to hear that. Thanks for closing the loop!

On Tue, Oct 27, 2020 at 11:14 AM Jonathan Tan  wrote:

> Hi Shalin,
>
> Moving to 8.6.3 fixed it!
>
> Thank you very much for that. :)
> We'd considered an upgrade - just because - but we won't have done so so
> quickly without your information.
>
> Cheers
>
> On Sat, Oct 24, 2020 at 11:37 PM Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
> > Hi Jonathan,
> >
> > Are you using the "shards.preference" parameter by any chance? There is a
> > bug that causes uneven request distribution during fan-out. Can you check
> > the number of requests using the /admin/metrics API? Look for the /select
> > handler's distrib and local request times for each core in the node.
> > Compare those across different nodes.
> >
> > The bug I refer to is https://issues.apache.org/jira/browse/SOLR-14471
> and
> > it is fixed in Solr 8.5.2
> >
> > On Fri, Oct 23, 2020 at 9:05 AM Jonathan Tan  wrote:
> >
> > > Hi,
> > >
> > > We've got a 3 node SolrCloud cluster running on GKE, each on their own
> > > kube node (which is in itself, relatively empty of other things).
> > >
> > > Our collection has ~18m documents of 36gb in size, split into 6 shards
> > > with 2 replicas each, and they are evenly distributed across the 3
> nodes.
> > > Our JVMs are currently sized to ~14gb min & max , and they are running
> on
> > > SSDs.
> > >
> > >
> > > [image: Screen Shot 2020-10-23 at 2.15.48 pm.png]
> > >
> > > Graph also available here: https://pasteboard.co/JwUQ98M.png
> > >
> > > Under perf testing of ~30 requests per second, we start seeing really
> bad
> > > response times (around 3s in the 90th percentile, and *one* of the
> nodes
> > > would be fully maxed out on CPU. At about 15 requests per second, our
> > > response times are reasonable enough for our purposes (~0.8-1.1s), but
> as
> > > is visible in the graph, it's definitely *not* an even distribution of
> > the
> > > CPU load. One of the nodes is running at around 13cores, whilst the
> > other 2
> > > are running at ~8cores and 6 cores respectively.
> > >
> > > We've tracked in our monitoring tools that the 3 nodes *are* getting an
> > > even distribution of requests, and we're using a Kube service which is
> in
> > > itself a fairly well known tool for load balancing pods. We've also
> used
> > > kube services heaps for load balancing of other apps and haven't seen
> > such
> > > a problem, so we doubt it's the load balancer that is the problem.
> > >
> > > All 3 nodes are built from the same kubernetes statefulset deployment
> so
> > > they'd all have the same configuration & setup. Additionally, over the
> > > course of the day, it may suddenly change so that an entirely different
> > > node is the one that is majorly overloaded on CPU.
> > >
> > > All this is happening only under queries, and we are doing no indexing
> at
> > > that time.
> > >
> > > We'd initially thought it might be the overseer that is being majorly
> > > overloaded when under queries (although we were surprised) until we did
> > > more testing and found that even the nodes that weren't overseer would
> > > sometimes have that disparity. We'd also tried using the `ADDROLE` API
> to
> > > force an overseer change in the middle of a test, and whilst the tree
> > > updated to show that the overseer had changed, it made no difference to
> > the
> > > highest CPU load.
> > >
> > > Directing queries directly to the non-busy nodes do actually give us
> back
> > > decent response times.
> > >
> > > We're quite puzzled by this and would really like some help figuring
> out
> > > *why* the CPU on one is so much higher. I did try to get the jaeger
> > tracing
> > > working (we already have jaeger in our cluster), but we just kept
> getting
> > > errors on startup with solr not being able to load the main function...
> > >
> > >
> > > Thank you in advance!
> > > Cheers
> > > Jonathan
> > >
> > >
> > >
> > >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>


-- 
Regards,
Shalin Shekhar Mangar.


Re: SOLR uses too much CPU and GC is also weird on Windows server

2020-10-27 Thread Walter Underwood
That first graph shows a JVM that does not have enough heap for the 
program it is running. Look at the bottom of the dips. That is the amount
of memory still in use after a full GC.

You want those dips to drop to about half of the available heap, so I’d 
immediately increase that heap to 4G. That might not be enough, so 
you’ll need to to watch that graph after the increase.

I’ve been using 8G heaps with Solr since version 1.2. We run this config
with Java 8 on over 100 machines. We do not do any faceting, which
can take more memory.

SOLR_HEAP=8g
# Use G1 GC  -- wunder 2017-01-23
# Settings from https://wiki.apache.org/solr/ShawnHeisey
GC_TUNE=" \
-XX:+UseG1GC \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=8m \
-XX:MaxGCPauseMillis=200 \
-XX:+UseLargePages \
-XX:+AggressiveOpts \
"
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 27, 2020, at 12:48 AM, Jaan Arjasepp  wrote:
> 
> Hello,
> 
> We have been using SOLR for quite some time. We used 6.0 and now we did a 
> little upgrade to our system and servers and we started to use 8.6.1.
> We use it on a Windows Server 2019.
> Java version is 11
> Basically using it in a default setting, except giving SOLR 2G of heap. It 
> used 512, but it ran out of memory and stopped responding. Not sure if it was 
> the issue. When older version, it managed fine with 512MB.
> SOLR is not in a cloud mode, but in solo mode as we use it internally and it 
> does not have too many request nor indexing actually.
> Document sizes are not big, I guess. We only use one core.
> Document stats are here:
> Num Docs: 3627341
> Max Doc: 4981019
> Heap Memory Usage: 434400
> Deleted Docs: 1353678
> Version: 15999036
> Segment Count: 30
> 
> The size of index is 2.66GB
> 
> While making upgrade we had to modify one field and a bit of code that uses 
> it. Thats basically it. It works.
> If needed more information about background of the system, I am happy to help.
> 
> 
> But now to the issue I am having.
> If SOLR is started, at first 40-60 minutes it works just fine. CPU is not 
> high, heap usage seem normal. All is good, but then suddenly, the heap usage 
> goes crazy, going up and down, up and down and CPU rises to 50-60% of the 
> usage. Also I noticed over the weekend, when there are no writing usage, the 
> CPU remains low and decent. I can try it this weekend again to see if and how 
> this works out.
> Also it seems to me, that after 4-5 days of working like this, it stops 
> responding, but needs to be confirmed with more heap also.
> 
> Heap memory usage via JMX and jconsole -> 
> https://drive.google.com/file/d/1Zo3B_xFsrrt-WRaxW-0A0QMXDNscXYih/view?usp=sharing
> As you can see, it starts of normal, but then goes crazy and it has been like 
> this over night.
> 
> This is overall monitoring graphs, as you can see CPU is working hard or 
> hardly working. -> 
> https://drive.google.com/file/d/1_Gtz-Bi7LUrj8UZvKfmNMr-8gF_lM2Ra/view?usp=sharing
> VM summary can be found here -> 
> https://drive.google.com/file/d/1FvdCz0N5pFG1fmX_5OQ2855MVkaL048w/view?usp=sharing
> And finally to have better and quick overview of the SOLR executing 
> parameters that I have -> 
> https://drive.google.com/file/d/10VCtYDxflJcvb1aOoxt0u3Nb5JzTjrAI/view?usp=sharing
> 
> If you can point me what I have to do to make it work, then I appreciate it a 
> lot.
> 
> Thank you in advance.
> 
> Best regards,
> Jaan
> 
> 



Re: Avoiding single digit and single charcater ONLY query by putting them in stopwords list

2020-10-27 Thread Mark Robinson
Thanks!

Mark

On Tue, Oct 27, 2020 at 11:56 AM Dave  wrote:

> Agreed. Just a JavaScript check on the input box would work fine for 99%
> of cases, unless something automatic is running them in which case just
> server side redirect back to the form.
>
> > On Oct 27, 2020, at 11:54 AM, Mark Robinson 
> wrote:
> >
> > Hi  Konstantinos ,
> >
> > Thanks for the reply.
> > I too feel the same. Wanted to find what others also in the Solr world
> > thought about it.
> >
> > Thanks!
> > Mark.
> >
> >> On Tue, Oct 27, 2020 at 11:45 AM Konstantinos Koukouvis <
> >> konstantinos.koukou...@mecenat.com> wrote:
> >>
> >> Oh hi Mark!
> >>
> >> Why would you wanna do such a thing in the solr end. Imho it would be
> much
> >> more clean and easy to do it on the client side
> >>
> >> Regards,
> >> Konstantinos
> >>
> >>
>  On 27 Oct 2020, at 16:42, Mark Robinson 
> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I want to block queries having only a digit like "1" or "2" ,... or
> >>> just a letter like "a" or "b" ...
> >>>
> >>> Is it a good idea to block them ... ie just single digits 0 - 9 and  a
> -
> >> z
> >>> by putting them as a stop word? The problem with this I can anticipate
> >> is a
> >>> query like "1 inch screw" can have the important information "1"
> stripped
> >>> out if I tokenize it.
> >>>
> >>> So what would be a good way to avoid  single digit only and single
> letter
> >>> only queries, from the Solr end?
> >>> Or should I not do this at the Solr end at all?
> >>>
> >>> Could someone please share your thoughts?
> >>>
> >>> Thanks!
> >>> Mark
> >>
> >> ==
> >> Konstantinos Koukouvis
> >> konstantinos.koukou...@mecenat.com
> >>
> >> Using Golang and Solr? Try this: https://github.com/mecenat/solr
> >>
> >>
> >>
> >>
> >>
> >>
>


Solr dependency update at Apache Beam - which versions should be supported

2020-10-27 Thread Piotr Szuberski
Hi,

We are working on dependency updates at Apache Beam and I would like to
consult which versions should be supported so we don't break any existing
users.

Previously the supported Solr version was 5.5.4.

Versions 8.x.y and 7.x.y naturally come to mind as they are the only not
deprecated. But maybe there are users that use some earlier versions?

Are these versions backwards-compatible or there are things to be aware of?

Regards


Re: Avoiding single digit and single charcater ONLY query by putting them in stopwords list

2020-10-27 Thread Dave
Agreed. Just a JavaScript check on the input box would work fine for 99% of 
cases, unless something automatic is running them in which case just server 
side redirect back to the form. 

> On Oct 27, 2020, at 11:54 AM, Mark Robinson  wrote:
> 
> Hi  Konstantinos ,
> 
> Thanks for the reply.
> I too feel the same. Wanted to find what others also in the Solr world
> thought about it.
> 
> Thanks!
> Mark.
> 
>> On Tue, Oct 27, 2020 at 11:45 AM Konstantinos Koukouvis <
>> konstantinos.koukou...@mecenat.com> wrote:
>> 
>> Oh hi Mark!
>> 
>> Why would you wanna do such a thing in the solr end. Imho it would be much
>> more clean and easy to do it on the client side
>> 
>> Regards,
>> Konstantinos
>> 
>> 
 On 27 Oct 2020, at 16:42, Mark Robinson  wrote:
>>> 
>>> Hello,
>>> 
>>> I want to block queries having only a digit like "1" or "2" ,... or
>>> just a letter like "a" or "b" ...
>>> 
>>> Is it a good idea to block them ... ie just single digits 0 - 9 and  a -
>> z
>>> by putting them as a stop word? The problem with this I can anticipate
>> is a
>>> query like "1 inch screw" can have the important information "1" stripped
>>> out if I tokenize it.
>>> 
>>> So what would be a good way to avoid  single digit only and single letter
>>> only queries, from the Solr end?
>>> Or should I not do this at the Solr end at all?
>>> 
>>> Could someone please share your thoughts?
>>> 
>>> Thanks!
>>> Mark
>> 
>> ==
>> Konstantinos Koukouvis
>> konstantinos.koukou...@mecenat.com
>> 
>> Using Golang and Solr? Try this: https://github.com/mecenat/solr
>> 
>> 
>> 
>> 
>> 
>> 


Re: Avoiding single digit and single charcater ONLY query by putting them in stopwords list

2020-10-27 Thread Mark Robinson
Hi  Konstantinos ,

Thanks for the reply.
I too feel the same. Wanted to find what others also in the Solr world
thought about it.

Thanks!
Mark.

On Tue, Oct 27, 2020 at 11:45 AM Konstantinos Koukouvis <
konstantinos.koukou...@mecenat.com> wrote:

> Oh hi Mark!
>
> Why would you wanna do such a thing in the solr end. Imho it would be much
> more clean and easy to do it on the client side
>
> Regards,
> Konstantinos
>
>
> > On 27 Oct 2020, at 16:42, Mark Robinson  wrote:
> >
> > Hello,
> >
> > I want to block queries having only a digit like "1" or "2" ,... or
> > just a letter like "a" or "b" ...
> >
> > Is it a good idea to block them ... ie just single digits 0 - 9 and  a -
> z
> > by putting them as a stop word? The problem with this I can anticipate
> is a
> > query like "1 inch screw" can have the important information "1" stripped
> > out if I tokenize it.
> >
> > So what would be a good way to avoid  single digit only and single letter
> > only queries, from the Solr end?
> > Or should I not do this at the Solr end at all?
> >
> > Could someone please share your thoughts?
> >
> > Thanks!
> > Mark
>
> ==
> Konstantinos Koukouvis
> konstantinos.koukou...@mecenat.com
>
> Using Golang and Solr? Try this: https://github.com/mecenat/solr
>
>
>
>
>
>


Re: Avoiding single digit and single charcater ONLY query by putting them in stopwords list

2020-10-27 Thread Konstantinos Koukouvis
Oh hi Mark!

Why would you wanna do such a thing in the solr end. Imho it would be much more 
clean and easy to do it on the client side

Regards,
Konstantinos


> On 27 Oct 2020, at 16:42, Mark Robinson  wrote:
> 
> Hello,
> 
> I want to block queries having only a digit like "1" or "2" ,... or
> just a letter like "a" or "b" ...
> 
> Is it a good idea to block them ... ie just single digits 0 - 9 and  a - z
> by putting them as a stop word? The problem with this I can anticipate is a
> query like "1 inch screw" can have the important information "1" stripped
> out if I tokenize it.
> 
> So what would be a good way to avoid  single digit only and single letter
> only queries, from the Solr end?
> Or should I not do this at the Solr end at all?
> 
> Could someone please share your thoughts?
> 
> Thanks!
> Mark

==
Konstantinos Koukouvis
konstantinos.koukou...@mecenat.com

Using Golang and Solr? Try this: https://github.com/mecenat/solr







Avoiding single digit and single charcater ONLY query by putting them in stopwords list

2020-10-27 Thread Mark Robinson
Hello,

I want to block queries having only a digit like "1" or "2" ,... or
just a letter like "a" or "b" ...

Is it a good idea to block them ... ie just single digits 0 - 9 and  a - z
by putting them as a stop word? The problem with this I can anticipate is a
query like "1 inch screw" can have the important information "1" stripped
out if I tokenize it.

So what would be a good way to avoid  single digit only and single letter
only queries, from the Solr end?
Or should I not do this at the Solr end at all?

Could someone please share your thoughts?

Thanks!
Mark


Re: SOLR uses too much CPU and GC is also weird on Windows server

2020-10-27 Thread Emir Arnautović
Hi Jaan,
You can also check in admin console in caches the sizes of field* caches. That 
will tell you if some field needs docValues=true.

Regards,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 27 Oct 2020, at 14:36, Jaan Arjasepp  wrote:
> 
> Hi Erick,
> 
> Thanks for this information, I will look into it.
> Main changes were regarding parsing the results JSON got from solr, not the 
> queries or updates.
> 
> Jaan
> 
> P.S. configuration change about requestParser was not it.
> 
> 
> -Original Message-
> From: Erick Erickson  > 
> Sent: 27 October 2020 15:03
> To: solr-user@lucene.apache.org 
> Subject: Re: SOLR uses too much CPU and GC is also weird on Windows server
> 
> Jean:
> 
> The basic search uses an “inverted index”, which is basically a list of terms 
> and the documents they appear in, e.g.
> my - 1, 4, 9, 12
> dog - 4, 8, 10
> 
> So the word “my” appears in docs 1, 4, 9 and 12, and “dog” appears in 4, 8, 
> 10. Makes it easy to search for my AND dog for instance, obviously both 
> appear in doc 4.
> 
> But that’s a lousy structure for faceting, where you have a list of documents 
> and are trying to find the terms it has to count them up. For that, you want 
> to “uninvert” the above structure,
> 1 - my
> 4 - my dog
> 8 - dog
> 9 - my
> 10 - dog
> 12 - my
> 
> From there, it’s easy to say “count the distinct terms for docs 1 and 4 and 
> put them in a bucket”, giving facet counts like 
> 
> my (2)
> dog (1)
> 
> If docValues=true, then the second structure is built at index time and 
> occupies memory at run time out in MMapDirectory space, i.e. _not_ on the 
> heap. 
> 
> If docValues=false, the second structure is built _on_ the heap when it’s 
> needed, adding to GC, memory pressure, CPU utilization etc.
> 
> So one theory is that when you upgraded your system (and you did completely 
> rebuild your corpus, right?) you inadvertently changed the docValues property 
> for one or more fields that you facet, group, sort, or use function queries 
> on and Solr is doing all the extra work of uninverting the field that it 
> didn’t have to before.
> 
> To answer that, you need to go through your schema and insure that 
> docValues=true is set for any field you facet, group, sort, or use function 
> queries on. If you do change this value, you need to blow away your index so 
> there are no segments and index all your documents again.
> 
> But that theory has problems:
> 1> why should Solr run for a while and then go crazy? It’d have to be 
> 1> that the query that
>triggers uninversion is uncommon.
> 2> docValues defaults to true for simple types in recent schemas. 
> 2> Perhaps you pulled
>  over an old definition from your former schema?
> 
> 
> One other thing: you mention a bit of custom code you needed to change. I 
> always try to investigate that first. Is it possible to
> 1> reproduce the problem no a non-prod system
> 2> see what happens if you take the custom code out?
> 
> Best,
> Erick
> 
> 
>> On Oct 27, 2020, at 4:42 AM, Emir Arnautović  
>> wrote:
>> 
>> Hi Jaan,
>> It can be several things:
>> caches
>> fieldCache/fieldValueCache - it can be that you you are missing doc values 
>> on some fields that are used for faceting/sorting/functions and that 
>> uninverted field structures are eating your memory. 
>> filterCache - you’ve changed setting for filter caches and set it to 
>> some large value heavy queries return a lot of documents facet on high 
>> cardinality fields deep pagination
>> 
>> HTH,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection Solr & 
>> Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 27 Oct 2020, at 08:48, Jaan Arjasepp  wrote:
>>> 
>>> Hello,
>>> 
>>> We have been using SOLR for quite some time. We used 6.0 and now we did a 
>>> little upgrade to our system and servers and we started to use 8.6.1.
>>> We use it on a Windows Server 2019.
>>> Java version is 11
>>> Basically using it in a default setting, except giving SOLR 2G of heap. It 
>>> used 512, but it ran out of memory and stopped responding. Not sure if it 
>>> was the issue. When older version, it managed fine with 512MB.
>>> SOLR is not in a cloud mode, but in solo mode as we use it internally and 
>>> it does not have too many request nor indexing actually.
>>> Document sizes are not big, I guess. We only use one core.
>>> Document stats are here:
>>> Num Docs: 3627341
>>> Max Doc: 4981019
>>> Heap Memory Usage: 434400
>>> Deleted Docs: 1353678
>>> Version: 15999036
>>> Segment Count: 30
>>> 
>>> The size of index is 2.66GB
>>> 
>>> While making upgrade we had to modify one field and a bit of code that uses 
>>> it. Thats basically it. It works.
>>> If needed more information about background of the system, I am happy to 
>>> help.
>>> 
>>> 
>>> But now to 

RE: SOLR uses too much CPU and GC is also weird on Windows server

2020-10-27 Thread Jaan Arjasepp
Hi Erick,

Thanks for this information, I will look into it.
Main changes were regarding parsing the results JSON got from solr, not the 
queries or updates.

Jaan

P.S. configuration change about requestParser was not it.


-Original Message-
From: Erick Erickson  
Sent: 27 October 2020 15:03
To: solr-user@lucene.apache.org
Subject: Re: SOLR uses too much CPU and GC is also weird on Windows server

Jean:

The basic search uses an “inverted index”, which is basically a list of terms 
and the documents they appear in, e.g.
my - 1, 4, 9, 12
dog - 4, 8, 10

So the word “my” appears in docs 1, 4, 9 and 12, and “dog” appears in 4, 8, 10. 
Makes it easy to search for my AND dog for instance, obviously both appear in 
doc 4.

But that’s a lousy structure for faceting, where you have a list of documents 
and are trying to find the terms it has to count them up. For that, you want to 
“uninvert” the above structure,
1 - my
4 - my dog
8 - dog
9 - my
10 - dog
12 - my

From there, it’s easy to say “count the distinct terms for docs 1 and 4 and put 
them in a bucket”, giving facet counts like 

my (2)
dog (1)

If docValues=true, then the second structure is built at index time and 
occupies memory at run time out in MMapDirectory space, i.e. _not_ on the heap. 

If docValues=false, the second structure is built _on_ the heap when it’s 
needed, adding to GC, memory pressure, CPU utilization etc.

So one theory is that when you upgraded your system (and you did completely 
rebuild your corpus, right?) you inadvertently changed the docValues property 
for one or more fields that you facet, group, sort, or use function queries on 
and Solr is doing all the extra work of uninverting the field that it didn’t 
have to before.

To answer that, you need to go through your schema and insure that 
docValues=true is set for any field you facet, group, sort, or use function 
queries on. If you do change this value, you need to blow away your index so 
there are no segments and index all your documents again.

But that theory has problems:
1> why should Solr run for a while and then go crazy? It’d have to be 
1> that the query that
triggers uninversion is uncommon.
2> docValues defaults to true for simple types in recent schemas. 
2> Perhaps you pulled
  over an old definition from your former schema?


One other thing: you mention a bit of custom code you needed to change. I 
always try to investigate that first. Is it possible to
1> reproduce the problem no a non-prod system
2> see what happens if you take the custom code out?

Best,
Erick


> On Oct 27, 2020, at 4:42 AM, Emir Arnautović  
> wrote:
> 
> Hi Jaan,
> It can be several things:
> caches
> fieldCache/fieldValueCache - it can be that you you are missing doc values on 
> some fields that are used for faceting/sorting/functions and that uninverted 
> field structures are eating your memory. 
> filterCache - you’ve changed setting for filter caches and set it to 
> some large value heavy queries return a lot of documents facet on high 
> cardinality fields deep pagination
> 
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection Solr & 
> Elasticsearch Consulting Support Training - http://sematext.com/
> 
> 
> 
>> On 27 Oct 2020, at 08:48, Jaan Arjasepp  wrote:
>> 
>> Hello,
>> 
>> We have been using SOLR for quite some time. We used 6.0 and now we did a 
>> little upgrade to our system and servers and we started to use 8.6.1.
>> We use it on a Windows Server 2019.
>> Java version is 11
>> Basically using it in a default setting, except giving SOLR 2G of heap. It 
>> used 512, but it ran out of memory and stopped responding. Not sure if it 
>> was the issue. When older version, it managed fine with 512MB.
>> SOLR is not in a cloud mode, but in solo mode as we use it internally and it 
>> does not have too many request nor indexing actually.
>> Document sizes are not big, I guess. We only use one core.
>> Document stats are here:
>> Num Docs: 3627341
>> Max Doc: 4981019
>> Heap Memory Usage: 434400
>> Deleted Docs: 1353678
>> Version: 15999036
>> Segment Count: 30
>> 
>> The size of index is 2.66GB
>> 
>> While making upgrade we had to modify one field and a bit of code that uses 
>> it. Thats basically it. It works.
>> If needed more information about background of the system, I am happy to 
>> help.
>> 
>> 
>> But now to the issue I am having.
>> If SOLR is started, at first 40-60 minutes it works just fine. CPU is not 
>> high, heap usage seem normal. All is good, but then suddenly, the heap usage 
>> goes crazy, going up and down, up and down and CPU rises to 50-60% of the 
>> usage. Also I noticed over the weekend, when there are no writing usage, the 
>> CPU remains low and decent. I can try it this weekend again to see if and 
>> how this works out.
>> Also it seems to me, that after 4-5 days of working like this, it stops 
>> responding, but needs to be confirmed with more heap also.
>> 
>> Heap memory usage via JMX and jconsole 

Re: SOLR uses too much CPU and GC is also weird on Windows server

2020-10-27 Thread Erick Erickson
Jean:

The basic search uses an “inverted index”, which is basically a list of terms 
and the documents they appear in, e.g.
my - 1, 4, 9, 12
dog - 4, 8, 10

So the word “my” appears in docs 1, 4, 9 and 12, and “dog” appears in 4, 8, 10. 
Makes
it easy to search for 
my AND dog
for instance, obviously both appear in doc 4.

But that’s a lousy structure for faceting, where you have a list of documents 
and are trying to
find the terms it has to count them up. For that, you want to “uninvert” the 
above structure,
1 - my
4 - my dog
8 - dog
9 - my
10 - dog
12 - my

From there, it’s easy to say “count the distinct terms for docs 1 and 4 and put 
them in a bucket”,
giving facet counts like 

my (2)
dog (1)

If docValues=true, then the second structure is built at index time and 
occupies memory at
run time out in MMapDirectory space, i.e. _not_ on the heap. 

If docValues=false, the second structure is built _on_ the heap when it’s 
needed, adding to
GC, memory pressure, CPU utilization etc.

So one theory is that when you upgraded your system (and you did completely 
rebuild your
corpus, right?) you inadvertently changed the docValues property for one or 
more fields that you 
facet, group, sort, or use function queries on and Solr is doing all the extra 
work of
uninverting the field that it didn’t have to before.

To answer that, you need to go through your schema and insure that 
docValues=true is
set for any field you facet, group, sort, or use function queries on. If you do 
change
this value, you need to blow away your index so there are no segments and index
all your documents again.

But that theory has problems:
1> why should Solr run for a while and then go crazy? It’d have to be that the 
query that
triggers uninversion is uncommon.
2> docValues defaults to true for simple types in recent schemas. Perhaps you 
pulled
  over an old definition from your former schema?


One other thing: you mention a bit of custom code you needed to change. I 
always try to
investigate that first. Is it possible to
1> reproduce the problem no a non-prod system
2> see what happens if you take the custom code out?

Best,
Erick


> On Oct 27, 2020, at 4:42 AM, Emir Arnautović  
> wrote:
> 
> Hi Jaan,
> It can be several things:
> caches
> fieldCache/fieldValueCache - it can be that you you are missing doc values on 
> some fields that are used for faceting/sorting/functions and that uninverted 
> field structures are eating your memory. 
> filterCache - you’ve changed setting for filter caches and set it to some 
> large value
> heavy queries
> return a lot of documents
> facet on high cardinality fields
> deep pagination
> 
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> 
> 
> 
>> On 27 Oct 2020, at 08:48, Jaan Arjasepp  wrote:
>> 
>> Hello,
>> 
>> We have been using SOLR for quite some time. We used 6.0 and now we did a 
>> little upgrade to our system and servers and we started to use 8.6.1.
>> We use it on a Windows Server 2019.
>> Java version is 11
>> Basically using it in a default setting, except giving SOLR 2G of heap. It 
>> used 512, but it ran out of memory and stopped responding. Not sure if it 
>> was the issue. When older version, it managed fine with 512MB.
>> SOLR is not in a cloud mode, but in solo mode as we use it internally and it 
>> does not have too many request nor indexing actually.
>> Document sizes are not big, I guess. We only use one core.
>> Document stats are here:
>> Num Docs: 3627341
>> Max Doc: 4981019
>> Heap Memory Usage: 434400
>> Deleted Docs: 1353678
>> Version: 15999036
>> Segment Count: 30
>> 
>> The size of index is 2.66GB
>> 
>> While making upgrade we had to modify one field and a bit of code that uses 
>> it. Thats basically it. It works.
>> If needed more information about background of the system, I am happy to 
>> help.
>> 
>> 
>> But now to the issue I am having.
>> If SOLR is started, at first 40-60 minutes it works just fine. CPU is not 
>> high, heap usage seem normal. All is good, but then suddenly, the heap usage 
>> goes crazy, going up and down, up and down and CPU rises to 50-60% of the 
>> usage. Also I noticed over the weekend, when there are no writing usage, the 
>> CPU remains low and decent. I can try it this weekend again to see if and 
>> how this works out.
>> Also it seems to me, that after 4-5 days of working like this, it stops 
>> responding, but needs to be confirmed with more heap also.
>> 
>> Heap memory usage via JMX and jconsole -> 
>> https://drive.google.com/file/d/1Zo3B_xFsrrt-WRaxW-0A0QMXDNscXYih/view?usp=sharing
>> As you can see, it starts of normal, but then goes crazy and it has been 
>> like this over night.
>> 
>> This is overall monitoring graphs, as you can see CPU is working hard or 
>> hardly working. -> 
>> https://drive.google.com/file/d/1_Gtz-Bi7LUrj8UZvKfmNMr-8gF_lM2Ra/view?usp=sharing
>> VM summary can be found here 

RE: SOLR uses too much CPU and GC is also weird on Windows server

2020-10-27 Thread Jaan Arjasepp
I found one little difference from old solrconfig and new one.
It is in requestDispatchers section
It does not have this, but we had this in old configuration. Maybe it helps, I 
will see.



Jaan

-Original Message-
From: Jaan Arjasepp  
Sent: 27 October 2020 14:05
To: solr-user@lucene.apache.org
Subject: RE: SOLR uses too much CPU and GC is also weird on Windows server

Hi Emir,

I checked the solrconfig.xml file and we dont even use fieldValueCache. Also 
are you saying, I should check the schema and all the fields in the old solr 
and the new one to see if they match or contain similar settings? What does 
this uninverted value means? How to check this?
As for filtercache, it is default setting, should I change it?
I mean if you refer to these issues, then I guess it is either changing 
configuration or schema?

I will add my solrconfig.xml without comments so it is less data here, this is 
pretty default, nothing changed here:



  8.6.1
  ${solr.data.dir:}
  
  
  
${solr.lock.type:native}
  
  
  

  ${solr.ulog.dir:}
  ${solr.ulog.numVersionBuckets:65536}


  ${solr.autoCommit.maxTime:15000}
  false


  ${solr.autoSoftCommit.maxTime:-1}

  
  
${solr.max.booleanClauses:1024}




true
20
200

  
  


  
  

false
  
  

  
  

  explicit
  10

  
  

  explicit
  json
  true

  
  

  _text_

  
  
text_general

  default
  _text_
  solr.DirectSolrSpellChecker
  internal
  0.5
  2
  1
  5
  4
  0.01

  
  

  default
  on
  true
  10
  5
  5
  true
  true
  10
  5


  spellcheck

  
  
  

  true
  false


  terms

  
  

  

  100

  
  

  70
  0.5
  [-\w ,/\n\]{20,200}

  
  

  
  

  
  
  
  
  
  
  
  

  
  

  
  

  10
  .,!? 

  
  

  WORD
  en
  US

  

  
  
  
  
[^\w-\.]
_
  
  
  
  
  

  -MM-dd['T'[HH:mm[:ss[.SSS]][z
  -MM-dd['T'[HH:mm[:ss[,SSS]][z
  -MM-dd HH:mm[:ss[.SSS]][z
  -MM-dd HH:mm[:ss[,SSS]][z
  [EEE, ]dd MMM  HH:mm[:ss] z
  , dd-MMM-yy HH:mm:ss z
  EEE MMM ppd HH:mm:ss [z ]

  
  

  java.lang.String
  text_general
  
*_str
256
  
  true


  java.lang.Boolean
  booleans


  java.util.Date
  pdates


  java.lang.Long
  java.lang.Integer
  plongs


  java.lang.Number
  pdoubles

  
  



  
  
text/plain; charset=UTF-8
  



-Original Message-
From: Emir Arnautović 
Sent: 27 October 2020 10:42
To: solr-user@lucene.apache.org
Subject: Re: SOLR uses too much CPU and GC is also weird on Windows server

Hi Jaan,
It can be several things:
caches
fieldCache/fieldValueCache - it can be that you you are missing doc values on 
some fields that are used for faceting/sorting/functions and that uninverted 
field structures are eating your memory. 
filterCache - you’ve changed setting for filter caches and set it to some large 
value heavy queries return a lot of documents facet on high cardinality fields 
deep pagination

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch 
Consulting Support Training - http://sematext.com/



> On 27 Oct 2020, at 08:48, Jaan Arjasepp  wrote:
> 
> Hello,
> 
> We have been using SOLR for quite some time. We used 6.0 and now we did a 
> little upgrade to our system and servers and we started to use 8.6.1.
> We use it on a Windows Server 2019.
> Java version is 11
> Basically using it in a default setting, except giving SOLR 2G of heap. It 
> used 512, but it ran out of memory and stopped responding. Not sure if it was 
> the issue. When older version, it managed fine with 512MB.
> SOLR is not in a cloud mode, but in solo mode as we use it internally and it 
> does not have too many request nor indexing actually.
> Document sizes are not big, I guess. We only use one core.
> Document stats are here:
> Num Docs: 3627341
> Max Doc: 4981019
> Heap Memory Usage: 434400
> Deleted Docs: 1353678
> Version: 15999036
> Segment Count: 30
> 
> The size of index is 2.66GB
> 
> While making upgrade we had to modify one field and a bit of code that uses 
> it. Thats basically it. It works.
> If needed more information about background of the system, I am happy to help.
> 
> 
> But now to the issue I am having.
> If SOLR is started, at first 40-60 minutes it works just fine. CPU is not 
> high, heap usage seem normal. All is good, but 

RE: SOLR uses too much CPU and GC is also weird on Windows server

2020-10-27 Thread Jaan Arjasepp
Hi Emir,

I checked the solrconfig.xml file and we dont even use fieldValueCache. Also 
are you saying, I should check the schema and all the fields in the old solr 
and the new one to see if they match or contain similar settings? What does 
this uninverted value means? How to check this?
As for filtercache, it is default setting, should I change it?
I mean if you refer to these issues, then I guess it is either changing 
configuration or schema?

I will add my solrconfig.xml without comments so it is less data here, this is 
pretty default, nothing changed here:



  8.6.1
  ${solr.data.dir:}
  
  
  
${solr.lock.type:native}
  
  
  

  ${solr.ulog.dir:}
  ${solr.ulog.numVersionBuckets:65536}


  ${solr.autoCommit.maxTime:15000}
  false


  ${solr.autoSoftCommit.maxTime:-1}

  
  
${solr.max.booleanClauses:1024}




true
20
200

  
  


  
  

false
  
  

  
  

  explicit
  10

  
  

  explicit
  json
  true

  
  

  _text_

  
  
text_general

  default
  _text_
  solr.DirectSolrSpellChecker
  internal
  0.5
  2
  1
  5
  4
  0.01

  
  

  default
  on
  true
  10
  5
  5
  true
  true
  10
  5


  spellcheck

  
  
  

  true
  false


  terms

  
  

  

  100

  
  

  70
  0.5
  [-\w ,/\n\]{20,200}

  
  

  
  

  
  
  
  
  
  
  
  

  
  

  
  

  10
  .,!? 

  
  

  WORD
  en
  US

  

  
  
  
  
[^\w-\.]
_
  
  
  
  
  

  -MM-dd['T'[HH:mm[:ss[.SSS]][z
  -MM-dd['T'[HH:mm[:ss[,SSS]][z
  -MM-dd HH:mm[:ss[.SSS]][z
  -MM-dd HH:mm[:ss[,SSS]][z
  [EEE, ]dd MMM  HH:mm[:ss] z
  , dd-MMM-yy HH:mm:ss z
  EEE MMM ppd HH:mm:ss [z ]

  
  

  java.lang.String
  text_general
  
*_str
256
  
  true


  java.lang.Boolean
  booleans


  java.util.Date
  pdates


  java.lang.Long
  java.lang.Integer
  plongs


  java.lang.Number
  pdoubles

  
  



  
  
text/plain; charset=UTF-8
  



-Original Message-
From: Emir Arnautović  
Sent: 27 October 2020 10:42
To: solr-user@lucene.apache.org
Subject: Re: SOLR uses too much CPU and GC is also weird on Windows server

Hi Jaan,
It can be several things:
caches
fieldCache/fieldValueCache - it can be that you you are missing doc values on 
some fields that are used for faceting/sorting/functions and that uninverted 
field structures are eating your memory. 
filterCache - you’ve changed setting for filter caches and set it to some large 
value heavy queries return a lot of documents facet on high cardinality fields 
deep pagination

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch 
Consulting Support Training - http://sematext.com/



> On 27 Oct 2020, at 08:48, Jaan Arjasepp  wrote:
> 
> Hello,
> 
> We have been using SOLR for quite some time. We used 6.0 and now we did a 
> little upgrade to our system and servers and we started to use 8.6.1.
> We use it on a Windows Server 2019.
> Java version is 11
> Basically using it in a default setting, except giving SOLR 2G of heap. It 
> used 512, but it ran out of memory and stopped responding. Not sure if it was 
> the issue. When older version, it managed fine with 512MB.
> SOLR is not in a cloud mode, but in solo mode as we use it internally and it 
> does not have too many request nor indexing actually.
> Document sizes are not big, I guess. We only use one core.
> Document stats are here:
> Num Docs: 3627341
> Max Doc: 4981019
> Heap Memory Usage: 434400
> Deleted Docs: 1353678
> Version: 15999036
> Segment Count: 30
> 
> The size of index is 2.66GB
> 
> While making upgrade we had to modify one field and a bit of code that uses 
> it. Thats basically it. It works.
> If needed more information about background of the system, I am happy to help.
> 
> 
> But now to the issue I am having.
> If SOLR is started, at first 40-60 minutes it works just fine. CPU is not 
> high, heap usage seem normal. All is good, but then suddenly, the heap usage 
> goes crazy, going up and down, up and down and CPU rises to 50-60% of the 
> usage. Also I noticed over the weekend, when there are no writing usage, the 
> CPU remains low and decent. I can try it this weekend again to see if and how 
> this works out.
> Also it seems to me, that after 4-5 days of working like this, it stops 
> responding, but 

Solr LockObtainFailedException and NPEs for CoreAdmin STATUS

2020-10-27 Thread Andreas Hubold

Hi,

we're running tests on a stand-alone Solr instance, which create Solr 
cores from multiple applications using CoreAdmin (via SolrJ).


Lately, we upgraded from 8.4.1 to 8.6.3, and sometimes we now see a 
LockObtainFailedException for a lock held by the same JVM, after which 
Solr is broken and runs into NullPointerExceptions for simple CoreAdmin 
STATUS requests. We have to restart Solr then. I've never seen this with 
8.4.1 or previous releases.


This bug is quite severe for us because it breaks our system tests with 
Solr, and we fear that it may also happen in production. Is this a known 
bug?


Our applications use a CoreAdmin STATUS request to check whether a core 
exists, followed by a CREATE request, if the core does not exist. With 
multiple applications, and bad timing, two concurrent CREATE requests 
for the same core are of course still possible. Solr 8.4.1 rejected 
duplicate requests and logged ERRORs but kept working correctly [1]. I 
can still see the same log messages in 8.6.3 ("Core with name ... 
already exists" or "Error CREATEing SolrCore ... Could not create a new 
core in ... as another core is already defined there") - but sometimes 
also the following error, after which Solr is broken:


2020-10-27 00:29:25.350 ERROR (qtp2029754983-24) [   ] 
o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Error 
CREATEing SolrCore 'blueprint_acgqqafsogyc_comments': Unable to create core 
[blueprint_acgqqafsogyc_comments] Caused by: Lock held by this virtual machine: 
/var/solr/data/blueprint_acgqqafsogyc_comments/data/index/write.lock

    at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1312)
    at 
org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$0(CoreAdminOperation.java:95)
    at 
org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:367)
...
Caused by: org.apache.solr.common.SolrException: Unable to create core 
[blueprint_acgqqafsogyc_comments]
    at 
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1408)
    at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1273)
    ... 47 more
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
    at org.apache.solr.core.SolrCore.(SolrCore.java:1071)
    at org.apache.solr.core.SolrCore.(SolrCore.java:906)
    at 
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1387)
    ... 48 more
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
    at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2184)
    at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2308)
    at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1130)
    at org.apache.solr.core.SolrCore.(SolrCore.java:1012)
    ... 50 more
Caused by: org.apache.lucene.store.LockObtainFailedException: Lock held by this 
virtual machine: 
/var/solr/data/blueprint_acgqqafsogyc_comments/data/index/write.lock
    at 
org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:139)
    at 
org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:41)
    at 
org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:45)
    at 
org.apache.lucene.store.FilterDirectory.obtainLock(FilterDirectory.java:105)
    at org.apache.lucene.index.IndexWriter.(IndexWriter.java:785)
    at 
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:126)
    at 
org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:100)
    at 
org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:261)
    at 
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:135)
    at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2145)

2020-10-27 00:29:25.353 INFO  (qtp2029754983-19) [   ] o.a.s.s.HttpSolrCall [admin] webapp=null 
path=/admin/cores 
params={core=blueprint_acgqqafsogyc_comments=STATUS=false=javabin=2}
 status=500 QTime=0
2020-10-27 00:29:25.353 ERROR (qtp2029754983-19) [   ] o.a.s.s.HttpSolrCall 
null:org.apache.solr.common.SolrException: Error handling 'STATUS' action
    at 
org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:372)
    at 
org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:397)
    at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:181)
...
Caused by: java.lang.NullPointerException
    at org.apache.solr.core.SolrCore.getInstancePath(SolrCore.java:333)
    at 
org.apache.solr.handler.admin.CoreAdminOperation.getCoreStatus(CoreAdminOperation.java:329)
    at org.apache.solr.handler.admin.StatusOp.execute(StatusOp.java:54)
    at 
org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:367)


Any ideas? Were there 

Re: SOLR uses too much CPU and GC is also weird on Windows server

2020-10-27 Thread Emir Arnautović
Hi Jaan,
It can be several things:
caches
fieldCache/fieldValueCache - it can be that you you are missing doc values on 
some fields that are used for faceting/sorting/functions and that uninverted 
field structures are eating your memory. 
filterCache - you’ve changed setting for filter caches and set it to some large 
value
heavy queries
return a lot of documents
facet on high cardinality fields
deep pagination

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 27 Oct 2020, at 08:48, Jaan Arjasepp  wrote:
> 
> Hello,
> 
> We have been using SOLR for quite some time. We used 6.0 and now we did a 
> little upgrade to our system and servers and we started to use 8.6.1.
> We use it on a Windows Server 2019.
> Java version is 11
> Basically using it in a default setting, except giving SOLR 2G of heap. It 
> used 512, but it ran out of memory and stopped responding. Not sure if it was 
> the issue. When older version, it managed fine with 512MB.
> SOLR is not in a cloud mode, but in solo mode as we use it internally and it 
> does not have too many request nor indexing actually.
> Document sizes are not big, I guess. We only use one core.
> Document stats are here:
> Num Docs: 3627341
> Max Doc: 4981019
> Heap Memory Usage: 434400
> Deleted Docs: 1353678
> Version: 15999036
> Segment Count: 30
> 
> The size of index is 2.66GB
> 
> While making upgrade we had to modify one field and a bit of code that uses 
> it. Thats basically it. It works.
> If needed more information about background of the system, I am happy to help.
> 
> 
> But now to the issue I am having.
> If SOLR is started, at first 40-60 minutes it works just fine. CPU is not 
> high, heap usage seem normal. All is good, but then suddenly, the heap usage 
> goes crazy, going up and down, up and down and CPU rises to 50-60% of the 
> usage. Also I noticed over the weekend, when there are no writing usage, the 
> CPU remains low and decent. I can try it this weekend again to see if and how 
> this works out.
> Also it seems to me, that after 4-5 days of working like this, it stops 
> responding, but needs to be confirmed with more heap also.
> 
> Heap memory usage via JMX and jconsole -> 
> https://drive.google.com/file/d/1Zo3B_xFsrrt-WRaxW-0A0QMXDNscXYih/view?usp=sharing
> As you can see, it starts of normal, but then goes crazy and it has been like 
> this over night.
> 
> This is overall monitoring graphs, as you can see CPU is working hard or 
> hardly working. -> 
> https://drive.google.com/file/d/1_Gtz-Bi7LUrj8UZvKfmNMr-8gF_lM2Ra/view?usp=sharing
> VM summary can be found here -> 
> https://drive.google.com/file/d/1FvdCz0N5pFG1fmX_5OQ2855MVkaL048w/view?usp=sharing
> And finally to have better and quick overview of the SOLR executing 
> parameters that I have -> 
> https://drive.google.com/file/d/10VCtYDxflJcvb1aOoxt0u3Nb5JzTjrAI/view?usp=sharing
> 
> If you can point me what I have to do to make it work, then I appreciate it a 
> lot.
> 
> Thank you in advance.
> 
> Best regards,
> Jaan
> 
> 



Re: Question on solr metrics

2020-10-27 Thread Emir Arnautović
Hi,
In order to see time range metrics, you’ll need to collect metrics periodically 
and send it to some storage and then query/visualise. Solr has exporters for 
some popular backends, or you can use some cloud based solution. One such 
solution is our: https://sematext.com/integrations/solr-monitoring/ and we’ve 
also just added Solr logs integration so you can collect/visualise/alert on 
both metrics and logs.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 26 Oct 2020, at 22:08, yaswanth kumar  wrote:
> 
> Can we get the metrics for a particular time range? I know metrics history
> was not enabled, so that I will be having only from when the solr node is
> up and running last time, but even from it can we do a data range like for
> example on to see CPU usage on a particular time range?
> 
> Note: Solr version: 8.2
> 
> -- 
> Thanks & Regards,
> Yaswanth Kumar Konathala.
> yaswanth...@gmail.com



SolrJ NestableJsonFacet ordering of query facet

2020-10-27 Thread Shivam Jha
Hi folks,

Doing some faceted queries using 'facet.json' param and SolrJ, the results
of which I am processing using SolrJ NestableJsonFacet class.
basically as   *queryResponse.getJsonFacetingResponse() -> returns
*NestableJsonFacet
object.

But I have noticed it does not maintain the facet-query order in which it
was given in *facet.json.*
*Direct queries to solr do maintain that order, but not after it comes to
Java layer in SolrJ.*

Is there a way to make it maintain that order ?
Hopefully the question makes sense, if not please let me know I can clarify
further.

Thanks,
Shivam


SOLR uses too much CPU and GC is also weird on Windows server

2020-10-27 Thread Jaan Arjasepp
Hello,

We have been using SOLR for quite some time. We used 6.0 and now we did a 
little upgrade to our system and servers and we started to use 8.6.1.
We use it on a Windows Server 2019.
Java version is 11
Basically using it in a default setting, except giving SOLR 2G of heap. It used 
512, but it ran out of memory and stopped responding. Not sure if it was the 
issue. When older version, it managed fine with 512MB.
SOLR is not in a cloud mode, but in solo mode as we use it internally and it 
does not have too many request nor indexing actually.
Document sizes are not big, I guess. We only use one core.
Document stats are here:
Num Docs: 3627341
Max Doc: 4981019
Heap Memory Usage: 434400
Deleted Docs: 1353678
Version: 15999036
Segment Count: 30

The size of index is 2.66GB

While making upgrade we had to modify one field and a bit of code that uses it. 
Thats basically it. It works.
If needed more information about background of the system, I am happy to help.


But now to the issue I am having.
If SOLR is started, at first 40-60 minutes it works just fine. CPU is not high, 
heap usage seem normal. All is good, but then suddenly, the heap usage goes 
crazy, going up and down, up and down and CPU rises to 50-60% of the usage. 
Also I noticed over the weekend, when there are no writing usage, the CPU 
remains low and decent. I can try it this weekend again to see if and how this 
works out.
Also it seems to me, that after 4-5 days of working like this, it stops 
responding, but needs to be confirmed with more heap also.

Heap memory usage via JMX and jconsole -> 
https://drive.google.com/file/d/1Zo3B_xFsrrt-WRaxW-0A0QMXDNscXYih/view?usp=sharing
As you can see, it starts of normal, but then goes crazy and it has been like 
this over night.

This is overall monitoring graphs, as you can see CPU is working hard or hardly 
working. -> 
https://drive.google.com/file/d/1_Gtz-Bi7LUrj8UZvKfmNMr-8gF_lM2Ra/view?usp=sharing
VM summary can be found here -> 
https://drive.google.com/file/d/1FvdCz0N5pFG1fmX_5OQ2855MVkaL048w/view?usp=sharing
And finally to have better and quick overview of the SOLR executing parameters 
that I have -> 
https://drive.google.com/file/d/10VCtYDxflJcvb1aOoxt0u3Nb5JzTjrAI/view?usp=sharing

If you can point me what I have to do to make it work, then I appreciate it a 
lot.

Thank you in advance.

Best regards,
Jaan