Re: Solr 6. 3 Can not talk to ZK Updates are disabled

2018-04-03 Thread Erick Erickson
With beefy machines, one strategy is to create multiple JVMs. For
example, if you have one JVM and it hosts 32 replicas, splitting that
up to 4 JVMs hosting 8 replicas each. That can allow you to drop down
the heap allocated to each.

Managing memory is always "exciting" at scale. If you're sorting,
faceting, or grouping on a field that does _not_ have docValues
enabled, that can be a major memory hog. If you enable docValues you
need to re-index completely BTW...

>From there, it's a matter of trying to figure out where the memory is
being used and see what can be done about that.

Best,
Erick

On Mon, Apr 2, 2018 at 2:57 PM, Shawn Heisey  wrote:
> On 4/2/2018 2:43 PM, murugesh karmegam wrote:
>> So given all of that wondering is there any options
>> like G1 GC tuning ?
>
> Targeted reply.
>
> I've put some G1 information out there for Solr.
>
> https://wiki.apache.org/solr/ShawnHeisey
>
> Thanks,
> Shawn
>


Re: Solr 6. 3 Can not talk to ZK Updates are disabled

2018-04-02 Thread Shawn Heisey
On 4/2/2018 2:43 PM, murugesh karmegam wrote:
> So given all of that wondering is there any options
> like G1 GC tuning ? 

Targeted reply.

I've put some G1 information out there for Solr.

https://wiki.apache.org/solr/ShawnHeisey

Thanks,
Shawn



Re: Solr 6. 3 Can not talk to ZK Updates are disabled

2018-04-02 Thread murugesh karmegam
Thanks Erik for the reply. We even had 92G heap size for some time at one
time. We were able to run and survive with 64G for the last several months
although with some issues mainly this issue "Can not talk to ZK Updates are
disabled". We have dedicated zk quorum. When we have reduced to 32G we ran
into some other issues. So given all of that wondering is there any options
like G1 GC tuning ? 

We are running in 256 GB boxes. The os cache is quiet huge too. 

/usr/java/latest/bin/java -server -Xms32g -Xmx64g
-DsharedLib=/opt/mw/solrlib -XX:+UseG1GC -XX:MaxGCPauseMillis=5000
-XX:ParallelGCThreads=30 -XX:ConcGCThreads=10 -Djute.maxbuffer=41943040
-XX:G1HeapWastePercent=20 -XX:InitiatingHeapOccupancyPercent=45
-XX:+UnlockExperimentalVMOptions -XX:NewSize=10g -XX:MaxNewSize=32g
-XX:SurvivorRatio=2 -XX:-ResizePLAB   



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr 6. 3 Can not talk to ZK Updates are disabled

2018-04-02 Thread Erick Erickson
Actually, 64G is on the high side, GC pauses can kill you pretty
easily in that range.

If it's at all possible to cut that down it would be A Good Thing

Best,
Erick

On Mon, Apr 2, 2018 at 12:56 PM, murugesh karmegam  wrote:
> Hi Yago Riveiro ,
>
> Thanks for the reply. We have heap size 64G. Any more is not recommended
> right? Except one time I was not able to co relate "updates disabled" with
> GC pause.  Also zk timeout is 120 seconds even with long GC pause (more than
> 10 seconds normally) we should recover right?
>
> JVM settings
>
>  /usr/java/latest/bin/java -server -Xms32g -Xmx64g
> -DsharedLib=/opt/mw/solrlib -XX:+UseG1GC -XX:MaxGCPauseMillis=5000
> -XX:ParallelGCThreads=30 -XX:ConcGCThreads=10 -Djute.maxbuffer=41943040
> -XX:G1HeapWastePercent=20 -XX:InitiatingHeapOccupancyPercent=45
> -XX:+UnlockExperimentalVMOptions -XX:NewSize=10g -XX:MaxNewSize=32g
> -XX:SurvivorRatio=2 -XX:-ResizePLAB -XX:+AlwaysPreTouch
> -XX:+ParallelRefProcEnabled -server
> -Xloggc:/var/log/solr/gc-solr2018-03-27-19-16.log -verbose:gc
> -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps
> -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution
> -XX:+PrintGCApplicationStoppedTime -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64m
> -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1
> -Xloggc:/var/log/solr/solr_gc.log -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M
> -Dcom.sun.management.jmxremote
> -Dcom.sun.management.jmxremote.local.only=false
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false
> -Dcom.sun.management.jmxremote.port=18983
> -Dcom.sun.management.jmxremote.rmi.port=18983
> -Djava.rmi.server.hostname=tr3slr3dn27 -DzkClientTimeout=12 -Dzkhost
> .../solr -Dsolr.log.dir=/var/log/solr -Djetty.port=8983 -DSTOP.PORT=7983
> -DSTOP.KEY=solrrocks -Dhost=tr3slr3dn27 -Duser.timezone=EST
> -Djetty.home=/opt/solr/server -Dsolr.solr.home=/data0/solr
> -Dsolr.install.dir=/opt/solr
> -Dlog4j.configuration=file:/etc/solr/conf/log4j.properties -Xss256k
> -Dsolr.autoSoftCommit.maxTime=30 -Dsolr.autoCommit.maxTime=60
> -Dsolr.clustering.enabled=false -DsharedLib=/opt/mw/solrlib
> -Dsolr.lock.type=native -XX:MetaspaceSize=1024m -XX:MaxMetaspaceSize=1024m
> -XX:MinMetaspaceExpansion=16m -XX:MaxMetaspaceExpansion=32m -Xss256k
> -Dsolr.log.muteconsole -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983
> /var/log/solr -jar start.jar --module=http
>
>
>
> just to give more idea...
> we are a 48 node cluster with each node having indexes (many together) up to
> 900GB to 1TB and one major index is with 48 shards with each shard is 80 -
> 85 G = approx 4TB
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr 6. 3 Can not talk to ZK Updates are disabled

2018-04-02 Thread murugesh karmegam
Hi Yago Riveiro , 

Thanks for the reply. We have heap size 64G. Any more is not recommended
right? Except one time I was not able to co relate "updates disabled" with
GC pause.  Also zk timeout is 120 seconds even with long GC pause (more than
10 seconds normally) we should recover right? 

JVM settings 

 /usr/java/latest/bin/java -server -Xms32g -Xmx64g
-DsharedLib=/opt/mw/solrlib -XX:+UseG1GC -XX:MaxGCPauseMillis=5000
-XX:ParallelGCThreads=30 -XX:ConcGCThreads=10 -Djute.maxbuffer=41943040
-XX:G1HeapWastePercent=20 -XX:InitiatingHeapOccupancyPercent=45
-XX:+UnlockExperimentalVMOptions -XX:NewSize=10g -XX:MaxNewSize=32g
-XX:SurvivorRatio=2 -XX:-ResizePLAB -XX:+AlwaysPreTouch
-XX:+ParallelRefProcEnabled -server
-Xloggc:/var/log/solr/gc-solr2018-03-27-19-16.log -verbose:gc
-XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime -XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=64m
-XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1
-Xloggc:/var/log/solr/solr_gc.log -XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.local.only=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.port=18983
-Dcom.sun.management.jmxremote.rmi.port=18983
-Djava.rmi.server.hostname=tr3slr3dn27 -DzkClientTimeout=12 -Dzkhost
.../solr -Dsolr.log.dir=/var/log/solr -Djetty.port=8983 -DSTOP.PORT=7983
-DSTOP.KEY=solrrocks -Dhost=tr3slr3dn27 -Duser.timezone=EST
-Djetty.home=/opt/solr/server -Dsolr.solr.home=/data0/solr
-Dsolr.install.dir=/opt/solr
-Dlog4j.configuration=file:/etc/solr/conf/log4j.properties -Xss256k
-Dsolr.autoSoftCommit.maxTime=30 -Dsolr.autoCommit.maxTime=60
-Dsolr.clustering.enabled=false -DsharedLib=/opt/mw/solrlib
-Dsolr.lock.type=native -XX:MetaspaceSize=1024m -XX:MaxMetaspaceSize=1024m
-XX:MinMetaspaceExpansion=16m -XX:MaxMetaspaceExpansion=32m -Xss256k
-Dsolr.log.muteconsole -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983
/var/log/solr -jar start.jar --module=http



just to give more idea... 
we are a 48 node cluster with each node having indexes (many together) up to
900GB to 1TB and one major index is with 48 shards with each shard is 80 -
85 G = approx 4TB



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr 6. 3 Can not talk to ZK Updates are disabled

2018-04-02 Thread Yago Riveiro
Hi murugesh,

This error happen normally when you are in long GC pauses. Try to rise the heap 
memory.

The only way to recover from this is restarting the affected node.

Regard.

--

Yago Riveiro

On 2 Apr 2018 15:39 +0100, murugesh karmegam , wrote:
> We noticed this issue in our solr clusters right after when Solr cluster is
> restarted or Solr cluster is live for some time. Based on my research so
> far... I am not seeing zookeeper connection issues from zk server side. It
> seems it is solr side ( zk client) side. This issue is pretty constant now
> and then.
>
> Error 1 Solr:
>
> WARN - 2018-02-06 17:35:04.742;
> org.apache.solr.common.cloud.ConnectionManager; Our previous ZooKeeper
> session was expired. Attempting to reconnect to recover relationship with
> ZooKeeper...
> ERROR - 2018-02-06 17:35:04.743; org.apache.solr.common.SolrException;
> org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are
> disabled.
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1508)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:696)
> at
> org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97)
>
>
> Error 2:
>
> From ingestor log:
> /var/log/mwired/core-ingestors/app.log.9:2018-03-30 05:44:52,616 [-38] ERROR
> org.apache.solr.client.solrj.impl.CloudSolrClient - Request to collection
> failed due to (503)
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
> from server at : Cannot talk to ZooKeeper - Updates are disabled., retry? 0
> /var/log/mwired/core-ingestors/app.log.9:com.mwired.grid.commons.exception.PersistenceException:
> Failed to add 11 docs to solr0 collection , cachedDocs=118; because Error
> from server at : Cannot talk to ZooKeeper - Updates are disabled.
>
>
> Wondering is there any fix? Appreciate any input.
>
> http://lucene.472066.n3.nabble.com/Cannot-talk-to-ZooKeeper-Updates-are-disabled-Solr-6-3-0-td4311582.html
> http://lucene.472066.n3.nabble.com/6-6-Cannot-talk-to-ZooKeeper-Updates-are-disabled-td4352917.html
> https://issues.apache.org/jira/browse/SOLR-3274
>
> Thanks in advance.
> Murux
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Solr 6. 3 Can not talk to ZK Updates are disabled

2018-04-02 Thread murugesh karmegam
We noticed this issue in our solr clusters right after when Solr cluster is
restarted or Solr cluster is live for some time. Based on my research so
far... I am not seeing zookeeper connection issues from zk server side. It
seems it is solr side ( zk client) side. This issue is pretty constant now
and then.

Error 1 Solr:

WARN  - 2018-02-06 17:35:04.742;
org.apache.solr.common.cloud.ConnectionManager; Our previous ZooKeeper
session was expired. Attempting to reconnect to recover relationship with
ZooKeeper...
ERROR - 2018-02-06 17:35:04.743; org.apache.solr.common.SolrException;
org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are
disabled.
at
org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1508)
at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:696)
at
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97)


Error 2:

>From ingestor log:
/var/log/mwired/core-ingestors/app.log.9:2018-03-30 05:44:52,616 [-38] ERROR
org.apache.solr.client.solrj.impl.CloudSolrClient - Request to collection 
failed due to (503)
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at : Cannot talk to ZooKeeper - Updates are disabled., retry? 0
/var/log/mwired/core-ingestors/app.log.9:com.mwired.grid.commons.exception.PersistenceException:
Failed to add 11 docs to solr0 collection , cachedDocs=118; because Error
from server at : Cannot talk to ZooKeeper - Updates are disabled.


Wondering is there any fix? Appreciate any input.

http://lucene.472066.n3.nabble.com/Cannot-talk-to-ZooKeeper-Updates-are-disabled-Solr-6-3-0-td4311582.html
http://lucene.472066.n3.nabble.com/6-6-Cannot-talk-to-ZooKeeper-Updates-are-disabled-td4352917.html
https://issues.apache.org/jira/browse/SOLR-3274

Thanks in advance.
Murux



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html