Re: SolrCloud not able to view cloud page - Loading of "/solr/zookeeper?wt=json" failed (HTTP-Status 500)
@shawn Heisey, My client is facing the same issue. However, I am not sure / have not worked with the ZkCli script in Zookeeper. Could you please help me with the steps? If you could send me, where can I find the ZkCli script associated with Zookeeper and the exact commands to run from each node, it will be great. Thanks in advance. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: SolrCloud not able to view cloud page - Loading of "/solr/zookeeper?wt=json" failed (HTTP-Status 500)
I am also getting the same error. I am not sure where exactly we need to pass -Djute.buffer arguments. In the C:\\zookeeper\bin path I can find zkCli file which has below information. @echo off REM Licensed to the Apache Software Foundation (ASF) under one or more REM contributor license agreements. See the NOTICE file distributed with REM this work for additional information regarding copyright ownership. REM The ASF licenses this file to You under the Apache License, Version 2.0 REM (the "License"); you may not use this file except in compliance with REM the License. You may obtain a copy of the License at REM REM http://www.apache.org/licenses/LICENSE-2.0 REM REM Unless required by applicable law or agreed to in writing, software REM distributed under the License is distributed on an "AS IS" BASIS, REM WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. REM See the License for the specific language governing permissions and REM limitations under the License. setlocal call "%~dp0zkEnv.cmd" set ZOOMAIN=org.apache.zookeeper.ZooKeeperMain java "-Dzookeeper.log.dir=%ZOO_LOG_DIR%" "-Dzookeeper.root.logger=%ZOO_LOG4J_PROP%" -cp "%CLASSPATH%" %ZOOMAIN% %* endlocal -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: SolrCloud not able to view cloud page - Loading of "/solr/zookeeper?wt=json" failed (HTTP-Status 500)
We are passing -DhostPort=4040;-DzkClientTimeout=2; in Apache tomcat service batch file. Passing below argument will here will fix the issue? -Djute.maxbuffer=5291220 -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: SolrCloud not able to view cloud page - Loading of "/solr/zookeeper?wt=json" failed (HTTP-Status 500)
On the Solr side there's at least https://issues.apache.org/jira/browse/SOLR-9818 which may cause trouble with the queue. I once had the core reload command in the admin UI add more than 200k entries to the overseer queue.. --Ere Shawn Heisey kirjoitti 25.10.2017 klo 15.57: On 10/24/2017 8:11 AM, Tarjono, C. A. wrote: Would like to check if anyone have seen this issue before, we started having this a few days ago: The only error I can see in solr console is below: 5960847[main-SendThread(172.16.130.132:2281)] WARN org.apache.zookeeper.ClientCnxn [ ] – Session 0x65f4e28b7370001 for server 172.16.130.132/172.16.130.132:2281, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Packet len30829010 is out of range! Combining the last part of what I quoted above with the image you shared later, I am pretty sure I know what is happening. The overseer queue in zookeeper (at the ZK path of /overseer/queue) has a lot of entries in it. Based on the fact that you are seeing a packet length beyond 30 million bytes, I am betting that the number of entries in the queue is between 1.5 million and 2 million. ZK cannot handle that packet size without a special startup argument. The value of the special parameter defaults to a little over one million bytes. To fix this, you're going to need to wipe out the overseer queue. ZK includes a script named ZkCli. Note that Solr includes a script called zkcli as well, which does very different things. You need the one included with zookeeper. Wiping out the queue when it is that large is not straightforward. You need to start the ZkCli script included with zookeeper with a -Djute.maxbuffer=3100 argument and the same zkHost value used by Solr, and then use a command like "rmr /overseer/queue" in that command shell to completely remove the /overseer/queue path. Then you can restart the ZK servers without the jute.maxbuffer setting. You may need to restart Solr. Running this procedure might also require temporarily restarting the ZK servers with the same jute.maxbuffer argument, but I am not sure whether that is required. The basic underlying problem here is that ZK allows adding new nodes even when the size of the parent node exceeds the default buffer size. That issue is documented here: https://issues.apache.org/jira/browse/ZOOKEEPER-1162 I can't be sure why why your cloud is adding so many entries to the overseer queue. I have seen this problem happen when restarting a server in the cloud, particularly when there are a large number of collections or shard replicas in the cloud. Restarting multiple servers or restarting the same server multiple times without waiting for the overseer queue to empty could also cause the issue. Thanks, Shawn -- Ere Maijala Kansalliskirjasto / The National Library of Finland
Re: [External] Re: SolrCloud not able to view cloud page - Loading of "/solr/zookeeper?wt=json" failed (HTTP-Status 500)
On 10/25/2017 6:44 PM, Tarjono, C. A. wrote: > Thanks so much for your input! We will try your suggestion and hope it will > resolve the issue. > > On the side note, would you know if this is an existing bug? if yes, has it > been resolved in later version? i.e. zk allows adding nodes when it exceeds > the buffer. > > We are currently using ZK 3.4.6 to use with SolrCloud 5.1.0. The ZOOKEEPER-1162 issue has not been fixed. It is a very old bug -- opened six years ago. They probably aren't going to fix it. If you find that restarting a single Solr instance ends up filling the queue with too many entries, you may need to increase the jute.maxbuffer setting on both Solr and ZK so that a large queue won't cause everything to break. There has been some effort in recent 6.x versions to improve this situation, as Erick mentioned in his reply. There's nothing that can be done for problems like this in 5.x versions. Thanks, Shawn
Re: [External] Re: SolrCloud not able to view cloud page - Loading of "/solr/zookeeper?wt=json" failed (HTTP-Status 500)
Later versions of Solr have been changed two ways: 1> changes have been made to not put so many items in the overseer queue in the first place 2> changes have been made to process the messages that do get there much more quickly. Meanwhile, my guess is you have a lot of replicas out there. I've seen this happen when there are lots of collections and/or replicas and people try to start many of them up at once. One strategy to get by is to start your Solr nodes a few at a time, wait for the Overseer queue to get processed then start a few more. Unsatisfactory, but if the precursor to this was starting all your Solr instances and you have a lot of replicas, it may help until you can upgrade. Best, Erick On Wed, Oct 25, 2017 at 5:44 PM, Tarjono, C. A.wrote: > @Shawn Heisey, > > Thanks so much for your input! We will try your suggestion and hope it will > resolve the issue. > > On the side note, would you know if this is an existing bug? if yes, has it > been resolved in later version? i.e. zk allows adding nodes when it exceeds > the buffer. > > We are currently using ZK 3.4.6 to use with SolrCloud 5.1.0. > > Thanks again! > > Best Regards, > > Christopher Tarjono > Accenture Pte Ltd > > +65 9347 2484 > c.a.tarj...@accenture.com > > From: Shawn Heisey > Sent: 25 October 2017 20:57:30 > To: solr-user@lucene.apache.org > Subject: [External] Re: SolrCloud not able to view cloud page - Loading of > "/solr/zookeeper?wt=json" failed (HTTP-Status 500) > > On 10/24/2017 8:11 AM, Tarjono, C. A. wrote: >> Would like to check if anyone have seen this issue before, we started >> having this a few days ago: >> >>  >> >> The only error I can see in solr console is below: >> >> 5960847[main-SendThread(172.16.130.132:2281)] WARN >> org.apache.zookeeper.ClientCnxn [ ] – Session 0x65f4e28b7370001 for >> server 172.16.130.132/172.16.130.132:2281, unexpected error, closing >> socket connection and attempting reconnect java.io.IOException: Packet >> len30829010 is out of range! >> > > Combining the last part of what I quoted above with the image you shared > later, I am pretty sure I know what is happening. > > The overseer queue in zookeeper (at the ZK path of /overseer/queue) has > a lot of entries in it. Based on the fact that you are seeing a packet > length beyond 30 million bytes, I am betting that the number of entries > in the queue is between 1.5 million and 2 million. ZK cannot handle > that packet size without a special startup argument. The value of the > special parameter defaults to a little over one million bytes. > > To fix this, you're going to need to wipe out the overseer queue. ZK > includes a script named ZkCli. Note that Solr includes a script called > zkcli as well, which does very different things. You need the one > included with zookeeper. > > Wiping out the queue when it is that large is not straightforward. You > need to start the ZkCli script included with zookeeper with a > -Djute.maxbuffer=3100 argument and the same zkHost value used by > Solr, and then use a command like "rmr /overseer/queue" in that command > shell to completely remove the /overseer/queue path. Then you can > restart the ZK servers without the jute.maxbuffer setting. You may need > to restart Solr. Running this procedure might also require temporarily > restarting the ZK servers with the same jute.maxbuffer argument, but I > am not sure whether that is required. > > The basic underlying problem here is that ZK allows adding new nodes > even when the size of the parent node exceeds the default buffer size. > That issue is documented here: > > https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZOOKEEPER-2D1162=DwID-g=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU=nMQjeyON92LbZ8rY3nXuv_He9mq8qtY9BEKkAyIxX-o=gk-2k71keLZeoINvrC1CZC2NLBiRkNVKK2VMu8UXb7Q=0ekWo10I-HOI3ppcq8pVpjzaHNaIhhE2XhhZnGUjn5M= > > I can't be sure why why your cloud is adding so many entries to the > overseer queue. I have seen this problem happen when restarting a > server in the cloud, particularly when there are a large number of > collections or shard replicas in the cloud. Restarting multiple servers > or restarting the same server multiple times without waiting for the > overseer queue to empty could also cause the issue. > > Thanks, > Shawn > > > > > This message is for the designated recipient only and may contain privileged, > proprietary, or otherwise confidential information. If you have received it > in error, please notify the sender immediately and delete the original. Any > other use of the e-mail by you is prohibited. Where allowed by local law, > electronic communications with Accenture and its affiliates, including e-mail > and instant messaging (including content), may be scanned by our systems for > the purposes of information security and assessment of
Re: [External] Re: SolrCloud not able to view cloud page - Loading of "/solr/zookeeper?wt=json" failed (HTTP-Status 500)
@Shawn Heisey, Thanks so much for your input! We will try your suggestion and hope it will resolve the issue. On the side note, would you know if this is an existing bug? if yes, has it been resolved in later version? i.e. zk allows adding nodes when it exceeds the buffer. We are currently using ZK 3.4.6 to use with SolrCloud 5.1.0. Thanks again! Best Regards, Christopher Tarjono Accenture Pte Ltd +65 9347 2484 c.a.tarj...@accenture.com From: Shawn HeiseySent: 25 October 2017 20:57:30 To: solr-user@lucene.apache.org Subject: [External] Re: SolrCloud not able to view cloud page - Loading of "/solr/zookeeper?wt=json" failed (HTTP-Status 500) On 10/24/2017 8:11 AM, Tarjono, C. A. wrote: > Would like to check if anyone have seen this issue before, we started > having this a few days ago: > >  > > The only error I can see in solr console is below: > > 5960847[main-SendThread(172.16.130.132:2281)] WARN > org.apache.zookeeper.ClientCnxn [ ] – Session 0x65f4e28b7370001 for > server 172.16.130.132/172.16.130.132:2281, unexpected error, closing > socket connection and attempting reconnect java.io.IOException: Packet > len30829010 is out of range! > Combining the last part of what I quoted above with the image you shared later, I am pretty sure I know what is happening. The overseer queue in zookeeper (at the ZK path of /overseer/queue) has a lot of entries in it. Based on the fact that you are seeing a packet length beyond 30 million bytes, I am betting that the number of entries in the queue is between 1.5 million and 2 million. ZK cannot handle that packet size without a special startup argument. The value of the special parameter defaults to a little over one million bytes. To fix this, you're going to need to wipe out the overseer queue. ZK includes a script named ZkCli. Note that Solr includes a script called zkcli as well, which does very different things. You need the one included with zookeeper. Wiping out the queue when it is that large is not straightforward. You need to start the ZkCli script included with zookeeper with a -Djute.maxbuffer=3100 argument and the same zkHost value used by Solr, and then use a command like "rmr /overseer/queue" in that command shell to completely remove the /overseer/queue path. Then you can restart the ZK servers without the jute.maxbuffer setting. You may need to restart Solr. Running this procedure might also require temporarily restarting the ZK servers with the same jute.maxbuffer argument, but I am not sure whether that is required. The basic underlying problem here is that ZK allows adding new nodes even when the size of the parent node exceeds the default buffer size. That issue is documented here: https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZOOKEEPER-2D1162=DwID-g=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU=nMQjeyON92LbZ8rY3nXuv_He9mq8qtY9BEKkAyIxX-o=gk-2k71keLZeoINvrC1CZC2NLBiRkNVKK2VMu8UXb7Q=0ekWo10I-HOI3ppcq8pVpjzaHNaIhhE2XhhZnGUjn5M= I can't be sure why why your cloud is adding so many entries to the overseer queue. I have seen this problem happen when restarting a server in the cloud, particularly when there are a large number of collections or shard replicas in the cloud. Restarting multiple servers or restarting the same server multiple times without waiting for the overseer queue to empty could also cause the issue. Thanks, Shawn This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. __ www.accenture.com
Re: SolrCloud not able to view cloud page - Loading of "/solr/zookeeper?wt=json" failed (HTTP-Status 500)
On 10/24/2017 8:11 AM, Tarjono, C. A. wrote: > Would like to check if anyone have seen this issue before, we started > having this a few days ago: > > > > The only error I can see in solr console is below: > > 5960847[main-SendThread(172.16.130.132:2281)] WARN > org.apache.zookeeper.ClientCnxn [ ] – Session 0x65f4e28b7370001 for > server 172.16.130.132/172.16.130.132:2281, unexpected error, closing > socket connection and attempting reconnect java.io.IOException: Packet > len30829010 is out of range! > Combining the last part of what I quoted above with the image you shared later, I am pretty sure I know what is happening. The overseer queue in zookeeper (at the ZK path of /overseer/queue) has a lot of entries in it. Based on the fact that you are seeing a packet length beyond 30 million bytes, I am betting that the number of entries in the queue is between 1.5 million and 2 million. ZK cannot handle that packet size without a special startup argument. The value of the special parameter defaults to a little over one million bytes. To fix this, you're going to need to wipe out the overseer queue. ZK includes a script named ZkCli. Note that Solr includes a script called zkcli as well, which does very different things. You need the one included with zookeeper. Wiping out the queue when it is that large is not straightforward. You need to start the ZkCli script included with zookeeper with a -Djute.maxbuffer=3100 argument and the same zkHost value used by Solr, and then use a command like "rmr /overseer/queue" in that command shell to completely remove the /overseer/queue path. Then you can restart the ZK servers without the jute.maxbuffer setting. You may need to restart Solr. Running this procedure might also require temporarily restarting the ZK servers with the same jute.maxbuffer argument, but I am not sure whether that is required. The basic underlying problem here is that ZK allows adding new nodes even when the size of the parent node exceeds the default buffer size. That issue is documented here: https://issues.apache.org/jira/browse/ZOOKEEPER-1162 I can't be sure why why your cloud is adding so many entries to the overseer queue. I have seen this problem happen when restarting a server in the cloud, particularly when there are a large number of collections or shard replicas in the cloud. Restarting multiple servers or restarting the same server multiple times without waiting for the overseer queue to empty could also cause the issue. Thanks, Shawn
RE: [External] Re: SolrCloud not able to view cloud page - Loading of "/solr/zookeeper?wt=json" failed (HTTP-Status 500)
Thanks Eric for your response, please see below link for the image of our solrcloud dashboard that shows the error. https://imgur.com/QCn9BCl Best Regards, Christopher Tarjono Accenture Pte Ltd +65 9347 2484 c.a.tarj...@accenture.com -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, October 24, 2017 11:32 PM To: solr-userSubject: [External] Re: SolrCloud not able to view cloud page - Loading of "/solr/zookeeper?wt=json" failed (HTTP-Status 500) The mail server aggressively removes attachments and the like, you'll have to put it somewhere and provide a link. Did anything change in that time frame? Best, Erick On Tue, Oct 24, 2017 at 7:11 AM, Tarjono, C. A. wrote: > Hi All, > > > > Would like to check if anyone have seen this issue before, we started > having this a few days ago: > > > > The only error I can see in solr console is below: > > 5960847 [main-SendThread(172.16.130.132:2281)] WARN org.apache.zookeeper. > ClientCnxn [ ] – Session 0x65f4e28b7370001 for server > 172.16.130.132/172.16.130.132:2281, unexpected error, closing socket > connection and attempting reconnect java.io.IOException: Packet > len30829010 is out of range! at org.apache.zookeeper. > ClientCnxnSocket.readLength(ClientCnxnSocket.java:112) at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java > :79) at > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketN > I > O.java:366) at org.apache.zookeeper.ClientCnxn$SendThread.run(Clie > ntCnxn.java:1081) 5960947 [zkCallback-2-thread-120] INFO > org.apache.solr.common.cloud.ConnectionManager [ ] – Watcher > org.apache.solr.common.cloud.ConnectionManager@4cf4d11e > name:ZooKeeperConnection > Watcher:172.16.129.132:2281,172.16.129.133:2281, > 172.16.129.134:2281,172.16.130.132:2281,172.16.130.133:2281,172.16.130. > 134:2281 got event WatchedEvent state:Disconnected type:None path:null > path:null type:None 5960947 [zkCallback-2-thread-120] INFO > org.apache.solr.common.cloud.ConnectionManager [ ] – zkClient has > disconnected > > > > We cant find any corresponding error in zookeeper log. > > Appreciate any input, thanks! > > > > Best Regards, > > > > Christopher Tarjono > > *Accenture Pte Ltd* > > > > +65 9347 2484 <+65%209347%202484> > > c.a.tarj...@accenture.com > > > > -- > > This message is for the designated recipient only and may contain > privileged, proprietary, or otherwise confidential information. If you > have received it in error, please notify the sender immediately and > delete the original. Any other use of the e-mail by you is prohibited. > Where allowed by local law, electronic communications with Accenture > and its affiliates, including e-mail and instant messaging (including > content), may be scanned by our systems for the purposes of > information security and assessment of internal compliance with Accenture > policy. > > __ > > www.accenture.com > This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. __ www.accenture.com
Re: SolrCloud not able to view cloud page - Loading of "/solr/zookeeper?wt=json" failed (HTTP-Status 500)
The mail server aggressively removes attachments and the like, you'll have to put it somewhere and provide a link. Did anything change in that time frame? Best, Erick On Tue, Oct 24, 2017 at 7:11 AM, Tarjono, C. A.wrote: > Hi All, > > > > Would like to check if anyone have seen this issue before, we started > having this a few days ago: > > > > The only error I can see in solr console is below: > > 5960847 [main-SendThread(172.16.130.132:2281)] WARN org.apache.zookeeper. > ClientCnxn [ ] – Session 0x65f4e28b7370001 for server > 172.16.130.132/172.16.130.132:2281, unexpected error, closing socket > connection and attempting reconnect java.io.IOException: Packet > len30829010 is out of range! at org.apache.zookeeper. > ClientCnxnSocket.readLength(ClientCnxnSocket.java:112) at > org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:79) > at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNI > O.java:366) at org.apache.zookeeper.ClientCnxn$SendThread.run(Clie > ntCnxn.java:1081) 5960947 [zkCallback-2-thread-120] INFO > org.apache.solr.common.cloud.ConnectionManager [ ] – Watcher > org.apache.solr.common.cloud.ConnectionManager@4cf4d11e > name:ZooKeeperConnection Watcher:172.16.129.132:2281,172.16.129.133:2281, > 172.16.129.134:2281,172.16.130.132:2281,172.16.130.133:2281,172.16.130. > 134:2281 got event WatchedEvent state:Disconnected type:None path:null > path:null type:None 5960947 [zkCallback-2-thread-120] INFO > org.apache.solr.common.cloud.ConnectionManager [ ] – zkClient has > disconnected > > > > We cant find any corresponding error in zookeeper log. > > Appreciate any input, thanks! > > > > Best Regards, > > > > Christopher Tarjono > > *Accenture Pte Ltd* > > > > +65 9347 2484 <+65%209347%202484> > > c.a.tarj...@accenture.com > > > > -- > > This message is for the designated recipient only and may contain > privileged, proprietary, or otherwise confidential information. If you have > received it in error, please notify the sender immediately and delete the > original. Any other use of the e-mail by you is prohibited. Where allowed > by local law, electronic communications with Accenture and its affiliates, > including e-mail and instant messaging (including content), may be scanned > by our systems for the purposes of information security and assessment of > internal compliance with Accenture policy. > > __ > > www.accenture.com >
SolrCloud not able to view cloud page - Loading of "/solr/zookeeper?wt=json" failed (HTTP-Status 500)
Hi All, Would like to check if anyone have seen this issue before, we started having this a few days ago: [cid:image003.jpg@01D34D14.FC34F4D0] The only error I can see in solr console is below: 5960847 [main-SendThread(172.16.130.132:2281)] WARN org.apache.zookeeper.ClientCnxn [ ] - Session 0x65f4e28b7370001 for server 172.16.130.132/172.16.130.132:2281, unexpected error, closing socket connection and attempting reconnect java.io.IOException: Packet len30829010 is out of range! at org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112) at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:79) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 5960947 [zkCallback-2-thread-120] INFO org.apache.solr.common.cloud.ConnectionManager [ ] - Watcher org.apache.solr.common.cloud.ConnectionManager@4cf4d11e name:ZooKeeperConnection Watcher:172.16.129.132:2281,172.16.129.133:2281,172.16.129.134:2281,172.16.130.132:2281,172.16.130.133:2281,172.16.130.134:2281 got event WatchedEvent state:Disconnected type:None path:null path:null type:None 5960947 [zkCallback-2-thread-120] INFO org.apache.solr.common.cloud.ConnectionManager [ ] - zkClient has disconnected We cant find any corresponding error in zookeeper log. Appreciate any input, thanks! Best Regards, Christopher Tarjono Accenture Pte Ltd +65 9347 2484 c.a.tarj...@accenture.com [cid:image005.jpg@01D34D14.FC34F4D0] This message is for the designated recipient only and may contain privileged, proprietary, or otherwise confidential information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the e-mail by you is prohibited. Where allowed by local law, electronic communications with Accenture and its affiliates, including e-mail and instant messaging (including content), may be scanned by our systems for the purposes of information security and assessment of internal compliance with Accenture policy. __ www.accenture.com