Re: Is the current max packet length available via the API?

2018-04-05 Thread Shawn Heisey

On 4/5/2018 3:44 AM, Andor Molnar wrote:
You can get the current jute.maxbuffer setting from a running 
ZooKeeper instance by querying ZooKeeperServerBean via JMX.


I'm not sure how I would do that in a client program.  It might be 
trivial, but it's not something I've ever done.


Currently there're 2 usage of the setting in ZK: 1) server-client 
communication which is by default 4MB, 2) server-server communication 
which is by default 1MB. They can't be set individually, but can be 
overriden with the jute.maxbuffer system property.


I'm looking for a way to ask the ZK client to give me the value it is 
currently using as its max packet length.  I'm only going to be logging 
a warning to inform the user about which file may have caused a problem 
due to size, not preventing the attempt at uploading the file, so I'm 
not opposed to falling back to a hard-coded value if I can't figure it 
out.  I can look for the jute.maxbuffer sysprop, but if ZK will tell me 
what it's actually using, I'd prefer that.


Does the max packet length cover ONLY the size of the znode data, or 
does the znode name get included in that?  Asked another way: Should I 
subtract a little bit from the max packet length (maybe 128 or 256) 
before I compare the file size, or just compare the unchanged value?


I did discover that the ZkClientConfig.CLIENT_MAX_PACKET_LENGTH_DEFAULT 
field I mentioned before is not available in 3.4.x, it seems to have 
been added to a 3.5 version.  Since Solr uses 3.4.x and won't upgrade 
until there is a stable 3.5 release, I can't use that.


I do think that the ZK client should log something useful when the max 
packet length is exceeded -- if that's even possible.  The user in this 
scenario is running the latest version of Solr that was available at the 
time, which includes ZK 3.4.10 for its client.  The error message 
indicated socket problems, but didn't have any information about the cause.


When running under java 9, they got this as the error:

WARN - 2018-04-04 09:05:28.194; 
org.apache.zookeeper.ClientCnxn$SendThread; Session 0x100244e8ffb0004 
for server localhost/127.0.0.1:2181, unexpected error, closing socket 
connection and attempting reconnect java.io.IOException: Connection 
reset by peer


With Java 8, they got this:

WARN - 2018-04-04 09:10:11.879; 
org.apache.zookeeper.ClientCnxn$SendThread; Session 0x10024db7e280002 
for server localhost/0:0:0:0:0:0:0:1:2181, unexpected error, closing 
socket connection and attempting reconnect java.io.IOException: Protocol 
wrong type for socket


In both cases, the stacktrace listed a bunch of sun classes and then a 
couple of methods in zookeeper's ClientCnxnSocketNIO class.


When I asked them what their ZK server log said, that's when I figured 
out the problem:


2018-04-04 14:06:01,361 [myid:] - WARN [NIOServerCxn.Factory: 
0.0.0.0/0.0.0.0:2181:NIOServerCnxn@383] - Exception causing close of 
session 0x10024db7e280006: Len error 5327937



Do I understand correctly that Solr uploads file to ZooKeeper?


Solr *itself* won't typically be uploading data to ZK that can exceed 
the max packet size.  It is typically done either with a separate 
commandline program (the ZkCLI class the commandline program uses is 
included in Solr), or by a client program using the SolrJ library (which 
is part of Solr like ZkCLI, but usable by itself).  The action being 
performed is an upload of a configuration for a Solr index.


Solr does sometimes run into the problem described in ZOOKEEPER-1162, 
but this is due to the number of children in a znode, where each one has 
minimal data.


Thanks,
Shawn



Re: Which metrics to monitor?

2018-04-05 Thread Nikhil Bafna
I'm also interested in the answer to this.

On Thu 5 Apr, 2018, 7:30 PM Mark Bonetti, 
wrote:

> Hi,
> I'm building a monitoring system for Zookeeper and want to set up default
> alerts (threshold or anomaly) on 2-3 key metrics everyone who uses
> Zookeeper typically wants to alert on.
> Importantly, alert rules have to be generally useful, so can't be on
> metrics whose values vary wildly based on the size of deployment.
>
> In other words, which metrics would be most significant indicators that
> something went wrong with your ZK deployment?
>
> I thought the best place to find experienced ZK users would be here.
>
> Thanks very much,
> Mark Scott
>


Which metrics to monitor?

2018-04-05 Thread Mark Bonetti
Hi,
I'm building a monitoring system for Zookeeper and want to set up default
alerts (threshold or anomaly) on 2-3 key metrics everyone who uses
Zookeeper typically wants to alert on.
Importantly, alert rules have to be generally useful, so can't be on
metrics whose values vary wildly based on the size of deployment.

In other words, which metrics would be most significant indicators that
something went wrong with your ZK deployment?

I thought the best place to find experienced ZK users would be here.

Thanks very much,
Mark Scott


RE: zookeeper node can't join the cluster

2018-04-05 Thread Rashwan, Abderahman
Hi,

Could it be something to do with Proxmox containers?
---
Could be but I tried VMs as well and gave me the same error

Which ZooKeeper version are u running?
Zookeeper version: 3.4.10-39d3a4f269333c922ed3db283be479f9deacaa0f, built on 
03/23/2017 10:13 GMT


Looks like you restarted zk01 and it was trying to connect to itself.
(zk001/172.31.254.56:3888)

Would you please attach your Zk config files too?
--
dataDir=/var/lib/zookeeper
clientPort=2181
maxClientCnxns=0
tickTime=2000
initLimit=5
syncLimit=2
server.1=zk001:2888:3888
server.2=zk002:2888:3888
server.3=zk003:2888:3888


/etc/hosts file
172.31.254.57 zk002
172.31.254.56 zk001
172.31.254.10 zk003

Regards,
Andor




On Wed, Apr 4, 2018 at 10:51 PM, Rashwan, Abderahman < 
abderahman.rash...@bell.ca> wrote:

> Hello,
>
>
>
> I have 2 servers, I installed proxmox in both and created a cluster 
> contains 6 kafka nodes and 3 zookeepers
>
> Server1: kafka1, kafka2, kafka3,zk1
>
> Server2: kafka4, kafka5, kafka6,zk2
>
> VM: zk3
>
>
>
> When i shut down one server, for example server1 (kafka1, kafka2,
> kafka3,zk1)  and then power it up Zk01 gives me an error and can’t join
> the cluster, and I got this error
>
>
>
> [2018-04-03 10:22:04,370] WARN Cannot open channel to 1 at election 
> address zk001/172.31.254.56:3888 (org.apache.zookeeper.server.
> quorum.QuorumCnxManager)
>
> java.net.ConnectException: Connection refused (Connection refused)
>
> at java.net.PlainSocketImpl.socketConnect(Native Method)
>
> at java.net.AbstractPlainSocketImpl.doConnect(
> AbstractPlainSocketImpl.java:350)
>
> at java.net.AbstractPlainSocketImpl.connectToAddress(
> AbstractPlainSocketImpl.java:206)
>
> at java.net.AbstractPlainSocketImpl.connect(
> AbstractPlainSocketImpl.java:188)
>
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>
> at java.net.Socket.connect(Socket.java:589)
>
> at org.apache.zookeeper.server.quorum.QuorumCnxManager.
> connectOne(QuorumCnxManager.java:562)
>
> at org.apache.zookeeper.server.quorum.QuorumCnxManager.
> handleConnection(QuorumCnxManager.java:479)
>
> at org.apache.zookeeper.server.quorum.QuorumCnxManager.
> receiveConnection(QuorumCnxManager.java:379)
>
> at org.apache.zookeeper.server.quorum.QuorumCnxManager$
> Listener.run(QuorumCnxManager.java:757)
>
> [2018-04-03 10:22:04,370] INFO Resolved hostname: zk001 to address: 
> zk001/
> 172.31.254.56 (org.apache.zookeeper.server.quorum.QuorumPeer)
>
> [2018-04-03 10:22:17,171] INFO Received connection request /
> 172.31.254.56:58322 
> (org.apache.zookeeper.server.quorum.QuorumCnxManager)
>
> [2018-04-03 10:22:17,172] WARN Cannot open channel to 1 at election 
> address zk001/172.31.254.56:3888 (org.apache.zookeeper.server.
> quorum.QuorumCnxManager)
>
> java.net.ConnectException: Connection refused (Connection refused)
>
>
>
>
>
>
>
> When I restart the zookeeper service, it joined the cluster
>
>
>
> Also when I start the zookeeper service after the boot with 10 sec, it 
> worked
>
>
>
> What could be the cause
>
>
>
> Abderahman Rashwan
>
> [image: bell]Bell Network | SOC
>
> Network Security Engineering|Cyber Security Analyst
>
> T: (514) 870-7001 M: (514) 443-5820
>
> C: abderahman.rash...@bell.ca
>
>
>


Re: zookeeper node can't join the cluster

2018-04-05 Thread Andor Molnar
Hi,

Could it be something to do with Proxmox containers?

Which ZooKeeper version are u running?
Looks like you restarted zk01 and it was trying to connect to itself.
(zk001/172.31.254.56:3888)

Would you please attach your Zk config files too?

Regards,
Andor




On Wed, Apr 4, 2018 at 10:51 PM, Rashwan, Abderahman <
abderahman.rash...@bell.ca> wrote:

> Hello,
>
>
>
> I have 2 servers, I installed proxmox in both and created a cluster
> contains 6 kafka nodes and 3 zookeepers
>
> Server1: kafka1, kafka2, kafka3,zk1
>
> Server2: kafka4, kafka5, kafka6,zk2
>
> VM: zk3
>
>
>
> When i shut down one server, for example server1 (kafka1, kafka2,
> kafka3,zk1)  and then power it up Zk01 gives me an error and can’t join
> the cluster, and I got this error
>
>
>
> [2018-04-03 10:22:04,370] WARN Cannot open channel to 1 at election
> address zk001/172.31.254.56:3888 (org.apache.zookeeper.server.
> quorum.QuorumCnxManager)
>
> java.net.ConnectException: Connection refused (Connection refused)
>
> at java.net.PlainSocketImpl.socketConnect(Native Method)
>
> at java.net.AbstractPlainSocketImpl.doConnect(
> AbstractPlainSocketImpl.java:350)
>
> at java.net.AbstractPlainSocketImpl.connectToAddress(
> AbstractPlainSocketImpl.java:206)
>
> at java.net.AbstractPlainSocketImpl.connect(
> AbstractPlainSocketImpl.java:188)
>
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>
> at java.net.Socket.connect(Socket.java:589)
>
> at org.apache.zookeeper.server.quorum.QuorumCnxManager.
> connectOne(QuorumCnxManager.java:562)
>
> at org.apache.zookeeper.server.quorum.QuorumCnxManager.
> handleConnection(QuorumCnxManager.java:479)
>
> at org.apache.zookeeper.server.quorum.QuorumCnxManager.
> receiveConnection(QuorumCnxManager.java:379)
>
> at org.apache.zookeeper.server.quorum.QuorumCnxManager$
> Listener.run(QuorumCnxManager.java:757)
>
> [2018-04-03 10:22:04,370] INFO Resolved hostname: zk001 to address: zk001/
> 172.31.254.56 (org.apache.zookeeper.server.quorum.QuorumPeer)
>
> [2018-04-03 10:22:17,171] INFO Received connection request /
> 172.31.254.56:58322 (org.apache.zookeeper.server.quorum.QuorumCnxManager)
>
> [2018-04-03 10:22:17,172] WARN Cannot open channel to 1 at election
> address zk001/172.31.254.56:3888 (org.apache.zookeeper.server.
> quorum.QuorumCnxManager)
>
> java.net.ConnectException: Connection refused (Connection refused)
>
>
>
>
>
>
>
> When I restart the zookeeper service, it joined the cluster
>
>
>
> Also when I start the zookeeper service after the boot with 10 sec, it
> worked
>
>
>
> What could be the cause
>
>
>
> Abderahman Rashwan
>
> [image: bell]Bell Network | SOC
>
> Network Security Engineering|Cyber Security Analyst
>
> T: (514) 870-7001 M: (514) 443-5820
>
> C: abderahman.rash...@bell.ca
>
>
>


Re: Is the current max packet length available via the API?

2018-04-05 Thread Andor Molnar
Hi Shawn,

You can get the current jute.maxbuffer setting from a running ZooKeeper
instance by querying ZooKeeperServerBean via JMX.

Currently there're 2 usage of the setting in ZK:
1) server-client communication which is by default 4MB,
2) server-server communication which is by default 1MB.

They can't be set individually, but can be overriden with the
jute.maxbuffer system property.

Do I understand correctly that Solr uploads file to ZooKeeper?

Regards,
Andor



On Wed, Apr 4, 2018 at 9:50 PM, Shawn Heisey  wrote:

> Is it possible to get the current max packet length from the API?
> (version 3.4.x)
>
> If not, I'm guessing that I need to look for the jute.maxbuffer system
> property and fallback to ZkClientConfig.CLIENT_MAX_PACKET_LENGTH_DEFAULT
> if it's not defined.
>
> What I'm trying to do is log a useful error message in Solr if somebody
> tries to upload a file that's too big for what's allowed.  The error
> that they get currently is not helpful, and figuring out what went wrong
> seems to require looking at the server log.
>
> Side note:  I can see in current code (and the 3.5.2 programmer's guide)
> that the default max packet length is 4MB, but the administrators guide
> (even the 3.5.3 version) still says 1MB.
>
> Thanks,
> Shawn
>
>