how to debug zookeeper session failure

2017-05-22 Thread prashantkumar dhotre
Hi
Many of apps are reporting zookeeper session failure.
>From the app logs and zookeeper logs it is not clear what is the root cause.
Could you please inform how to debug zookeeper session failures.
What debug prints I can print in apps when this issue happens
and what zookeeper related info  (using zkcli) i can collect if this happens etc
Regards


core in netlink with "Unexpected error 9 on netlink descriptor 20" error

2018-03-08 Thread prashantkumar dhotre
Hi
Could you please let me know if this issue is already resolved or if we
have an exisiting bug already for below issue
issue:core in netlink with "Unexpected error 9 on netlink descriptor 20"
error
https://issues.apache.org/jira/browse/ZOOKEEPER-2996

Thanks


what does it mean to have exit-code of 251

2019-03-04 Thread prashantkumar dhotre
Hi,

In my journal log, I see ;

1199473 Mar 01 15:46:03 evo-qfx-01 systemd[1]: ifmand.service: Main process
exited, code=exited, status=251/n/a

In my app, I don't explicitly call exit(251).
I use zookeeper lib.
Fro my logs,I see that after a call to zookeeper lib API zookeeper_close()
, I see that my service exited with 251.
Is 251 a std exit code or app/lib specific custom exit code ?
I want to know what does 251 means.

Can you  please  let me know where can I see the exit-code to meaning mapping
?

Thanks

Prashant


client session expired after timeout after errors and warning logs in zk server logs

2019-06-20 Thread prashantkumar dhotre
Hi
We use 3.5.1-alpha version.
We are seeing session expiry issue in VM set up.
This is running in replicated more (two servers + node mastership as one
vote for quorum).
we see client session expired after session timeout (of 10 sec).
This connection was to local zk server.
session timeout is 10 sec.

This session got established at 17:40:18 and ZK server expired this at
17:40:57, after 39 seconds of establishment.

in between this time, i see few errors and warnings in zookeeper server
logs (as shown below).

I see below errors/warning in between this time before session expiry.

This issue is not very easy to replicate , so far we have seen only twice.

Could you please help me identify root cause and let me know if this is
fixed in later release ?

Thanks,

Prashant


Logs:



ZK server errors and warnings in ZK server logs:



2711:2019-06-05 17:40:38,766 [myid:2147483652] - INFO  [
0.0.0.0/0.0.0.0:61864:QuorumCnxManager$Listener@637] - Received connection
request /128.0.0.5:38020

2724:2019-06-05 17:40:38,931 [myid:2147483652] - WARN
[RecvWorker:2147483653:QuorumCnxManager$RecvWorker@917] - Connection broken
for id 2147483653, my id = 2147483652, error =

2725:java.io.EOFException

2726:   at java.io.DataInputStream.readInt(Unknown Source)

2727:   at
org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:902)

2728:2019-06-05 17:40:38,943 [myid:2147483652] - WARN
[RecvWorker:2147483653:QuorumCnxManager$RecvWorker@920] - Interrupting
SendWorker

2730:2019-06-05 17:40:38,950 [myid:2147483652] - WARN
[SendWorker:2147483653:QuorumCnxManager$SendWorker@834] - Interrupted while
waiting for message on queue

2731:java.lang.InterruptedException

2732:   at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(Unknown
Source)

2733:   at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown
Source)

2734:   at java.util.concurrent.ArrayBlockingQueue.poll(Unknown Source)

2735:   at
org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:986)

2736:   at
org.apache.zookeeper.server.quorum.QuorumCnxManager.access$500(QuorumCnxManager.java:63)

2737:   at
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:822)

2738:2019-06-05 17:40:38,965 [myid:2147483652] - WARN
[SendWorker:2147483653:QuorumCnxManager$SendWorker@843] - Send worker
leaving thread  id 2147483653 my id = 2147483652

2740:2019-06-05 17:40:38,978 [myid:2147483652] - INFO  [
0.0.0.0/0.0.0.0:61864:QuorumCnxManager$Listener@637] - Received connection
request /128.0.0.5:38022

2741:2019-06-05 17:40:38,986 [myid:2147483652] - INFO  [
0.0.0.0/0.0.0.0:61864:QuorumCnxManager$Listener@637] - Received connection
request /128.0.0.5:38024

2742:2019-06-05 17:40:38,987 [myid:2147483652] - WARN
[RecvWorker:2147483653:QuorumCnxManager$RecvWorker@920] - Interrupting
SendWorker

2743:2019-06-05 17:40:39,017 [myid:2147483652] - ERROR
[SendWorker:2147483653:QuorumCnxManager$SendWorker@810] - Failed to send
last message. Shutting down thread.

2744:java.net.SocketException: Socket closed

2745:   at java.net.SocketOutputStream.socketWrite(Unknown Source)

2746:   at java.net.SocketOutputStream.write(Unknown Source)

2747:   at java.io.DataOutputStream.writeInt(Unknown Source)

2748:   at
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.send(QuorumCnxManager.java:779)

2749:   at
org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:806)

2750:2019-06-05 17:40:39,018 [myid:2147483652] - WARN
[SendWorker:2147483653:QuorumCnxManager$SendWorker@843] - Send worker
leaving thread  id 2147483653 my id = 2147483652

2758:2019-06-05 17:40:39,125 [myid:2147483652] - INFO
[LearnerHandler-/128.0.0.5:49888:LearnerHandler@385] - Follower sid:
2147483653 not in the current config 1

2759:2019-06-05 17:40:39,130 [myid:2147483652] - INFO
[LearnerHandler-/128.0.0.5:49888:LearnerHandler@683] - Synchronizing with
Follower sid: 2147483653 maxCommittedLog=0x10e04 minCommittedLog=0x1000

00c10 lastProcessedZxid=0x10e04 peerLastZxid=0x0

2760:2019-06-05 17:40:39,132 [myid:2147483652] - WARN
[LearnerHandler-/128.0.0.5:49888:ZKDatabase@310] - Unable to find proposals
from txnlog for zxid: 0

2762:2019-06-05 17:40:39,134 [myid:2147483652] - INFO
[LearnerHandler-/128.0.0.5:49888:LearnerHandler@455] - Sending snapshot
last zxid of peer is 0x0, zxid of leader is 0x10e0b, send zxid of db as
0

x10e05, 1 concurrent snapshots, snapshot was exempt from throttle

2814:2019-06-05 17:40:39,438 [myid:2147483652] - INFO
[ProcessThread(sid:2147483652 cport:-1)::PrepRequestProcessor@512] -
Incremental reconfig

2815:2019-06-05 17:40:39,454 [myid:2147483652] - WARN
[LearnerHandler-/128.0.0.5:49888:QuorumPeer@1370] - Restarting Leader
Election

2816:2019-06-05 17:40:39,474 [myid:2147483652] - INFO  [
0.0.0.0/0.0.0.0:61864:QuorumCnxManager$Listener@659] - 

'zkServer.sh status' connects only to local node ?

2019-12-12 Thread prashantkumar dhotre
Hi,
Could you please help me with a basic query regarding 'zkServer.sh status' ?
>From zkServer.sh, it appears that this cmd tries to connect to server at
local node and get info.
If the local server is down, does it connect to next available server (as
specified in config file )  and returns status ?
Or is this cmd intended to only report status of local server and not
status of zookeeper quorum ?
My use case is, my client app needs to connect to zookeeper and create
znode.
connect routine ( which does zookeeper_init2) has retries and in case of
connect failure, it sleeps and retries for number of times.
But if zookeeper quorum is down (zookeeper not running ) then I don't want
to even attempt connection ( hence avoid wait delay in connect api)
Regards,
Prashant