Re: Port 3888 closed on Leader

2018-08-23 Thread harish lohar
Thanks Everyone.

Yes, even i was wondering how can a node can be leader without port 3888
being open, we just did a restart and everything became fine.

Port 3888 must be in LISTEN mode at all Zk Nodes , while 2888 only opens at
leader and other nodes connect to it.


Thanks
Harish



On Thu, Aug 23, 2018 at 6:47 AM Shawn Heisey  wrote:

> On 8/15/2018 7:46 AM, harish lohar wrote:
> > In a deployment of 3 Node Zk Cluster we have seen that sometime port 3888
> > is absent after the cluster is formed , this causes Follower node to not
> > able to connect to leader if they restart.
> >
> > Don't leader itself should come out of clustering if this happens  ??
>
> I'm not well-versed in how ZK works internally, and don't have access
> any more to systems I can check, but I seem to remember when looking at
> a live ensemble that not every ZK instance will bind to all three ports
> (2181, 2888, and 3888 if using the example configs).  Surprised me when
> I noticed it, but I didn't worry about it too much since ZK seemed to be
> working correctly.
>
> Thanks,
> Shawn
>
>


Port 3888 closed on Leader

2018-08-15 Thread harish lohar
Hi,

In a deployment of 3 Node Zk Cluster we have seen that sometime port 3888
is absent after the cluster is formed , this causes Follower node to not
able to connect to leader if they restart.

Don't leader itself should come out of clustering if this happens  ??

Thanks
Harish


Re: Port :3888 Bind failure

2018-07-23 Thread harish lohar
Another Question on the same line, are there any specific number of retries
configured in zookeeper in case there is intermittent problem with the
interface or they are infinite ??


On Mon, Jul 23, 2018 at 2:37 PM harish lohar  wrote:

> Thanks Andor,
> We found some issue with our interface configuration and correcting same
> has solved the issue.
>
> Thanks
> Harish
>
> On Mon, Jul 23, 2018 at 11:22 AM Andor Molnar 
> wrote:
>
>> Hi,
>>
>> Is the IP address valid that you're trying to bind the server?
>> Please tell me some info about your environment: cloud? docker?
>> kubernetes?
>> ZooKeeper config files would also be beneficial to take a look.
>>
>> Regards,
>> Andor
>>
>>
>>
>>
>> On Mon, Jul 23, 2018 at 5:02 PM, harish lohar  wrote:
>>
>> > Hi ,
>> >
>> > I am seeing bind failure for zookeeper ports, these are random and not
>> > easily reproducible.
>> > There was no one else listening on these ports.
>> >
>> > We have recently upgraded to 3.5.4-beta , earlier i never saw this issue
>> >
>> > 2018-07-22 00:08:59,409 [myid:] - WARN [main:QuorumPeerConfig@644] -
>> > Non-optimial configuration, consider an odd number of servers.
>> > 2018-07-22 00:08:59,707 [myid:181] - ERROR
>> > [/xx.xx.xxx.xxx:3888:QuorumCnxManager$Listener@878] - Exception while
>> > listening
>> > java.net.BindException: Cannot assign requested address (Bind failed)
>> > at java.net.PlainSocketImpl.socketBind(Native Method)
>> > at java.net
>> .AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
>> > at java.net.ServerSocket.bind(ServerSocket.java:375)
>> > at java.net.ServerSocket.bind(ServerSocket.java:329)
>> > at
>> > org.apache.zookeeper.server.quorum.QuorumCnxManager$
>> > Listener.run(QuorumCnxManager.java:856)
>> > ket.bind(ServerSocket.java:329)
>> >
>>
>


Re: Port :3888 Bind failure

2018-07-23 Thread harish lohar
Thanks Andor,
We found some issue with our interface configuration and correcting same
has solved the issue.

Thanks
Harish

On Mon, Jul 23, 2018 at 11:22 AM Andor Molnar 
wrote:

> Hi,
>
> Is the IP address valid that you're trying to bind the server?
> Please tell me some info about your environment: cloud? docker? kubernetes?
> ZooKeeper config files would also be beneficial to take a look.
>
> Regards,
> Andor
>
>
>
>
> On Mon, Jul 23, 2018 at 5:02 PM, harish lohar  wrote:
>
> > Hi ,
> >
> > I am seeing bind failure for zookeeper ports, these are random and not
> > easily reproducible.
> > There was no one else listening on these ports.
> >
> > We have recently upgraded to 3.5.4-beta , earlier i never saw this issue
> >
> > 2018-07-22 00:08:59,409 [myid:] - WARN [main:QuorumPeerConfig@644] -
> > Non-optimial configuration, consider an odd number of servers.
> > 2018-07-22 00:08:59,707 [myid:181] - ERROR
> > [/xx.xx.xxx.xxx:3888:QuorumCnxManager$Listener@878] - Exception while
> > listening
> > java.net.BindException: Cannot assign requested address (Bind failed)
> > at java.net.PlainSocketImpl.socketBind(Native Method)
> > at java.net
> .AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
> > at java.net.ServerSocket.bind(ServerSocket.java:375)
> > at java.net.ServerSocket.bind(ServerSocket.java:329)
> > at
> > org.apache.zookeeper.server.quorum.QuorumCnxManager$
> > Listener.run(QuorumCnxManager.java:856)
> > ket.bind(ServerSocket.java:329)
> >
>


Port :3888 Bind failure

2018-07-23 Thread harish lohar
Hi ,

I am seeing bind failure for zookeeper ports, these are random and not
easily reproducible.
There was no one else listening on these ports.

We have recently upgraded to 3.5.4-beta , earlier i never saw this issue

2018-07-22 00:08:59,409 [myid:] - WARN [main:QuorumPeerConfig@644] -
Non-optimial configuration, consider an odd number of servers.
2018-07-22 00:08:59,707 [myid:181] - ERROR
[/xx.xx.xxx.xxx:3888:QuorumCnxManager$Listener@878] - Exception while
listening
java.net.BindException: Cannot assign requested address (Bind failed)
at java.net.PlainSocketImpl.socketBind(Native Method)
at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
at java.net.ServerSocket.bind(ServerSocket.java:375)
at java.net.ServerSocket.bind(ServerSocket.java:329)
at
org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:856)
ket.bind(ServerSocket.java:329)


Re: ZooKeeper Cluster Health Checking

2018-07-17 Thread harish lohar
We did it via java monitoring app , using zookeeper java api which sends 4
lw commands to zookeeper and returns the output.


Thanks
Harish

On Tue, Jul 17, 2018 at 2:00 AM adrien ruffie 
wrote:

> Hi Harish,
>
>
> thank you very much for this advise and explanation !
>
> Do you think with just a simple script shell for checking all this metrics
> is enough ? Or would better to do it in a Java with a simple monitoring
> application?
>
>
> Thank again,
>
>
> Best regards,
>
>
> Adrien
>
> 
> De : harish lohar 
> Envoyé : mardi 17 juillet 2018 04:13:51
> À : user@zookeeper.apache.org
> Objet : Re: ZooKeeper Cluster Health Checking
>
> Hi Adrian,
> Below zookeeper commands are generally used to get health of zookeeper
> cluster
> stat
>
> Lists brief details for the server and connected clients.
>
> usage echo stat | nc server port
>
> This gives whether cluster is up /down. If down this will give that
>
> Zookeeper instance is currently not serving any request -  which means
> either the leader election is failing or <= 50% of zookeeper node in
> cluster are down.
>
>
> mntr
>
> *New in 3.4.0:* Outputs a list of variables that could be used for
> monitoring the health of the cluster.
>
> $ echo mntr | nc localhost 2185
>
> zk_version  3.4.0
> zk_avg_latency  0
> zk_max_latency  0
> zk_min_latency  0
> zk_packets_received 70
> zk_packets_sent 69
> zk_outstanding_requests 0
> zk_server_state leader
> zk_znode_count   4
> zk_watch_count  0
> zk_ephemerals_count 0
> zk_approximate_data_size27
> zk_followers4   - only exposed by the Leader
> zk_synced_followers 4   - only exposed by the Leader
> zk_pending_syncs0   - only exposed by the Leader
> zk_open_file_descriptor_count 23- only available on Unix platforms
> zk_max_file_descriptor_count 1024   - only available on Unix platforms
>
> The output is compatible with java properties format and the content may
> change over time (new keys added). Your scripts should expect changes.
>
> ATTENTION: Some of the keys are platform specific and some of the keys are
> only exported by the Leader.
>
> The output contains multiple lines with the following format:
>
>
> On Mon, Jul 16, 2018 at 10:13 AM adrien ruffie 
> wrote:
>
> > Hello all,
> >
> >
> > In my company we have a Zookeeper production cluster.
> >
> >
> > But we don't really know how can we check the health of our cluster...
> >
> >
> > Can we advise us about this topic ?
> >
> >
> > I know this topic may has been cropping up for a while, but I don't
> really
> > found any concrete solution.
> >
> >
> > Do you use a monitoring tools ? Which can launch alert ?
> >
> > What metrics/properties/any thing which can indicate that our cluster
> > isn't in good health.
> >
> >
> > Thank you very much and best regards
> >
> >
> > Adrien
> >
>


Re: ZooKeeper Cluster Health Checking

2018-07-16 Thread harish lohar
 Hi Adrian,
Below zookeeper commands are generally used to get health of zookeeper
cluster
stat

Lists brief details for the server and connected clients.

usage echo stat | nc server port

This gives whether cluster is up /down. If down this will give that

Zookeeper instance is currently not serving any request -  which means
either the leader election is failing or <= 50% of zookeeper node in
cluster are down.


mntr

*New in 3.4.0:* Outputs a list of variables that could be used for
monitoring the health of the cluster.

$ echo mntr | nc localhost 2185

zk_version  3.4.0
zk_avg_latency  0
zk_max_latency  0
zk_min_latency  0
zk_packets_received 70
zk_packets_sent 69
zk_outstanding_requests 0
zk_server_state leader
zk_znode_count   4
zk_watch_count  0
zk_ephemerals_count 0
zk_approximate_data_size27
zk_followers4   - only exposed by the Leader
zk_synced_followers 4   - only exposed by the Leader
zk_pending_syncs0   - only exposed by the Leader
zk_open_file_descriptor_count 23- only available on Unix platforms
zk_max_file_descriptor_count 1024   - only available on Unix platforms

The output is compatible with java properties format and the content may
change over time (new keys added). Your scripts should expect changes.

ATTENTION: Some of the keys are platform specific and some of the keys are
only exported by the Leader.

The output contains multiple lines with the following format:


On Mon, Jul 16, 2018 at 10:13 AM adrien ruffie 
wrote:

> Hello all,
>
>
> In my company we have a Zookeeper production cluster.
>
>
> But we don't really know how can we check the health of our cluster...
>
>
> Can we advise us about this topic ?
>
>
> I know this topic may has been cropping up for a while, but I don't really
> found any concrete solution.
>
>
> Do you use a monitoring tools ? Which can launch alert ?
>
> What metrics/properties/any thing which can indicate that our cluster
> isn't in good health.
>
>
> Thank you very much and best regards
>
>
> Adrien
>


Re: Kafka Failing to start due to existing ID

2018-06-18 Thread harish lohar
Just to update everyone, finally i was able to root cause the issue and it
seems to be

https://issues.apache.org/jira/browse/ZOOKEEPER-2901

which is related to node id  being > 127.

it's fixed in 3.5.4-beta and it works fine.

Thanks
Harish


On Wed, Jun 13, 2018 at 7:42 AM Andor Molnar  wrote:

> Hi Harish,
>
> I see 2 things which need to be clarified here:
>
> 1. ZooKeeper session dies in 2 cases only: when client explicitly closes
> the session (which is *not* equivalent to disconnection) or session timeout
> expires,
> 2. If quorum is not present, there'll be no updates committed and clients
> are rejected to connect, so Kafka shouldn't be able to use the cluster.
>
> Similarly, when quorum comes back online, ZooKeeper will continue operating
> normally: it receives client connections, performs updates and expire
> sessions if necessary.
>
> I still believe therefore that your Kafka setup doesn't properly cleanup
> znodes for some reason, but I'm not a Kafka expert.
>
> Regards,
> Andor
>
>
>
>
> On Wed, Jun 13, 2018 at 12:34 AM, harish lohar  wrote:
>
> > Exactly , so in a case where there is jo quotum and no update can be
> made ,
> > is there a way yo stop kafka failing to start.
> >
> > One way is to cleanup kafka related znodes  after bringing up quorum and
> > then starting kafka.
> >
> > I was looking to avoid this.
> >
> >
> > On Tue, Jun 12, 2018 at 4:59 PM Brian Lininger  >
> > wrote:
> >
> > > Hi Harish,
> > > I think I see what may be the problem for you.  Based on your initial
> > > description (6 ZK nodes, 3 down) I think the problem is that you no
> > longer
> > > have a quorum.  When a Zookeeper cluster is running, updates (i.e.
> > removing
> > > znodes) can only occur when Zookeeper has a quorum, which 50.1% of the
> > > configured Zookeeper nodes.  If I understand correctly, then in your
> case
> > > you have 6 Zookeeper nodes configured but 3 are down.  This means that
> > you
> > > only have 50.0% of the Zookeeper cluster working, and thus Zookeeper
> does
> > > not have a quorum so no updates can be made.  I don't know much about
> the
> > > new TTL feature in 3.5, but my assumption is that it works on this same
> > > principle which is that no updates can be made to the cluster's znodes
> > when
> > > there is no quorum.  The same applies to the 3 Zookeeper node cluster,
> > you
> > > must have 2 nodes running to form a quorum and allow any updates to
> > occur.
> > >
> > > Please correct me if I missed something
> > >
> > > Thanks,
> > > Brian
> > >
> > >
> > > On Tue, Jun 12, 2018 at 1:33 PM, harish lohar 
> wrote:
> > >
> > >> -- Forwarded message -
> > >> From: harish lohar 
> > >> Date: Tue, Jun 12, 2018 at 3:26 PM
> > >> Subject: Re: Kafka Failing to start due to existing ID
> > >> To: 
> > >>
> > >>
> > >> Hi Andor,
> > >>
> > >> Thanks for your reply.
> > >>
> > >> This issue is irrespective of number of nodes, even should be seen
> with
> > 3
> > >> Node cluster as well.
> > >>
> > >> Actually kafka has session_timeout config , but that seems to be in
> > effect
> > >> only if zookeeper cluster is up i.e. if kafka goes down when zookeeper
> > >> cluster is up.
> > >>
> > >> Now let's say if 2 nodes of Zookeeper cluster is down , and then if
> > kafka
> > >> connected to 3rd Zookeeper Node goes down zookeeper cluster doesn't
> > >> refresh
> > >> the session for Kafka connected to 3rd Node.
> > >>
> > >> So when other Node comes up and zookeeper cluster becomes available it
> > >> doesn't delete the id of the kafka which went down when zookeeper
> > cluster
> > >> was down.
> > >>
> > >> Regarding TTL I have already enquired the kafka forum and awaiting
> > reply.
> > >>
> > >> Ideally once zookeper cluster is up , it should delete the kafka
> broker
> > >> id's which are not connected which doesn't seem to be happening
> > >>
> > >> I hope I am making some sense :)
> > >>
> > >> Thanks
> > >> harish
> > >>
> > >>
> > >>
> > >> On Tue, Jun 12, 2018 at 2:59 PM Andor Molnár 
> wrote:
> > >>
> > >> > Hi Harish,
> &g

How to new quorum leader in ZK Cluster ( except from stat command)

2018-06-18 Thread harish lohar
Hi,

Is there a way to query on any follower node and find out about the leader
of the ZK cluster.


Thanks
Harish


Kafka Failing to start due to existing ID

2018-06-12 Thread harish lohar
Hi All,

Need help regarding below scenario if any configuration is available to
help.

I have cluster of 6 nodes
3 Nodes are stopped and  brought up again, kafka fails to restart since
broker ID are still present in zookeeper znode /broker/ids/

Since the cluster goes down after removing 3 Nodes , session timeout
doesn't happen.

Though i am aware about TTL feature in zookeeper , but how to make sure
kafka creates znodes with TTL

Thanks
Harish


Non-incremental reconfig failing while trying to bind to same local client port

2018-05-09 Thread harish lohar
Hi All,

Need help resolving below issue:

2018-05-10 00:59:16,584 [myid:1] - WARN
[RecvWorker:3:QuorumCnxManager$RecvWorker@922] - Interrupting SendWorker
2018-05-10 00:59:16,584 [myid:1] - INFO
[QuorumPeerListener:QuorumCnxManager$Listener@636] - My election bind port:
/10.60.11.240:3888
2018-05-10 00:59:16,584 [myid:1] - INFO
[QuorumPeer[myid=1](plain=/127.0.0.1:2181
)(secure=disabled):NIOServerCnxnFactory@706] - binding to port localhost/
127.0.0.1:2181
2018-05-10 00:59:16,585 [myid:1] - ERROR
[QuorumPeer[myid=1](plain=/127.0.0.1:2181
)(secure=disabled):NIOServerCnxnFactory@722] - Error reconfiguring client
port to localhost/127.0.0.1:2181 Address already in use


Re: removing ZK installation

2018-05-08 Thread harish lohar
Could someone please let me know where to get RPM for Centos for Zookeeper.

Thanks
Harish

On Tue, May 8, 2018 at 1:57 PM, Washko, Daniel  wrote:

> Steve, how was zookeeper installed? That should be the method with which
> you remove it.
>
> If you are not sure how it was installed, you can do:
>
> rpm -qa |grep zookeeper
>
> To determine whether it was installed via an RPM package. If that does not
> unearth a matching RPM then it was probably installed some other way. More
> than likely it could have binary in an archive extracted to, maybe,
> /opt/zookeeper.
>
> If you look at the running zookeeper process it should give you an idea of
> where zookeeper is installed and where the data directory is:
>
> ps -ef |grep zookeeper
>
> How zookeeper is starting is dependent on which version of Centos you are
> running. Centos 6 uses upstart and service command. More than likely you
> will find the zookeeper init script in /etc/init.d. If this is Centos 7
> then it's systemd. As root you can run systemctl by itself to get a list of
> service scripts. Hit the "/" key and type in zookeeper. It will take you to
> any service script with zookeeper in the name. This will help you determine
> how to stop zookeeper.
>
> If neither systemd is showing a zookeeper service nor you see a service
> script in /etc/init.d (or if service zookeeper stop doesn't work), then it
> would appear that zookeeper was started in some other way, maybe manually
> without a service or systemd script.
>
> You'll want to figure this out because if you have to manually remove
> zookeeper, instead of using a package manager like RPM, you'll want to
> disable any startup scripts from running and throwing errors once Zookeeper
> is removed.
>
> On 5/8/18, 10:32 AM, "Steph van Schalkwyk" 
> wrote:
>
> Find where it is installed - typically /opt/zookeeper.
> Also do a which zookeeper to see if it is linked to /usr/bin or some
> such
> place.
> Make sure zookeeper is stopped.
> Far as I recall, Centos has Upstart, so sudo stop zookeeper and sudo
> disable zookeeper. Or sudo systemctl stop zookeeper and sudo systemctl
> disable zookeeper.
> Then cat the /opt/zookeeper/conf/zoo.cfg to see where the data
> directories
> and logs are. Delete the data and log directories.
> Then delete /opt/zookeeper.
> Steph
>
>
>
> On Tue, May 8, 2018 at 9:07 AM, Steve Pruitt 
> wrote:
>
> > Hi,
> >
> > I need to remove ZooKeeper from a Centos machine.  I tried yum
> remove to
> > no avail using instructions I found online.
> >
> > Thanks.
> >
> > -S
> >
> >
>
>
>


Does 3.4.11 supports Reconfig feature ??

2018-05-04 Thread harish lohar
Hi,

Could anyone please clarify if 3.4.11 release supports reconfig feature.


Getting Authentication Not valid while running reconfig Command

2018-03-01 Thread harish lohar
I am connecting from ./zkCli.sh and trying to add an server to zookeeper
ensemble

I see i am authenticated on prompt



2018-03-01 11:21:41,716 [myid:localhost:2181] - INFO
[main-SendThread(localhost:2181):ZooKeeperSaslClient@274] - Client will use
DIGEST-MD5 as SASL mechanism.
2018-03-01 11:21:41,770 [myid:localhost:2181] - INFO
[main-SendThread(localhost:2181):ClientCnxn$SendThread@1113] - Opening
socket connection to server localhost/127.0.0.1:2181. Will attempt to
SASL-authenticate using Login Context section 'Client'
WatchedEvent state:SaslAuthenticated type:None path:null

Even Set ACL doesnt work

[zk: localhost:2181(CONNECTED) 1] setAcl /zookeeper/config
world:anyone:cdrwa
Authentication is not valid : /zookeeper/config

same issue happens with "reconfig" command as well.

I am using zookeeper-3.5.3-beta release

Appreciate your quick response.

Thanks
harish