Thanks for sharing insights. Moving hbase mailing list to cc.
Sorry, forgot to mention that we are using Phoenix4.7(HDP 2.6.3). This
cluster is mostly being queried via Phoenix apart from few pure NoSql cases
that uses raw HBase api's.

I looked further into zk logs and found that only 6/15 RS are running into
max connection problems(no other ip/hosts of our client apps were found)
constantly. One of those RS is getting 3-4x the connections errors as
compared to others, this RS is hosting hbase:meta
<http://ip-10-74-10-228.us-west-2.compute.internal:16030/region.jsp?name=1588230740>,
regions of phoenix secondary indexes and region of Phoenix and HBase
tables. I also looked into other 5 RS that are getting max connection
errors, for me nothing really stands out since all of them are hosting
regions of phoenix secondary indexes and region of Phoenix and HBase tables.

I also tried to run netstat and tcpdump on zk host to find out anomaly but
couldn't find anything apart from above mentioned analysis. Also ran hbck
and it reported that things are fine. I am still unable to pin point exact
problem(maybe something with phoenix secondary index?). Any other pointer
to further debug the problem will be appreciated.

Lastly, I constantly see following zk connection loss logs in above
mentioned 6 RS:





















*2020-06-03 06:40:30,859 WARN
 
[RpcServer.FifoWFPBQ.default.handler=123,queue=3,port=16020-SendThread(ip-10-74-0-120.us-west-2.compute.internal:2181)]
zookeeper.ClientCnxn: Session 0x0 for server
ip-10-74-0-120.us-west-2.compute.internal/10.74.0.120:2181
<http://10.74.0.120:2181>, unexpected error, closing socket connection and
attempting reconnectjava.io.IOException: Connection reset by peer        at
sun.nio.ch.FileDispatcherImpl.read0(Native Method)        at
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)        at
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)        at
sun.nio.ch.IOUtil.read(IOUtil.java:192)        at
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)        at
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
      at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
      at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)2020-06-03
06:40:30,861 INFO
 
[RpcServer.FifoWFPBQ.default.handler=137,queue=17,port=16020-SendThread(ip-10-74-9-182.us-west-2.compute.internal:2181)]
zookeeper.ClientCnxn: Opening socket connection to server
ip-10-74-9-182.us-west-2.compute.internal/10.74.9.182:2181
<http://10.74.9.182:2181>. Will not attempt to authenticate using SASL
(unknown error)2020-06-03 06:40:30,861 INFO
 
[RpcServer.FifoWFPBQ.default.handler=137,queue=17,port=16020-SendThread(ip-10-74-9-182.us-west-2.compute.internal:2181)]
zookeeper.ClientCnxn: Socket connection established, initiating session,
client: /10.74.10.228:60012 <http://10.74.10.228:60012>, server:
ip-10-74-9-182.us-west-2.compute.internal/10.74.9.182:2181
<http://10.74.9.182:2181>2020-06-03 06:40:30,861 WARN
 
[RpcServer.FifoWFPBQ.default.handler=137,queue=17,port=16020-SendThread(ip-10-74-9-182.us-west-2.compute.internal:2181)]
zookeeper.ClientCnxn: Session 0x0 for server
ip-10-74-9-182.us-west-2.compute.internal/10.74.9.182:2181
<http://10.74.9.182:2181>, unexpected error, closing socket connection and
attempting reconnectjava.io.IOException: Connection reset by peer        at
sun.nio.ch.FileDispatcherImpl.read0(Native Method)        at
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)        at
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)        at
sun.nio.ch.IOUtil.read(IOUtil.java:192)        at
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)        at
org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
      at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
      at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)*

Thanks!

On Tue, Jun 2, 2020 at 6:57 AM Josh Elser <els...@apache.org> wrote:

> HBase (daemons) try to use a single connection for themselves. A RS also
> does not need to mutate state in ZK to handle things like gets and puts.
>
> Phoenix is probably the thing you need to look at more closely
> (especially if you're using an old version of Phoenix that matches the
> old HBase 1.1 version). Internally, Phoenix acts like an HBase client
> which results in a new ZK connection. There have certainly been bugs
> like that in the past (speaking generally, not specifically).
>
> On 6/1/20 5:59 PM, anil gupta wrote:
> > Hi Folks,
> >
> > We are running in HBase problems due to hitting the limit of ZK
> > connections. This cluster is running HBase 1.1.x and ZK 3.4.6.x on I3en
> ec2
> > instance type in AWS. Almost all our Region server are listed in zk logs
> > with "Too many connections from /<IP> - max is 60".
> > 2020-06-01 21:42:08,375 - WARN  [NIOServerCxn.Factory:
> > 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections
> from
> > /<ip> - max is 60
> >
> >   On a average each RegionServer has ~250 regions. We are also running
> > Phoenix on this cluster. Most of the queries are short range scans but
> > sometimes we are doing full table scans too.
> >
> >    It seems like one of the simple fix is to increase maxClientCnxns
> > property in zoo.cfg to 300, 500, 700, etc. I will probably do that. But,
> i
> > am just curious to know In what scenarios these connections are
> > created/used(Scans/Puts/Delete or during other RegionServer operations)?
> > Are these also created by hbase clients/apps(my guess is NO)? How can i
> > calculate optimal value of maxClientCnxns for my cluster/usage?
> >
>


-- 
Thanks & Regards,
Anil Gupta

Reply via email to