Re: Too many connections from / - max is 60

Sukumar Maddineni Tue, 09 Jun 2020 00:10:19 -0700

Hi Anil,

I think if you create that missing HBase table (index table) with dummy
metadata(using have shell) and then Phoenix drop index and recreate index
should work.


Phoenix 4.7 indexing code might have issues related cross rpc calls between
RS which can cause zk connection leaks. I would recommend upgrading 4.14.3
of possible which has lot of indexing improvements related to consistency a
d also performance.

--
Sukumar

On Mon, Jun 8, 2020, 10:15 PM anil gupta <[email protected]> wrote:

> You were right from the beginning. It is a problem with phoenix secondary
> index!
> I tried 4LW zk commands after enabling them, they didnt really provide me
> much extra information.
> Then, i took heap and thread dump of a RS that was throwing a lot of max
> connection error. Most of the rpc were busy with:
> *RpcServer.FifoWFPBQ.default.handler=129,queue=9,port=16020*
> *org.apache.phoenix.hbase.index.exception.SingleIndexWriteFailureException
> @ 0x63d837320*
> *org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException @
> 0x41f9a0270*
>
> *Failed 499 actions: Table 'DE:TABLE_IDX_NEW' was not found, got:
> DE:TABLE_IDX.: 499 times, *
>
> After finding the above error, i queried SYSTEM.CATALOG table. DE:TABLE
> has two secondary index but has only 1 secondary index table in
> hbase(DE:TABLE_IDX). Other table DE:TABLE_IDX_NEW' is missing from hbase. I
> am not really sure how this would happen.
> DE:TABLE_IDX_NEW is listed with index_state='i' in catalog table. Can you
> tell me what does this mean?(incomplete?)
> Now, i am trying to delete the primary table to get rid of index and then
> we can recreate the primary table since this is a small table but i am
> unable to do so via phoenix. Can you please tell me how i can
> *delete this table?(restart of cluster? or doing an upsert in catalog?)*
>
> *Lat one, if table_type='u' then its a user defined table? if
> table_type='i' then its an index table? *
>
> *Thanks a lot for your help!*
>
> *~Anil *
>
> On Wed, Jun 3, 2020 at 8:14 AM Josh Elser <[email protected]> wrote:
>
>> The RegionServer hosting hbase:meta will certainly have "load" placed
>> onto it, commensurate to the size of your cluster and the number of
>> clients you're running. However, this shouldn't be increasing the amount
>> of connections to ZK from a RegionServer.
>>
>> The RegionServer hosting system.catalog would be unique WRT other
>> Phoenix-table Regions. I don't recall off of the top of my head if there
>> is anything specific in the RegionServer code that runs alongside
>> system.catalog (the MetaDataEndpoint protocol) that reaches out to
>> ZooKeeper.
>>
>> If you're using HDP 2.6.3, I wouldn't be surprised if you're running
>> into known and fixed issues where ZooKeeper connections are not cleaned
>> up. That's multiple-years old code.
>>
>> netstat and tcpdump isn't really going to tell you anything you don't
>> already. From a thread dump or a heap dump, you'll be able to see the
>> number of ZooKeeper connections from a RegionServer. The 4LW commands
>> from ZK will be able to tell you which clients (i.e. RegionServers) have
>> the most connections. These numbers should match (X connections from a
>> RS to a ZK, and X connections in the Java RS process). The focus would
>> need to be on what opens a new connection and what is not properly
>> closing that connection (in every case).
>>
>> On 6/3/20 4:57 AM, anil gupta wrote:
>> > Thanks for sharing insights. Moving hbase mailing list to cc.
>> > Sorry, forgot to mention that we are using Phoenix4.7(HDP 2.6.3). This
>> > cluster is mostly being queried via Phoenix apart from few pure NoSql
>> > cases that uses raw HBase api's.
>> >
>> > I looked further into zk logs and found that only 6/15 RS are running
>> > into max connection problems(no other ip/hosts of our client apps were
>> > found) constantly. One of those RS is getting 3-4x the connections
>> > errors as compared to others, this RS is hosting hbase:meta
>> > <
>> http://ip-10-74-10-228.us-west-2.compute.internal:16030/region.jsp?name=1588230740>,
>>
>> > regions of phoenix secondary indexes and region of Phoenix and HBase
>> > tables. I also looked into other 5 RS that are getting max connection
>> > errors, for me nothing really stands out since all of them are hosting
>> > regions of phoenix secondary indexes and region of Phoenix and HBase
>> tables.
>> >
>> > I also tried to run netstat and tcpdump on zk host to find out anomaly
>> > but couldn't find anything apart from above mentioned analysis. Also
>> ran
>> > hbck and it reported that things are fine. I am still unable to pin
>> > point exact problem(maybe something with phoenix secondary index?). Any
>> > other pointer to further debug the problem will be appreciated.
>> >
>> > Lastly, I constantly see following zk connection loss logs in above
>> > mentioned 6 RS:
>> > /2020-06-03 06:40:30,859 WARN
>> >
>>  
>> [RpcServer.FifoWFPBQ.default.handler=123,queue=3,port=16020-SendThread(ip-10-74-0-120.us-west-2.compute.internal:2181)]
>> zookeeper.ClientCnxn: Session 0x0 for server
>> ip-10-74-0-120.us-west-2.compute.internal/10.74.0.120:2181 <
>> http://10.74.0.120:2181>, unexpected error, closing socket connection
>> and attempting reconnect
>> > java.io.IOException: Connection reset by peer
>> >          at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>> >          at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>> >          at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>> >          at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>> >          at
>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
>> >          at
>> >
>> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
>> >          at
>> >
>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
>> >          at
>> > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)
>> > 2020-06-03 06:40:30,861 INFO
>> >
>>  
>> [RpcServer.FifoWFPBQ.default.handler=137,queue=17,port=16020-SendThread(ip-10-74-9-182.us-west-2.compute.internal:2181)]
>> zookeeper.ClientCnxn: Opening socket connection to server
>> ip-10-74-9-182.us-west-2.compute.internal/10.74.9.182:2181 <
>> http://10.74.9.182:2181>. Will not attempt to authenticate using SASL
>> (unknown error)
>> > 2020-06-03 06:40:30,861 INFO
>> >
>>  
>> [RpcServer.FifoWFPBQ.default.handler=137,queue=17,port=16020-SendThread(ip-10-74-9-182.us-west-2.compute.internal:2181)]
>> zookeeper.ClientCnxn: Socket connection established, initiating session,
>> client: /10.74.10.228:60012 <http://10.74.10.228:60012>, server:
>> ip-10-74-9-182.us-west-2.compute.internal/10.74.9.182:2181 <
>> http://10.74.9.182:2181>
>> > 2020-06-03 06:40:30,861 WARN
>> >
>>  
>> [RpcServer.FifoWFPBQ.default.handler=137,queue=17,port=16020-SendThread(ip-10-74-9-182.us-west-2.compute.internal:2181)]
>> zookeeper.ClientCnxn: Session 0x0 for server
>> ip-10-74-9-182.us-west-2.compute.internal/10.74.9.182:2181 <
>> http://10.74.9.182:2181>, unexpected error, closing socket connection
>> and attempting reconnect
>> > java.io.IOException: Connection reset by peer
>> >          at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>> >          at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>> >          at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>> >          at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>> >          at
>> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
>> >          at
>> >
>> org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
>> >          at
>> >
>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
>> >          at
>> > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1125)/
>> >
>> > Thanks!
>> >
>> > On Tue, Jun 2, 2020 at 6:57 AM Josh Elser <[email protected]
>> > <mailto:[email protected]>> wrote:
>> >
>> >     HBase (daemons) try to use a single connection for themselves. A RS
>> >     also
>> >     does not need to mutate state in ZK to handle things like gets and
>> puts.
>> >
>> >     Phoenix is probably the thing you need to look at more closely
>> >     (especially if you're using an old version of Phoenix that matches
>> the
>> >     old HBase 1.1 version). Internally, Phoenix acts like an HBase
>> client
>> >     which results in a new ZK connection. There have certainly been bugs
>> >     like that in the past (speaking generally, not specifically).
>> >
>> >     On 6/1/20 5:59 PM, anil gupta wrote:
>> >      > Hi Folks,
>> >      >
>> >      > We are running in HBase problems due to hitting the limit of ZK
>> >      > connections. This cluster is running HBase 1.1.x and ZK 3.4.6.x
>> >     on I3en ec2
>> >      > instance type in AWS. Almost all our Region server are listed in
>> >     zk logs
>> >      > with "Too many connections from /<IP> - max is 60".
>> >      > 2020-06-01 21:42:08,375 - WARN  [NIOServerCxn.Factory:
>> >      > 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193
>> >     <http://0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193>] - Too many
>> >     connections from
>> >      > /<ip> - max is 60
>> >      >
>> >      >   On a average each RegionServer has ~250 regions. We are also
>> >     running
>> >      > Phoenix on this cluster. Most of the queries are short range
>> >     scans but
>> >      > sometimes we are doing full table scans too.
>> >      >
>> >      >    It seems like one of the simple fix is to increase
>> maxClientCnxns
>> >      > property in zoo.cfg to 300, 500, 700, etc. I will probably do
>> >     that. But, i
>> >      > am just curious to know In what scenarios these connections are
>> >      > created/used(Scans/Puts/Delete or during other RegionServer
>> >     operations)?
>> >      > Are these also created by hbase clients/apps(my guess is NO)? How
>> >     can i
>> >      > calculate optimal value of maxClientCnxns for my cluster/usage?
>> >      >
>> >
>> >
>> >
>> > --
>> > Thanks & Regards,
>> > Anil Gupta
>>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

Re: Too many connections from / - max is 60

Reply via email to