Hi Marcel, I really appreciate your input. We will try to find out what is wrong in connmgr_get() which caused port leaking.
In the meantime, we want to change clnt_cots_do_bindresvport from 1 to 0 through mdb so that new connection will start using non-reserved port, http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/rpc/clnt_cots.c#508 Will dynamically changing this variable have any side effect? Thanks, -Youzhong On Mon, Nov 3, 2014 at 4:50 PM, Marcel Telka <mar...@telka.sk> wrote: > Hi Youzhong, > > On Mon, Nov 03, 2014 at 04:14:55PM -0500, Youzhong Yang via > illumos-developer wrote: > > Hello, > > > > We are having a very strange issue on one of servers. The issue is that > > fcntl locking over NFS returns 'no locks available' immediately. > > > > dtrace shows that bindresvport() returns error code 125 (EADDRINUSE): > > > > # dtrace -n 'fbt:rpcmod:bindresvport:return /arg1 != 0/ {stack(); > > printf("ret = %d", arg1);}' > > 9 52692 bindresvport:return > > rpcmod`connmgr_get+0x560 > > rpcmod`connmgr_wrapget+0x63 > > rpcmod`clnt_cots_kcallit+0x198 > > rpcmod`rpcbind_getaddr+0x245 > > klmmod`update_host_rpcbinding+0x4f > > klmmod`nlm_host_get_rpc+0x6d > > klmmod`nlm_do_lock+0x10d > > klmmod`nlm4_lock_4_svc+0x2a > > klmmod`nlm_dispatch+0xe6 > > klmmod`nlm_prog_4+0x34 > > rpcmod`svc_getreq+0x1c1 > > rpcmod`svc_run+0x146 > > rpcmod`svc_do_run+0x8e > > nfs`nfssys+0xf1 > > unix`_sys_sysenter_post_swapgs+0x149 > > ret = 125 > > > > netstat shows that 501 reserved ports are in BOUND state: > > > > # netstat -an | grep BOUND > > *.935 *.* 0 0 1049740 0 > BOUND > > *.801 *.* 0 0 1049740 0 > BOUND > > *.798 *.* 0 0 1049740 0 > BOUND > > *.561 *.* 0 0 1049740 0 > BOUND > > *.613 *.* 0 0 1049740 0 > BOUND > > .... > > # netstat -an | grep BOUND | wc -l > > 501 > > > > Has anyone seen this similar issue? is it possible to unbind those > reserved > > ports? Rebooting the server is our last resort. > > > > Any advice would be very much appreciated. > > I faced similar issue in connmgr_get(). It is filed as #1616 and the > problem > is that the dead connection is not properly closed (there seems to be > missing > connmgr_cancelconn() call somewhere), so the client could properly > reconnect. > Unfortunately, I had no time to finish the analysis of this bug. > > > HTH > > -- > +-------------------------------------------+ > | Marcel Telka e-mail: mar...@telka.sk | > | homepage: http://telka.sk/ | > | jabber: mar...@jabber.sk | > +-------------------------------------------+ > ------------------------------------------- smartos-discuss Archives: https://www.listbox.com/member/archive/184463/=now RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00 Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125&id_secret=25769125-7688e9fb Powered by Listbox: http://www.listbox.com