Re: [Openstack] Horizon and open connections

2013-01-31 Thread Gabriel Hurley
Even though I don't experience this problem (and prefer nginx to apache), I can 
help diagnose:

Connections ending up in CLOSE_WAIT means that the socket isn't being fully 
closed, which is controlled by the client lib (in this case 
python-keystoneclient) which uses httplib2 under the hood.  When requests 
complete successfully httplib2 *does* close the connections just fine, so I'm 
wondering if you're actually triggering some kind of unhandled exception in 
keystoneclient. Are you seeing any errors in your logs anywhere? It's also 
worth noting that httplib2 has some very peculiar retry behaviors and other 
vagaries that come into play when the remote endpoint is unresponsive, etc.

Another potential problem is if you're running a proxy layer (such as haproxy) 
in the middle there are various configuration options which can cause the 
connection to remain open even after the backend has sent a complete response 
(adding inappropriate keep-alive headers, stripping connection: close, 
filtering packets, etc.). The same is true of any other middleware you might be 
running that could get between the python process opening the socket and the 
remote end returning a response.

Hope something in there helps,

- Gabriel

 -Original Message-
 From: openstack-bounces+gabriel.hurley=nebula@lists.launchpad.net
 [mailto:openstack-
 bounces+gabriel.hurley=nebula@lists.launchpad.net] On Behalf Of Sam
 Morrison
 Sent: Wednesday, January 30, 2013 7:36 PM
 To: openstack@lists.launchpad.net list
 Subject: [Openstack] Horizon and open connections
 
 We have horizon running based on the Ubuntu Folsom Cloud Archive
 packages.
 
 What I notice is that after a while we have thousands of connections in the
 CLOSE_WAIT state to keystone and our nova api servers.
 The host also uses up all it's available memory (2GB)
 
 After a restart of apache all the connections are cleaned up and the memory
 used drops down to about 200MB
 
 Just wondering if this is supposed to happen or is there a bug. It seems to me
 that horizon isn't closing connections or something.
 
 Anyone have a similar issue/solution?
 
 Cheers,
 Sam
 
 
 
 ___
 Mailing list: https://launchpad.net/~openstack
 Post to : openstack@lists.launchpad.net
 Unsubscribe : https://launchpad.net/~openstack
 More help   : https://help.launchpad.net/ListHelp



___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


Re: [Openstack] Horizon and open connections

2013-01-31 Thread Rick Jones

On 01/31/2013 11:59 AM, Gabriel Hurley wrote:

Even though I don't experience this problem (and prefer nginx to
apache), I can help diagnose:

Connections ending up in CLOSE_WAIT means that the socket isn't being
fully closed, which is controlled by the client lib (in this case
python-keystoneclient) which uses httplib2 under the hood.


Expanding on that a bit. CLOSE_WAIT is the state a TCP endpoint will 
enter upon receiving a FINished segment from the remote TCP.  When the 
FIN arrives, the local application will receive notification of this via 
the classic read return of zero on a receive/read call against the 
socket.  The FIN segment means I will be sending you no more data.


Meanwhile, the local TCP will have ACKed the FIN segment, and the remote 
TCP will transition to FIN_WAIT_2 upon receipt of that ACK (until then 
it will be in FIN_WAIT_1).


Depending on how the remote application triggered the sending of the 
FIN, the TCP connection is now in a perfectly valid simplex state 
wherein the side in CLOSE_WAIT can continue sending data to the side 
which will now be in FIN_WAIT_2.  It is exceedingly rare for 
applications to want a simplex TCP connection


If such a unidirectional TCP connection is not of any use to an 
application, (the common case)then that application should/must also 
close the connection upon the read return of zero.


Thus, seeing lots of connections stuck in CLOSE_WAIT is an indication 
of an application-level (relative to TCP) bug wherein the application on 
the CLOSE_WAIT side is ignoring the read return of zero.


Such bugs in applications may be masked by a few things:

1) If the remote side called close() rather than shutdown(SHUT_WR) then 
an attempt on the CLOSE_WAIT side to send data to the remote will cause 
the remote TCP to return a RST segment (reset) because there is no 
longer anything above TCP to receive the data.  This will then cause the 
local TCP to terminate the connection.  This may also happen if the 
local application set SO_KEEPALIVE to enable TCP keepalives.


*) If the local side doesn't send anything, and doesn't have TCP 
keepalives set, if the remote TCP has a FIN_WAIT_2 timer of some sort 
going (long story involving a hole in the TCP specification and 
implementation workarounds, email if you want to hear it) then when that 
FIN_WAIT_2 timer expires the remote TCP may sent a RST segment.


RST segments are best effort in sending - they don't get retransmitted 
explicitly.  In case 1 if the RST segment doesn't make it back, the 
local TCP will retransmit the data it was sending (because it will not 
have received an ACKnowledgement either).  It will then either receive 
the RST triggered by that retransmission, or if no RSTs ever make it 
back, the local TCP will at some point reach its retransmission limit 
and terminate the connection.  In case 2, if that one RST is lost, 
that's it, and the CLOSE_WAIT may remain forever.


Again though, given the rarity of actual application use of a simplex 
TCP connection, 99 times out of 10, seeing lots of CLOSE_WAIT 
connections building-up implies a buggy application or the libraries 
doing work on its behalf.


rick jones

___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp


[Openstack] Horizon and open connections

2013-01-30 Thread Sam Morrison
We have horizon running based on the Ubuntu Folsom Cloud Archive packages.

What I notice is that after a while we have thousands of connections in the 
CLOSE_WAIT state to keystone and our nova api servers. 
The host also uses up all it's available memory (2GB)

After a restart of apache all the connections are cleaned up and the memory 
used drops down to about 200MB

Just wondering if this is supposed to happen or is there a bug. It seems to me 
that horizon isn't closing connections or something.

Anyone have a similar issue/solution?

Cheers,
Sam



___
Mailing list: https://launchpad.net/~openstack
Post to : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp