I don't believe that we've seen this issue since 15:55 UTC. If you've been
seeing hangs more often, it's possible that you are experiencing another
problem. Or, equally likely, there is another set of things going wrong.

I've restarted the cluster twice today, but this shouldn't cause hanging --
just closed connections.


On Mon, Feb 22, 2010 at 1:53 PM, Scott Wilcox <sc...@tig.gr> wrote:

> Same issue here, over the past six hours more than ever.
> On 22 Feb 2010, at 19:13, John Kalucki wrote:
> One further note: A reasonable workaround for the moment is to, if your
> client allows iotcls, set a socket timeout of about 90 seconds on your
> streaming api connection. The servers currently send a newline every 30
> seconds. If you stop receiving newlines for 90 seconds, your connection is
> probably high and dry, and you should reconnect. Please be sure to continue
> honoring the reconnection policies as described in the wiki, however.
> -John Kalucki
> http://twitter.com/jkalucki
> Infrastructure, Twitter Inc.
> On Mon, Feb 22, 2010 at 10:51 AM, John Kalucki <j...@twitter.com> wrote:
>> A number of developers have reported abandoned connection issues on the
>> Streaming API starting, perhaps, about two weeks ago. The symptoms include a
>> long-established TCP connection to stream.twitter.com going quiet, with
>> the connection mysteriously held open for perhaps hours afterward. After
>> sorting through a lot of conflicting data and chasing a few wild geese, I
>> finally reproduced this problem at Feb 22 15:55 UTC (7:55am PST). I'd
>> imagine that a number of streams were abandoned at this time. If you had a
>> correlative experience within a minute or so of 15:55 UTC, please respond to
>> this message.
>> We currently suspect an infrequent hardware load balancer issue, perhaps
>> related to a recent configuration change. The appearance is that the load
>> balancer is, for whatever reason, dropping valid connections, closing the
>> connection to the Streaming API servers, but not sending a TCP FIN or TCP
>> RST to the client. This is bad. We're treating this as a critical production
>> issue and working through the details with network operations. I'll follow
>> up as we learn more.
>> -John Kalucki
>> http://twitter.com/jkalucki
>> Infrastructure, Twitter Inc.

Reply via email to