Thank John and Ntwrk Team ! ;-)
On Feb 24, 8:48 pm, John Kalucki <j...@twitter.com> wrote:
> Network operations believes that they found and fixed the proximate cause of
> the connection abandonment issue -- a periodically overwhelmed LB CPU. The
> LB should close connections in this case, but for some reason it wasn't
> closing them over the last two weeks. Load will be more carefully managed on
> this particular LB pair to avoid this problem in the future. If you see
> abandoned connections, let's dig into the issue, but, for now, I think the
> system is in a good state.
> -John Kaluckihttp://twitter.com/jkalucki
> Infrastructure, Twitter Inc.
> On Tue, Feb 23, 2010 at 9:37 AM, John Kalucki <j...@twitter.com> wrote:
> > Judging my some additional data found yesterday, this drop apparently
> > happens most often at around say 14:30 and 16:00 UTC, a time period that we
> > also happen to steeply climb into our daily peak traffic. By our monitoring,
> > we have not experienced a connection drop so far this morning. But, I have
> > no confidence that all streams are dropped during all events, and it's
> > possible that our monitoring streams were just lucky -- and any true client
> > drops were lost in the organic connection churn noise.
> > If you have data to the contrary between say 14:00 UTC and 18:00 UTC today,
> > please let us know. Otherwise, we're going to keep watching and waiting for
> > this to happen again. Once we have a drop, we have a team of networking
> > engineers at the ready to run through a pre-planned sequence of
> > investigatory steps. With any luck, we'll identify the issue.
> > -John Kalucki
> > Infrastructure, Twitter Inc.
> > On Mon, Feb 22, 2010 at 4:30 PM, Sergi <sdepab...@gmail.com> wrote:
> >> I experienced the problem for the last time today - in fact now
> >> yesterday - at 15:55 and after 5 minutes Phirehose reconnected.
> >> [22-Feb-2010 15:55:25] Phirehose: Consume rate: 0 status/sec (1
> >> total), avg enqueueStatus(): 0.05ms, avg checkFilterPredicates():
> >> 0.01ms (3 total) over 60 seconds.
> >> [22-Feb-2010 16:01:22] Phirehose: Idle timeout: No statuses received
> >> for > 300 seconds. Reconnecting.
> >> Sergi
> >> On Feb 22, 7:51 pm, John Kalucki <j...@twitter.com> wrote:
> >> > A number of developers have reported abandoned connection issues on the
> >> > Streaming API starting, perhaps, about two weeks ago. The symptoms
> >> include a
> >> > long-established TCP connection to stream.twitter.com going quiet, with
> >> the
> >> > connection mysteriously held open for perhaps hours afterward. After
> >> sorting
> >> > through a lot of conflicting data and chasing a few wild geese, I
> >> finally
> >> > reproduced this problem at Feb 22 15:55 UTC (7:55am PST). I'd imagine
> >> that a
> >> > number of streams were abandoned at this time. If you had a correlative
> >> > experience within a minute or so of 15:55 UTC, please respond to this
> >> > message.
> >> > We currently suspect an infrequent hardware load balancer issue, perhaps
> >> > related to a recent configuration change. The appearance is that the
> >> load
> >> > balancer is, for whatever reason, dropping valid connections, closing
> >> the
> >> > connection to the Streaming API servers, but not sending a TCP FIN or
> >> TCP
> >> > RST to the client. This is bad. We're treating this as a critical
> >> production
> >> > issue and working through the details with network operations. I'll
> >> follow
> >> > up as we learn more.
> >> > -John Kaluckihttp://twitter.com/jkalucki
> >> > Infrastructure, Twitter Inc.