Hi,

I've been reading a lot in the Twitter Streaming API doc and in this
group about techniques to handle filter updates. I've got a good
picture of the best practices, but having a hard time applying them to
my particular situation.

In my case, I've got a filtered stream where the filters will be
updated based on the current user activities on my site. The filter
updates won't happen that frequently, but when they do, they have to
happen with as little latency as possible. This isn't a big problem in
probably 80% of the cases where the last filter update happened more
than 2 minutes ago, as I can happily disconnect and reconnect
immediately and stay within the rules.

However when multiple filter updates happen to arrive within the 2
minutes, that's where I have an issue. The unlucky user whose request
came in just after a previous update happened gets stuck waiting the
full 2 minutes before anything happens for them. They'll get bored,
and walk away!

The approaches to filter updates in the doc and in this group mainly
talk about two concurrent streams - one primary stream with an
elevated role and a second interim stream with default elevation.
However, this approach works well in allowing filter changes with
minimal interruption to the high-volume stream, but it does little or
nothing to reduce the update latency. The worst case update latency is
still 2 minutes for the poor sucker who came in just after a reconnect
on the second (default elevation) stream.

Some of the ideas I'm considering are:

1. Running four concurrent streams under four different Twitter
accounts and spreading the overall filter criteria between them all
(without predicate overlap to prevent wastage). I round-robin any
filter changes across the streams, so I should be able to average 4x
less latency. This seems within the rules since I'm using four
different accounts, but I'm concerned that unless I originate from
four different IPs that it'll be seen as a grey area and I risk being
banned.

2. Bending the rules a little and bringing my minimum time before
reconnect down to 30 seconds, hoping that if 80% or more of the time I
respect the 2 minute minimum reconnect interval (and actually stay
connected a LOT longer in most cases), I can get away with
reconnecting a little more often during edge cases.

3. Running a single stream, and when filter changes are needed and I'm
still within the 2 minute reconnect window, faking a stream with
multiple queries until the reconnect is allowable at which time I
transition to the reconnected stream. While this might be strictly
within the rules, I'm convinced that the multiple query hits while
waiting for the reconnect window to open would have a higher impact on
Twitter than an extra reconnect within the 2 minute window every now
and then.

Can anyone shed some light on which of these approaches is preferable,
or propose a different/better one? The goal for me is being able to
adapt the stream criteria to my current user load with the change
taking effect as quickly as possible - I can probably wait 30 seconds
for an update, but 2 minutes will be tough!

Thanks,
Toby.

Reply via email to