So, I've got a nice bunch of Bayesian filters to do spam detection, tweet categorization and link canonicalization and classification. The stuff runs great on http://stltweet.com now, but I'm looking to share the load for other properties I'm developing for other locations/ verticals. In this load sharing, I want one processor doing the link- work, and one processor doing the tweet processing work across all properties. (of course there will be N-machines doing the work, but I want to only do the work once per...)
So the ideal thing would be some way to emit the applicable metadata as annotations in a new tweet in the tweet stream, placing the new "classification, typing & labeling" information on the NEW tweet. When I create that tweet, I would make it "in reply to" the original tweet being classified to easily link the two. It SEEMS like this is the ideal solution, in general, to the post- mutability of tweet annotations... just tweet another tweet with the annotations that you want to apply to the original tweet, set the in- reply-to-tweet-id and go about business. When that new tweet is seen by the "in the know" application, it knows to apply the metadata retroactively to the original tweet in whatever manner it wish... think things like a "read flag", a "star rating", a "sentiment analysis", etc... heck, you could even track triggered "trouble ticket numbers" like this... I don't mind someone else seeing all these tweets (honest, don't care), but I wonder how Twitter will feel about what are essentially just automated tweets used as a broadcast communication channel. These tweets would not be very interesting in themselves because the tweet message would be essentially irrelevant. How do I keep from triggering spam filters, and how do I get over the tweets-per-day limits for this sort of work? So, any ideas?