On Thu, Mar 5, 2009 at 8:09 PM, TjL <[email protected]> wrote: > > Specifically > > 1) There are WAY to many "trending topic" bots which fill search > results with useless clutter
> > 2) I'd love to see a "trending topics" list that does NOT include hash > tags, you know, to find out what ordinary people are talking about :-) > > I know this is the wrong place for it (sorry) but I'm not sure where else > to go. Wrong place? I don't think so... I've got code that's doing a fairly job of identifying robots -- essentially any user who posts an unusually high number of tweets in a short time and doesn't follow many people. (My goal is to analyze tweets by real people, not automated systems.) The code is relatively simple -- it looks at how many updates the user has made over time, short-term and long-term. My long-term period is 12 days, but that misses the new ones that seem to pop up every day, so I also look at the last 12 hours. The long-term one also looks at the follower count. If the user is following less than 100 people, that seems to be a strong clue that it's a bot. I'm sure that I can refine the rules further, but it hasn't been a priority, since this seems to be working reasonably well. There are a bit more than 1,800 in the list, which isn't just for trending bots, but all sorts of automated users. I've added it as a blog page here: http://www.twurlednews.com/twitter-bots/ Grab everything between the pre tags and you'll have a CSV list of screen names and ids. I assume it would be more useful as a plain CSV file. If there's interest in that, I'll make it available and keep it updated. The current update schedule for TwURLed News is 15 minutes, so it should stay fairly fresh. Other than an API, which I don't really have time to create right now, is there another form this data could take that would be useful? I could create a Twitter user and send a tweet every time I identify a new possible bot. But if this is sufficient, I have other things I'd like to work on... ;-) I've been thinking about adding a page that lists them, which would give a filter list that you could use to remove them from results. One of these days, I might even put that in an API. This is something that's been done for years for user-agents on the web, so that advertisers and others can isolate robots from web analytics.
