On Thu, Mar 5, 2009 at 8:09 PM, TjL <[email protected]> wrote:

>
> Specifically
>
> 1) There are WAY to many "trending topic" bots which fill search
> results with useless clutter


>
> 2) I'd love to see a "trending topics" list that does NOT include hash
> tags, you know, to find out what ordinary people are talking about :-)
>
> I know this is the wrong place for it (sorry) but I'm not sure where else
> to go.


Wrong place?  I don't think so...

I've got code that's doing a fairly job of identifying robots -- essentially
any user who posts an unusually high number of tweets in a short time and
doesn't follow many people.  (My goal is to analyze tweets by real people,
not automated systems.)  The code is relatively simple -- it looks at how
many updates the user has made over time, short-term and long-term.  My
long-term period is 12 days, but that misses the new ones that seem to pop
up every day, so I also look at the last 12 hours.  The long-term one also
looks at the follower count.  If the user is following less than 100 people,
that seems to be a strong clue that it's a bot.  I'm sure that I can refine
the rules further, but it hasn't been a priority, since this seems to be
working reasonably well.

There are a bit more than 1,800 in the list, which isn't just for trending
bots, but all sorts of automated users.

I've added it as a blog page here: http://www.twurlednews.com/twitter-bots/

Grab everything between the pre tags and you'll have a CSV list of screen
names and ids.

I assume it would be more useful as a plain CSV file.  If there's interest
in that, I'll make it available and keep it updated.  The current update
schedule for TwURLed News is 15 minutes, so it should stay fairly fresh.
 Other than an API, which I don't really have time to create right now, is
there another form this data could take that would be useful?  I could
create a Twitter user and send a tweet every time I identify a new possible
bot.  But if this is sufficient, I have other things I'd like to work on...
;-)

I've been thinking about adding a page that lists them, which would give a
filter list that you could use to remove them from results.  One of these
days, I might even put that in an API. This is something that's been done
for years for user-agents on the web, so that advertisers and others can
isolate robots from web analytics.

Reply via email to