Re: [twitter-dev] need twitter spam for a research project

M. Edward (Ed) Borasky Sun, 03 Apr 2011 20:05:26 -0700

On Sun, 3 Apr 2011 18:19:38 -0700 (PDT), Jeff Tucker<[email protected]> wrote:

I'm conducting a research project involving proactively identifying
twitter spam accounts before they actually start spamming.  I've
observed that some spammers attempt to create tweets that look like
they're a legitimate account prior to actually sending spam and my
project is to be able to identify those accounts as soon as they pop
up.


Unfortunately (I can't believe that I'm writing this) I am having a

hard time getting spammers to actually spam me. Is there any waythat

I can somehow get access to the tweets of several dozen spam accounts
(prior to when they're shut down) so that I can see what they're
posting?  Is this possible somehow?

Also, if anyone gets spammed regularly, are you interested in helping
me out with my research?  No guarantee that I'll actually publish
this, but anyone interested will be credited in my paper in the
acknowledgements.  Thanks
-Jeff Tucker
Lecturer, DigiPen Institute of Technology
www.digipen.edu

I don't know how rapidly Twitter detects and shuts spam accounts downthese days. I imagine there's a priority scheme, with accounts linkingto malware and pr0n shut down more aggressively than those that are just"selling stuff" and being annoying about it. Here's a bit of pseudo-codethat will get you one class of spammers:

1. Poll the Trending Topics periodically. IIRC if you do it every tenminutes for all the localities you won't use up all your API calls perhour.

2. Do a search for each trending topic - take the first 100 tweets foreach. This doesn't cost you any API calls, since it's a search.

3. Now use a relational database to find tweets that match more thanone trending topic. There's a high probability those are spam. Quite afew of the other tweets will be spam too, but those that match multipletrends are much more likely to be spam.

4. Now you have a list of accounts - pull their most recent 3200 tweetsand test your algorithm. You'll probably have to manually go throughthem to find the boundary where the account started spamming, but thenyou should have a nice dataset for a classifier training.



--
http://twitter.com/znmeb http://borasky-research.net

"A mathematician is a device for turning coffee into theorems." -- PaulErdős


--
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk

Re: [twitter-dev] need twitter spam for a research project

Reply via email to