All of your sample spam tweets are from suspended accounts, yet the
tweets were only sent yesterday. That means that the spammers behavior
was so aggressive that they were suspended quickly by a Twitter
algorithm. I doubt that a human at Twitter read your email and went
through each tweet suspending the accounts. Have you checked to see
how quickly these spam accounts get canceled for other spam tweets?
You could hold back tweets from unknown users for 24 hours, and then
check all new users through the API to see if they are suspended. If
they aren't suspended, you can whitelist them in your system.

What is really weird is that I also checked the URLs in these tweets
and they resolve to an empty page. They return a header with an HTTP
code of 200, and no content at all. That can't be an accident. Either
they are sending empty responses to everyone, or they could tell from
my IP that they didn't want to send anything to me. Why would a
spammer do that? They only benefit if someone clicks on their links
and buys something, or gets infected somehow. Could you be the subject
of some kind of attack? You use the word "community." Would anyone
want to disrupt your community? Is this a community that is in one
geographic area that can be detected by IP? Very interesting...

Anyway, you can use URL resolution to test new users. When you get a
tweet from a new user with a URL, check the URL, and blacklist them if
it resolves to an empty page. If you only have to do this for new
users, it won't be too processor intensive.


On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru <furkank...@gmail.com> wrote:
> The text in these spam tweets are not easy to recognize.
> They do not repeat. They are mixed of different words and they contain a
> link.
> They seem to be sent via web.
>
> The ranking and discarding some mentions will not completely resolve the
> problem.
> Because our mention data and trending words data both were affected. We
> donot want to eliminate tweets from innocent people who have few followers.
>
> The simplest way seems to be just ignoring the tweets coming from outside of
> the community.
> But those tweets were helping us to extend our network.
>
>
>
> On Fri, Nov 26, 2010 at 6:42 PM, Adam Green <140...@gmail.com> wrote:
>>
>> As long as you aren't trying to capture and deliver *all* tweets,
>> there are a couple of good ways to cut out spammers. One thing I do is
>> save all mentions for all users in a database of tweets. When a tweet
>> comes in from the streaming API, I collect @mentions, and store them
>> with the screen name of the tweet's author and the screen name
>> mentioned. Then I can rank users based on the number of different
>> accounts that mention them. If you only use the tweets from the top N%
>> of users, the quality improves a lot. I find that the top 80% is
>> usually enough of a screen to get good quality.
>>
>> Another trick is blocking duplicates from each user. The API only
>> blocks duplicates that repeat immediately, but if a spammer has a list
>> of tweets, and cycles through them, all the tweets get through. I
>> compare all new tweets with the other tweets from that user. This is
>> very expensive if you have a big database. This can be made less
>> intensive by limiting the comparison to just the tweets from that user
>> in the last few days. You can also run this with a separate process
>> that doesn't slow down you main tweet parsing loop. Most spammers are
>> so simplistic that they just repeat the same tweet over and over. In a
>> real spammy set of keywords, if I find more than a few duplicates from
>> a user, I just stop saving their tweets.
>>
>>
>> On Fri, Nov 26, 2010 at 11:26 AM, Furkan Kuru <furkank...@gmail.com>
>> wrote:
>> >
>> > Word "lol" is the most common in these spam tweets. We receive 400 spam
>> > tweets per hour now tracking 100K people.
>> >
>> > We plan to delete all of the tweets containing "lol" word. It is also
>> > used
>> > by our users (Turkish people) writing in English though.
>> >
>> > Any better suggestions?
>> >
>>
>> --
>> Adam Green
>> Twitter API Consultant and Trainer
>> http://140dev.com
>> @140dev
>>
>> --
>> Twitter developer documentation and resources: http://dev.twitter.com/doc
>> API updates via Twitter: http://twitter.com/twitterapi
>> Issues/Enhancements Tracker:
>> http://code.google.com/p/twitter-api/issues/list
>> Change your membership to this group:
>> http://groups.google.com/group/twitter-development-talk
>
>
>
> --
> Furkan Kuru
>
> --
> Twitter developer documentation and resources: http://dev.twitter.com/doc
> API updates via Twitter: http://twitter.com/twitterapi
> Issues/Enhancements Tracker:
> http://code.google.com/p/twitter-api/issues/list
> Change your membership to this group:
> http://groups.google.com/group/twitter-development-talk
>



-- 
Adam Green
Twitter API Consultant and Trainer
http://140dev.com
@140dev

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk

Reply via email to