empty url? resolve if the user clicks i'm sure there is backend code running, 
the only purpose of even returning a 200

On Nov 27, 2010, at 8:33 AM, Adam Green wrote:

> All of your sample spam tweets are from suspended accounts, yet the
> tweets were only sent yesterday. That means that the spammers behavior
> was so aggressive that they were suspended quickly by a Twitter
> algorithm. I doubt that a human at Twitter read your email and went
> through each tweet suspending the accounts. Have you checked to see
> how quickly these spam accounts get canceled for other spam tweets?
> You could hold back tweets from unknown users for 24 hours, and then
> check all new users through the API to see if they are suspended. If
> they aren't suspended, you can whitelist them in your system.
> 
> What is really weird is that I also checked the URLs in these tweets
> and they resolve to an empty page. They return a header with an HTTP
> code of 200, and no content at all. That can't be an accident. Either
> they are sending empty responses to everyone, or they could tell from
> my IP that they didn't want to send anything to me. Why would a
> spammer do that? They only benefit if someone clicks on their links
> and buys something, or gets infected somehow. Could you be the subject
> of some kind of attack? You use the word "community." Would anyone
> want to disrupt your community? Is this a community that is in one
> geographic area that can be detected by IP? Very interesting...
> 
> Anyway, you can use URL resolution to test new users. When you get a
> tweet from a new user with a URL, check the URL, and blacklist them if
> it resolves to an empty page. If you only have to do this for new
> users, it won't be too processor intensive.
> 
> 
> On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru <furkank...@gmail.com> wrote:
>> The text in these spam tweets are not easy to recognize.
>> They do not repeat. They are mixed of different words and they contain a
>> link.
>> They seem to be sent via web.
>> 
>> The ranking and discarding some mentions will not completely resolve the
>> problem.
>> Because our mention data and trending words data both were affected. We
>> donot want to eliminate tweets from innocent people who have few followers.
>> 
>> The simplest way seems to be just ignoring the tweets coming from outside of
>> the community.
>> But those tweets were helping us to extend our network.
>> 
>> 
>> 
>> On Fri, Nov 26, 2010 at 6:42 PM, Adam Green <140...@gmail.com> wrote:
>>> 
>>> As long as you aren't trying to capture and deliver *all* tweets,
>>> there are a couple of good ways to cut out spammers. One thing I do is
>>> save all mentions for all users in a database of tweets. When a tweet
>>> comes in from the streaming API, I collect @mentions, and store them
>>> with the screen name of the tweet's author and the screen name
>>> mentioned. Then I can rank users based on the number of different
>>> accounts that mention them. If you only use the tweets from the top N%
>>> of users, the quality improves a lot. I find that the top 80% is
>>> usually enough of a screen to get good quality.
>>> 
>>> Another trick is blocking duplicates from each user. The API only
>>> blocks duplicates that repeat immediately, but if a spammer has a list
>>> of tweets, and cycles through them, all the tweets get through. I
>>> compare all new tweets with the other tweets from that user. This is
>>> very expensive if you have a big database. This can be made less
>>> intensive by limiting the comparison to just the tweets from that user
>>> in the last few days. You can also run this with a separate process
>>> that doesn't slow down you main tweet parsing loop. Most spammers are
>>> so simplistic that they just repeat the same tweet over and over. In a
>>> real spammy set of keywords, if I find more than a few duplicates from
>>> a user, I just stop saving their tweets.
>>> 
>>> 
>>> On Fri, Nov 26, 2010 at 11:26 AM, Furkan Kuru <furkank...@gmail.com>
>>> wrote:
>>>> 
>>>> Word "lol" is the most common in these spam tweets. We receive 400 spam
>>>> tweets per hour now tracking 100K people.
>>>> 
>>>> We plan to delete all of the tweets containing "lol" word. It is also
>>>> used
>>>> by our users (Turkish people) writing in English though.
>>>> 
>>>> Any better suggestions?
>>>> 
>>> 
>>> --
>>> Adam Green
>>> Twitter API Consultant and Trainer
>>> http://140dev.com
>>> @140dev
>>> 
>>> --
>>> Twitter developer documentation and resources: http://dev.twitter.com/doc
>>> API updates via Twitter: http://twitter.com/twitterapi
>>> Issues/Enhancements Tracker:
>>> http://code.google.com/p/twitter-api/issues/list
>>> Change your membership to this group:
>>> http://groups.google.com/group/twitter-development-talk
>> 
>> 
>> 
>> --
>> Furkan Kuru
>> 
>> --
>> Twitter developer documentation and resources: http://dev.twitter.com/doc
>> API updates via Twitter: http://twitter.com/twitterapi
>> Issues/Enhancements Tracker:
>> http://code.google.com/p/twitter-api/issues/list
>> Change your membership to this group:
>> http://groups.google.com/group/twitter-development-talk
>> 
> 
> 
> 
> -- 
> Adam Green
> Twitter API Consultant and Trainer
> http://140dev.com
> @140dev
> 
> -- 
> Twitter developer documentation and resources: http://dev.twitter.com/doc
> API updates via Twitter: http://twitter.com/twitterapi
> Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
> Change your membership to this group: 
> http://groups.google.com/group/twitter-development-talk


Regards,

--------------------
Edward Hotchkiss
edw...@edwardhotchkiss.com
http://www.edwardhotchkiss.com/
--------------------



-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk

Reply via email to