subject:"\[twitter\-dev\] Trying to get rid of twitter spammers"

Re: [twitter-dev] Trying to get rid of twitter spammers

2010-12-11 Thread Furkan Kuru

Unfortunately we do not have any time to implement a spam filter/ranking
algorithm.

Besides I think this issue should be resolved on the twitter side.

Some people are sending tweets in reply to *all* twitter users.
I think the spammer twitter accounts and their tweets should be analyzed.

The behaviour I see:

Open a new twitter account
No need to follow anyone
But tweet as a reply to some people with some spam message as many as
hundreds.

As I said earlier, the tweets contain lol word in common.

example:

https://twitter.com/madiav_isBOMB
https://twitter.com/ddubplneandonly

for more caught by our system (as a reply to Turkish twitter-ers):
http://twitturk.com/tweet/search?q=lol

On Sun, Nov 28, 2010 at 12:10 AM, Adam Green 140...@gmail.com wrote:

My final suggestion is to rank users by something (age of account,
number of mentions/mentioners/followers/following) and cut out the
bottom N%.

On Sat, Nov 27, 2010 at 4:18 PM, Furkan Kuru furkank...@gmail.com wrote:

Another hosting will be problematic to maintain.
I have looked at a few more short urls. They redirect to very wide range
of
sites not just amazon.

I think twitter may change the priority level of Report for spam for
new
opened accounts.
And the number of tweets per hour.

Here I write again the link that shows the tweets written as a reply to
Turkish people
the lol word is the common:
http://twitturk.com/tweet/search?q=lol

And an example account:
http://twitter.com/Bomuchellxee
All tweets are spam and lol is common.
It has also 0 folloing and 3 followers (real accounts I guess).
Unbelievable!

On Sat, Nov 27, 2010 at 4:29 PM, Adam Green 140...@gmail.com wrote:

Now you know that it does resolve differently in different countries.
You could set up an account with a webhost in the US, and have a
script there that you can call with URLs in tweets from new users. If
the URL resolves to a blank page, blacklist that user. There are
plenty of good hosts that only charge $7 a month. Sounds extreme, but
these are very clever spammers.

Or you could just resolve URLs from new users, and blacklist them if
the URL points to Amazon. That will work as long as they still point
to Amazon.

On Sat, Nov 27, 2010 at 9:12 AM, Furkan Kuru furkank...@gmail.com
wrote:
It returns a redirection to amazon.com product page

Example:

http://www.amazon.com/gp/product/B0041E16RC?ie=UTF8tag=iphone403d-20linkCode=as2camp=1789creative=9325creativeASIN=B0041E16RC

On Sat, Nov 27, 2010 at 4:04 PM, Adam Green 140...@gmail.com wrote:

The URLs again return a code of 200 and nothing in the content. What
happens when you try getting one of the URLs with cURL? I'm curious
if
it behaves differently for an IP in Turkey.

On Sat, Nov 27, 2010 at 8:56 AM, Furkan Kuru furkank...@gmail.com
wrote:
Most of the tweets here are spams:

http://twitturk.com/tweet/search?q=lol

On Sat, Nov 27, 2010 at 3:33 PM, Adam Green 140...@gmail.com
wrote:

All of your sample spam tweets are from suspended accounts, yet
the
tweets were only sent yesterday. That means that the spammers
behavior
was so aggressive that they were suspended quickly by a Twitter
algorithm. I doubt that a human at Twitter read your email and
went
through each tweet suspending the accounts. Have you checked to
see
how quickly these spam accounts get canceled for other spam
tweets?
You could hold back tweets from unknown users for 24 hours, and
then
check all new users through the API to see if they are suspended.
If
they aren't suspended, you can whitelist them in your system.

What is really weird is that I also checked the URLs in these
tweets
and they resolve to an empty page. They return a header with an
HTTP
code of 200, and no content at all. That can't be an accident.
Either
they are sending empty responses to everyone, or they could tell
from
my IP that they didn't want to send anything to me. Why would a
spammer do that? They only benefit if someone clicks on their
links
and buys something, or gets infected somehow. Could you be the
subject
of some kind of attack? You use the word community. Would anyone
want to disrupt your community? Is this a community that is in one
geographic area that can be detected by IP? Very interesting...

Anyway, you can use URL resolution to test new users. When you get
a
tweet from a new user with a URL, check the URL, and blacklist
them
if
it resolves to an empty page. If you only have to do this for new
users, it won't be too processor intensive.

On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru
furkank...@gmail.com
wrote:
The text in these spam tweets are not easy to recognize.
They do not repeat. They are mixed of different words and they
contain a
link.
They seem to be

Re: [twitter-dev] Trying to get rid of twitter spammers

2010-11-27 Thread Furkan Kuru

The text in these spam tweets are not easy to recognize.
They do not repeat. They are mixed of different words and they contain a
link.
They seem to be sent via web.

The ranking and discarding some mentions will not completely resolve the
problem.
Because our mention data and trending words data both were affected. We
donot want to eliminate tweets from innocent people who have few followers.

The simplest way seems to be just ignoring the tweets coming from outside of
the community.
But those tweets were helping us to extend our network.



On Fri, Nov 26, 2010 at 6:42 PM, Adam Green 140...@gmail.com wrote:

 As long as you aren't trying to capture and deliver *all* tweets,
 there are a couple of good ways to cut out spammers. One thing I do is
 save all mentions for all users in a database of tweets. When a tweet
 comes in from the streaming API, I collect @mentions, and store them
 with the screen name of the tweet's author and the screen name
 mentioned. Then I can rank users based on the number of different
 accounts that mention them. If you only use the tweets from the top N%
 of users, the quality improves a lot. I find that the top 80% is
 usually enough of a screen to get good quality.

 Another trick is blocking duplicates from each user. The API only
 blocks duplicates that repeat immediately, but if a spammer has a list
 of tweets, and cycles through them, all the tweets get through. I
 compare all new tweets with the other tweets from that user. This is
 very expensive if you have a big database. This can be made less
 intensive by limiting the comparison to just the tweets from that user
 in the last few days. You can also run this with a separate process
 that doesn't slow down you main tweet parsing loop. Most spammers are
 so simplistic that they just repeat the same tweet over and over. In a
 real spammy set of keywords, if I find more than a few duplicates from
 a user, I just stop saving their tweets.


 On Fri, Nov 26, 2010 at 11:26 AM, Furkan Kuru furkank...@gmail.com
 wrote:
 
  Word lol is the most common in these spam tweets. We receive 400 spam
  tweets per hour now tracking 100K people.
 
  We plan to delete all of the tweets containing lol word. It is also
 used
  by our users (Turkish people) writing in English though.
 
  Any better suggestions?
 

 --
 Adam Green
 Twitter API Consultant and Trainer
 http://140dev.com
 @140dev

 --
 Twitter developer documentation and resources: http://dev.twitter.com/doc
 API updates via Twitter: http://twitter.com/twitterapi
 Issues/Enhancements Tracker:
 http://code.google.com/p/twitter-api/issues/list
 Change your membership to this group:
 http://groups.google.com/group/twitter-development-talk




-- 
Furkan Kuru

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk

Re: [twitter-dev] Trying to get rid of twitter spammers

2010-11-27 Thread Adam Green

All of your sample spam tweets are from suspended accounts, yet the
tweets were only sent yesterday. That means that the spammers behavior
was so aggressive that they were suspended quickly by a Twitter
algorithm. I doubt that a human at Twitter read your email and went
through each tweet suspending the accounts. Have you checked to see
how quickly these spam accounts get canceled for other spam tweets?
You could hold back tweets from unknown users for 24 hours, and then
check all new users through the API to see if they are suspended. If
they aren't suspended, you can whitelist them in your system.

What is really weird is that I also checked the URLs in these tweets
and they resolve to an empty page. They return a header with an HTTP
code of 200, and no content at all. That can't be an accident. Either
they are sending empty responses to everyone, or they could tell from
my IP that they didn't want to send anything to me. Why would a
spammer do that? They only benefit if someone clicks on their links
and buys something, or gets infected somehow. Could you be the subject
of some kind of attack? You use the word community. Would anyone
want to disrupt your community? Is this a community that is in one
geographic area that can be detected by IP? Very interesting...

Anyway, you can use URL resolution to test new users. When you get a
tweet from a new user with a URL, check the URL, and blacklist them if
it resolves to an empty page. If you only have to do this for new
users, it won't be too processor intensive.


On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru furkank...@gmail.com wrote:
 The text in these spam tweets are not easy to recognize.
 They do not repeat. They are mixed of different words and they contain a
 link.
 They seem to be sent via web.

 The ranking and discarding some mentions will not completely resolve the
 problem.
 Because our mention data and trending words data both were affected. We
 donot want to eliminate tweets from innocent people who have few followers.

 The simplest way seems to be just ignoring the tweets coming from outside of
 the community.
 But those tweets were helping us to extend our network.



 On Fri, Nov 26, 2010 at 6:42 PM, Adam Green 140...@gmail.com wrote:

 As long as you aren't trying to capture and deliver *all* tweets,
 there are a couple of good ways to cut out spammers. One thing I do is
 save all mentions for all users in a database of tweets. When a tweet
 comes in from the streaming API, I collect @mentions, and store them
 with the screen name of the tweet's author and the screen name
 mentioned. Then I can rank users based on the number of different
 accounts that mention them. If you only use the tweets from the top N%
 of users, the quality improves a lot. I find that the top 80% is
 usually enough of a screen to get good quality.

 Another trick is blocking duplicates from each user. The API only
 blocks duplicates that repeat immediately, but if a spammer has a list
 of tweets, and cycles through them, all the tweets get through. I
 compare all new tweets with the other tweets from that user. This is
 very expensive if you have a big database. This can be made less
 intensive by limiting the comparison to just the tweets from that user
 in the last few days. You can also run this with a separate process
 that doesn't slow down you main tweet parsing loop. Most spammers are
 so simplistic that they just repeat the same tweet over and over. In a
 real spammy set of keywords, if I find more than a few duplicates from
 a user, I just stop saving their tweets.


 On Fri, Nov 26, 2010 at 11:26 AM, Furkan Kuru furkank...@gmail.com
 wrote:
 
  Word lol is the most common in these spam tweets. We receive 400 spam
  tweets per hour now tracking 100K people.
 
  We plan to delete all of the tweets containing lol word. It is also
  used
  by our users (Turkish people) writing in English though.
 
  Any better suggestions?
 

 --
 Adam Green
 Twitter API Consultant and Trainer
 http://140dev.com
 @140dev

 --
 Twitter developer documentation and resources: http://dev.twitter.com/doc
 API updates via Twitter: http://twitter.com/twitterapi
 Issues/Enhancements Tracker:
 http://code.google.com/p/twitter-api/issues/list
 Change your membership to this group:
 http://groups.google.com/group/twitter-development-talk



 --
 Furkan Kuru

 --
 Twitter developer documentation and resources: http://dev.twitter.com/doc
 API updates via Twitter: http://twitter.com/twitterapi
 Issues/Enhancements Tracker:
 http://code.google.com/p/twitter-api/issues/list
 Change your membership to this group:
 http://groups.google.com/group/twitter-development-talk




-- 
Adam Green
Twitter API Consultant and Trainer
http://140dev.com
@140dev

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group:

Re: [twitter-dev] Trying to get rid of twitter spammers

2010-11-27 Thread Edward Hotchkiss

empty url? resolve if the user clicks i'm sure there is backend code running, 
the only purpose of even returning a 200

On Nov 27, 2010, at 8:33 AM, Adam Green wrote:

 All of your sample spam tweets are from suspended accounts, yet the
 tweets were only sent yesterday. That means that the spammers behavior
 was so aggressive that they were suspended quickly by a Twitter
 algorithm. I doubt that a human at Twitter read your email and went
 through each tweet suspending the accounts. Have you checked to see
 how quickly these spam accounts get canceled for other spam tweets?
 You could hold back tweets from unknown users for 24 hours, and then
 check all new users through the API to see if they are suspended. If
 they aren't suspended, you can whitelist them in your system.
 
 What is really weird is that I also checked the URLs in these tweets
 and they resolve to an empty page. They return a header with an HTTP
 code of 200, and no content at all. That can't be an accident. Either
 they are sending empty responses to everyone, or they could tell from
 my IP that they didn't want to send anything to me. Why would a
 spammer do that? They only benefit if someone clicks on their links
 and buys something, or gets infected somehow. Could you be the subject
 of some kind of attack? You use the word community. Would anyone
 want to disrupt your community? Is this a community that is in one
 geographic area that can be detected by IP? Very interesting...
 
 Anyway, you can use URL resolution to test new users. When you get a
 tweet from a new user with a URL, check the URL, and blacklist them if
 it resolves to an empty page. If you only have to do this for new
 users, it won't be too processor intensive.
 
 
 On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru furkank...@gmail.com wrote:
 The text in these spam tweets are not easy to recognize.
 They do not repeat. They are mixed of different words and they contain a
 link.
 They seem to be sent via web.
 
 The ranking and discarding some mentions will not completely resolve the
 problem.
 Because our mention data and trending words data both were affected. We
 donot want to eliminate tweets from innocent people who have few followers.
 
 The simplest way seems to be just ignoring the tweets coming from outside of
 the community.
 But those tweets were helping us to extend our network.
 
 
 
 On Fri, Nov 26, 2010 at 6:42 PM, Adam Green 140...@gmail.com wrote:
 
 As long as you aren't trying to capture and deliver *all* tweets,
 there are a couple of good ways to cut out spammers. One thing I do is
 save all mentions for all users in a database of tweets. When a tweet
 comes in from the streaming API, I collect @mentions, and store them
 with the screen name of the tweet's author and the screen name
 mentioned. Then I can rank users based on the number of different
 accounts that mention them. If you only use the tweets from the top N%
 of users, the quality improves a lot. I find that the top 80% is
 usually enough of a screen to get good quality.
 
 Another trick is blocking duplicates from each user. The API only
 blocks duplicates that repeat immediately, but if a spammer has a list
 of tweets, and cycles through them, all the tweets get through. I
 compare all new tweets with the other tweets from that user. This is
 very expensive if you have a big database. This can be made less
 intensive by limiting the comparison to just the tweets from that user
 in the last few days. You can also run this with a separate process
 that doesn't slow down you main tweet parsing loop. Most spammers are
 so simplistic that they just repeat the same tweet over and over. In a
 real spammy set of keywords, if I find more than a few duplicates from
 a user, I just stop saving their tweets.
 
 
 On Fri, Nov 26, 2010 at 11:26 AM, Furkan Kuru furkank...@gmail.com
 wrote:
 
 Word lol is the most common in these spam tweets. We receive 400 spam
 tweets per hour now tracking 100K people.
 
 We plan to delete all of the tweets containing lol word. It is also
 used
 by our users (Turkish people) writing in English though.
 
 Any better suggestions?
 
 
 --
 Adam Green
 Twitter API Consultant and Trainer
 http://140dev.com
 @140dev
 
 --
 Twitter developer documentation and resources: http://dev.twitter.com/doc
 API updates via Twitter: http://twitter.com/twitterapi
 Issues/Enhancements Tracker:
 http://code.google.com/p/twitter-api/issues/list
 Change your membership to this group:
 http://groups.google.com/group/twitter-development-talk
 
 
 
 --
 Furkan Kuru
 
 --
 Twitter developer documentation and resources: http://dev.twitter.com/doc
 API updates via Twitter: http://twitter.com/twitterapi
 Issues/Enhancements Tracker:
 http://code.google.com/p/twitter-api/issues/list
 Change your membership to this group:
 http://groups.google.com/group/twitter-development-talk
 
 
 
 
 -- 
 Adam Green
 Twitter API Consultant and Trainer
 http://140dev.com
 @140dev
 
 -- 
 Twitter developer

Re: [twitter-dev] Trying to get rid of twitter spammers

2010-11-27 Thread Furkan Kuru

Most of the tweets here are spams:

http://twitturk.com/tweet/search?q=lol



On Sat, Nov 27, 2010 at 3:33 PM, Adam Green 140...@gmail.com wrote:

 All of your sample spam tweets are from suspended accounts, yet the
 tweets were only sent yesterday. That means that the spammers behavior
 was so aggressive that they were suspended quickly by a Twitter
 algorithm. I doubt that a human at Twitter read your email and went
 through each tweet suspending the accounts. Have you checked to see
 how quickly these spam accounts get canceled for other spam tweets?
 You could hold back tweets from unknown users for 24 hours, and then
 check all new users through the API to see if they are suspended. If
 they aren't suspended, you can whitelist them in your system.

 What is really weird is that I also checked the URLs in these tweets
 and they resolve to an empty page. They return a header with an HTTP
 code of 200, and no content at all. That can't be an accident. Either
 they are sending empty responses to everyone, or they could tell from
 my IP that they didn't want to send anything to me. Why would a
 spammer do that? They only benefit if someone clicks on their links
 and buys something, or gets infected somehow. Could you be the subject
 of some kind of attack? You use the word community. Would anyone
 want to disrupt your community? Is this a community that is in one
 geographic area that can be detected by IP? Very interesting...

 Anyway, you can use URL resolution to test new users. When you get a
 tweet from a new user with a URL, check the URL, and blacklist them if
 it resolves to an empty page. If you only have to do this for new
 users, it won't be too processor intensive.


 On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru furkank...@gmail.com wrote:
  The text in these spam tweets are not easy to recognize.
  They do not repeat. They are mixed of different words and they contain a
  link.
  They seem to be sent via web.
 
  The ranking and discarding some mentions will not completely resolve the
  problem.
  Because our mention data and trending words data both were affected. We
  donot want to eliminate tweets from innocent people who have few
 followers.
 
  The simplest way seems to be just ignoring the tweets coming from outside
 of
  the community.
  But those tweets were helping us to extend our network.
 
 
 
  On Fri, Nov 26, 2010 at 6:42 PM, Adam Green 140...@gmail.com wrote:
 
  As long as you aren't trying to capture and deliver *all* tweets,
  there are a couple of good ways to cut out spammers. One thing I do is
  save all mentions for all users in a database of tweets. When a tweet
  comes in from the streaming API, I collect @mentions, and store them
  with the screen name of the tweet's author and the screen name
  mentioned. Then I can rank users based on the number of different
  accounts that mention them. If you only use the tweets from the top N%
  of users, the quality improves a lot. I find that the top 80% is
  usually enough of a screen to get good quality.
 
  Another trick is blocking duplicates from each user. The API only
  blocks duplicates that repeat immediately, but if a spammer has a list
  of tweets, and cycles through them, all the tweets get through. I
  compare all new tweets with the other tweets from that user. This is
  very expensive if you have a big database. This can be made less
  intensive by limiting the comparison to just the tweets from that user
  in the last few days. You can also run this with a separate process
  that doesn't slow down you main tweet parsing loop. Most spammers are
  so simplistic that they just repeat the same tweet over and over. In a
  real spammy set of keywords, if I find more than a few duplicates from
  a user, I just stop saving their tweets.
 
 
  On Fri, Nov 26, 2010 at 11:26 AM, Furkan Kuru furkank...@gmail.com
  wrote:
  
   Word lol is the most common in these spam tweets. We receive 400
 spam
   tweets per hour now tracking 100K people.
  
   We plan to delete all of the tweets containing lol word. It is also
   used
   by our users (Turkish people) writing in English though.
  
   Any better suggestions?
  
 
  --
  Adam Green
  Twitter API Consultant and Trainer
  http://140dev.com
  @140dev
 
  --
  Twitter developer documentation and resources:
 http://dev.twitter.com/doc
  API updates via Twitter: http://twitter.com/twitterapi
  Issues/Enhancements Tracker:
  http://code.google.com/p/twitter-api/issues/list
  Change your membership to this group:
  http://groups.google.com/group/twitter-development-talk
 
 
 
  --
  Furkan Kuru
 
  --
  Twitter developer documentation and resources:
 http://dev.twitter.com/doc
  API updates via Twitter: http://twitter.com/twitterapi
  Issues/Enhancements Tracker:
  http://code.google.com/p/twitter-api/issues/list
  Change your membership to this group:
  http://groups.google.com/group/twitter-development-talk
 



 --
 Adam Green
 Twitter API Consultant and Trainer

Re: [twitter-dev] Trying to get rid of twitter spammers

2010-11-27 Thread Adam Green

The URLs again return a code of 200 and nothing in the content. What
happens when you try getting one of the URLs with cURL? I'm curious if
it behaves differently for an IP in Turkey.

On Sat, Nov 27, 2010 at 8:56 AM, Furkan Kuru furkank...@gmail.com wrote:
 Most of the tweets here are spams:

 http://twitturk.com/tweet/search?q=lol



 On Sat, Nov 27, 2010 at 3:33 PM, Adam Green 140...@gmail.com wrote:

 All of your sample spam tweets are from suspended accounts, yet the
 tweets were only sent yesterday. That means that the spammers behavior
 was so aggressive that they were suspended quickly by a Twitter
 algorithm. I doubt that a human at Twitter read your email and went
 through each tweet suspending the accounts. Have you checked to see
 how quickly these spam accounts get canceled for other spam tweets?
 You could hold back tweets from unknown users for 24 hours, and then
 check all new users through the API to see if they are suspended. If
 they aren't suspended, you can whitelist them in your system.

 What is really weird is that I also checked the URLs in these tweets
 and they resolve to an empty page. They return a header with an HTTP
 code of 200, and no content at all. That can't be an accident. Either
 they are sending empty responses to everyone, or they could tell from
 my IP that they didn't want to send anything to me. Why would a
 spammer do that? They only benefit if someone clicks on their links
 and buys something, or gets infected somehow. Could you be the subject
 of some kind of attack? You use the word community. Would anyone
 want to disrupt your community? Is this a community that is in one
 geographic area that can be detected by IP? Very interesting...

 Anyway, you can use URL resolution to test new users. When you get a
 tweet from a new user with a URL, check the URL, and blacklist them if
 it resolves to an empty page. If you only have to do this for new
 users, it won't be too processor intensive.


 On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru furkank...@gmail.com wrote:
  The text in these spam tweets are not easy to recognize.
  They do not repeat. They are mixed of different words and they contain a
  link.
  They seem to be sent via web.
 
  The ranking and discarding some mentions will not completely resolve the
  problem.
  Because our mention data and trending words data both were affected. We
  donot want to eliminate tweets from innocent people who have few
  followers.
 
  The simplest way seems to be just ignoring the tweets coming from
  outside of
  the community.
  But those tweets were helping us to extend our network.
 
 
 
  On Fri, Nov 26, 2010 at 6:42 PM, Adam Green 140...@gmail.com wrote:
 
  As long as you aren't trying to capture and deliver *all* tweets,
  there are a couple of good ways to cut out spammers. One thing I do is
  save all mentions for all users in a database of tweets. When a tweet
  comes in from the streaming API, I collect @mentions, and store them
  with the screen name of the tweet's author and the screen name
  mentioned. Then I can rank users based on the number of different
  accounts that mention them. If you only use the tweets from the top N%
  of users, the quality improves a lot. I find that the top 80% is
  usually enough of a screen to get good quality.
 
  Another trick is blocking duplicates from each user. The API only
  blocks duplicates that repeat immediately, but if a spammer has a list
  of tweets, and cycles through them, all the tweets get through. I
  compare all new tweets with the other tweets from that user. This is
  very expensive if you have a big database. This can be made less
  intensive by limiting the comparison to just the tweets from that user
  in the last few days. You can also run this with a separate process
  that doesn't slow down you main tweet parsing loop. Most spammers are
  so simplistic that they just repeat the same tweet over and over. In a
  real spammy set of keywords, if I find more than a few duplicates from
  a user, I just stop saving their tweets.
 
 
  On Fri, Nov 26, 2010 at 11:26 AM, Furkan Kuru furkank...@gmail.com
  wrote:
  
   Word lol is the most common in these spam tweets. We receive 400
   spam
   tweets per hour now tracking 100K people.
  
   We plan to delete all of the tweets containing lol word. It is also
   used
   by our users (Turkish people) writing in English though.
  
   Any better suggestions?
  
 
  --
  Adam Green
  Twitter API Consultant and Trainer
  http://140dev.com
  @140dev
 
  --
  Twitter developer documentation and resources:
  http://dev.twitter.com/doc
  API updates via Twitter: http://twitter.com/twitterapi
  Issues/Enhancements Tracker:
  http://code.google.com/p/twitter-api/issues/list
  Change your membership to this group:
  http://groups.google.com/group/twitter-development-talk
 
 
 
  --
  Furkan Kuru
 
  --
  Twitter developer documentation and resources:
  http://dev.twitter.com/doc
  API updates via Twitter:

Re: [twitter-dev] Trying to get rid of twitter spammers

2010-11-27 Thread Furkan Kuru

It returns a redirection to amazon.com product page

Example:

http://www.amazon.com/gp/product/B0041E16RC?ie=UTF8tag=iphone403d-20linkCode=as2camp=1789creative=9325creativeASIN=B0041E16RC

On Sat, Nov 27, 2010 at 4:04 PM, Adam Green 140...@gmail.com wrote:

The URLs again return a code of 200 and nothing in the content. What
happens when you try getting one of the URLs with cURL? I'm curious if
it behaves differently for an IP in Turkey.

On Sat, Nov 27, 2010 at 8:56 AM, Furkan Kuru furkank...@gmail.com wrote:
Most of the tweets here are spams:

http://twitturk.com/tweet/search?q=lol

On Sat, Nov 27, 2010 at 3:33 PM, Adam Green 140...@gmail.com wrote:

All of your sample spam tweets are from suspended accounts, yet the
tweets were only sent yesterday. That means that the spammers behavior
was so aggressive that they were suspended quickly by a Twitter
algorithm. I doubt that a human at Twitter read your email and went
through each tweet suspending the accounts. Have you checked to see
how quickly these spam accounts get canceled for other spam tweets?
You could hold back tweets from unknown users for 24 hours, and then
check all new users through the API to see if they are suspended. If
they aren't suspended, you can whitelist them in your system.

What is really weird is that I also checked the URLs in these tweets
and they resolve to an empty page. They return a header with an HTTP
code of 200, and no content at all. That can't be an accident. Either
they are sending empty responses to everyone, or they could tell from
my IP that they didn't want to send anything to me. Why would a
spammer do that? They only benefit if someone clicks on their links
and buys something, or gets infected somehow. Could you be the subject
of some kind of attack? You use the word community. Would anyone
want to disrupt your community? Is this a community that is in one
geographic area that can be detected by IP? Very interesting...

Anyway, you can use URL resolution to test new users. When you get a
tweet from a new user with a URL, check the URL, and blacklist them if
it resolves to an empty page. If you only have to do this for new
users, it won't be too processor intensive.

On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru furkank...@gmail.com
wrote:
The text in these spam tweets are not easy to recognize.
They do not repeat. They are mixed of different words and they contain
a
link.
They seem to be sent via web.

The ranking and discarding some mentions will not completely resolve
the
problem.
Because our mention data and trending words data both were affected.
We
donot want to eliminate tweets from innocent people who have few
followers.

The simplest way seems to be just ignoring the tweets coming from
outside of
the community.
But those tweets were helping us to extend our network.

On Fri, Nov 26, 2010 at 6:42 PM, Adam Green 140...@gmail.com wrote:

As long as you aren't trying to capture and deliver *all* tweets,
there are a couple of good ways to cut out spammers. One thing I do
is
save all mentions for all users in a database of tweets. When a tweet
comes in from the streaming API, I collect @mentions, and store them
with the screen name of the tweet's author and the screen name
mentioned. Then I can rank users based on the number of different
accounts that mention them. If you only use the tweets from the top
N%
of users, the quality improves a lot. I find that the top 80% is
usually enough of a screen to get good quality.

Another trick is blocking duplicates from each user. The API only
blocks duplicates that repeat immediately, but if a spammer has a
list
of tweets, and cycles through them, all the tweets get through. I
compare all new tweets with the other tweets from that user. This is
very expensive if you have a big database. This can be made less
intensive by limiting the comparison to just the tweets from that
user
in the last few days. You can also run this with a separate process
that doesn't slow down you main tweet parsing loop. Most spammers are
so simplistic that they just repeat the same tweet over and over. In
a
real spammy set of keywords, if I find more than a few duplicates
from
a user, I just stop saving their tweets.

On Fri, Nov 26, 2010 at 11:26 AM, Furkan Kuru furkank...@gmail.com
wrote:

Word lol is the most common in these spam tweets. We receive 400
spam
tweets per hour now tracking 100K people.

We plan to delete all of the tweets containing lol word. It is
also
used
by our users (Turkish people) writing in English though.

Any better suggestions?

--
Adam Green
Twitter API Consultant and Trainer
http://140dev.com
@140dev

--
Twitter developer documentation and resources:
http://dev.twitter.com/doc
API

Re: [twitter-dev] Trying to get rid of twitter spammers

2010-11-27 Thread Adam Green

Or you could just resolve URLs from new users, and blacklist them if
the URL points to Amazon. That will work as long as they still point
to Amazon.

On Sat, Nov 27, 2010 at 9:12 AM, Furkan Kuru furkank...@gmail.com wrote:
It returns a redirection to amazon.com product page

Example:

http://www.amazon.com/gp/product/B0041E16RC?ie=UTF8tag=iphone403d-20linkCode=as2camp=1789creative=9325creativeASIN=B0041E16RC

On Sat, Nov 27, 2010 at 4:04 PM, Adam Green 140...@gmail.com wrote:

The URLs again return a code of 200 and nothing in the content. What
happens when you try getting one of the URLs with cURL? I'm curious if
it behaves differently for an IP in Turkey.

On Sat, Nov 27, 2010 at 8:56 AM, Furkan Kuru furkank...@gmail.com wrote:
Most of the tweets here are spams:

http://twitturk.com/tweet/search?q=lol

On Sat, Nov 27, 2010 at 3:33 PM, Adam Green 140...@gmail.com wrote:

All of your sample spam tweets are from suspended accounts, yet the
tweets were only sent yesterday. That means that the spammers behavior
was so aggressive that they were suspended quickly by a Twitter
algorithm. I doubt that a human at Twitter read your email and went
through each tweet suspending the accounts. Have you checked to see
how quickly these spam accounts get canceled for other spam tweets?
You could hold back tweets from unknown users for 24 hours, and then
check all new users through the API to see if they are suspended. If
they aren't suspended, you can whitelist them in your system.

What is really weird is that I also checked the URLs in these tweets
and they resolve to an empty page. They return a header with an HTTP
code of 200, and no content at all. That can't be an accident. Either
they are sending empty responses to everyone, or they could tell from
my IP that they didn't want to send anything to me. Why would a
spammer do that? They only benefit if someone clicks on their links
and buys something, or gets infected somehow. Could you be the subject
of some kind of attack? You use the word community. Would anyone
want to disrupt your community? Is this a community that is in one
geographic area that can be detected by IP? Very interesting...

Anyway, you can use URL resolution to test new users. When you get a
tweet from a new user with a URL, check the URL, and blacklist them if
it resolves to an empty page. If you only have to do this for new
users, it won't be too processor intensive.

On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru furkank...@gmail.com
wrote:
The text in these spam tweets are not easy to recognize.
They do not repeat. They are mixed of different words and they
contain a
link.
They seem to be sent via web.

The simplest way seems to be just ignoring the tweets coming from
outside of
the community.
But those tweets were helping us to extend our network.

On Fri, Nov 26, 2010 at 6:42 PM, Adam Green 140...@gmail.com wrote:

As long as you aren't trying to capture and deliver *all* tweets,
there are a couple of good ways to cut out spammers. One thing I do
is
save all mentions for all users in a database of tweets. When a
tweet
comes in from the streaming API, I collect @mentions, and store them
with the screen name of the tweet's author and the screen name
mentioned. Then I can rank users based on the number of different
accounts that mention them. If you only use the tweets from the top
N%
of users, the quality improves a lot. I find that the top 80% is
usually enough of a screen to get good quality.

Another trick is blocking duplicates from each user. The API only
blocks duplicates that repeat immediately, but if a spammer has a
list
of tweets, and cycles through them, all the tweets get through. I
compare all new tweets with the other tweets from that user. This is
very expensive if you have a big database. This can be made less
intensive by limiting the comparison to just the tweets from that
user
in the last few days. You can also run this with a separate process
that doesn't slow down you main tweet parsing loop. Most spammers
are
so simplistic that they just repeat the same tweet over and over. In
a
real spammy set of keywords, if I find more than a few duplicates
from
a user,

Re: [twitter-dev] Trying to get rid of twitter spammers

2010-11-27 Thread Furkan Kuru

Another hosting will be problematic to maintain.
I have looked at a few more short urls. They redirect to very wide range of
sites not just amazon.

I think twitter may change the priority level of Report for spam for new
opened accounts.
And the number of tweets per hour.

Here I write again the link that shows the tweets written as a reply to
Turkish people
the lol word is the common:
http://twitturk.com/tweet/search?q=lol

And an example account:
http://twitter.com/Bomuchellxee
All tweets are spam and lol is common.
It has also 0 folloing and 3 followers (real accounts I guess).
Unbelievable!

On Sat, Nov 27, 2010 at 4:29 PM, Adam Green 140...@gmail.com wrote:

Or you could just resolve URLs from new users, and blacklist them if
the URL points to Amazon. That will work as long as they still point
to Amazon.

On Sat, Nov 27, 2010 at 9:12 AM, Furkan Kuru furkank...@gmail.com wrote:
It returns a redirection to amazon.com product page

Example:

http://www.amazon.com/gp/product/B0041E16RC?ie=UTF8tag=iphone403d-20linkCode=as2camp=1789creative=9325creativeASIN=B0041E16RC

On Sat, Nov 27, 2010 at 4:04 PM, Adam Green 140...@gmail.com wrote:

The URLs again return a code of 200 and nothing in the content. What
happens when you try getting one of the URLs with cURL? I'm curious if
it behaves differently for an IP in Turkey.

On Sat, Nov 27, 2010 at 8:56 AM, Furkan Kuru furkank...@gmail.com
wrote:
Most of the tweets here are spams:

http://twitturk.com/tweet/search?q=lol

On Sat, Nov 27, 2010 at 3:33 PM, Adam Green 140...@gmail.com wrote:

All of your sample spam tweets are from suspended accounts, yet the
tweets were only sent yesterday. That means that the spammers
behavior
was so aggressive that they were suspended quickly by a Twitter
algorithm. I doubt that a human at Twitter read your email and went
through each tweet suspending the accounts. Have you checked to see
how quickly these spam accounts get canceled for other spam tweets?
You could hold back tweets from unknown users for 24 hours, and then
check all new users through the API to see if they are suspended. If
they aren't suspended, you can whitelist them in your system.

What is really weird is that I also checked the URLs in these tweets
and they resolve to an empty page. They return a header with an HTTP
code of 200, and no content at all. That can't be an accident. Either
they are sending empty responses to everyone, or they could tell from
my IP that they didn't want to send anything to me. Why would a
spammer do that? They only benefit if someone clicks on their links
and buys something, or gets infected somehow. Could you be the
subject
of some kind of attack? You use the word community. Would anyone
want to disrupt your community? Is this a community that is in one
geographic area that can be detected by IP? Very interesting...

Anyway, you can use URL resolution to test new users. When you get a
tweet from a new user with a URL, check the URL, and blacklist them
if
it resolves to an empty page. If you only have to do this for new
users, it won't be too processor intensive.

On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru furkank...@gmail.com
wrote:
The text in these spam tweets are not easy to recognize.
They do not repeat. They are mixed of different words and they
contain a
link.
They seem to be sent via web.

The ranking and discarding some mentions will not completely
resolve
the
problem.
Because our mention data and trending words data both were
affected.
We
donot want to eliminate tweets from innocent people who have few
followers.

The simplest way seems to be just ignoring the tweets coming from
outside of
the community.
But those tweets were helping us to extend our network.

On Fri, Nov 26, 2010 at 6:42 PM, Adam Green 140...@gmail.com
wrote:

As long as you aren't trying to capture and deliver *all* tweets,
there are a couple of good ways to cut out spammers. One thing I
do
is
save all mentions for all users in a database of tweets. When a
tweet
comes in from the streaming API, I collect @mentions, and store
them
with the screen name of the tweet's author and the screen name
mentioned. Then I can rank users based on the number of different
accounts that mention them. If you only use the tweets from the
top
N%
of users, the quality improves a lot. I find that the top 80% is
usually

Re: [twitter-dev] Trying to get rid of twitter spammers

2010-11-27 Thread Edward Hotchkiss

the followers are probably bots, create an account and within about 5 minutes
or less you will generally have 2-3 followers that appear [real]. they iterate
over ids. someone is running a dating/hookup bot net with those user accounts.

On Nov 27, 2010, at 4:18 PM, Furkan Kuru wrote:

Another hosting will be problematic to maintain.
I have looked at a few more short urls. They redirect to very wide range of
sites not just amazon.

I think twitter may change the priority level of Report for spam for new
opened accounts.
And the number of tweets per hour.

Here I write again the link that shows the tweets written as a reply to
Turkish people
the lol word is the common:
http://twitturk.com/tweet/search?q=lol

And an example account:
http://twitter.com/Bomuchellxee
All tweets are spam and lol is common.
It has also 0 folloing and 3 followers (real accounts I guess). Unbelievable!

On Sat, Nov 27, 2010 at 4:29 PM, Adam Green 140...@gmail.com wrote:
Now you know that it does resolve differently in different countries.
You could set up an account with a webhost in the US, and have a
script there that you can call with URLs in tweets from new users. If
the URL resolves to a blank page, blacklist that user. There are
plenty of good hosts that only charge $7 a month. Sounds extreme, but
these are very clever spammers.