Re: [twitter-dev] Trying to get rid of twitter spammers
Unfortunately we do not have any time to implement a spam filter/ranking algorithm. Besides I think this issue should be resolved on the twitter side. Some people are sending tweets in reply to *all* twitter users. I think the spammer twitter accounts and their tweets should be analyzed. The behaviour I see: Open a new twitter account No need to follow anyone But tweet as a reply to some people with some spam message as many as hundreds. As I said earlier, the tweets contain lol word in common. example: https://twitter.com/madiav_isBOMB https://twitter.com/ddubplneandonly for more caught by our system (as a reply to Turkish twitter-ers): http://twitturk.com/tweet/search?q=lol On Sun, Nov 28, 2010 at 12:10 AM, Adam Green 140...@gmail.com wrote: My final suggestion is to rank users by something (age of account, number of mentions/mentioners/followers/following) and cut out the bottom N%. On Sat, Nov 27, 2010 at 4:18 PM, Furkan Kuru furkank...@gmail.com wrote: Another hosting will be problematic to maintain. I have looked at a few more short urls. They redirect to very wide range of sites not just amazon. I think twitter may change the priority level of Report for spam for new opened accounts. And the number of tweets per hour. Here I write again the link that shows the tweets written as a reply to Turkish people the lol word is the common: http://twitturk.com/tweet/search?q=lol And an example account: http://twitter.com/Bomuchellxee All tweets are spam and lol is common. It has also 0 folloing and 3 followers (real accounts I guess). Unbelievable! On Sat, Nov 27, 2010 at 4:29 PM, Adam Green 140...@gmail.com wrote: Now you know that it does resolve differently in different countries. You could set up an account with a webhost in the US, and have a script there that you can call with URLs in tweets from new users. If the URL resolves to a blank page, blacklist that user. There are plenty of good hosts that only charge $7 a month. Sounds extreme, but these are very clever spammers. Or you could just resolve URLs from new users, and blacklist them if the URL points to Amazon. That will work as long as they still point to Amazon. On Sat, Nov 27, 2010 at 9:12 AM, Furkan Kuru furkank...@gmail.com wrote: It returns a redirection to amazon.com product page Example: http://www.amazon.com/gp/product/B0041E16RC?ie=UTF8tag=iphone403d-20linkCode=as2camp=1789creative=9325creativeASIN=B0041E16RC On Sat, Nov 27, 2010 at 4:04 PM, Adam Green 140...@gmail.com wrote: The URLs again return a code of 200 and nothing in the content. What happens when you try getting one of the URLs with cURL? I'm curious if it behaves differently for an IP in Turkey. On Sat, Nov 27, 2010 at 8:56 AM, Furkan Kuru furkank...@gmail.com wrote: Most of the tweets here are spams: http://twitturk.com/tweet/search?q=lol On Sat, Nov 27, 2010 at 3:33 PM, Adam Green 140...@gmail.com wrote: All of your sample spam tweets are from suspended accounts, yet the tweets were only sent yesterday. That means that the spammers behavior was so aggressive that they were suspended quickly by a Twitter algorithm. I doubt that a human at Twitter read your email and went through each tweet suspending the accounts. Have you checked to see how quickly these spam accounts get canceled for other spam tweets? You could hold back tweets from unknown users for 24 hours, and then check all new users through the API to see if they are suspended. If they aren't suspended, you can whitelist them in your system. What is really weird is that I also checked the URLs in these tweets and they resolve to an empty page. They return a header with an HTTP code of 200, and no content at all. That can't be an accident. Either they are sending empty responses to everyone, or they could tell from my IP that they didn't want to send anything to me. Why would a spammer do that? They only benefit if someone clicks on their links and buys something, or gets infected somehow. Could you be the subject of some kind of attack? You use the word community. Would anyone want to disrupt your community? Is this a community that is in one geographic area that can be detected by IP? Very interesting... Anyway, you can use URL resolution to test new users. When you get a tweet from a new user with a URL, check the URL, and blacklist them if it resolves to an empty page. If you only have to do this for new users, it won't be too processor intensive. On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru furkank...@gmail.com wrote: The text in these spam tweets are not easy to recognize. They do not repeat. They are mixed of different words and they contain a link. They seem to be
Re: [twitter-dev] Trying to get rid of twitter spammers
The text in these spam tweets are not easy to recognize. They do not repeat. They are mixed of different words and they contain a link. They seem to be sent via web. The ranking and discarding some mentions will not completely resolve the problem. Because our mention data and trending words data both were affected. We donot want to eliminate tweets from innocent people who have few followers. The simplest way seems to be just ignoring the tweets coming from outside of the community. But those tweets were helping us to extend our network. On Fri, Nov 26, 2010 at 6:42 PM, Adam Green 140...@gmail.com wrote: As long as you aren't trying to capture and deliver *all* tweets, there are a couple of good ways to cut out spammers. One thing I do is save all mentions for all users in a database of tweets. When a tweet comes in from the streaming API, I collect @mentions, and store them with the screen name of the tweet's author and the screen name mentioned. Then I can rank users based on the number of different accounts that mention them. If you only use the tweets from the top N% of users, the quality improves a lot. I find that the top 80% is usually enough of a screen to get good quality. Another trick is blocking duplicates from each user. The API only blocks duplicates that repeat immediately, but if a spammer has a list of tweets, and cycles through them, all the tweets get through. I compare all new tweets with the other tweets from that user. This is very expensive if you have a big database. This can be made less intensive by limiting the comparison to just the tweets from that user in the last few days. You can also run this with a separate process that doesn't slow down you main tweet parsing loop. Most spammers are so simplistic that they just repeat the same tweet over and over. In a real spammy set of keywords, if I find more than a few duplicates from a user, I just stop saving their tweets. On Fri, Nov 26, 2010 at 11:26 AM, Furkan Kuru furkank...@gmail.com wrote: Word lol is the most common in these spam tweets. We receive 400 spam tweets per hour now tracking 100K people. We plan to delete all of the tweets containing lol word. It is also used by our users (Turkish people) writing in English though. Any better suggestions? -- Adam Green Twitter API Consultant and Trainer http://140dev.com @140dev -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk -- Furkan Kuru -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk
Re: [twitter-dev] Trying to get rid of twitter spammers
All of your sample spam tweets are from suspended accounts, yet the tweets were only sent yesterday. That means that the spammers behavior was so aggressive that they were suspended quickly by a Twitter algorithm. I doubt that a human at Twitter read your email and went through each tweet suspending the accounts. Have you checked to see how quickly these spam accounts get canceled for other spam tweets? You could hold back tweets from unknown users for 24 hours, and then check all new users through the API to see if they are suspended. If they aren't suspended, you can whitelist them in your system. What is really weird is that I also checked the URLs in these tweets and they resolve to an empty page. They return a header with an HTTP code of 200, and no content at all. That can't be an accident. Either they are sending empty responses to everyone, or they could tell from my IP that they didn't want to send anything to me. Why would a spammer do that? They only benefit if someone clicks on their links and buys something, or gets infected somehow. Could you be the subject of some kind of attack? You use the word community. Would anyone want to disrupt your community? Is this a community that is in one geographic area that can be detected by IP? Very interesting... Anyway, you can use URL resolution to test new users. When you get a tweet from a new user with a URL, check the URL, and blacklist them if it resolves to an empty page. If you only have to do this for new users, it won't be too processor intensive. On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru furkank...@gmail.com wrote: The text in these spam tweets are not easy to recognize. They do not repeat. They are mixed of different words and they contain a link. They seem to be sent via web. The ranking and discarding some mentions will not completely resolve the problem. Because our mention data and trending words data both were affected. We donot want to eliminate tweets from innocent people who have few followers. The simplest way seems to be just ignoring the tweets coming from outside of the community. But those tweets were helping us to extend our network. On Fri, Nov 26, 2010 at 6:42 PM, Adam Green 140...@gmail.com wrote: As long as you aren't trying to capture and deliver *all* tweets, there are a couple of good ways to cut out spammers. One thing I do is save all mentions for all users in a database of tweets. When a tweet comes in from the streaming API, I collect @mentions, and store them with the screen name of the tweet's author and the screen name mentioned. Then I can rank users based on the number of different accounts that mention them. If you only use the tweets from the top N% of users, the quality improves a lot. I find that the top 80% is usually enough of a screen to get good quality. Another trick is blocking duplicates from each user. The API only blocks duplicates that repeat immediately, but if a spammer has a list of tweets, and cycles through them, all the tweets get through. I compare all new tweets with the other tweets from that user. This is very expensive if you have a big database. This can be made less intensive by limiting the comparison to just the tweets from that user in the last few days. You can also run this with a separate process that doesn't slow down you main tweet parsing loop. Most spammers are so simplistic that they just repeat the same tweet over and over. In a real spammy set of keywords, if I find more than a few duplicates from a user, I just stop saving their tweets. On Fri, Nov 26, 2010 at 11:26 AM, Furkan Kuru furkank...@gmail.com wrote: Word lol is the most common in these spam tweets. We receive 400 spam tweets per hour now tracking 100K people. We plan to delete all of the tweets containing lol word. It is also used by our users (Turkish people) writing in English though. Any better suggestions? -- Adam Green Twitter API Consultant and Trainer http://140dev.com @140dev -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk -- Furkan Kuru -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk -- Adam Green Twitter API Consultant and Trainer http://140dev.com @140dev -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group:
Re: [twitter-dev] Trying to get rid of twitter spammers
empty url? resolve if the user clicks i'm sure there is backend code running, the only purpose of even returning a 200 On Nov 27, 2010, at 8:33 AM, Adam Green wrote: All of your sample spam tweets are from suspended accounts, yet the tweets were only sent yesterday. That means that the spammers behavior was so aggressive that they were suspended quickly by a Twitter algorithm. I doubt that a human at Twitter read your email and went through each tweet suspending the accounts. Have you checked to see how quickly these spam accounts get canceled for other spam tweets? You could hold back tweets from unknown users for 24 hours, and then check all new users through the API to see if they are suspended. If they aren't suspended, you can whitelist them in your system. What is really weird is that I also checked the URLs in these tweets and they resolve to an empty page. They return a header with an HTTP code of 200, and no content at all. That can't be an accident. Either they are sending empty responses to everyone, or they could tell from my IP that they didn't want to send anything to me. Why would a spammer do that? They only benefit if someone clicks on their links and buys something, or gets infected somehow. Could you be the subject of some kind of attack? You use the word community. Would anyone want to disrupt your community? Is this a community that is in one geographic area that can be detected by IP? Very interesting... Anyway, you can use URL resolution to test new users. When you get a tweet from a new user with a URL, check the URL, and blacklist them if it resolves to an empty page. If you only have to do this for new users, it won't be too processor intensive. On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru furkank...@gmail.com wrote: The text in these spam tweets are not easy to recognize. They do not repeat. They are mixed of different words and they contain a link. They seem to be sent via web. The ranking and discarding some mentions will not completely resolve the problem. Because our mention data and trending words data both were affected. We donot want to eliminate tweets from innocent people who have few followers. The simplest way seems to be just ignoring the tweets coming from outside of the community. But those tweets were helping us to extend our network. On Fri, Nov 26, 2010 at 6:42 PM, Adam Green 140...@gmail.com wrote: As long as you aren't trying to capture and deliver *all* tweets, there are a couple of good ways to cut out spammers. One thing I do is save all mentions for all users in a database of tweets. When a tweet comes in from the streaming API, I collect @mentions, and store them with the screen name of the tweet's author and the screen name mentioned. Then I can rank users based on the number of different accounts that mention them. If you only use the tweets from the top N% of users, the quality improves a lot. I find that the top 80% is usually enough of a screen to get good quality. Another trick is blocking duplicates from each user. The API only blocks duplicates that repeat immediately, but if a spammer has a list of tweets, and cycles through them, all the tweets get through. I compare all new tweets with the other tweets from that user. This is very expensive if you have a big database. This can be made less intensive by limiting the comparison to just the tweets from that user in the last few days. You can also run this with a separate process that doesn't slow down you main tweet parsing loop. Most spammers are so simplistic that they just repeat the same tweet over and over. In a real spammy set of keywords, if I find more than a few duplicates from a user, I just stop saving their tweets. On Fri, Nov 26, 2010 at 11:26 AM, Furkan Kuru furkank...@gmail.com wrote: Word lol is the most common in these spam tweets. We receive 400 spam tweets per hour now tracking 100K people. We plan to delete all of the tweets containing lol word. It is also used by our users (Turkish people) writing in English though. Any better suggestions? -- Adam Green Twitter API Consultant and Trainer http://140dev.com @140dev -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk -- Furkan Kuru -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk -- Adam Green Twitter API Consultant and Trainer http://140dev.com @140dev -- Twitter developer
Re: [twitter-dev] Trying to get rid of twitter spammers
Most of the tweets here are spams: http://twitturk.com/tweet/search?q=lol On Sat, Nov 27, 2010 at 3:33 PM, Adam Green 140...@gmail.com wrote: All of your sample spam tweets are from suspended accounts, yet the tweets were only sent yesterday. That means that the spammers behavior was so aggressive that they were suspended quickly by a Twitter algorithm. I doubt that a human at Twitter read your email and went through each tweet suspending the accounts. Have you checked to see how quickly these spam accounts get canceled for other spam tweets? You could hold back tweets from unknown users for 24 hours, and then check all new users through the API to see if they are suspended. If they aren't suspended, you can whitelist them in your system. What is really weird is that I also checked the URLs in these tweets and they resolve to an empty page. They return a header with an HTTP code of 200, and no content at all. That can't be an accident. Either they are sending empty responses to everyone, or they could tell from my IP that they didn't want to send anything to me. Why would a spammer do that? They only benefit if someone clicks on their links and buys something, or gets infected somehow. Could you be the subject of some kind of attack? You use the word community. Would anyone want to disrupt your community? Is this a community that is in one geographic area that can be detected by IP? Very interesting... Anyway, you can use URL resolution to test new users. When you get a tweet from a new user with a URL, check the URL, and blacklist them if it resolves to an empty page. If you only have to do this for new users, it won't be too processor intensive. On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru furkank...@gmail.com wrote: The text in these spam tweets are not easy to recognize. They do not repeat. They are mixed of different words and they contain a link. They seem to be sent via web. The ranking and discarding some mentions will not completely resolve the problem. Because our mention data and trending words data both were affected. We donot want to eliminate tweets from innocent people who have few followers. The simplest way seems to be just ignoring the tweets coming from outside of the community. But those tweets were helping us to extend our network. On Fri, Nov 26, 2010 at 6:42 PM, Adam Green 140...@gmail.com wrote: As long as you aren't trying to capture and deliver *all* tweets, there are a couple of good ways to cut out spammers. One thing I do is save all mentions for all users in a database of tweets. When a tweet comes in from the streaming API, I collect @mentions, and store them with the screen name of the tweet's author and the screen name mentioned. Then I can rank users based on the number of different accounts that mention them. If you only use the tweets from the top N% of users, the quality improves a lot. I find that the top 80% is usually enough of a screen to get good quality. Another trick is blocking duplicates from each user. The API only blocks duplicates that repeat immediately, but if a spammer has a list of tweets, and cycles through them, all the tweets get through. I compare all new tweets with the other tweets from that user. This is very expensive if you have a big database. This can be made less intensive by limiting the comparison to just the tweets from that user in the last few days. You can also run this with a separate process that doesn't slow down you main tweet parsing loop. Most spammers are so simplistic that they just repeat the same tweet over and over. In a real spammy set of keywords, if I find more than a few duplicates from a user, I just stop saving their tweets. On Fri, Nov 26, 2010 at 11:26 AM, Furkan Kuru furkank...@gmail.com wrote: Word lol is the most common in these spam tweets. We receive 400 spam tweets per hour now tracking 100K people. We plan to delete all of the tweets containing lol word. It is also used by our users (Turkish people) writing in English though. Any better suggestions? -- Adam Green Twitter API Consultant and Trainer http://140dev.com @140dev -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk -- Furkan Kuru -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk -- Adam Green Twitter API Consultant and Trainer
Re: [twitter-dev] Trying to get rid of twitter spammers
The URLs again return a code of 200 and nothing in the content. What happens when you try getting one of the URLs with cURL? I'm curious if it behaves differently for an IP in Turkey. On Sat, Nov 27, 2010 at 8:56 AM, Furkan Kuru furkank...@gmail.com wrote: Most of the tweets here are spams: http://twitturk.com/tweet/search?q=lol On Sat, Nov 27, 2010 at 3:33 PM, Adam Green 140...@gmail.com wrote: All of your sample spam tweets are from suspended accounts, yet the tweets were only sent yesterday. That means that the spammers behavior was so aggressive that they were suspended quickly by a Twitter algorithm. I doubt that a human at Twitter read your email and went through each tweet suspending the accounts. Have you checked to see how quickly these spam accounts get canceled for other spam tweets? You could hold back tweets from unknown users for 24 hours, and then check all new users through the API to see if they are suspended. If they aren't suspended, you can whitelist them in your system. What is really weird is that I also checked the URLs in these tweets and they resolve to an empty page. They return a header with an HTTP code of 200, and no content at all. That can't be an accident. Either they are sending empty responses to everyone, or they could tell from my IP that they didn't want to send anything to me. Why would a spammer do that? They only benefit if someone clicks on their links and buys something, or gets infected somehow. Could you be the subject of some kind of attack? You use the word community. Would anyone want to disrupt your community? Is this a community that is in one geographic area that can be detected by IP? Very interesting... Anyway, you can use URL resolution to test new users. When you get a tweet from a new user with a URL, check the URL, and blacklist them if it resolves to an empty page. If you only have to do this for new users, it won't be too processor intensive. On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru furkank...@gmail.com wrote: The text in these spam tweets are not easy to recognize. They do not repeat. They are mixed of different words and they contain a link. They seem to be sent via web. The ranking and discarding some mentions will not completely resolve the problem. Because our mention data and trending words data both were affected. We donot want to eliminate tweets from innocent people who have few followers. The simplest way seems to be just ignoring the tweets coming from outside of the community. But those tweets were helping us to extend our network. On Fri, Nov 26, 2010 at 6:42 PM, Adam Green 140...@gmail.com wrote: As long as you aren't trying to capture and deliver *all* tweets, there are a couple of good ways to cut out spammers. One thing I do is save all mentions for all users in a database of tweets. When a tweet comes in from the streaming API, I collect @mentions, and store them with the screen name of the tweet's author and the screen name mentioned. Then I can rank users based on the number of different accounts that mention them. If you only use the tweets from the top N% of users, the quality improves a lot. I find that the top 80% is usually enough of a screen to get good quality. Another trick is blocking duplicates from each user. The API only blocks duplicates that repeat immediately, but if a spammer has a list of tweets, and cycles through them, all the tweets get through. I compare all new tweets with the other tweets from that user. This is very expensive if you have a big database. This can be made less intensive by limiting the comparison to just the tweets from that user in the last few days. You can also run this with a separate process that doesn't slow down you main tweet parsing loop. Most spammers are so simplistic that they just repeat the same tweet over and over. In a real spammy set of keywords, if I find more than a few duplicates from a user, I just stop saving their tweets. On Fri, Nov 26, 2010 at 11:26 AM, Furkan Kuru furkank...@gmail.com wrote: Word lol is the most common in these spam tweets. We receive 400 spam tweets per hour now tracking 100K people. We plan to delete all of the tweets containing lol word. It is also used by our users (Turkish people) writing in English though. Any better suggestions? -- Adam Green Twitter API Consultant and Trainer http://140dev.com @140dev -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk -- Furkan Kuru -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter:
Re: [twitter-dev] Trying to get rid of twitter spammers
It returns a redirection to amazon.com product page Example: http://www.amazon.com/gp/product/B0041E16RC?ie=UTF8tag=iphone403d-20linkCode=as2camp=1789creative=9325creativeASIN=B0041E16RC On Sat, Nov 27, 2010 at 4:04 PM, Adam Green 140...@gmail.com wrote: The URLs again return a code of 200 and nothing in the content. What happens when you try getting one of the URLs with cURL? I'm curious if it behaves differently for an IP in Turkey. On Sat, Nov 27, 2010 at 8:56 AM, Furkan Kuru furkank...@gmail.com wrote: Most of the tweets here are spams: http://twitturk.com/tweet/search?q=lol On Sat, Nov 27, 2010 at 3:33 PM, Adam Green 140...@gmail.com wrote: All of your sample spam tweets are from suspended accounts, yet the tweets were only sent yesterday. That means that the spammers behavior was so aggressive that they were suspended quickly by a Twitter algorithm. I doubt that a human at Twitter read your email and went through each tweet suspending the accounts. Have you checked to see how quickly these spam accounts get canceled for other spam tweets? You could hold back tweets from unknown users for 24 hours, and then check all new users through the API to see if they are suspended. If they aren't suspended, you can whitelist them in your system. What is really weird is that I also checked the URLs in these tweets and they resolve to an empty page. They return a header with an HTTP code of 200, and no content at all. That can't be an accident. Either they are sending empty responses to everyone, or they could tell from my IP that they didn't want to send anything to me. Why would a spammer do that? They only benefit if someone clicks on their links and buys something, or gets infected somehow. Could you be the subject of some kind of attack? You use the word community. Would anyone want to disrupt your community? Is this a community that is in one geographic area that can be detected by IP? Very interesting... Anyway, you can use URL resolution to test new users. When you get a tweet from a new user with a URL, check the URL, and blacklist them if it resolves to an empty page. If you only have to do this for new users, it won't be too processor intensive. On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru furkank...@gmail.com wrote: The text in these spam tweets are not easy to recognize. They do not repeat. They are mixed of different words and they contain a link. They seem to be sent via web. The ranking and discarding some mentions will not completely resolve the problem. Because our mention data and trending words data both were affected. We donot want to eliminate tweets from innocent people who have few followers. The simplest way seems to be just ignoring the tweets coming from outside of the community. But those tweets were helping us to extend our network. On Fri, Nov 26, 2010 at 6:42 PM, Adam Green 140...@gmail.com wrote: As long as you aren't trying to capture and deliver *all* tweets, there are a couple of good ways to cut out spammers. One thing I do is save all mentions for all users in a database of tweets. When a tweet comes in from the streaming API, I collect @mentions, and store them with the screen name of the tweet's author and the screen name mentioned. Then I can rank users based on the number of different accounts that mention them. If you only use the tweets from the top N% of users, the quality improves a lot. I find that the top 80% is usually enough of a screen to get good quality. Another trick is blocking duplicates from each user. The API only blocks duplicates that repeat immediately, but if a spammer has a list of tweets, and cycles through them, all the tweets get through. I compare all new tweets with the other tweets from that user. This is very expensive if you have a big database. This can be made less intensive by limiting the comparison to just the tweets from that user in the last few days. You can also run this with a separate process that doesn't slow down you main tweet parsing loop. Most spammers are so simplistic that they just repeat the same tweet over and over. In a real spammy set of keywords, if I find more than a few duplicates from a user, I just stop saving their tweets. On Fri, Nov 26, 2010 at 11:26 AM, Furkan Kuru furkank...@gmail.com wrote: Word lol is the most common in these spam tweets. We receive 400 spam tweets per hour now tracking 100K people. We plan to delete all of the tweets containing lol word. It is also used by our users (Turkish people) writing in English though. Any better suggestions? -- Adam Green Twitter API Consultant and Trainer http://140dev.com @140dev -- Twitter developer documentation and resources: http://dev.twitter.com/doc API
Re: [twitter-dev] Trying to get rid of twitter spammers
Now you know that it does resolve differently in different countries. You could set up an account with a webhost in the US, and have a script there that you can call with URLs in tweets from new users. If the URL resolves to a blank page, blacklist that user. There are plenty of good hosts that only charge $7 a month. Sounds extreme, but these are very clever spammers. Or you could just resolve URLs from new users, and blacklist them if the URL points to Amazon. That will work as long as they still point to Amazon. On Sat, Nov 27, 2010 at 9:12 AM, Furkan Kuru furkank...@gmail.com wrote: It returns a redirection to amazon.com product page Example: http://www.amazon.com/gp/product/B0041E16RC?ie=UTF8tag=iphone403d-20linkCode=as2camp=1789creative=9325creativeASIN=B0041E16RC On Sat, Nov 27, 2010 at 4:04 PM, Adam Green 140...@gmail.com wrote: The URLs again return a code of 200 and nothing in the content. What happens when you try getting one of the URLs with cURL? I'm curious if it behaves differently for an IP in Turkey. On Sat, Nov 27, 2010 at 8:56 AM, Furkan Kuru furkank...@gmail.com wrote: Most of the tweets here are spams: http://twitturk.com/tweet/search?q=lol On Sat, Nov 27, 2010 at 3:33 PM, Adam Green 140...@gmail.com wrote: All of your sample spam tweets are from suspended accounts, yet the tweets were only sent yesterday. That means that the spammers behavior was so aggressive that they were suspended quickly by a Twitter algorithm. I doubt that a human at Twitter read your email and went through each tweet suspending the accounts. Have you checked to see how quickly these spam accounts get canceled for other spam tweets? You could hold back tweets from unknown users for 24 hours, and then check all new users through the API to see if they are suspended. If they aren't suspended, you can whitelist them in your system. What is really weird is that I also checked the URLs in these tweets and they resolve to an empty page. They return a header with an HTTP code of 200, and no content at all. That can't be an accident. Either they are sending empty responses to everyone, or they could tell from my IP that they didn't want to send anything to me. Why would a spammer do that? They only benefit if someone clicks on their links and buys something, or gets infected somehow. Could you be the subject of some kind of attack? You use the word community. Would anyone want to disrupt your community? Is this a community that is in one geographic area that can be detected by IP? Very interesting... Anyway, you can use URL resolution to test new users. When you get a tweet from a new user with a URL, check the URL, and blacklist them if it resolves to an empty page. If you only have to do this for new users, it won't be too processor intensive. On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru furkank...@gmail.com wrote: The text in these spam tweets are not easy to recognize. They do not repeat. They are mixed of different words and they contain a link. They seem to be sent via web. The ranking and discarding some mentions will not completely resolve the problem. Because our mention data and trending words data both were affected. We donot want to eliminate tweets from innocent people who have few followers. The simplest way seems to be just ignoring the tweets coming from outside of the community. But those tweets were helping us to extend our network. On Fri, Nov 26, 2010 at 6:42 PM, Adam Green 140...@gmail.com wrote: As long as you aren't trying to capture and deliver *all* tweets, there are a couple of good ways to cut out spammers. One thing I do is save all mentions for all users in a database of tweets. When a tweet comes in from the streaming API, I collect @mentions, and store them with the screen name of the tweet's author and the screen name mentioned. Then I can rank users based on the number of different accounts that mention them. If you only use the tweets from the top N% of users, the quality improves a lot. I find that the top 80% is usually enough of a screen to get good quality. Another trick is blocking duplicates from each user. The API only blocks duplicates that repeat immediately, but if a spammer has a list of tweets, and cycles through them, all the tweets get through. I compare all new tweets with the other tweets from that user. This is very expensive if you have a big database. This can be made less intensive by limiting the comparison to just the tweets from that user in the last few days. You can also run this with a separate process that doesn't slow down you main tweet parsing loop. Most spammers are so simplistic that they just repeat the same tweet over and over. In a real spammy set of keywords, if I find more than a few duplicates from a user,
Re: [twitter-dev] Trying to get rid of twitter spammers
Another hosting will be problematic to maintain. I have looked at a few more short urls. They redirect to very wide range of sites not just amazon. I think twitter may change the priority level of Report for spam for new opened accounts. And the number of tweets per hour. Here I write again the link that shows the tweets written as a reply to Turkish people the lol word is the common: http://twitturk.com/tweet/search?q=lol And an example account: http://twitter.com/Bomuchellxee All tweets are spam and lol is common. It has also 0 folloing and 3 followers (real accounts I guess). Unbelievable! On Sat, Nov 27, 2010 at 4:29 PM, Adam Green 140...@gmail.com wrote: Now you know that it does resolve differently in different countries. You could set up an account with a webhost in the US, and have a script there that you can call with URLs in tweets from new users. If the URL resolves to a blank page, blacklist that user. There are plenty of good hosts that only charge $7 a month. Sounds extreme, but these are very clever spammers. Or you could just resolve URLs from new users, and blacklist them if the URL points to Amazon. That will work as long as they still point to Amazon. On Sat, Nov 27, 2010 at 9:12 AM, Furkan Kuru furkank...@gmail.com wrote: It returns a redirection to amazon.com product page Example: http://www.amazon.com/gp/product/B0041E16RC?ie=UTF8tag=iphone403d-20linkCode=as2camp=1789creative=9325creativeASIN=B0041E16RC On Sat, Nov 27, 2010 at 4:04 PM, Adam Green 140...@gmail.com wrote: The URLs again return a code of 200 and nothing in the content. What happens when you try getting one of the URLs with cURL? I'm curious if it behaves differently for an IP in Turkey. On Sat, Nov 27, 2010 at 8:56 AM, Furkan Kuru furkank...@gmail.com wrote: Most of the tweets here are spams: http://twitturk.com/tweet/search?q=lol On Sat, Nov 27, 2010 at 3:33 PM, Adam Green 140...@gmail.com wrote: All of your sample spam tweets are from suspended accounts, yet the tweets were only sent yesterday. That means that the spammers behavior was so aggressive that they were suspended quickly by a Twitter algorithm. I doubt that a human at Twitter read your email and went through each tweet suspending the accounts. Have you checked to see how quickly these spam accounts get canceled for other spam tweets? You could hold back tweets from unknown users for 24 hours, and then check all new users through the API to see if they are suspended. If they aren't suspended, you can whitelist them in your system. What is really weird is that I also checked the URLs in these tweets and they resolve to an empty page. They return a header with an HTTP code of 200, and no content at all. That can't be an accident. Either they are sending empty responses to everyone, or they could tell from my IP that they didn't want to send anything to me. Why would a spammer do that? They only benefit if someone clicks on their links and buys something, or gets infected somehow. Could you be the subject of some kind of attack? You use the word community. Would anyone want to disrupt your community? Is this a community that is in one geographic area that can be detected by IP? Very interesting... Anyway, you can use URL resolution to test new users. When you get a tweet from a new user with a URL, check the URL, and blacklist them if it resolves to an empty page. If you only have to do this for new users, it won't be too processor intensive. On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru furkank...@gmail.com wrote: The text in these spam tweets are not easy to recognize. They do not repeat. They are mixed of different words and they contain a link. They seem to be sent via web. The ranking and discarding some mentions will not completely resolve the problem. Because our mention data and trending words data both were affected. We donot want to eliminate tweets from innocent people who have few followers. The simplest way seems to be just ignoring the tweets coming from outside of the community. But those tweets were helping us to extend our network. On Fri, Nov 26, 2010 at 6:42 PM, Adam Green 140...@gmail.com wrote: As long as you aren't trying to capture and deliver *all* tweets, there are a couple of good ways to cut out spammers. One thing I do is save all mentions for all users in a database of tweets. When a tweet comes in from the streaming API, I collect @mentions, and store them with the screen name of the tweet's author and the screen name mentioned. Then I can rank users based on the number of different accounts that mention them. If you only use the tweets from the top N% of users, the quality improves a lot. I find that the top 80% is usually
Re: [twitter-dev] Trying to get rid of twitter spammers
the followers are probably bots, create an account and within about 5 minutes or less you will generally have 2-3 followers that appear [real]. they iterate over ids. someone is running a dating/hookup bot net with those user accounts. On Nov 27, 2010, at 4:18 PM, Furkan Kuru wrote: Another hosting will be problematic to maintain. I have looked at a few more short urls. They redirect to very wide range of sites not just amazon. I think twitter may change the priority level of Report for spam for new opened accounts. And the number of tweets per hour. Here I write again the link that shows the tweets written as a reply to Turkish people the lol word is the common: http://twitturk.com/tweet/search?q=lol And an example account: http://twitter.com/Bomuchellxee All tweets are spam and lol is common. It has also 0 folloing and 3 followers (real accounts I guess). Unbelievable! On Sat, Nov 27, 2010 at 4:29 PM, Adam Green 140...@gmail.com wrote: Now you know that it does resolve differently in different countries. You could set up an account with a webhost in the US, and have a script there that you can call with URLs in tweets from new users. If the URL resolves to a blank page, blacklist that user. There are plenty of good hosts that only charge $7 a month. Sounds extreme, but these are very clever spammers. Or you could just resolve URLs from new users, and blacklist them if the URL points to Amazon. That will work as long as they still point to Amazon. On Sat, Nov 27, 2010 at 9:12 AM, Furkan Kuru furkank...@gmail.com wrote: It returns a redirection to amazon.com product page Example: http://www.amazon.com/gp/product/B0041E16RC?ie=UTF8tag=iphone403d-20linkCode=as2camp=1789creative=9325creativeASIN=B0041E16RC On Sat, Nov 27, 2010 at 4:04 PM, Adam Green 140...@gmail.com wrote: The URLs again return a code of 200 and nothing in the content. What happens when you try getting one of the URLs with cURL? I'm curious if it behaves differently for an IP in Turkey. On Sat, Nov 27, 2010 at 8:56 AM, Furkan Kuru furkank...@gmail.com wrote: Most of the tweets here are spams: http://twitturk.com/tweet/search?q=lol On Sat, Nov 27, 2010 at 3:33 PM, Adam Green 140...@gmail.com wrote: All of your sample spam tweets are from suspended accounts, yet the tweets were only sent yesterday. That means that the spammers behavior was so aggressive that they were suspended quickly by a Twitter algorithm. I doubt that a human at Twitter read your email and went through each tweet suspending the accounts. Have you checked to see how quickly these spam accounts get canceled for other spam tweets? You could hold back tweets from unknown users for 24 hours, and then check all new users through the API to see if they are suspended. If they aren't suspended, you can whitelist them in your system. What is really weird is that I also checked the URLs in these tweets and they resolve to an empty page. They return a header with an HTTP code of 200, and no content at all. That can't be an accident. Either they are sending empty responses to everyone, or they could tell from my IP that they didn't want to send anything to me. Why would a spammer do that? They only benefit if someone clicks on their links and buys something, or gets infected somehow. Could you be the subject of some kind of attack? You use the word community. Would anyone want to disrupt your community? Is this a community that is in one geographic area that can be detected by IP? Very interesting... Anyway, you can use URL resolution to test new users. When you get a tweet from a new user with a URL, check the URL, and blacklist them if it resolves to an empty page. If you only have to do this for new users, it won't be too processor intensive. On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru furkank...@gmail.com wrote: The text in these spam tweets are not easy to recognize. They do not repeat. They are mixed of different words and they contain a link. They seem to be sent via web. The ranking and discarding some mentions will not completely resolve the problem. Because our mention data and trending words data both were affected. We donot want to eliminate tweets from innocent people who have few followers. The simplest way seems to be just ignoring the tweets coming from outside of the community. But those tweets were helping us to extend our network. On Fri, Nov 26, 2010 at 6:42 PM, Adam Green 140...@gmail.com wrote: As long as you aren't trying to capture and deliver *all* tweets, there are a couple of good ways to cut out spammers. One thing I do is save all mentions for all users in a database of tweets. When a tweet comes in from the streaming API, I collect @mentions, and store
Re: [twitter-dev] Trying to get rid of twitter spammers
My final suggestion is to rank users by something (age of account, number of mentions/mentioners/followers/following) and cut out the bottom N%. On Sat, Nov 27, 2010 at 4:18 PM, Furkan Kuru furkank...@gmail.com wrote: Another hosting will be problematic to maintain. I have looked at a few more short urls. They redirect to very wide range of sites not just amazon. I think twitter may change the priority level of Report for spam for new opened accounts. And the number of tweets per hour. Here I write again the link that shows the tweets written as a reply to Turkish people the lol word is the common: http://twitturk.com/tweet/search?q=lol And an example account: http://twitter.com/Bomuchellxee All tweets are spam and lol is common. It has also 0 folloing and 3 followers (real accounts I guess). Unbelievable! On Sat, Nov 27, 2010 at 4:29 PM, Adam Green 140...@gmail.com wrote: Now you know that it does resolve differently in different countries. You could set up an account with a webhost in the US, and have a script there that you can call with URLs in tweets from new users. If the URL resolves to a blank page, blacklist that user. There are plenty of good hosts that only charge $7 a month. Sounds extreme, but these are very clever spammers. Or you could just resolve URLs from new users, and blacklist them if the URL points to Amazon. That will work as long as they still point to Amazon. On Sat, Nov 27, 2010 at 9:12 AM, Furkan Kuru furkank...@gmail.com wrote: It returns a redirection to amazon.com product page Example: http://www.amazon.com/gp/product/B0041E16RC?ie=UTF8tag=iphone403d-20linkCode=as2camp=1789creative=9325creativeASIN=B0041E16RC On Sat, Nov 27, 2010 at 4:04 PM, Adam Green 140...@gmail.com wrote: The URLs again return a code of 200 and nothing in the content. What happens when you try getting one of the URLs with cURL? I'm curious if it behaves differently for an IP in Turkey. On Sat, Nov 27, 2010 at 8:56 AM, Furkan Kuru furkank...@gmail.com wrote: Most of the tweets here are spams: http://twitturk.com/tweet/search?q=lol On Sat, Nov 27, 2010 at 3:33 PM, Adam Green 140...@gmail.com wrote: All of your sample spam tweets are from suspended accounts, yet the tweets were only sent yesterday. That means that the spammers behavior was so aggressive that they were suspended quickly by a Twitter algorithm. I doubt that a human at Twitter read your email and went through each tweet suspending the accounts. Have you checked to see how quickly these spam accounts get canceled for other spam tweets? You could hold back tweets from unknown users for 24 hours, and then check all new users through the API to see if they are suspended. If they aren't suspended, you can whitelist them in your system. What is really weird is that I also checked the URLs in these tweets and they resolve to an empty page. They return a header with an HTTP code of 200, and no content at all. That can't be an accident. Either they are sending empty responses to everyone, or they could tell from my IP that they didn't want to send anything to me. Why would a spammer do that? They only benefit if someone clicks on their links and buys something, or gets infected somehow. Could you be the subject of some kind of attack? You use the word community. Would anyone want to disrupt your community? Is this a community that is in one geographic area that can be detected by IP? Very interesting... Anyway, you can use URL resolution to test new users. When you get a tweet from a new user with a URL, check the URL, and blacklist them if it resolves to an empty page. If you only have to do this for new users, it won't be too processor intensive. On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru furkank...@gmail.com wrote: The text in these spam tweets are not easy to recognize. They do not repeat. They are mixed of different words and they contain a link. They seem to be sent via web. The ranking and discarding some mentions will not completely resolve the problem. Because our mention data and trending words data both were affected. We donot want to eliminate tweets from innocent people who have few followers. The simplest way seems to be just ignoring the tweets coming from outside of the community. But those tweets were helping us to extend our network. On Fri, Nov 26, 2010 at 6:42 PM, Adam Green 140...@gmail.com wrote: As long as you aren't trying to capture and deliver *all* tweets, there are a couple of good ways to cut out spammers. One thing I do is save all mentions for all users in a database of tweets. When a tweet comes in from the streaming API, I collect @mentions, and store them with the screen name of the
[twitter-dev] Trying to get rid of twitter spammers
Hello, I think there is a spamming action that uses too many twitter accounts and tweet by mentioning usernames and send as a reply. We receive thousands of similar spam tweets that are written as a reply to our followed users through streaming api. It spoils our data. The tweets seem to be sent from web not via a twitter app. Here are a few examples. @kaanalay http://twitter.com/kaanalay JobsCDFSales forevertravis RT ITS_NEL Discover lies from RonnieMo I'll come visit you ..lol http://bit.isff.com/3PoCt 26/11/10 12:49:01 http://twitter.com/P_Lobrayy/status/8109946705027073 @serkan_cakmak http://twitter.com/serkan_cakmak FREE!! before i have be mean/rude lol RT dreaontv: odotjdot *slides the Wrap it Up button ur way* http://fplk.c2.my/Yl4qz 26/11/10 12:49:01 http://twitter.com/ivtaathjathra/status/8109939918639105 @aralgamze http://twitter.com/aralgamze thiagomaciell mey2734 RT KokaMoe88: i wanna have sex .. right now at this moment || let's go lol http://wbx.c4.ee/v5QtU 26/11/10 12:49:01 http://twitter.com/qoorgeees/status/8109930166878208 @kkocaerkek http://twitter.com/kkocaerkek huh lol RT XxLovinJessixX: HELLL NOOO!!! I THATS POISON! RT :YUCKK -__- how about chipotle:) evebayby http://wmfi.l.to/VPkw5 26/11/10 12:49:01 http://twitter.com/fuaneledes/status/8109920641617920 @salihturan http://twitter.com/salihturan Niekstra 333TtJJ Fleegz RT PoetryNMoshun: SimplyMilele lol even the conscious got to love f*cking.. http://xllo.6p.ro/JPfIL http://twitter.com/rahaelrilt/status/8109887489839104 26/11/10 12:49:01 http://twitter.com/rahaelrilt/status/8109887489839104 http://twitturk.com/tweet/search?q=lol# @nlyshn http://twitter.com/nlyshn carynfust5 Bieberbananzaaa LOL!! RT firstlady47: FAMU= Nene's old nose, bcc= Nene's new clothespin nose http://tlny.1k.ru/IbUpy http://twitter.com/brafh/status/8109862101716992 http://twitturk.com/tweet/search?q=lol# 26/11/10 12:49:01 http://twitter.com/brafh/status/8109862101716992 @zehra_ozcan http://twitter.com/zehra_ozcan D88Miller GibsGaldino RT I_DOLLA: Kim lol RT BigHomie_: Nicki Minaj or Lil Kim in a fight WhoYaGot http://oyu.iz.rs/fGwaG http://twitter.com/YrnbAdi_Dhaama/status/8109813330345984 http://twitturk.com/tweet/search?q=lol# 26/11/10 12:49:01 http://twitter.com/YrnbAdi_Dhaama/status/8109813330345984 @I5IL http://twitter.com/I5IL sexspeaking a shit. So... If ya can't beat 'em, join 'em. RT The100KShow: LadyBlogga lol you endorsing that! http://nofj.hn.cx/r1jvr http://twitter.com/dqbajBSB/status/8109804488753152 http://twitturk.com/tweet/search?q=lol# 26/11/10 12:49:01 http://twitter.com/dqbajBSB/status/8109804488753152 @Melek_Ulker http://twitter.com/Melek_Ulker nciku honeku Pompam1016 RT KnockOWTdiva: Rhianna sounds like a lamb$$ lol on what song? http://gux.ah.sg/xlzaw 26/11/10 12:49:01 http://twitter.com/ManiSvitheick/status/8109799736614912 -- Furkan Kuru -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk
RE: [twitter-dev] Trying to get rid of twitter spammers
What I don't understand is that apart from possibly generating clicks why are people doing this? Are enough clicks converting into some kind of ROI interaction that makes them money? I keep expecting SPAM to take some kind of evolutionary leap (customized to your location/interests/cookies etc) but it seems to be the same old click requests. Cheers, Dean From: twitter-development-talk@googlegroups.com [mailto:twitter-development-t...@googlegroups.com] On Behalf Of Furkan Kuru Sent: Friday, 26 November 2010 6:02 AM To: twitter-development-talk@googlegroups.com Subject: [twitter-dev] Trying to get rid of twitter spammers Hello, I think there is a spamming action that uses too many twitter accounts and tweet by mentioning usernames and send as a reply. We receive thousands of similar spam tweets that are written as a reply to our followed users through streaming api. It spoils our data. The tweets seem to be sent from web not via a twitter app. Here are a few examples. @kaanalay http://twitter.com/kaanalay JobsCDFSales forevertravis RT ITS_NEL Discover lies from RonnieMo I'll come visit you ..lol http://bit.isff.com/3PoCt 26/11/10 12:49:01 http://twitter.com/P_Lobrayy/status/8109946705027073 @serkan_cakmak http://twitter.com/serkan_cakmak FREE!! before i have be mean/rude lol RT dreaontv: odotjdot *slides the Wrap it Up button ur way* http://fplk.c2.my/Yl4qz 26/11/10 12:49:01 http://twitter.com/ivtaathjathra/status/8109939918639105 @aralgamze http://twitter.com/aralgamze thiagomaciell mey2734 RT KokaMoe88: i wanna have sex .. right now at this moment || let's go lol http://wbx.c4.ee/v5QtU 26/11/10 12:49:01 http://twitter.com/qoorgeees/status/8109930166878208 @kkocaerkek http://twitter.com/kkocaerkek huh lol RT XxLovinJessixX: HELLL NOOO!!! I THATS POISON! RT :YUCKK -__- how about chipotle:) evebayby http://wmfi.l.to/VPkw5 26/11/10 12:49:01 http://twitter.com/fuaneledes/status/8109920641617920 @salihturan http://twitter.com/salihturan Niekstra 333TtJJ Fleegz RT PoetryNMoshun: SimplyMilele lol even the conscious got to love f*cking.. http://xllo.6p.ro/JPfIL http://twitter.com/rahaelrilt/status/8109887489839104 26/11/10 12:49:01 http://twitter.com/rahaelrilt/status/8109887489839104 http://twitturk.com/tweet/search?q=lol @nlyshn http://twitter.com/nlyshn carynfust5 Bieberbananzaaa LOL!! RT firstlady47: FAMU= Nene's old nose, bcc= Nene's new clothespin nose http://tlny.1k.ru/IbUpy http://twitter.com/brafh/status/8109862101716992 http://twitturk.com/tweet/search?q=lol 26/11/10 12:49:01 http://twitter.com/brafh/status/8109862101716992 @zehra_ozcan http://twitter.com/zehra_ozcan D88Miller GibsGaldino RT I_DOLLA: Kim lol RT BigHomie_: Nicki Minaj or Lil Kim in a fight WhoYaGot http://oyu.iz.rs/fGwaG http://twitter.com/YrnbAdi_Dhaama/status/8109813330345984 http://twitturk.com/tweet/search?q=lol 26/11/10 12:49:01 http://twitter.com/YrnbAdi_Dhaama/status/8109813330345984 @I5IL http://twitter.com/I5IL sexspeaking a shit. So... If ya can't beat 'em, join 'em. RT The100KShow: LadyBlogga lol you endorsing that! http://nofj.hn.cx/r1jvr http://twitter.com/dqbajBSB/status/8109804488753152 http://twitturk.com/tweet/search?q=lol 26/11/10 12:49:01 http://twitter.com/dqbajBSB/status/8109804488753152 @Melek_Ulker http://twitter.com/Melek_Ulker nciku honeku Pompam1016 RT KnockOWTdiva: Rhianna sounds like a lamb$$ lol on what song? http://gux.ah.sg/xlzaw 26/11/10 12:49:01 http://twitter.com/ManiSvitheick/status/8109799736614912 -- Furkan Kuru -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk
Re: [twitter-dev] Trying to get rid of twitter spammers
Word lol is the most common in these spam tweets. We receive 400 spam tweets per hour now tracking 100K people. We plan to delete all of the tweets containing lol word. It is also used by our users (Turkish people) writing in English though. Any better suggestions? On Fri, Nov 26, 2010 at 5:15 PM, Dean Collins d...@cognation.net wrote: What I don’t understand is that apart from possibly generating clicks why are people doing this? Are enough clicks converting into some kind of ROI interaction that makes them money? I keep expecting SPAM to take some kind of evolutionary leap (customized to your location/interests/cookies etc) but it seems to be the same old click requests. Cheers, Dean -- *From:* twitter-development-talk@googlegroups.com [mailto: twitter-development-t...@googlegroups.com] *On Behalf Of *Furkan Kuru *Sent:* Friday, 26 November 2010 6:02 AM *To:* twitter-development-talk@googlegroups.com *Subject:* [twitter-dev] Trying to get rid of twitter spammers Hello, I think there is a spamming action that uses too many twitter accounts and tweet by mentioning usernames and send as a reply. We receive thousands of similar spam tweets that are written as a reply to our followed users through streaming api. It spoils our data. The tweets seem to be sent from web not via a twitter app. Here are a few examples. @kaanalay http://twitter.com/kaanalay JobsCDFSales forevertravis RT ITS_NEL Discover lies from RonnieMo I'll come visit you ..lol http://bit.isff.com/3PoCt 26/11/10 12:49:01 http://twitter.com/P_Lobrayy/status/8109946705027073 @serkan_cakmak http://twitter.com/serkan_cakmak FREE!! before i have be mean/rude lol RT dreaontv: odotjdot *slides the Wrap it Up button ur way* http://fplk.c2.my/Yl4qz 26/11/10 12:49:01 http://twitter.com/ivtaathjathra/status/8109939918639105 @aralgamze http://twitter.com/aralgamze thiagomaciell mey2734 RT KokaMoe88: i wanna have sex .. right now at this moment || let's go lol http://wbx.c4.ee/v5QtU 26/11/10 12:49:01 http://twitter.com/qoorgeees/status/8109930166878208 @kkocaerkek http://twitter.com/kkocaerkek huh lol RT XxLovinJessixX: HELLL NOOO!!! I THATS POISON! RT :YUCKK -__- how about chipotle:) evebayby http://wmfi.l.to/VPkw5 26/11/10 12:49:01 http://twitter.com/fuaneledes/status/8109920641617920 @salihturan http://twitter.com/salihturan Niekstra 333TtJJ Fleegz RT PoetryNMoshun: SimplyMilele lol even the conscious got to love f*cking.. http://xllo.6p.ro/JPfIL http://twitter.com/rahaelrilt/status/8109887489839104 26/11/10 12:49:01 http://twitter.com/rahaelrilt/status/8109887489839104 http://twitturk.com/tweet/search?q=lol @nlyshn http://twitter.com/nlyshn carynfust5 Bieberbananzaaa LOL!! RT firstlady47: FAMU= Nene's old nose, bcc= Nene's new clothespin nose http://tlny.1k.ru/IbUpy http://twitter.com/brafh/status/8109862101716992http://twitturk.com/tweet/search?q=lol 26/11/10 12:49:01 http://twitter.com/brafh/status/8109862101716992 @zehra_ozcan http://twitter.com/zehra_ozcan D88Miller GibsGaldino RT I_DOLLA: Kim lol RT BigHomie_: Nicki Minaj or Lil Kim in a fight WhoYaGot http://oyu.iz.rs/fGwaG http://twitter.com/YrnbAdi_Dhaama/status/8109813330345984http://twitturk.com/tweet/search?q=lol 26/11/10 12:49:01 http://twitter.com/YrnbAdi_Dhaama/status/8109813330345984 @I5IL http://twitter.com/I5IL sexspeaking a shit. So... If ya can't beat 'em, join 'em. RT The100KShow: LadyBlogga lol you endorsing that! http://nofj.hn.cx/r1jvr http://twitter.com/dqbajBSB/status/8109804488753152http://twitturk.com/tweet/search?q=lol 26/11/10 12:49:01 http://twitter.com/dqbajBSB/status/8109804488753152 @Melek_Ulker http://twitter.com/Melek_Ulker nciku honeku Pompam1016 RT KnockOWTdiva: Rhianna sounds like a lamb$$ lol on what song? http://gux.ah.sg/xlzaw 26/11/10 12:49:01 http://twitter.com/ManiSvitheick/status/8109799736614912 -- Furkan Kuru -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk -- Furkan Kuru -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk
RE: [twitter-dev] Trying to get rid of twitter spammers
Hmmm I don't think that would work - it type lol in my @DeanCollins personal posts a lot :-) Cheers, Dean From: twitter-development-talk@googlegroups.com [mailto:twitter-development-t...@googlegroups.com] On Behalf Of Furkan Kuru Sent: Friday, 26 November 2010 11:27 AM To: twitter-development-talk@googlegroups.com Subject: Re: [twitter-dev] Trying to get rid of twitter spammers Word lol is the most common in these spam tweets. We receive 400 spam tweets per hour now tracking 100K people. We plan to delete all of the tweets containing lol word. It is also used by our users (Turkish people) writing in English though. Any better suggestions? On Fri, Nov 26, 2010 at 5:15 PM, Dean Collins d...@cognation.net wrote: What I don't understand is that apart from possibly generating clicks why are people doing this? Are enough clicks converting into some kind of ROI interaction that makes them money? I keep expecting SPAM to take some kind of evolutionary leap (customized to your location/interests/cookies etc) but it seems to be the same old click requests. Cheers, Dean From: twitter-development-talk@googlegroups.com [mailto:twitter-development-t...@googlegroups.com] On Behalf Of Furkan Kuru Sent: Friday, 26 November 2010 6:02 AM To: twitter-development-talk@googlegroups.com Subject: [twitter-dev] Trying to get rid of twitter spammers Hello, I think there is a spamming action that uses too many twitter accounts and tweet by mentioning usernames and send as a reply. We receive thousands of similar spam tweets that are written as a reply to our followed users through streaming api. It spoils our data. The tweets seem to be sent from web not via a twitter app. Here are a few examples. @kaanalay http://twitter.com/kaanalay JobsCDFSales forevertravis RT ITS_NEL Discover lies from RonnieMo I'll come visit you ..lol http://bit.isff.com/3PoCt 26/11/10 12:49:01 http://twitter.com/P_Lobrayy/status/8109946705027073 @serkan_cakmak http://twitter.com/serkan_cakmak FREE!! before i have be mean/rude lol RT dreaontv: odotjdot *slides the Wrap it Up button ur way* http://fplk.c2.my/Yl4qz 26/11/10 12:49:01 http://twitter.com/ivtaathjathra/status/8109939918639105 @aralgamze http://twitter.com/aralgamze thiagomaciell mey2734 RT KokaMoe88: i wanna have sex .. right now at this moment || let's go lol http://wbx.c4.ee/v5QtU 26/11/10 12:49:01 http://twitter.com/qoorgeees/status/8109930166878208 @kkocaerkek http://twitter.com/kkocaerkek huh lol RT XxLovinJessixX: HELLL NOOO!!! I THATS POISON! RT :YUCKK -__- how about chipotle:) evebayby http://wmfi.l.to/VPkw5 26/11/10 12:49:01 http://twitter.com/fuaneledes/status/8109920641617920 @salihturan http://twitter.com/salihturan Niekstra 333TtJJ Fleegz RT PoetryNMoshun: SimplyMilele lol even the conscious got to love f*cking.. http://xllo.6p.ro/JPfIL http://twitter.com/rahaelrilt/status/8109887489839104 26/11/10 12:49:01 http://twitter.com/rahaelrilt/status/8109887489839104 http://twitturk.com/tweet/search?q=lol @nlyshn http://twitter.com/nlyshn carynfust5 Bieberbananzaaa LOL!! RT firstlady47: FAMU= Nene's old nose, bcc= Nene's new clothespin nose http://tlny.1k.ru/IbUpy http://twitter.com/brafh/status/8109862101716992 http://twitturk.com/tweet/search?q=lol 26/11/10 12:49:01 http://twitter.com/brafh/status/8109862101716992 @zehra_ozcan http://twitter.com/zehra_ozcan D88Miller GibsGaldino RT I_DOLLA: Kim lol RT BigHomie_: Nicki Minaj or Lil Kim in a fight WhoYaGot http://oyu.iz.rs/fGwaG http://twitter.com/YrnbAdi_Dhaama/status/8109813330345984 http://twitturk.com/tweet/search?q=lol 26/11/10 12:49:01 http://twitter.com/YrnbAdi_Dhaama/status/8109813330345984 @I5IL http://twitter.com/I5IL sexspeaking a shit. So... If ya can't beat 'em, join 'em. RT The100KShow: LadyBlogga lol you endorsing that! http://nofj.hn.cx/r1jvr http://twitter.com/dqbajBSB/status/8109804488753152 http://twitturk.com/tweet/search?q=lol 26/11/10 12:49:01 http://twitter.com/dqbajBSB/status/8109804488753152 @Melek_Ulker http://twitter.com/Melek_Ulker nciku honeku Pompam1016 RT KnockOWTdiva: Rhianna sounds like a lamb$$ lol on what song? http://gux.ah.sg/xlzaw 26/11/10 12:49:01 http://twitter.com/ManiSvitheick/status/8109799736614912 -- Furkan Kuru -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com
Re: [twitter-dev] Trying to get rid of twitter spammers
As long as you aren't trying to capture and deliver *all* tweets, there are a couple of good ways to cut out spammers. One thing I do is save all mentions for all users in a database of tweets. When a tweet comes in from the streaming API, I collect @mentions, and store them with the screen name of the tweet's author and the screen name mentioned. Then I can rank users based on the number of different accounts that mention them. If you only use the tweets from the top N% of users, the quality improves a lot. I find that the top 80% is usually enough of a screen to get good quality. Another trick is blocking duplicates from each user. The API only blocks duplicates that repeat immediately, but if a spammer has a list of tweets, and cycles through them, all the tweets get through. I compare all new tweets with the other tweets from that user. This is very expensive if you have a big database. This can be made less intensive by limiting the comparison to just the tweets from that user in the last few days. You can also run this with a separate process that doesn't slow down you main tweet parsing loop. Most spammers are so simplistic that they just repeat the same tweet over and over. In a real spammy set of keywords, if I find more than a few duplicates from a user, I just stop saving their tweets. On Fri, Nov 26, 2010 at 11:26 AM, Furkan Kuru furkank...@gmail.com wrote: Word lol is the most common in these spam tweets. We receive 400 spam tweets per hour now tracking 100K people. We plan to delete all of the tweets containing lol word. It is also used by our users (Turkish people) writing in English though. Any better suggestions? -- Adam Green Twitter API Consultant and Trainer http://140dev.com @140dev -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk
Re: [twitter-dev] Trying to get rid of twitter spammers
Hmmm ... Twitter has a user quality filter that's supposed to weed out spammers from Search and Streaming. At about 450,000 new user IDs created every day, it might take a while for Twitter's spambot detectors to flag them all, but I'd think between algorithms and crowdsourced block / report, eventually they'd get taken out. -- M. Edward (Ed) Borasky http://borasky-research.net http://twitter.com/znmeb A mathematician is a device for turning coffee into theorems. - Paul Erdos Quoting Adam Green 140...@gmail.com: As long as you aren't trying to capture and deliver *all* tweets, there are a couple of good ways to cut out spammers. One thing I do is save all mentions for all users in a database of tweets. When a tweet comes in from the streaming API, I collect @mentions, and store them with the screen name of the tweet's author and the screen name mentioned. Then I can rank users based on the number of different accounts that mention them. If you only use the tweets from the top N% of users, the quality improves a lot. I find that the top 80% is usually enough of a screen to get good quality. Another trick is blocking duplicates from each user. The API only blocks duplicates that repeat immediately, but if a spammer has a list of tweets, and cycles through them, all the tweets get through. I compare all new tweets with the other tweets from that user. This is very expensive if you have a big database. This can be made less intensive by limiting the comparison to just the tweets from that user in the last few days. You can also run this with a separate process that doesn't slow down you main tweet parsing loop. Most spammers are so simplistic that they just repeat the same tweet over and over. In a real spammy set of keywords, if I find more than a few duplicates from a user, I just stop saving their tweets. On Fri, Nov 26, 2010 at 11:26 AM, Furkan Kuru furkank...@gmail.com wrote: Word lol is the most common in these spam tweets. We receive 400 spam tweets per hour now tracking 100K people. We plan to delete all of the tweets containing lol word. It is also used by our users (Turkish people) writing in English though. Any better suggestions? -- Adam Green Twitter API Consultant and Trainer http://140dev.com @140dev -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk