Re: [twitter-dev] Trying to get rid of twitter spammers

2010-12-11 Thread Furkan Kuru
Unfortunately we do not have any time to implement a spam filter/ranking
algorithm.

Besides I think this issue should be resolved on the twitter side.

Some people are sending tweets in reply to *all* twitter users.
I think the spammer twitter accounts and their tweets should be analyzed.

The behaviour I see:

Open a new twitter account
No need to follow anyone
But tweet as a reply to some people with some spam message as many as
hundreds.

As I said earlier, the tweets contain "lol" word in common.

example:

https://twitter.com/madiav_isBOMB
https://twitter.com/ddubplneandonly

for more caught by our system (as a reply to Turkish twitter-ers):
http://twitturk.com/tweet/search?q=lol



On Sun, Nov 28, 2010 at 12:10 AM, Adam Green <140...@gmail.com> wrote:

> My final suggestion is to rank users by something (age of account,
> number of mentions/mentioners/followers/following) and cut out the
> bottom N%.
>
> On Sat, Nov 27, 2010 at 4:18 PM, Furkan Kuru  wrote:
> >
> > Another hosting will be problematic to maintain.
> > I have looked at a few more short urls. They redirect to very wide range
> of
> > sites not just amazon.
> >
> > I think twitter may change the priority level of "Report for spam" for
> new
> > opened accounts.
> > And the number of tweets per hour.
> >
> > Here I write again the link that shows the tweets written as a reply to
> > Turkish people
> > the lol word is the common:
> > http://twitturk.com/tweet/search?q=lol
> >
> > And an example account:
> > http://twitter.com/Bomuchellxee
> > All tweets are spam and "lol" is common.
> > It has also 0 folloing and 3 followers (real accounts I guess).
> > Unbelievable!
> >
> >
> >
> > On Sat, Nov 27, 2010 at 4:29 PM, Adam Green <140...@gmail.com> wrote:
> >>
> >> Now you know that it does resolve differently in different countries.
> >> You could set up an account with a webhost in the US, and have a
> >> script there that you can call with URLs in tweets from new users. If
> >> the URL resolves to a blank page, blacklist that user. There are
> >> plenty of good hosts that only charge $7 a month. Sounds extreme, but
> >> these are very clever spammers.
> >>
> >> Or you could just resolve URLs from new users, and blacklist them if
> >> the URL points to Amazon. That will work as long as they still point
> >> to Amazon.
> >>
> >> On Sat, Nov 27, 2010 at 9:12 AM, Furkan Kuru 
> wrote:
> >> > It returns a redirection to amazon.com product page
> >> >
> >> > Example:
> >> >
> >> >
> >> >
> http://www.amazon.com/gp/product/B0041E16RC?ie=UTF8&tag=iphone403d-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=B0041E16RC
> >> >
> >> >
> >> > On Sat, Nov 27, 2010 at 4:04 PM, Adam Green <140...@gmail.com> wrote:
> >> >>
> >> >> The URLs again return a code of 200 and nothing in the content. What
> >> >> happens when you try getting one of the URLs with cURL? I'm curious
> if
> >> >> it behaves differently for an IP in Turkey.
> >> >>
> >> >> On Sat, Nov 27, 2010 at 8:56 AM, Furkan Kuru 
> >> >> wrote:
> >> >> > Most of the tweets here are spams:
> >> >> >
> >> >> > http://twitturk.com/tweet/search?q=lol
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Sat, Nov 27, 2010 at 3:33 PM, Adam Green <140...@gmail.com>
> wrote:
> >> >> >>
> >> >> >> All of your sample spam tweets are from suspended accounts, yet
> the
> >> >> >> tweets were only sent yesterday. That means that the spammers
> >> >> >> behavior
> >> >> >> was so aggressive that they were suspended quickly by a Twitter
> >> >> >> algorithm. I doubt that a human at Twitter read your email and
> went
> >> >> >> through each tweet suspending the accounts. Have you checked to
> see
> >> >> >> how quickly these spam accounts get canceled for other spam
> tweets?
> >> >> >> You could hold back tweets from unknown users for 24 hours, and
> then
> >> >> >> check all new users through the API to see if they are suspended.
> If
> >> >> >> they aren't suspended, you can whitelist them in your system.
> >> >> >>
> >> >> >> What is really weird is that I also checked the URLs in these
> tweets
> >> >> >> and they resolve to an empty page. They return a header with an
> HTTP
> >> >> >> code of 200, and no content at all. That can't be an accident.
> >> >> >> Either
> >> >> >> they are sending empty responses to everyone, or they could tell
> >> >> >> from
> >> >> >> my IP that they didn't want to send anything to me. Why would a
> >> >> >> spammer do that? They only benefit if someone clicks on their
> links
> >> >> >> and buys something, or gets infected somehow. Could you be the
> >> >> >> subject
> >> >> >> of some kind of attack? You use the word "community." Would anyone
> >> >> >> want to disrupt your community? Is this a community that is in one
> >> >> >> geographic area that can be detected by IP? Very interesting...
> >> >> >>
> >> >> >> Anyway, you can use URL resolution to test new users. When you get
> a
> >> >> >> tweet from a new user with a URL, check the URL, and blacklist
> them
> >> >> >> if
> >> 

Re: [twitter-dev] Trying to get rid of twitter spammers

2010-11-27 Thread Adam Green
My final suggestion is to rank users by something (age of account,
number of mentions/mentioners/followers/following) and cut out the
bottom N%.

On Sat, Nov 27, 2010 at 4:18 PM, Furkan Kuru  wrote:
>
> Another hosting will be problematic to maintain.
> I have looked at a few more short urls. They redirect to very wide range of
> sites not just amazon.
>
> I think twitter may change the priority level of "Report for spam" for new
> opened accounts.
> And the number of tweets per hour.
>
> Here I write again the link that shows the tweets written as a reply to
> Turkish people
> the lol word is the common:
> http://twitturk.com/tweet/search?q=lol
>
> And an example account:
> http://twitter.com/Bomuchellxee
> All tweets are spam and "lol" is common.
> It has also 0 folloing and 3 followers (real accounts I guess).
> Unbelievable!
>
>
>
> On Sat, Nov 27, 2010 at 4:29 PM, Adam Green <140...@gmail.com> wrote:
>>
>> Now you know that it does resolve differently in different countries.
>> You could set up an account with a webhost in the US, and have a
>> script there that you can call with URLs in tweets from new users. If
>> the URL resolves to a blank page, blacklist that user. There are
>> plenty of good hosts that only charge $7 a month. Sounds extreme, but
>> these are very clever spammers.
>>
>> Or you could just resolve URLs from new users, and blacklist them if
>> the URL points to Amazon. That will work as long as they still point
>> to Amazon.
>>
>> On Sat, Nov 27, 2010 at 9:12 AM, Furkan Kuru  wrote:
>> > It returns a redirection to amazon.com product page
>> >
>> > Example:
>> >
>> >
>> > http://www.amazon.com/gp/product/B0041E16RC?ie=UTF8&tag=iphone403d-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=B0041E16RC
>> >
>> >
>> > On Sat, Nov 27, 2010 at 4:04 PM, Adam Green <140...@gmail.com> wrote:
>> >>
>> >> The URLs again return a code of 200 and nothing in the content. What
>> >> happens when you try getting one of the URLs with cURL? I'm curious if
>> >> it behaves differently for an IP in Turkey.
>> >>
>> >> On Sat, Nov 27, 2010 at 8:56 AM, Furkan Kuru 
>> >> wrote:
>> >> > Most of the tweets here are spams:
>> >> >
>> >> > http://twitturk.com/tweet/search?q=lol
>> >> >
>> >> >
>> >> >
>> >> > On Sat, Nov 27, 2010 at 3:33 PM, Adam Green <140...@gmail.com> wrote:
>> >> >>
>> >> >> All of your sample spam tweets are from suspended accounts, yet the
>> >> >> tweets were only sent yesterday. That means that the spammers
>> >> >> behavior
>> >> >> was so aggressive that they were suspended quickly by a Twitter
>> >> >> algorithm. I doubt that a human at Twitter read your email and went
>> >> >> through each tweet suspending the accounts. Have you checked to see
>> >> >> how quickly these spam accounts get canceled for other spam tweets?
>> >> >> You could hold back tweets from unknown users for 24 hours, and then
>> >> >> check all new users through the API to see if they are suspended. If
>> >> >> they aren't suspended, you can whitelist them in your system.
>> >> >>
>> >> >> What is really weird is that I also checked the URLs in these tweets
>> >> >> and they resolve to an empty page. They return a header with an HTTP
>> >> >> code of 200, and no content at all. That can't be an accident.
>> >> >> Either
>> >> >> they are sending empty responses to everyone, or they could tell
>> >> >> from
>> >> >> my IP that they didn't want to send anything to me. Why would a
>> >> >> spammer do that? They only benefit if someone clicks on their links
>> >> >> and buys something, or gets infected somehow. Could you be the
>> >> >> subject
>> >> >> of some kind of attack? You use the word "community." Would anyone
>> >> >> want to disrupt your community? Is this a community that is in one
>> >> >> geographic area that can be detected by IP? Very interesting...
>> >> >>
>> >> >> Anyway, you can use URL resolution to test new users. When you get a
>> >> >> tweet from a new user with a URL, check the URL, and blacklist them
>> >> >> if
>> >> >> it resolves to an empty page. If you only have to do this for new
>> >> >> users, it won't be too processor intensive.
>> >> >>
>> >> >>
>> >> >> On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru 
>> >> >> wrote:
>> >> >> > The text in these spam tweets are not easy to recognize.
>> >> >> > They do not repeat. They are mixed of different words and they
>> >> >> > contain a
>> >> >> > link.
>> >> >> > They seem to be sent via web.
>> >> >> >
>> >> >> > The ranking and discarding some mentions will not completely
>> >> >> > resolve
>> >> >> > the
>> >> >> > problem.
>> >> >> > Because our mention data and trending words data both were
>> >> >> > affected.
>> >> >> > We
>> >> >> > donot want to eliminate tweets from innocent people who have few
>> >> >> > followers.
>> >> >> >
>> >> >> > The simplest way seems to be just ignoring the tweets coming from
>> >> >> > outside of
>> >> >> > the community.
>> >> >> > But those tweets were helping us to extend our network.
>> >> >> >
>>

Re: [twitter-dev] Trying to get rid of twitter spammers

2010-11-27 Thread Edward Hotchkiss
the followers are probably bots, create an account and within about 5 minutes 
or less you will generally have 2-3 followers that appear [real]. they iterate 
over ids. someone is running a dating/hookup bot net with those user accounts.

On Nov 27, 2010, at 4:18 PM, Furkan Kuru wrote:

> 
> Another hosting will be problematic to maintain.
> I have looked at a few more short urls. They redirect to very wide range of 
> sites not just amazon.
> 
> I think twitter may change the priority level of "Report for spam" for new 
> opened accounts.
> And the number of tweets per hour.
> 
> Here I write again the link that shows the tweets written as a reply to 
> Turkish people
> the lol word is the common:
> http://twitturk.com/tweet/search?q=lol
> 
> And an example account:
> http://twitter.com/Bomuchellxee
> All tweets are spam and "lol" is common.  
> It has also 0 folloing and 3 followers (real accounts I guess). Unbelievable!
> 
> 
> 
> On Sat, Nov 27, 2010 at 4:29 PM, Adam Green <140...@gmail.com> wrote:
> Now you know that it does resolve differently in different countries.
> You could set up an account with a webhost in the US, and have a
> script there that you can call with URLs in tweets from new users. If
> the URL resolves to a blank page, blacklist that user. There are
> plenty of good hosts that only charge $7 a month. Sounds extreme, but
> these are very clever spammers.
> 
> Or you could just resolve URLs from new users, and blacklist them if
> the URL points to Amazon. That will work as long as they still point
> to Amazon.
> 
> On Sat, Nov 27, 2010 at 9:12 AM, Furkan Kuru  wrote:
> > It returns a redirection to amazon.com product page
> >
> > Example:
> >
> > http://www.amazon.com/gp/product/B0041E16RC?ie=UTF8&tag=iphone403d-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=B0041E16RC
> >
> >
> > On Sat, Nov 27, 2010 at 4:04 PM, Adam Green <140...@gmail.com> wrote:
> >>
> >> The URLs again return a code of 200 and nothing in the content. What
> >> happens when you try getting one of the URLs with cURL? I'm curious if
> >> it behaves differently for an IP in Turkey.
> >>
> >> On Sat, Nov 27, 2010 at 8:56 AM, Furkan Kuru  wrote:
> >> > Most of the tweets here are spams:
> >> >
> >> > http://twitturk.com/tweet/search?q=lol
> >> >
> >> >
> >> >
> >> > On Sat, Nov 27, 2010 at 3:33 PM, Adam Green <140...@gmail.com> wrote:
> >> >>
> >> >> All of your sample spam tweets are from suspended accounts, yet the
> >> >> tweets were only sent yesterday. That means that the spammers behavior
> >> >> was so aggressive that they were suspended quickly by a Twitter
> >> >> algorithm. I doubt that a human at Twitter read your email and went
> >> >> through each tweet suspending the accounts. Have you checked to see
> >> >> how quickly these spam accounts get canceled for other spam tweets?
> >> >> You could hold back tweets from unknown users for 24 hours, and then
> >> >> check all new users through the API to see if they are suspended. If
> >> >> they aren't suspended, you can whitelist them in your system.
> >> >>
> >> >> What is really weird is that I also checked the URLs in these tweets
> >> >> and they resolve to an empty page. They return a header with an HTTP
> >> >> code of 200, and no content at all. That can't be an accident. Either
> >> >> they are sending empty responses to everyone, or they could tell from
> >> >> my IP that they didn't want to send anything to me. Why would a
> >> >> spammer do that? They only benefit if someone clicks on their links
> >> >> and buys something, or gets infected somehow. Could you be the subject
> >> >> of some kind of attack? You use the word "community." Would anyone
> >> >> want to disrupt your community? Is this a community that is in one
> >> >> geographic area that can be detected by IP? Very interesting...
> >> >>
> >> >> Anyway, you can use URL resolution to test new users. When you get a
> >> >> tweet from a new user with a URL, check the URL, and blacklist them if
> >> >> it resolves to an empty page. If you only have to do this for new
> >> >> users, it won't be too processor intensive.
> >> >>
> >> >>
> >> >> On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru 
> >> >> wrote:
> >> >> > The text in these spam tweets are not easy to recognize.
> >> >> > They do not repeat. They are mixed of different words and they
> >> >> > contain a
> >> >> > link.
> >> >> > They seem to be sent via web.
> >> >> >
> >> >> > The ranking and discarding some mentions will not completely resolve
> >> >> > the
> >> >> > problem.
> >> >> > Because our mention data and trending words data both were affected.
> >> >> > We
> >> >> > donot want to eliminate tweets from innocent people who have few
> >> >> > followers.
> >> >> >
> >> >> > The simplest way seems to be just ignoring the tweets coming from
> >> >> > outside of
> >> >> > the community.
> >> >> > But those tweets were helping us to extend our network.
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Fri, Nov 26, 2010 at 6:42 PM, A

Re: [twitter-dev] Trying to get rid of twitter spammers

2010-11-27 Thread Furkan Kuru
Another hosting will be problematic to maintain.
I have looked at a few more short urls. They redirect to very wide range of
sites not just amazon.

I think twitter may change the priority level of "Report for spam" for new
opened accounts.
And the number of tweets per hour.

Here I write again the link that shows the tweets written as a reply to
Turkish people
the lol word is the common:
http://twitturk.com/tweet/search?q=lol

And an example account:
http://twitter.com/Bomuchellxee
All tweets are spam and "lol" is common.
It has also 0 folloing and 3 followers (real accounts I guess).
Unbelievable!



On Sat, Nov 27, 2010 at 4:29 PM, Adam Green <140...@gmail.com> wrote:

> Now you know that it does resolve differently in different countries.
> You could set up an account with a webhost in the US, and have a
> script there that you can call with URLs in tweets from new users. If
> the URL resolves to a blank page, blacklist that user. There are
> plenty of good hosts that only charge $7 a month. Sounds extreme, but
> these are very clever spammers.
>
> Or you could just resolve URLs from new users, and blacklist them if
> the URL points to Amazon. That will work as long as they still point
> to Amazon.
>
> On Sat, Nov 27, 2010 at 9:12 AM, Furkan Kuru  wrote:
> > It returns a redirection to amazon.com product page
> >
> > Example:
> >
> >
> http://www.amazon.com/gp/product/B0041E16RC?ie=UTF8&tag=iphone403d-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=B0041E16RC
> >
> >
> > On Sat, Nov 27, 2010 at 4:04 PM, Adam Green <140...@gmail.com> wrote:
> >>
> >> The URLs again return a code of 200 and nothing in the content. What
> >> happens when you try getting one of the URLs with cURL? I'm curious if
> >> it behaves differently for an IP in Turkey.
> >>
> >> On Sat, Nov 27, 2010 at 8:56 AM, Furkan Kuru 
> wrote:
> >> > Most of the tweets here are spams:
> >> >
> >> > http://twitturk.com/tweet/search?q=lol
> >> >
> >> >
> >> >
> >> > On Sat, Nov 27, 2010 at 3:33 PM, Adam Green <140...@gmail.com> wrote:
> >> >>
> >> >> All of your sample spam tweets are from suspended accounts, yet the
> >> >> tweets were only sent yesterday. That means that the spammers
> behavior
> >> >> was so aggressive that they were suspended quickly by a Twitter
> >> >> algorithm. I doubt that a human at Twitter read your email and went
> >> >> through each tweet suspending the accounts. Have you checked to see
> >> >> how quickly these spam accounts get canceled for other spam tweets?
> >> >> You could hold back tweets from unknown users for 24 hours, and then
> >> >> check all new users through the API to see if they are suspended. If
> >> >> they aren't suspended, you can whitelist them in your system.
> >> >>
> >> >> What is really weird is that I also checked the URLs in these tweets
> >> >> and they resolve to an empty page. They return a header with an HTTP
> >> >> code of 200, and no content at all. That can't be an accident. Either
> >> >> they are sending empty responses to everyone, or they could tell from
> >> >> my IP that they didn't want to send anything to me. Why would a
> >> >> spammer do that? They only benefit if someone clicks on their links
> >> >> and buys something, or gets infected somehow. Could you be the
> subject
> >> >> of some kind of attack? You use the word "community." Would anyone
> >> >> want to disrupt your community? Is this a community that is in one
> >> >> geographic area that can be detected by IP? Very interesting...
> >> >>
> >> >> Anyway, you can use URL resolution to test new users. When you get a
> >> >> tweet from a new user with a URL, check the URL, and blacklist them
> if
> >> >> it resolves to an empty page. If you only have to do this for new
> >> >> users, it won't be too processor intensive.
> >> >>
> >> >>
> >> >> On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru 
> >> >> wrote:
> >> >> > The text in these spam tweets are not easy to recognize.
> >> >> > They do not repeat. They are mixed of different words and they
> >> >> > contain a
> >> >> > link.
> >> >> > They seem to be sent via web.
> >> >> >
> >> >> > The ranking and discarding some mentions will not completely
> resolve
> >> >> > the
> >> >> > problem.
> >> >> > Because our mention data and trending words data both were
> affected.
> >> >> > We
> >> >> > donot want to eliminate tweets from innocent people who have few
> >> >> > followers.
> >> >> >
> >> >> > The simplest way seems to be just ignoring the tweets coming from
> >> >> > outside of
> >> >> > the community.
> >> >> > But those tweets were helping us to extend our network.
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Fri, Nov 26, 2010 at 6:42 PM, Adam Green <140...@gmail.com>
> wrote:
> >> >> >>
> >> >> >> As long as you aren't trying to capture and deliver *all* tweets,
> >> >> >> there are a couple of good ways to cut out spammers. One thing I
> do
> >> >> >> is
> >> >> >> save all mentions for all users in a database of tweets. When a
> >> >> >> tweet
> >> >> >> c

Re: [twitter-dev] Trying to get rid of twitter spammers

2010-11-27 Thread Adam Green
Now you know that it does resolve differently in different countries.
You could set up an account with a webhost in the US, and have a
script there that you can call with URLs in tweets from new users. If
the URL resolves to a blank page, blacklist that user. There are
plenty of good hosts that only charge $7 a month. Sounds extreme, but
these are very clever spammers.

Or you could just resolve URLs from new users, and blacklist them if
the URL points to Amazon. That will work as long as they still point
to Amazon.

On Sat, Nov 27, 2010 at 9:12 AM, Furkan Kuru  wrote:
> It returns a redirection to amazon.com product page
>
> Example:
>
> http://www.amazon.com/gp/product/B0041E16RC?ie=UTF8&tag=iphone403d-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=B0041E16RC
>
>
> On Sat, Nov 27, 2010 at 4:04 PM, Adam Green <140...@gmail.com> wrote:
>>
>> The URLs again return a code of 200 and nothing in the content. What
>> happens when you try getting one of the URLs with cURL? I'm curious if
>> it behaves differently for an IP in Turkey.
>>
>> On Sat, Nov 27, 2010 at 8:56 AM, Furkan Kuru  wrote:
>> > Most of the tweets here are spams:
>> >
>> > http://twitturk.com/tweet/search?q=lol
>> >
>> >
>> >
>> > On Sat, Nov 27, 2010 at 3:33 PM, Adam Green <140...@gmail.com> wrote:
>> >>
>> >> All of your sample spam tweets are from suspended accounts, yet the
>> >> tweets were only sent yesterday. That means that the spammers behavior
>> >> was so aggressive that they were suspended quickly by a Twitter
>> >> algorithm. I doubt that a human at Twitter read your email and went
>> >> through each tweet suspending the accounts. Have you checked to see
>> >> how quickly these spam accounts get canceled for other spam tweets?
>> >> You could hold back tweets from unknown users for 24 hours, and then
>> >> check all new users through the API to see if they are suspended. If
>> >> they aren't suspended, you can whitelist them in your system.
>> >>
>> >> What is really weird is that I also checked the URLs in these tweets
>> >> and they resolve to an empty page. They return a header with an HTTP
>> >> code of 200, and no content at all. That can't be an accident. Either
>> >> they are sending empty responses to everyone, or they could tell from
>> >> my IP that they didn't want to send anything to me. Why would a
>> >> spammer do that? They only benefit if someone clicks on their links
>> >> and buys something, or gets infected somehow. Could you be the subject
>> >> of some kind of attack? You use the word "community." Would anyone
>> >> want to disrupt your community? Is this a community that is in one
>> >> geographic area that can be detected by IP? Very interesting...
>> >>
>> >> Anyway, you can use URL resolution to test new users. When you get a
>> >> tweet from a new user with a URL, check the URL, and blacklist them if
>> >> it resolves to an empty page. If you only have to do this for new
>> >> users, it won't be too processor intensive.
>> >>
>> >>
>> >> On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru 
>> >> wrote:
>> >> > The text in these spam tweets are not easy to recognize.
>> >> > They do not repeat. They are mixed of different words and they
>> >> > contain a
>> >> > link.
>> >> > They seem to be sent via web.
>> >> >
>> >> > The ranking and discarding some mentions will not completely resolve
>> >> > the
>> >> > problem.
>> >> > Because our mention data and trending words data both were affected.
>> >> > We
>> >> > donot want to eliminate tweets from innocent people who have few
>> >> > followers.
>> >> >
>> >> > The simplest way seems to be just ignoring the tweets coming from
>> >> > outside of
>> >> > the community.
>> >> > But those tweets were helping us to extend our network.
>> >> >
>> >> >
>> >> >
>> >> > On Fri, Nov 26, 2010 at 6:42 PM, Adam Green <140...@gmail.com> wrote:
>> >> >>
>> >> >> As long as you aren't trying to capture and deliver *all* tweets,
>> >> >> there are a couple of good ways to cut out spammers. One thing I do
>> >> >> is
>> >> >> save all mentions for all users in a database of tweets. When a
>> >> >> tweet
>> >> >> comes in from the streaming API, I collect @mentions, and store them
>> >> >> with the screen name of the tweet's author and the screen name
>> >> >> mentioned. Then I can rank users based on the number of different
>> >> >> accounts that mention them. If you only use the tweets from the top
>> >> >> N%
>> >> >> of users, the quality improves a lot. I find that the top 80% is
>> >> >> usually enough of a screen to get good quality.
>> >> >>
>> >> >> Another trick is blocking duplicates from each user. The API only
>> >> >> blocks duplicates that repeat immediately, but if a spammer has a
>> >> >> list
>> >> >> of tweets, and cycles through them, all the tweets get through. I
>> >> >> compare all new tweets with the other tweets from that user. This is
>> >> >> very expensive if you have a big database. This can be made less
>> >> >> intensive by limiting the comparison 

Re: [twitter-dev] Trying to get rid of twitter spammers

2010-11-27 Thread Furkan Kuru
It returns a redirection to amazon.com product page

Example:

http://www.amazon.com/gp/product/B0041E16RC?ie=UTF8&tag=iphone403d-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=B0041E16RC


On Sat, Nov 27, 2010 at 4:04 PM, Adam Green <140...@gmail.com> wrote:

> The URLs again return a code of 200 and nothing in the content. What
> happens when you try getting one of the URLs with cURL? I'm curious if
> it behaves differently for an IP in Turkey.
>
> On Sat, Nov 27, 2010 at 8:56 AM, Furkan Kuru  wrote:
> > Most of the tweets here are spams:
> >
> > http://twitturk.com/tweet/search?q=lol
> >
> >
> >
> > On Sat, Nov 27, 2010 at 3:33 PM, Adam Green <140...@gmail.com> wrote:
> >>
> >> All of your sample spam tweets are from suspended accounts, yet the
> >> tweets were only sent yesterday. That means that the spammers behavior
> >> was so aggressive that they were suspended quickly by a Twitter
> >> algorithm. I doubt that a human at Twitter read your email and went
> >> through each tweet suspending the accounts. Have you checked to see
> >> how quickly these spam accounts get canceled for other spam tweets?
> >> You could hold back tweets from unknown users for 24 hours, and then
> >> check all new users through the API to see if they are suspended. If
> >> they aren't suspended, you can whitelist them in your system.
> >>
> >> What is really weird is that I also checked the URLs in these tweets
> >> and they resolve to an empty page. They return a header with an HTTP
> >> code of 200, and no content at all. That can't be an accident. Either
> >> they are sending empty responses to everyone, or they could tell from
> >> my IP that they didn't want to send anything to me. Why would a
> >> spammer do that? They only benefit if someone clicks on their links
> >> and buys something, or gets infected somehow. Could you be the subject
> >> of some kind of attack? You use the word "community." Would anyone
> >> want to disrupt your community? Is this a community that is in one
> >> geographic area that can be detected by IP? Very interesting...
> >>
> >> Anyway, you can use URL resolution to test new users. When you get a
> >> tweet from a new user with a URL, check the URL, and blacklist them if
> >> it resolves to an empty page. If you only have to do this for new
> >> users, it won't be too processor intensive.
> >>
> >>
> >> On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru 
> wrote:
> >> > The text in these spam tweets are not easy to recognize.
> >> > They do not repeat. They are mixed of different words and they contain
> a
> >> > link.
> >> > They seem to be sent via web.
> >> >
> >> > The ranking and discarding some mentions will not completely resolve
> the
> >> > problem.
> >> > Because our mention data and trending words data both were affected.
> We
> >> > donot want to eliminate tweets from innocent people who have few
> >> > followers.
> >> >
> >> > The simplest way seems to be just ignoring the tweets coming from
> >> > outside of
> >> > the community.
> >> > But those tweets were helping us to extend our network.
> >> >
> >> >
> >> >
> >> > On Fri, Nov 26, 2010 at 6:42 PM, Adam Green <140...@gmail.com> wrote:
> >> >>
> >> >> As long as you aren't trying to capture and deliver *all* tweets,
> >> >> there are a couple of good ways to cut out spammers. One thing I do
> is
> >> >> save all mentions for all users in a database of tweets. When a tweet
> >> >> comes in from the streaming API, I collect @mentions, and store them
> >> >> with the screen name of the tweet's author and the screen name
> >> >> mentioned. Then I can rank users based on the number of different
> >> >> accounts that mention them. If you only use the tweets from the top
> N%
> >> >> of users, the quality improves a lot. I find that the top 80% is
> >> >> usually enough of a screen to get good quality.
> >> >>
> >> >> Another trick is blocking duplicates from each user. The API only
> >> >> blocks duplicates that repeat immediately, but if a spammer has a
> list
> >> >> of tweets, and cycles through them, all the tweets get through. I
> >> >> compare all new tweets with the other tweets from that user. This is
> >> >> very expensive if you have a big database. This can be made less
> >> >> intensive by limiting the comparison to just the tweets from that
> user
> >> >> in the last few days. You can also run this with a separate process
> >> >> that doesn't slow down you main tweet parsing loop. Most spammers are
> >> >> so simplistic that they just repeat the same tweet over and over. In
> a
> >> >> real spammy set of keywords, if I find more than a few duplicates
> from
> >> >> a user, I just stop saving their tweets.
> >> >>
> >> >>
> >> >> On Fri, Nov 26, 2010 at 11:26 AM, Furkan Kuru 
> >> >> wrote:
> >> >> >
> >> >> > Word "lol" is the most common in these spam tweets. We receive 400
> >> >> > spam
> >> >> > tweets per hour now tracking 100K people.
> >> >> >
> >> >> > We plan to delete all of the tweets containing "lol" word.

Re: [twitter-dev] Trying to get rid of twitter spammers

2010-11-27 Thread Adam Green
The URLs again return a code of 200 and nothing in the content. What
happens when you try getting one of the URLs with cURL? I'm curious if
it behaves differently for an IP in Turkey.

On Sat, Nov 27, 2010 at 8:56 AM, Furkan Kuru  wrote:
> Most of the tweets here are spams:
>
> http://twitturk.com/tweet/search?q=lol
>
>
>
> On Sat, Nov 27, 2010 at 3:33 PM, Adam Green <140...@gmail.com> wrote:
>>
>> All of your sample spam tweets are from suspended accounts, yet the
>> tweets were only sent yesterday. That means that the spammers behavior
>> was so aggressive that they were suspended quickly by a Twitter
>> algorithm. I doubt that a human at Twitter read your email and went
>> through each tweet suspending the accounts. Have you checked to see
>> how quickly these spam accounts get canceled for other spam tweets?
>> You could hold back tweets from unknown users for 24 hours, and then
>> check all new users through the API to see if they are suspended. If
>> they aren't suspended, you can whitelist them in your system.
>>
>> What is really weird is that I also checked the URLs in these tweets
>> and they resolve to an empty page. They return a header with an HTTP
>> code of 200, and no content at all. That can't be an accident. Either
>> they are sending empty responses to everyone, or they could tell from
>> my IP that they didn't want to send anything to me. Why would a
>> spammer do that? They only benefit if someone clicks on their links
>> and buys something, or gets infected somehow. Could you be the subject
>> of some kind of attack? You use the word "community." Would anyone
>> want to disrupt your community? Is this a community that is in one
>> geographic area that can be detected by IP? Very interesting...
>>
>> Anyway, you can use URL resolution to test new users. When you get a
>> tweet from a new user with a URL, check the URL, and blacklist them if
>> it resolves to an empty page. If you only have to do this for new
>> users, it won't be too processor intensive.
>>
>>
>> On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru  wrote:
>> > The text in these spam tweets are not easy to recognize.
>> > They do not repeat. They are mixed of different words and they contain a
>> > link.
>> > They seem to be sent via web.
>> >
>> > The ranking and discarding some mentions will not completely resolve the
>> > problem.
>> > Because our mention data and trending words data both were affected. We
>> > donot want to eliminate tweets from innocent people who have few
>> > followers.
>> >
>> > The simplest way seems to be just ignoring the tweets coming from
>> > outside of
>> > the community.
>> > But those tweets were helping us to extend our network.
>> >
>> >
>> >
>> > On Fri, Nov 26, 2010 at 6:42 PM, Adam Green <140...@gmail.com> wrote:
>> >>
>> >> As long as you aren't trying to capture and deliver *all* tweets,
>> >> there are a couple of good ways to cut out spammers. One thing I do is
>> >> save all mentions for all users in a database of tweets. When a tweet
>> >> comes in from the streaming API, I collect @mentions, and store them
>> >> with the screen name of the tweet's author and the screen name
>> >> mentioned. Then I can rank users based on the number of different
>> >> accounts that mention them. If you only use the tweets from the top N%
>> >> of users, the quality improves a lot. I find that the top 80% is
>> >> usually enough of a screen to get good quality.
>> >>
>> >> Another trick is blocking duplicates from each user. The API only
>> >> blocks duplicates that repeat immediately, but if a spammer has a list
>> >> of tweets, and cycles through them, all the tweets get through. I
>> >> compare all new tweets with the other tweets from that user. This is
>> >> very expensive if you have a big database. This can be made less
>> >> intensive by limiting the comparison to just the tweets from that user
>> >> in the last few days. You can also run this with a separate process
>> >> that doesn't slow down you main tweet parsing loop. Most spammers are
>> >> so simplistic that they just repeat the same tweet over and over. In a
>> >> real spammy set of keywords, if I find more than a few duplicates from
>> >> a user, I just stop saving their tweets.
>> >>
>> >>
>> >> On Fri, Nov 26, 2010 at 11:26 AM, Furkan Kuru 
>> >> wrote:
>> >> >
>> >> > Word "lol" is the most common in these spam tweets. We receive 400
>> >> > spam
>> >> > tweets per hour now tracking 100K people.
>> >> >
>> >> > We plan to delete all of the tweets containing "lol" word. It is also
>> >> > used
>> >> > by our users (Turkish people) writing in English though.
>> >> >
>> >> > Any better suggestions?
>> >> >
>> >>
>> >> --
>> >> Adam Green
>> >> Twitter API Consultant and Trainer
>> >> http://140dev.com
>> >> @140dev
>> >>
>> >> --
>> >> Twitter developer documentation and resources:
>> >> http://dev.twitter.com/doc
>> >> API updates via Twitter: http://twitter.com/twitterapi
>> >> Issues/Enhancements Tracker:
>> >> http://code.google

Re: [twitter-dev] Trying to get rid of twitter spammers

2010-11-27 Thread Furkan Kuru
Most of the tweets here are spams:

http://twitturk.com/tweet/search?q=lol



On Sat, Nov 27, 2010 at 3:33 PM, Adam Green <140...@gmail.com> wrote:

> All of your sample spam tweets are from suspended accounts, yet the
> tweets were only sent yesterday. That means that the spammers behavior
> was so aggressive that they were suspended quickly by a Twitter
> algorithm. I doubt that a human at Twitter read your email and went
> through each tweet suspending the accounts. Have you checked to see
> how quickly these spam accounts get canceled for other spam tweets?
> You could hold back tweets from unknown users for 24 hours, and then
> check all new users through the API to see if they are suspended. If
> they aren't suspended, you can whitelist them in your system.
>
> What is really weird is that I also checked the URLs in these tweets
> and they resolve to an empty page. They return a header with an HTTP
> code of 200, and no content at all. That can't be an accident. Either
> they are sending empty responses to everyone, or they could tell from
> my IP that they didn't want to send anything to me. Why would a
> spammer do that? They only benefit if someone clicks on their links
> and buys something, or gets infected somehow. Could you be the subject
> of some kind of attack? You use the word "community." Would anyone
> want to disrupt your community? Is this a community that is in one
> geographic area that can be detected by IP? Very interesting...
>
> Anyway, you can use URL resolution to test new users. When you get a
> tweet from a new user with a URL, check the URL, and blacklist them if
> it resolves to an empty page. If you only have to do this for new
> users, it won't be too processor intensive.
>
>
> On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru  wrote:
> > The text in these spam tweets are not easy to recognize.
> > They do not repeat. They are mixed of different words and they contain a
> > link.
> > They seem to be sent via web.
> >
> > The ranking and discarding some mentions will not completely resolve the
> > problem.
> > Because our mention data and trending words data both were affected. We
> > donot want to eliminate tweets from innocent people who have few
> followers.
> >
> > The simplest way seems to be just ignoring the tweets coming from outside
> of
> > the community.
> > But those tweets were helping us to extend our network.
> >
> >
> >
> > On Fri, Nov 26, 2010 at 6:42 PM, Adam Green <140...@gmail.com> wrote:
> >>
> >> As long as you aren't trying to capture and deliver *all* tweets,
> >> there are a couple of good ways to cut out spammers. One thing I do is
> >> save all mentions for all users in a database of tweets. When a tweet
> >> comes in from the streaming API, I collect @mentions, and store them
> >> with the screen name of the tweet's author and the screen name
> >> mentioned. Then I can rank users based on the number of different
> >> accounts that mention them. If you only use the tweets from the top N%
> >> of users, the quality improves a lot. I find that the top 80% is
> >> usually enough of a screen to get good quality.
> >>
> >> Another trick is blocking duplicates from each user. The API only
> >> blocks duplicates that repeat immediately, but if a spammer has a list
> >> of tweets, and cycles through them, all the tweets get through. I
> >> compare all new tweets with the other tweets from that user. This is
> >> very expensive if you have a big database. This can be made less
> >> intensive by limiting the comparison to just the tweets from that user
> >> in the last few days. You can also run this with a separate process
> >> that doesn't slow down you main tweet parsing loop. Most spammers are
> >> so simplistic that they just repeat the same tweet over and over. In a
> >> real spammy set of keywords, if I find more than a few duplicates from
> >> a user, I just stop saving their tweets.
> >>
> >>
> >> On Fri, Nov 26, 2010 at 11:26 AM, Furkan Kuru 
> >> wrote:
> >> >
> >> > Word "lol" is the most common in these spam tweets. We receive 400
> spam
> >> > tweets per hour now tracking 100K people.
> >> >
> >> > We plan to delete all of the tweets containing "lol" word. It is also
> >> > used
> >> > by our users (Turkish people) writing in English though.
> >> >
> >> > Any better suggestions?
> >> >
> >>
> >> --
> >> Adam Green
> >> Twitter API Consultant and Trainer
> >> http://140dev.com
> >> @140dev
> >>
> >> --
> >> Twitter developer documentation and resources:
> http://dev.twitter.com/doc
> >> API updates via Twitter: http://twitter.com/twitterapi
> >> Issues/Enhancements Tracker:
> >> http://code.google.com/p/twitter-api/issues/list
> >> Change your membership to this group:
> >> http://groups.google.com/group/twitter-development-talk
> >
> >
> >
> > --
> > Furkan Kuru
> >
> > --
> > Twitter developer documentation and resources:
> http://dev.twitter.com/doc
> > API updates via Twitter: http://twitter.com/twitterapi
> > Issues/Enhancements Tracker:
> > ht

Re: [twitter-dev] Trying to get rid of twitter spammers

2010-11-27 Thread Edward Hotchkiss
empty url? resolve if the user clicks i'm sure there is backend code running, 
the only purpose of even returning a 200

On Nov 27, 2010, at 8:33 AM, Adam Green wrote:

> All of your sample spam tweets are from suspended accounts, yet the
> tweets were only sent yesterday. That means that the spammers behavior
> was so aggressive that they were suspended quickly by a Twitter
> algorithm. I doubt that a human at Twitter read your email and went
> through each tweet suspending the accounts. Have you checked to see
> how quickly these spam accounts get canceled for other spam tweets?
> You could hold back tweets from unknown users for 24 hours, and then
> check all new users through the API to see if they are suspended. If
> they aren't suspended, you can whitelist them in your system.
> 
> What is really weird is that I also checked the URLs in these tweets
> and they resolve to an empty page. They return a header with an HTTP
> code of 200, and no content at all. That can't be an accident. Either
> they are sending empty responses to everyone, or they could tell from
> my IP that they didn't want to send anything to me. Why would a
> spammer do that? They only benefit if someone clicks on their links
> and buys something, or gets infected somehow. Could you be the subject
> of some kind of attack? You use the word "community." Would anyone
> want to disrupt your community? Is this a community that is in one
> geographic area that can be detected by IP? Very interesting...
> 
> Anyway, you can use URL resolution to test new users. When you get a
> tweet from a new user with a URL, check the URL, and blacklist them if
> it resolves to an empty page. If you only have to do this for new
> users, it won't be too processor intensive.
> 
> 
> On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru  wrote:
>> The text in these spam tweets are not easy to recognize.
>> They do not repeat. They are mixed of different words and they contain a
>> link.
>> They seem to be sent via web.
>> 
>> The ranking and discarding some mentions will not completely resolve the
>> problem.
>> Because our mention data and trending words data both were affected. We
>> donot want to eliminate tweets from innocent people who have few followers.
>> 
>> The simplest way seems to be just ignoring the tweets coming from outside of
>> the community.
>> But those tweets were helping us to extend our network.
>> 
>> 
>> 
>> On Fri, Nov 26, 2010 at 6:42 PM, Adam Green <140...@gmail.com> wrote:
>>> 
>>> As long as you aren't trying to capture and deliver *all* tweets,
>>> there are a couple of good ways to cut out spammers. One thing I do is
>>> save all mentions for all users in a database of tweets. When a tweet
>>> comes in from the streaming API, I collect @mentions, and store them
>>> with the screen name of the tweet's author and the screen name
>>> mentioned. Then I can rank users based on the number of different
>>> accounts that mention them. If you only use the tweets from the top N%
>>> of users, the quality improves a lot. I find that the top 80% is
>>> usually enough of a screen to get good quality.
>>> 
>>> Another trick is blocking duplicates from each user. The API only
>>> blocks duplicates that repeat immediately, but if a spammer has a list
>>> of tweets, and cycles through them, all the tweets get through. I
>>> compare all new tweets with the other tweets from that user. This is
>>> very expensive if you have a big database. This can be made less
>>> intensive by limiting the comparison to just the tweets from that user
>>> in the last few days. You can also run this with a separate process
>>> that doesn't slow down you main tweet parsing loop. Most spammers are
>>> so simplistic that they just repeat the same tweet over and over. In a
>>> real spammy set of keywords, if I find more than a few duplicates from
>>> a user, I just stop saving their tweets.
>>> 
>>> 
>>> On Fri, Nov 26, 2010 at 11:26 AM, Furkan Kuru 
>>> wrote:
 
 Word "lol" is the most common in these spam tweets. We receive 400 spam
 tweets per hour now tracking 100K people.
 
 We plan to delete all of the tweets containing "lol" word. It is also
 used
 by our users (Turkish people) writing in English though.
 
 Any better suggestions?
 
>>> 
>>> --
>>> Adam Green
>>> Twitter API Consultant and Trainer
>>> http://140dev.com
>>> @140dev
>>> 
>>> --
>>> Twitter developer documentation and resources: http://dev.twitter.com/doc
>>> API updates via Twitter: http://twitter.com/twitterapi
>>> Issues/Enhancements Tracker:
>>> http://code.google.com/p/twitter-api/issues/list
>>> Change your membership to this group:
>>> http://groups.google.com/group/twitter-development-talk
>> 
>> 
>> 
>> --
>> Furkan Kuru
>> 
>> --
>> Twitter developer documentation and resources: http://dev.twitter.com/doc
>> API updates via Twitter: http://twitter.com/twitterapi
>> Issues/Enhancements Tracker:
>> http://code.google.com/p/twitter-api/issues/list
>> Chang

Re: [twitter-dev] Trying to get rid of twitter spammers

2010-11-27 Thread Adam Green
All of your sample spam tweets are from suspended accounts, yet the
tweets were only sent yesterday. That means that the spammers behavior
was so aggressive that they were suspended quickly by a Twitter
algorithm. I doubt that a human at Twitter read your email and went
through each tweet suspending the accounts. Have you checked to see
how quickly these spam accounts get canceled for other spam tweets?
You could hold back tweets from unknown users for 24 hours, and then
check all new users through the API to see if they are suspended. If
they aren't suspended, you can whitelist them in your system.

What is really weird is that I also checked the URLs in these tweets
and they resolve to an empty page. They return a header with an HTTP
code of 200, and no content at all. That can't be an accident. Either
they are sending empty responses to everyone, or they could tell from
my IP that they didn't want to send anything to me. Why would a
spammer do that? They only benefit if someone clicks on their links
and buys something, or gets infected somehow. Could you be the subject
of some kind of attack? You use the word "community." Would anyone
want to disrupt your community? Is this a community that is in one
geographic area that can be detected by IP? Very interesting...

Anyway, you can use URL resolution to test new users. When you get a
tweet from a new user with a URL, check the URL, and blacklist them if
it resolves to an empty page. If you only have to do this for new
users, it won't be too processor intensive.


On Sat, Nov 27, 2010 at 5:20 AM, Furkan Kuru  wrote:
> The text in these spam tweets are not easy to recognize.
> They do not repeat. They are mixed of different words and they contain a
> link.
> They seem to be sent via web.
>
> The ranking and discarding some mentions will not completely resolve the
> problem.
> Because our mention data and trending words data both were affected. We
> donot want to eliminate tweets from innocent people who have few followers.
>
> The simplest way seems to be just ignoring the tweets coming from outside of
> the community.
> But those tweets were helping us to extend our network.
>
>
>
> On Fri, Nov 26, 2010 at 6:42 PM, Adam Green <140...@gmail.com> wrote:
>>
>> As long as you aren't trying to capture and deliver *all* tweets,
>> there are a couple of good ways to cut out spammers. One thing I do is
>> save all mentions for all users in a database of tweets. When a tweet
>> comes in from the streaming API, I collect @mentions, and store them
>> with the screen name of the tweet's author and the screen name
>> mentioned. Then I can rank users based on the number of different
>> accounts that mention them. If you only use the tweets from the top N%
>> of users, the quality improves a lot. I find that the top 80% is
>> usually enough of a screen to get good quality.
>>
>> Another trick is blocking duplicates from each user. The API only
>> blocks duplicates that repeat immediately, but if a spammer has a list
>> of tweets, and cycles through them, all the tweets get through. I
>> compare all new tweets with the other tweets from that user. This is
>> very expensive if you have a big database. This can be made less
>> intensive by limiting the comparison to just the tweets from that user
>> in the last few days. You can also run this with a separate process
>> that doesn't slow down you main tweet parsing loop. Most spammers are
>> so simplistic that they just repeat the same tweet over and over. In a
>> real spammy set of keywords, if I find more than a few duplicates from
>> a user, I just stop saving their tweets.
>>
>>
>> On Fri, Nov 26, 2010 at 11:26 AM, Furkan Kuru 
>> wrote:
>> >
>> > Word "lol" is the most common in these spam tweets. We receive 400 spam
>> > tweets per hour now tracking 100K people.
>> >
>> > We plan to delete all of the tweets containing "lol" word. It is also
>> > used
>> > by our users (Turkish people) writing in English though.
>> >
>> > Any better suggestions?
>> >
>>
>> --
>> Adam Green
>> Twitter API Consultant and Trainer
>> http://140dev.com
>> @140dev
>>
>> --
>> Twitter developer documentation and resources: http://dev.twitter.com/doc
>> API updates via Twitter: http://twitter.com/twitterapi
>> Issues/Enhancements Tracker:
>> http://code.google.com/p/twitter-api/issues/list
>> Change your membership to this group:
>> http://groups.google.com/group/twitter-development-talk
>
>
>
> --
> Furkan Kuru
>
> --
> Twitter developer documentation and resources: http://dev.twitter.com/doc
> API updates via Twitter: http://twitter.com/twitterapi
> Issues/Enhancements Tracker:
> http://code.google.com/p/twitter-api/issues/list
> Change your membership to this group:
> http://groups.google.com/group/twitter-development-talk
>



-- 
Adam Green
Twitter API Consultant and Trainer
http://140dev.com
@140dev

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/

Re: [twitter-dev] Trying to get rid of twitter spammers

2010-11-27 Thread Furkan Kuru
The text in these spam tweets are not easy to recognize.
They do not repeat. They are mixed of different words and they contain a
link.
They seem to be sent via web.

The ranking and discarding some mentions will not completely resolve the
problem.
Because our mention data and trending words data both were affected. We
donot want to eliminate tweets from innocent people who have few followers.

The simplest way seems to be just ignoring the tweets coming from outside of
the community.
But those tweets were helping us to extend our network.



On Fri, Nov 26, 2010 at 6:42 PM, Adam Green <140...@gmail.com> wrote:

> As long as you aren't trying to capture and deliver *all* tweets,
> there are a couple of good ways to cut out spammers. One thing I do is
> save all mentions for all users in a database of tweets. When a tweet
> comes in from the streaming API, I collect @mentions, and store them
> with the screen name of the tweet's author and the screen name
> mentioned. Then I can rank users based on the number of different
> accounts that mention them. If you only use the tweets from the top N%
> of users, the quality improves a lot. I find that the top 80% is
> usually enough of a screen to get good quality.
>
> Another trick is blocking duplicates from each user. The API only
> blocks duplicates that repeat immediately, but if a spammer has a list
> of tweets, and cycles through them, all the tweets get through. I
> compare all new tweets with the other tweets from that user. This is
> very expensive if you have a big database. This can be made less
> intensive by limiting the comparison to just the tweets from that user
> in the last few days. You can also run this with a separate process
> that doesn't slow down you main tweet parsing loop. Most spammers are
> so simplistic that they just repeat the same tweet over and over. In a
> real spammy set of keywords, if I find more than a few duplicates from
> a user, I just stop saving their tweets.
>
>
> On Fri, Nov 26, 2010 at 11:26 AM, Furkan Kuru 
> wrote:
> >
> > Word "lol" is the most common in these spam tweets. We receive 400 spam
> > tweets per hour now tracking 100K people.
> >
> > We plan to delete all of the tweets containing "lol" word. It is also
> used
> > by our users (Turkish people) writing in English though.
> >
> > Any better suggestions?
> >
>
> --
> Adam Green
> Twitter API Consultant and Trainer
> http://140dev.com
> @140dev
>
> --
> Twitter developer documentation and resources: http://dev.twitter.com/doc
> API updates via Twitter: http://twitter.com/twitterapi
> Issues/Enhancements Tracker:
> http://code.google.com/p/twitter-api/issues/list
> Change your membership to this group:
> http://groups.google.com/group/twitter-development-talk
>



-- 
Furkan Kuru

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk


Re: [twitter-dev] Trying to get rid of twitter spammers

2010-11-26 Thread M. Edward (Ed) Borasky
Hmmm ... Twitter has a "user quality filter" that's supposed to weed  
out spammers from Search and Streaming. At about 450,000 new user IDs  
created every day, it might take a while for Twitter's spambot  
detectors to flag them all, but I'd think between algorithms and  
crowdsourced block / report, eventually they'd get taken out.


--
M. Edward (Ed) Borasky
http://borasky-research.net http://twitter.com/znmeb

"A mathematician is a device for turning coffee into theorems." - Paul Erdos


Quoting Adam Green <140...@gmail.com>:


As long as you aren't trying to capture and deliver *all* tweets,
there are a couple of good ways to cut out spammers. One thing I do is
save all mentions for all users in a database of tweets. When a tweet
comes in from the streaming API, I collect @mentions, and store them
with the screen name of the tweet's author and the screen name
mentioned. Then I can rank users based on the number of different
accounts that mention them. If you only use the tweets from the top N%
of users, the quality improves a lot. I find that the top 80% is
usually enough of a screen to get good quality.

Another trick is blocking duplicates from each user. The API only
blocks duplicates that repeat immediately, but if a spammer has a list
of tweets, and cycles through them, all the tweets get through. I
compare all new tweets with the other tweets from that user. This is
very expensive if you have a big database. This can be made less
intensive by limiting the comparison to just the tweets from that user
in the last few days. You can also run this with a separate process
that doesn't slow down you main tweet parsing loop. Most spammers are
so simplistic that they just repeat the same tweet over and over. In a
real spammy set of keywords, if I find more than a few duplicates from
a user, I just stop saving their tweets.


On Fri, Nov 26, 2010 at 11:26 AM, Furkan Kuru  wrote:


Word "lol" is the most common in these spam tweets. We receive 400 spam
tweets per hour now tracking 100K people.

We plan to delete all of the tweets containing "lol" word. It is also used
by our users (Turkish people) writing in English though.

Any better suggestions?



--
Adam Green
Twitter API Consultant and Trainer
http://140dev.com
@140dev

--
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group:  
http://groups.google.com/group/twitter-development-talk






--
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk


Re: [twitter-dev] Trying to get rid of twitter spammers

2010-11-26 Thread Adam Green
As long as you aren't trying to capture and deliver *all* tweets,
there are a couple of good ways to cut out spammers. One thing I do is
save all mentions for all users in a database of tweets. When a tweet
comes in from the streaming API, I collect @mentions, and store them
with the screen name of the tweet's author and the screen name
mentioned. Then I can rank users based on the number of different
accounts that mention them. If you only use the tweets from the top N%
of users, the quality improves a lot. I find that the top 80% is
usually enough of a screen to get good quality.

Another trick is blocking duplicates from each user. The API only
blocks duplicates that repeat immediately, but if a spammer has a list
of tweets, and cycles through them, all the tweets get through. I
compare all new tweets with the other tweets from that user. This is
very expensive if you have a big database. This can be made less
intensive by limiting the comparison to just the tweets from that user
in the last few days. You can also run this with a separate process
that doesn't slow down you main tweet parsing loop. Most spammers are
so simplistic that they just repeat the same tweet over and over. In a
real spammy set of keywords, if I find more than a few duplicates from
a user, I just stop saving their tweets.


On Fri, Nov 26, 2010 at 11:26 AM, Furkan Kuru  wrote:
>
> Word "lol" is the most common in these spam tweets. We receive 400 spam
> tweets per hour now tracking 100K people.
>
> We plan to delete all of the tweets containing "lol" word. It is also used
> by our users (Turkish people) writing in English though.
>
> Any better suggestions?
>

-- 
Adam Green
Twitter API Consultant and Trainer
http://140dev.com
@140dev

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk


RE: [twitter-dev] Trying to get rid of twitter spammers

2010-11-26 Thread Dean Collins
Hmmm I don't think that would work - it type lol in my @DeanCollins
personal posts a lot :-)

 

 

Cheers,

Dean

 

 



From: twitter-development-talk@googlegroups.com
[mailto:twitter-development-t...@googlegroups.com] On Behalf Of Furkan
Kuru
Sent: Friday, 26 November 2010 11:27 AM
To: twitter-development-talk@googlegroups.com
Subject: Re: [twitter-dev] Trying to get rid of twitter spammers

 


Word "lol" is the most common in these spam tweets. We receive 400 spam
tweets per hour now tracking 100K people.

We plan to delete all of the tweets containing "lol" word. It is also
used by our users (Turkish people) writing in English though.

Any better suggestions?



On Fri, Nov 26, 2010 at 5:15 PM, Dean Collins 
wrote:

What I don't understand is that apart from possibly generating clicks
why are people doing this? Are enough clicks converting into some kind
of ROI interaction that makes them money?

 

I keep expecting SPAM to take some kind of evolutionary leap (customized
to your location/interests/cookies etc) but it seems to be the same old
click requests.

 

 

 

Cheers,

Dean

 

 



From: twitter-development-talk@googlegroups.com
[mailto:twitter-development-t...@googlegroups.com] On Behalf Of Furkan
Kuru
Sent: Friday, 26 November 2010 6:02 AM
To: twitter-development-talk@googlegroups.com
Subject: [twitter-dev] Trying to get rid of twitter spammers

 

Hello, 

I think there is a spamming action that uses too many twitter accounts
and tweet by mentioning usernames and send as a reply.

We receive thousands of similar spam tweets that are written as a reply
to our followed users through streaming api. 
It spoils our data. 

The tweets seem to be sent from web not via a twitter app.

Here are a few examples.

@kaanalay <http://twitter.com/kaanalay>  JobsCDFSales forevertravis RT
ITS_NEL Discover lies from RonnieMo "I'll come visit you" ..lol
http://bit.isff.com/3PoCt 

26/11/10 12:49:01
<http://twitter.com/P_Lobrayy/status/8109946705027073>  

 

@serkan_cakmak <http://twitter.com/serkan_cakmak>  FREE!! before i have
be mean/rude lol RT dreaontv: odotjdot *slides the Wrap it Up button ur
way* http://fplk.c2.my/Yl4qz 

26/11/10 12:49:01
<http://twitter.com/ivtaathjathra/status/8109939918639105>  

 

@aralgamze <http://twitter.com/aralgamze>  thiagomaciell mey2734 RT
KokaMoe88: i wanna have sex .. right now at this moment || let's go lol
http://wbx.c4.ee/v5QtU 

26/11/10 12:49:01
<http://twitter.com/qoorgeees/status/8109930166878208>  

 

@kkocaerkek <http://twitter.com/kkocaerkek>  huh lol RT XxLovinJessixX:
HELLL NOOO!!! I THATS POISON! RT :YUCKK -__- how about chipotle:)
evebayby http://wmfi.l.to/VPkw5 

26/11/10 12:49:01
<http://twitter.com/fuaneledes/status/8109920641617920>  

 

@salihturan <http://twitter.com/salihturan>  Niekstra 333TtJJ Fleegz RT
PoetryNMoshun: SimplyMilele lol even the conscious got to love f*cking..
http://xllo.6p.ro/JPfIL 

<http://twitter.com/rahaelrilt/status/8109887489839104> 

26/11/10 12:49:01
<http://twitter.com/rahaelrilt/status/8109887489839104> 

<http://twitturk.com/tweet/search?q=lol> 

@nlyshn <http://twitter.com/nlyshn>  carynfust5 Bieberbananzaaa LOL!! RT
firstlady47: FAMU= Nene's old nose, bcc= Nene's new clothespin nose
http://tlny.1k.ru/IbUpy 

<http://twitter.com/brafh/status/8109862101716992>
<http://twitturk.com/tweet/search?q=lol> 

26/11/10 12:49:01 <http://twitter.com/brafh/status/8109862101716992>


@zehra_ozcan <http://twitter.com/zehra_ozcan>  D88Miller GibsGaldino RT
I_DOLLA: Kim lol RT BigHomie_: Nicki Minaj or Lil Kim in a fight
WhoYaGot http://oyu.iz.rs/fGwaG 

<http://twitter.com/YrnbAdi_Dhaama/status/8109813330345984>
<http://twitturk.com/tweet/search?q=lol> 

26/11/10 12:49:01
<http://twitter.com/YrnbAdi_Dhaama/status/8109813330345984> 


@I5IL <http://twitter.com/I5IL>  sexspeaking a shit. So... If ya can't
beat 'em, join 'em. RT The100KShow: LadyBlogga lol you endorsing that!
http://nofj.hn.cx/r1jvr 

<http://twitter.com/dqbajBSB/status/8109804488753152>
<http://twitturk.com/tweet/search?q=lol> 

26/11/10 12:49:01 <http://twitter.com/dqbajBSB/status/8109804488753152> 

 

@Melek_Ulker <http://twitter.com/Melek_Ulker>  nciku honeku Pompam1016
RT KnockOWTdiva: Rhianna sounds like a lamb$$ lol on what song?
http://gux.ah.sg/xlzaw 

26/11/10 12:49:01
<http://twitter.com/ManiSvitheick/status/8109799736614912> 



-- 
Furkan Kuru

-- 
Twitter developer documentation and resources:
http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker:
http://code.google.com/p/twitter-api/issues/list
Change your membership to this group:
http://groups.google.com/group/twitter-development-talk

--

Re: [twitter-dev] Trying to get rid of twitter spammers

2010-11-26 Thread Furkan Kuru
Word "lol" is the most common in these spam tweets. We receive 400 spam
tweets per hour now tracking 100K people.

We plan to delete all of the tweets containing "lol" word. It is also used
by our users (Turkish people) writing in English though.

Any better suggestions?


On Fri, Nov 26, 2010 at 5:15 PM, Dean Collins  wrote:

>  What I don’t understand is that apart from possibly generating clicks why
> are people doing this? Are enough clicks converting into some kind of ROI
> interaction that makes them money?
>
>
>
> I keep expecting SPAM to take some kind of evolutionary leap (customized to
> your location/interests/cookies etc) but it seems to be the same old click
> requests.
>
>
>
>
>
>
>
> Cheers,
>
> Dean
>
>
>
>
>   --
>
> *From:* twitter-development-talk@googlegroups.com [mailto:
> twitter-development-t...@googlegroups.com] *On Behalf Of *Furkan Kuru
> *Sent:* Friday, 26 November 2010 6:02 AM
> *To:* twitter-development-talk@googlegroups.com
> *Subject:* [twitter-dev] Trying to get rid of twitter spammers
>
>
>
> Hello,
>
> I think there is a spamming action that uses too many twitter accounts and
> tweet by mentioning usernames and send as a reply.
>
> We receive thousands of similar spam tweets that are written as a reply to
> our followed users through streaming api.
> It spoils our data.
>
> The tweets seem to be sent from web not via a twitter app.
>
> Here are a few examples.
>
>   @kaanalay  JobsCDFSales forevertravis RT
> ITS_NEL Discover lies from RonnieMo "I'll come visit you" ..lol
> http://bit.isff.com/3PoCt
>
> 26/11/10 12:49:01  
>
>
>
> @serkan_cakmak  FREE!! before i have be
> mean/rude lol RT dreaontv: odotjdot *slides the Wrap it Up button ur way*
> http://fplk.c2.my/Yl4qz
>
> 26/11/10 12:49:01 
>
>
>
> @aralgamze  thiagomaciell mey2734 RT
> KokaMoe88: i wanna have sex .. right now at this moment || let's go lol
> http://wbx.c4.ee/v5QtU
>
> 26/11/10 12:49:01  
>
>
>
> @kkocaerkek  huh lol RT XxLovinJessixX:
> HELLL NOOO!!! I THATS POISON! RT :YUCKK -__- how about chipotle:) evebayby
> http://wmfi.l.to/VPkw5
>
> 26/11/10 12:49:01  
>
>
>
> @salihturan  Niekstra 333TtJJ Fleegz RT
> PoetryNMoshun: SimplyMilele lol even the conscious got to love f*cking..
> http://xllo.6p.ro/JPfIL
>
> 
>
> 26/11/10 12:49:01 
>
>  
>
> @nlyshn  carynfust5 Bieberbananzaaa LOL!! RT
> firstlady47: FAMU= Nene's old nose, bcc= Nene's new clothespin nose
> http://tlny.1k.ru/IbUpy
>
> 
>
> 26/11/10 12:49:01 
>
> @zehra_ozcan  D88Miller GibsGaldino RT
> I_DOLLA: Kim lol RT BigHomie_: Nicki Minaj or Lil Kim in a fight
> WhoYaGot http://oyu.iz.rs/fGwaG
>
> 
>
> 26/11/10 12:49:01
> 
>
>
> @I5IL  sexspeaking a shit. So... If ya can't beat
> 'em, join 'em. RT The100KShow: LadyBlogga lol you endorsing that!
> http://nofj.hn.cx/r1jvr
>
> 
>
> 26/11/10 12:49:01 
>
>
>
> @Melek_Ulker  nciku honeku Pompam1016 RT
> KnockOWTdiva: Rhianna sounds like a lamb$$ lol on what song?
> http://gux.ah.sg/xlzaw
>
> 26/11/10 12:49:01
> 
>
>
>
> --
> Furkan Kuru
>
> --
> Twitter developer documentation and resources: http://dev.twitter.com/doc
> API updates via Twitter: http://twitter.com/twitterapi
> Issues/Enhancements Tracker:
> http://code.google.com/p/twitter-api/issues/list
> Change your membership to this group:
> http://groups.google.com/group/twitter-development-talk
>
> --
> Twitter developer documentation and resources: http://dev.twitter.com/doc
> API updates via Twitter: http://twitter.com/twitterapi
> Issues/Enhancements Tracker:
> http://code.google.com/p/twitter-api/issues/list
> Change your membership to this group:
> http://groups.google.com/group/twitter-development-talk
>



-- 
Furkan Kuru

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker:

RE: [twitter-dev] Trying to get rid of twitter spammers

2010-11-26 Thread Dean Collins
What I don't understand is that apart from possibly generating clicks
why are people doing this? Are enough clicks converting into some kind
of ROI interaction that makes them money?

 

I keep expecting SPAM to take some kind of evolutionary leap (customized
to your location/interests/cookies etc) but it seems to be the same old
click requests.

 

 

 

Cheers,

Dean

 

 



From: twitter-development-talk@googlegroups.com
[mailto:twitter-development-t...@googlegroups.com] On Behalf Of Furkan
Kuru
Sent: Friday, 26 November 2010 6:02 AM
To: twitter-development-talk@googlegroups.com
Subject: [twitter-dev] Trying to get rid of twitter spammers

 

Hello, 

I think there is a spamming action that uses too many twitter accounts
and tweet by mentioning usernames and send as a reply.

We receive thousands of similar spam tweets that are written as a reply
to our followed users through streaming api. 
It spoils our data. 

The tweets seem to be sent from web not via a twitter app.

Here are a few examples.



@kaanalay   JobsCDFSales forevertravis RT
ITS_NEL Discover lies from RonnieMo "I'll come visit you" ..lol
http://bit.isff.com/3PoCt 

26/11/10 12:49:01
  

 

@serkan_cakmak   FREE!! before i have
be mean/rude lol RT dreaontv: odotjdot *slides the Wrap it Up button ur
way* http://fplk.c2.my/Yl4qz 

26/11/10 12:49:01
  

 

@aralgamze   thiagomaciell mey2734 RT
KokaMoe88: i wanna have sex .. right now at this moment || let's go lol
http://wbx.c4.ee/v5QtU 

26/11/10 12:49:01
  

 

@kkocaerkek   huh lol RT XxLovinJessixX:
HELLL NOOO!!! I THATS POISON! RT :YUCKK -__- how about chipotle:)
evebayby http://wmfi.l.to/VPkw5 

26/11/10 12:49:01
  

 

@salihturan   Niekstra 333TtJJ Fleegz RT
PoetryNMoshun: SimplyMilele lol even the conscious got to love f*cking..
http://xllo.6p.ro/JPfIL 

 

26/11/10 12:49:01
 

 

@nlyshn   carynfust5 Bieberbananzaaa LOL!! RT
firstlady47: FAMU= Nene's old nose, bcc= Nene's new clothespin nose
http://tlny.1k.ru/IbUpy 


 

26/11/10 12:49:01 


@zehra_ozcan   D88Miller GibsGaldino RT
I_DOLLA: Kim lol RT BigHomie_: Nicki Minaj or Lil Kim in a fight
WhoYaGot http://oyu.iz.rs/fGwaG 


 

26/11/10 12:49:01
 


@I5IL   sexspeaking a shit. So... If ya can't
beat 'em, join 'em. RT The100KShow: LadyBlogga lol you endorsing that!
http://nofj.hn.cx/r1jvr 


 

26/11/10 12:49:01  

 

@Melek_Ulker   nciku honeku Pompam1016
RT KnockOWTdiva: Rhianna sounds like a lamb$$ lol on what song?
http://gux.ah.sg/xlzaw 

26/11/10 12:49:01
 



-- 
Furkan Kuru

-- 
Twitter developer documentation and resources:
http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker:
http://code.google.com/p/twitter-api/issues/list
Change your membership to this group:
http://groups.google.com/group/twitter-development-talk

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk