Re: [Wikitech-l] suggestion: replace CAPTCHA with better approaches

2012-07-25 Thread Daniel Friesen
On Wed, 25 Jul 2012 17:55:06 -0700, John Vandenberg   
wrote:



On Thu, Jul 26, 2012 at 5:00 AM, Derric Atzrott
 wrote:
This way if people feel motivated at cheating at captcha they will end  
up

helping Wikipedia It is up to us to try to balance things out.


I'm pretty sure users will be less annoyed at solving captchas that

actually contribute some value.

Obligatory XKCD: https://xkcd.com/810/


;-)

The best CAPTCHAs are the kind that do this.  Look at how hard it is to  
beat
reCAPTCHA because they have taken this approach.  One must be careful  
though
that the CAPTCHA is constructed such that it won't be as simple as a  
lookup
though, and will actually require some thought (so that probably  
eliminates

the noun, verb, adjective idea).

This idea has my support.


We should use less CAPTCHAs.

If the problem is spam, we should build better "new URL" review
systems.  There are externally managed spam lists that we could use to
identify spammers.

'new URL' s could be defined as domain names that were not in the
external links table for more than 24 hrs.

Addition of these new URLs could be smartly throttled.

un-autoconfirmed edits which include 'new URLs' could be throttled so
that they can only be added to a single article for the first 24
hours.  That allows a new user to make use of a new domain name
unimpeded, however they can only use it on one page for the first 24
hrs.  If the new URL was spam, it will hopefully be removed within 24
hrs, which resets the clock for the spammer.  i.e. they can only add
the spam to one page each 24 hrs.

Another idea is for the wiki to ask the user that adds new URLs to
review three recent edits that included new URLs and ask the user to
indicate whether or not the new URL was SPAM and should be removed.
This may be unworkable because the spam-bot could use the linksearch
tool to check whether a link is good or not.

--
John Vandenberg


Your proposal fails to account for two important facts:
- A lot of spam may not even add links to the page.
- Don't underestimate bot programming. I've seen bots in the wild that  
wait for autoconfirmed status and then spam. If there is some pattern that  
can be followed to get access to spam the wiki, bots will be programmed to  
use that pattern to bypass spam limits.


--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] suggestion: replace CAPTCHA with better approaches

2012-07-25 Thread John Vandenberg
On Thu, Jul 26, 2012 at 5:00 AM, Derric Atzrott
 wrote:
>>This way if people feel motivated at cheating at captcha they will end up
> helping Wikipedia It is up to us to try to balance things out.
>>
>>I'm pretty sure users will be less annoyed at solving captchas that
> actually contribute some value.
>
> Obligatory XKCD: https://xkcd.com/810/

;-)

> The best CAPTCHAs are the kind that do this.  Look at how hard it is to beat
> reCAPTCHA because they have taken this approach.  One must be careful though
> that the CAPTCHA is constructed such that it won't be as simple as a lookup
> though, and will actually require some thought (so that probably eliminates
> the noun, verb, adjective idea).
>
> This idea has my support.

We should use less CAPTCHAs.

If the problem is spam, we should build better "new URL" review
systems.  There are externally managed spam lists that we could use to
identify spammers.

'new URL' s could be defined as domain names that were not in the
external links table for more than 24 hrs.

Addition of these new URLs could be smartly throttled.

un-autoconfirmed edits which include 'new URLs' could be throttled so
that they can only be added to a single article for the first 24
hours.  That allows a new user to make use of a new domain name
unimpeded, however they can only use it on one page for the first 24
hrs.  If the new URL was spam, it will hopefully be removed within 24
hrs, which resets the clock for the spammer.  i.e. they can only add
the spam to one page each 24 hrs.

Another idea is for the wiki to ask the user that adds new URLs to
review three recent edits that included new URLs and ask the user to
indicate whether or not the new URL was SPAM and should be removed.
This may be unworkable because the spam-bot could use the linksearch
tool to check whether a link is good or not.

--
John Vandenberg

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] suggestion: replace CAPTCHA with better approaches

2012-07-25 Thread Derric Atzrott
>This way if people feel motivated at cheating at captcha they will end up
helping Wikipedia It is up to us to try to balance things out.
>
>I'm pretty sure users will be less annoyed at solving captchas that
actually contribute some value.

Obligatory XKCD: https://xkcd.com/810/

The best CAPTCHAs are the kind that do this.  Look at how hard it is to beat
reCAPTCHA because they have taken this approach.  One must be careful though
that the CAPTCHA is constructed such that it won't be as simple as a lookup
though, and will actually require some thought (so that probably eliminates
the noun, verb, adjective idea).

This idea has my support.

Thank you,
Derric Atzrott


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] suggestion: replace CAPTCHA with better approaches

2012-07-25 Thread Oren Bochman
Hi

The wikipedia's captcha is a great opportunity for getting '''useful'' work
done by humans.
This is now called a [[game with a purpose]]. 

I think we can ideally use it to help:
* ocr wikisource text like recaptcha does
* translate articles fragments using geo-location of editors.
  Translate [xyz-known] [...]
  Translate [xyz-new] [...] 
check using blau metric etc.
* get more opinions on spam edits.
  Is this diff [spam] [good faith edit] [ok]
* collect linguistics information on different languages edition.
Is XYZ a [verb] / [noun] / [adjective] ... [other]
*disambiguate 
  Is [xyz-known] [xyz] ... [xyz] ... [xyz] ...
  Is [yzx-unknown] [yzx1] ... [yzx1] ... [yzx1] ...
Etc

This way if people feel motivated at cheating at captcha they will end up
helping Wikipedia
It is up to us to try to balance things out.

I'm pretty sure users will be less annoyed at solving captchas that actually
contribute some value.



-Original Message-
From: wikitech-l-boun...@lists.wikimedia.org
[mailto:wikitech-l-boun...@lists.wikimedia.org] On Behalf Of matanya
Sent: Tuesday, July 24, 2012 4:12 PM
To: wikitech-l@lists.wikimedia.org
Subject: [Wikitech-l] suggestion: replace CAPTCHA with better approaches

As for the last few month the spam rate stewards deal with is raising.
I suggest we implement a new mechanism:

Instead of giving the user a CAPTCHA to solve, give him a image from commons
and ask him to add a brief description in his own language.

We can give him two images, one with known description, and the other with
unknown, after enough users translate the unknown in the same why, we can
use it as a verified translation. We base on the known image description to
allow the user to create the account.

Is it possible to embed a file from commons in the login page? is it
possible to parse the entered text and store it?

benefits:

A) it would be harder for bots to create automated accounts.

B) We will get translations to many languages with little effort from the
users signing up.

What do you think?



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] suggestion: replace CAPTCHA with better approaches

2012-07-24 Thread Peter Gehres
On Tue, Jul 24, 2012 at 4:18 PM, Steven Walling wrote:

> On Tue, Jul 24, 2012 at 2:52 PM, Leslie Carr  wrote:
>
> > I agree that better tools and non captcha based tech are the way to go.
> >
> > At a previously very-spammed company, we learned how no matter how
> > badly you distort the captchas, it doesn't matter, as if it's human
> > readable, humans can pick out the text. Look how cheap it is to get a
> > human to do your captchas for the spammers!
> > http://decaptchablog.com/decaptcher-services
> >
> > Technical/social solutions such as helping the community patrol and
> > catch spam and automated detection of spammy language are the way to
> > go
> >
> > Leslie
> >
> > P.S. This is my own personal opinion and not the opinion of the
> foundation
> > P.P.S. I also vote for any proposal which increases the number of
> > kittens I get to view on a daily basis.
> >
>
> What about a honey pot?
>
> https://en.wikipedia.org/wiki/Honeypot_(computing)
>
> Steven


Tarpits are so much more fun.

http://en.wikipedia.org/wiki/Tarpit_(networking)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] suggestion: replace CAPTCHA with better approaches

2012-07-24 Thread Daniel Friesen
On Tue, 24 Jul 2012 16:18:43 -0700, Steven Walling  
 wrote:



On Tue, Jul 24, 2012 at 2:52 PM, Leslie Carr  wrote:


I agree that better tools and non captcha based tech are the way to go.

At a previously very-spammed company, we learned how no matter how
badly you distort the captchas, it doesn't matter, as if it's human
readable, humans can pick out the text. Look how cheap it is to get a
human to do your captchas for the spammers!
http://decaptchablog.com/decaptcher-services

Technical/social solutions such as helping the community patrol and
catch spam and automated detection of spammy language are the way to
go

Leslie

P.S. This is my own personal opinion and not the opinion of the  
foundation

P.P.S. I also vote for any proposal which increases the number of
kittens I get to view on a daily basis.



What about a honey pot?

https://en.wikipedia.org/wiki/Honeypot_(computing)

Steven


Time and time again that's something I've felt like trying to build.

--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] suggestion: replace CAPTCHA with better approaches

2012-07-24 Thread Steven Walling
On Tue, Jul 24, 2012 at 2:52 PM, Leslie Carr  wrote:

> I agree that better tools and non captcha based tech are the way to go.
>
> At a previously very-spammed company, we learned how no matter how
> badly you distort the captchas, it doesn't matter, as if it's human
> readable, humans can pick out the text. Look how cheap it is to get a
> human to do your captchas for the spammers!
> http://decaptchablog.com/decaptcher-services
>
> Technical/social solutions such as helping the community patrol and
> catch spam and automated detection of spammy language are the way to
> go
>
> Leslie
>
> P.S. This is my own personal opinion and not the opinion of the foundation
> P.P.S. I also vote for any proposal which increases the number of
> kittens I get to view on a daily basis.
>

What about a honey pot?

https://en.wikipedia.org/wiki/Honeypot_(computing)

Steven
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] suggestion: replace CAPTCHA with better approaches

2012-07-24 Thread Leslie Carr

>> Can you provide references?
>> What is the basis of the spam/work to do? Maybe we could make their
>> lives easier through creating a new tool, or better anti-spam measures.
>>

> But lastly, there is a very important fact in captcha cracking you're
> missing. Human aided captcha cracking already exists. No matter how hard you
> make it for a computer to understand captchas there are already bots
> breaking captchas by sending the captcha back to some home server, giving
> the captcha to either a badly paid turk worker or tricking some person
> wanting to look at porn into solving the captcha, then sending the solution
> back to the bot and breaking through the captcha.
>
> At this point "Pick the kitten" captchas will be less vulnerable than this
> proposal. (And even those can be broken with a human in the mix).
>

I agree that better tools and non captcha based tech are the way to go.

At a previously very-spammed company, we learned how no matter how
badly you distort the captchas, it doesn't matter, as if it's human
readable, humans can pick out the text. Look how cheap it is to get a
human to do your captchas for the spammers!
http://decaptchablog.com/decaptcher-services

Technical/social solutions such as helping the community patrol and
catch spam and automated detection of spammy language are the way to
go

Leslie

P.S. This is my own personal opinion and not the opinion of the foundation
P.P.S. I also vote for any proposal which increases the number of
kittens I get to view on a daily basis.



> --
> ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Leslie Carr
Wikimedia Foundation
AS 14907, 43821
http://as14907.peeringdb.com/

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] suggestion: replace CAPTCHA with better approaches

2012-07-24 Thread Daniel Friesen
On Tue, 24 Jul 2012 08:08:23 -0700, Platonides   
wrote:



On 24/07/12 16:11, matanya wrote:

As for the last few month the spam rate stewards deal with is raising.


Can you provide references?
What is the basis of the spam/work to do? Maybe we could make their
lives easier through creating a new tool, or better anti-spam measures.



I suggest we implement a new mechanism:

Instead of giving the user a CAPTCHA to solve, give him a image from  
commons

and ask him to add a brief description in his own language.

We can give him two images, one with known description, and the other  
with

unknown, after enough users translate the unknown in the same why, we
can use  it as a verified translation. We base on the known image  
description to

allow the user to create the account.


What if the known image isn't described the same way?
Even assuming that we provide them the English translation (so that they
know what it is, eg. it's not a "house" but the Royal Palace of XYZ!),
and that all our users understand English good enough for making a
translation. Not all translatoins will be the same.
Supose we get these different results: Fairytale graphic illustration,
House of the grandmother of Little Red Riding Hood, House of Little Red
Riding Hood granny, Picture of Perrault fairytale about Little Red
Riding Hood, Image of Little Red Riding Hood story from Grimms' Fairy  
Tales.


It wouldn't be that bad to have differing _proposed descriptions_. But
not so for the check-description, when you would need to guess how it
was translated previously by other (even if all translations were fair
and accurate, with no misspellings at all).




Is it possible to embed a file from commons in the login page? is it
possible to parse the entered text and store it?


Yes and yes (dedicating some efforts to make it happen, of course).



benefits:

A) it would be harder for bots to create automated accounts.

B) We will get translations to many languages with little effort from
the users
signing up.

What do you think?


I agree with (B) in that we would get many translations (although
probably low-quality ones).
I am not so sure about (A). If the accounts are being created by bot,
the captcha should be changed to stop it and/or new mechanisms (such as
throttles) created.
If they are handmade, I see little difference from a spammer POV. Making
up a description is harder than typing a word, but we would need to dumb
the process, so not a big difference. And in little time they would
learn how to game the system.


As for moving it forward, I think the learn from entered values should
be done in a generic way, and then the "recaptcha" proposal for helping
wikisource implemented. Your idea could be added later on (I see those
flaws, though).

Regards


I see flaws in this idea.

Firstly all images are going to need a pre-existing text we can match to  
work. We can't just go purely off of what is entered into the captcha  
because that means the first few users who are given that will have no  
text to match against and we're going to be annoying a lot of users by  
giving false negatives and rejecting edits from valid users.


Given that we will already have text out in the open this captcha will be  
trivial to attack. It will be a fairly trivial job to iterate over all of  
the commons images, and save a hash of the image and a description  
extracted from the image page into a key/value database.
After that when the bot is served an image it would hash the image, look  
up the value for the hash key in it's own database, and then use that  
description bypassing the captcha.


Even if you were to try resizing and slightly distorting an image I don't  
think it would help much. Image comparison is much easier than the task of  
extracting text from an image. Bots are liable to have access to some  
algorithm for creating fingerprints of images that will match even after  
you try to make slight changes to it.


But lastly, there is a very important fact in captcha cracking you're  
missing. Human aided captcha cracking already exists. No matter how hard  
you make it for a computer to understand captchas there are already bots  
breaking captchas by sending the captcha back to some home server, giving  
the captcha to either a badly paid turk worker or tricking some person  
wanting to look at porn into solving the captcha, then sending the  
solution back to the bot and breaking through the captcha.


At this point "Pick the kitten" captchas will be less vulnerable than this  
proposal. (And even those can be broken with a human in the mix).


--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] suggestion: replace CAPTCHA with better approaches

2012-07-24 Thread Platonides
On 24/07/12 16:11, matanya wrote:
> As for the last few month the spam rate stewards deal with is raising.

Can you provide references?
What is the basis of the spam/work to do? Maybe we could make their
lives easier through creating a new tool, or better anti-spam measures.


> I suggest we implement a new mechanism:
> 
> Instead of giving the user a CAPTCHA to solve, give him a image from commons
> and ask him to add a brief description in his own language.
> 
> We can give him two images, one with known description, and the other with
> unknown, after enough users translate the unknown in the same why, we
> can use  it as a verified translation. We base on the known image description 
> to
> allow the user to create the account.

What if the known image isn't described the same way?
Even assuming that we provide them the English translation (so that they
know what it is, eg. it's not a "house" but the Royal Palace of XYZ!),
and that all our users understand English good enough for making a
translation. Not all translatoins will be the same.
Supose we get these different results: Fairytale graphic illustration,
House of the grandmother of Little Red Riding Hood, House of Little Red
Riding Hood granny, Picture of Perrault fairytale about Little Red
Riding Hood, Image of Little Red Riding Hood story from Grimms' Fairy Tales.

It wouldn't be that bad to have differing _proposed descriptions_. But
not so for the check-description, when you would need to guess how it
was translated previously by other (even if all translations were fair
and accurate, with no misspellings at all).



> Is it possible to embed a file from commons in the login page? is it
> possible to parse the entered text and store it?

Yes and yes (dedicating some efforts to make it happen, of course).


> benefits:
> 
> A) it would be harder for bots to create automated accounts.
> 
> B) We will get translations to many languages with little effort from
> the users
> signing up.
> 
> What do you think?

I agree with (B) in that we would get many translations (although
probably low-quality ones).
I am not so sure about (A). If the accounts are being created by bot,
the captcha should be changed to stop it and/or new mechanisms (such as
throttles) created.
If they are handmade, I see little difference from a spammer POV. Making
up a description is harder than typing a word, but we would need to dumb
the process, so not a big difference. And in little time they would
learn how to game the system.


As for moving it forward, I think the learn from entered values should
be done in a generic way, and then the "recaptcha" proposal for helping
wikisource implemented. Your idea could be added later on (I see those
flaws, though).

Regards


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] suggestion: replace CAPTCHA with better approaches

2012-07-24 Thread matanya
As for the last few month the spam rate stewards deal with is raising.
I suggest we implement a new mechanism:

Instead of giving the user a CAPTCHA to solve, give him a image from commons
and ask him to add a brief description in his own language.

We can give him two images, one with known description, and the other with
unknown, after enough users translate the unknown in the same why, we
can use
it as a verified translation. We base on the known image description to
allow
the user to create the account.

Is it possible to embed a file from commons in the login page? is it
possible
to parse the entered text and store it?

benefits:

A) it would be harder for bots to create automated accounts.

B) We will get translations to many languages with little effort from
the users
signing up.

What do you think?



signature.asc
Description: OpenPGP digital signature
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l