Re: [Wikitech-l] Ethical question regarding some code

2020-08-09 Thread Gergő Tisza
FWIW, the movement strategy included a recommendation* for having a technology ethics review process [1]; maybe this is a good opportunity to experiment with creating a precursory, unofficial version of that - make a wiki page for the sock puppet detection tool, and a proposal process for such

Re: [Wikitech-l] Ethical question regarding some code

2020-08-09 Thread Gergo Tisza
On Sun, Aug 9, 2020 at 2:18 AM Nathan wrote: > I don't see how any part of it constitutes creating biometric identifiers, > nor is it obvious to me how it must remove anonymity of users. > The GDPR for example defines biometric data as "personal data resulting from specific technical processing

Re: [Wikitech-l] Ethical question regarding some code

2020-08-09 Thread Gergő Tisza
On Fri, Aug 7, 2020 at 6:39 PM Ryan Kaldari wrote: > Whatever danger is embodied in Amir's code, it's only a matter of time > before this danger is ubiquitous. And for the worst-case > scenario—governments using the technology to hunt down dissidents—I imagine > this is already happening. So

Re: [Wikitech-l] Ethical question regarding some code

2020-08-09 Thread Gergő Tisza
On Sat, Aug 8, 2020 at 7:43 PM Amir Sarabadani wrote: > * By closed source, I don't mean it will be only accessible to me, It's > already accessible by another CU and one WMF staff, and I would gladly > share the code with anyone who has signed NDA and they are of course more > than welcome to

Re: [Wikitech-l] Ethical question regarding some code

2020-08-08 Thread Nathan
For my part, I think Amir is going way above and beyond to be so thoughtful and open about the future of his tool. I don't see how any part of it constitutes creating biometric identifiers, nor is it obvious to me how it must remove anonymity of users. John, perhaps you can elaborate on your

Re: [Wikitech-l] Ethical question regarding some code

2020-08-08 Thread bawolff
On Sat, Aug 8, 2020 at 9:44 PM John Erling Blad wrote: > Please stop calling this an “AI” system, it is not. It is statistical > learning. > > So in other words, it is an AI system? AI is just a colloquial synonym for statistical learning at this point. -- Brian

Re: [Wikitech-l] Ethical question regarding some code

2020-08-08 Thread John Erling Blad
Please stop calling this an “AI” system, it is not. It is statistical learning. This is probably not going to make me popular… In some jurisdictions you will need a permit to create, manage, and store biometric identifiers, no matter if the biometric identifier is for a known person or not. If

Re: [Wikitech-l] Ethical question regarding some code

2020-08-08 Thread Amir Sarabadani
Thank you all for the responses, I try to summarize my responses here. * By closed source, I don't mean it will be only accessible to me, It's already accessible by another CU and one WMF staff, and I would gladly share the code with anyone who has signed NDA and they are of course more than

Re: [Wikitech-l] Ethical question regarding some code

2020-08-07 Thread Ryan Kaldari
For better or worse, it seems clear that the cat is out of the bag. Identity detection through stylometry is now an established technology and you can easily find code on GitHub or elsewhere (e.g. https://github.com/jabraunlin/reddit-user-id) to accomplish it (if you have the time and energy to

Re: [Wikitech-l] Ethical question regarding some code

2020-08-07 Thread Federico Leva (Nemo)
Thanks Amir for having this conversation here. On Nathan's point: outside the Wikimedia projects, we of the free culture movement tend to argue for full transparency on the functioning of "automated decision making", "algorithmic tools" , "forensic software" and so on, typically ensured by open

Re: [Wikitech-l] Ethical question regarding some code

2020-08-07 Thread Derk-Jan Hartman
As others, I see several problems 1. If the code is public, someone can duplicate it and bypass our internal 'safekeeping', because it uses public data. 2. Risk of misuse by either incompetence or malice 3. Risk of accidentally exposing legitimate sockpuppets even in the most closed off

Re: [Wikitech-l] Ethical question regarding some code

2020-08-06 Thread Nathan
I appreciate that Amir is acknowledging that as neat as this tool sounds, its use is fraught with risk. The comparison that immediately jumped to my mind is predictive algorithms used in the criminal justice system to assess risk of bail jumping or criminal recidividism. These algorithms have

Re: [Wikitech-l] Ethical question regarding some code

2020-08-06 Thread QEDK
I think an important thing to note is that it's public information, so such a model, either better or worse can easily be built by an AI enthusiast. The potential for misuse is not much as it's relatively easy to game, and I don't think that the model's results will hold more water than behaviour

Re: [Wikitech-l] Ethical question regarding some code

2020-08-06 Thread John Erling Blad
For those interested; the best solution as far as I know for this kind of similarity detection is the Siamese network with RNNs in the first part. That implies you must extract fingerprints for all likely candidates (users) and then some to create a baseline. You can not simply claim that two

Re: [Wikitech-l] Ethical question regarding some code

2020-08-06 Thread John Erling Blad
Nice idea! First time I wrote about this being possible was back in 2008-ish. The problem is quite trivial, you use some observable feature to fingerprint an adversary. The adversary can then game the system if the observable feature can be somehow changed or modified. To avoid this the

Re: [Wikitech-l] Ethical question regarding some code

2020-08-06 Thread Gergő Tisza
Technically, you can make the tool open-source and keep the source code secret. That solves the maintenance problem (others who get access can legally modify). Of course, you'd have to trust everyone with access to the files to not publish them which they would be technically entitled to (unless

Re: [Wikitech-l] Ethical question regarding some code

2020-08-06 Thread Thiemo Kreuz
I'm afraid I have to agree with what AntiCompositeNumber wrote. When you set up infrastructure to fight abuse – no matter if that infrastructure is a technical barrier like a captcha, a tool that "blames" people for being sock puppets, or a law – it will affect *all* users, not only the abusers.

Re: [Wikitech-l] Ethical question regarding some code

2020-08-05 Thread bawolff
That's a tough question, and I'm not sure what the answer is. There is a little bit of precedent with https://www.mediawiki.org/w/index.php?oldid=2533048=Extension:AntiBot When evaluating harm, I guess one of the questions is how does your approach compare in effectiveness to other publicly

Re: [Wikitech-l] Ethical question regarding some code

2020-08-05 Thread AntiCompositeNumber
Creating and promoting the use of a closed-source tool, especially one used to detect disruptive editing, runs counter to core Wikimedia community principles. Making such a tool closed-source prevents the Wikimedia editing community from auditing its use, contesting its decisions, making

[Wikitech-l] Ethical question regarding some code

2020-08-05 Thread Amir Sarabadani
Hey, I have an ethical question that I couldn't answer yet and have been asking around but no definite answer yet so I'm asking it in a larger audience in hope of a solution. For almost a year now, I have been developing an NLP-based AI system to be able to catch sock puppets (two users