Re: [Wikitech-l] Upgrading mailman (the software behind mailing lists)

2020-08-08 Thread ZI Jony
Great! It's really good and useful.

On Sun, Aug 9, 2020, 7:41 AM Denny Vrandečić 
wrote:

> Thank you so much!
>
> On Sat, Aug 8, 2020, 13:56 Amir Sarabadani  wrote:
>
> > Hey,
> > Mailman, the software that powers our mailing lists, is extremely old, by
> > looking at https://lists.wikimedia.org/ you can guess how old it is.
> >
> > I would really like to upgrade it to mailman 3 which has these benefits:
> > * Much better security (including but not limited to
> > https://phabricator.wikimedia.org/T181803)
> > * Much better UI and UX
> > * Much easier moderation and maintaining mailing lists
> > * Ability to send mail from the web
> > * Ability to search in archives.
> > * Ability to like/dislike an email
> > * List admins will be able to delete emails, merge threads, and much
> more.
> > * Admins won't need to store passwords for each mailing list separately,
> > they just login as their account everywhere.
> > * The current mailman stores everything as files (even mailing list
> > settings), mailman3 actually uses a proper database for everything
> meaning
> > proper backup and recovery, high availability and much more.
> >
> > I have already put up a test setup and humbly ask you (specially list
> > admins) to test it (and its admin interface), if you want to become a
> list
> > admin, drop me a message. Keep in mind that we don't maintain the
> software
> > so the most I can do is to change configuration and can't handle a
> feature
> > request or solve a bug (you are more than welcome to file it against
> > upstream though)
> >
> > Here's the test setup:
> > * https://lists.wmcloud.org
> >
> > Here's a mailing list:
> > * https://lists.wmcloud.org/postorius/lists/test.lists.wmcloud.org/
> >
> > Here's an archive post:
> > *
> >
> >
> https://lists.wmcloud.org/hyperkitty/list/t...@lists.wmcloud.org/thread/RMQPKSS4ID3WALFXAF636J2NGBVCN3UA/
> >
> > Issues that I haven't figured out yet:
> > * This system has profile picture support but it's only gravatar which we
> > can't enable due to our privacy policy but when you disable it, it shows
> > empty squares and looks bad. Reported upstream [1] but also we can have a
> > gravatar proxy in production. And in the worst case scenario we can just
> > inject "$('.gravatar').remove();" and remove them. Feel free to chime in
> in
> > the phabricator ticket in this regard:
> > https://phabricator.wikimedia.org/T256541
> >
> > * Upgrade will break archive links, making it work forever is not trivial
> > (you need write apache rewrite rule) (You can read about it in
> > https://docs.mailman3.org/en/latest/migration.html#other-considerations)
> >
> > * Mailman allows us to upgrade mailing list by mailing list, that's good
> > but we haven't found a way to keep the old version and the new ones in
> sync
> > (archives, etc.). Maybe we migrate a mailing list and the archives for
> the
> > old version will stop getting updated. Would that work for you? Feel free
> > to chime in: https://phabricator.wikimedia.org/T256539
> >
> > * We don't know what would be the size of the database after upgrade
> > because these two versions are so inherently different, one idea was to
> > check the size of a fully public mailing list, then move the files to the
> > test setup, upgrade it to the new version  and check how it changes, then
> > extrapolate the size of the final database. The discussion around the
> > database is happening in https://phabricator.wikimedia.org/T256538
> >
> > If you want to help in the upgrade (like puppetzining its configuration,
> > etc.) just let me know and I add you to the project! It uses a
> stand-alone
> > puppetmaster so you don't need to get your puppet patches merged to see
> its
> > effects.
> >
> > The main ticket about the upgrade:
> > https://phabricator.wikimedia.org/T52864
> >
> > [1] https://gitlab.com/mailman/hyperkitty/-/issues/303#note_365162201
> >
> > Hope that'll be useful for you :)
> > --
> > Amir (he/him)
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Upgrading mailman (the software behind mailing lists)

2020-08-08 Thread Denny Vrandečić
Thank you so much!

On Sat, Aug 8, 2020, 13:56 Amir Sarabadani  wrote:

> Hey,
> Mailman, the software that powers our mailing lists, is extremely old, by
> looking at https://lists.wikimedia.org/ you can guess how old it is.
>
> I would really like to upgrade it to mailman 3 which has these benefits:
> * Much better security (including but not limited to
> https://phabricator.wikimedia.org/T181803)
> * Much better UI and UX
> * Much easier moderation and maintaining mailing lists
> * Ability to send mail from the web
> * Ability to search in archives.
> * Ability to like/dislike an email
> * List admins will be able to delete emails, merge threads, and much more.
> * Admins won't need to store passwords for each mailing list separately,
> they just login as their account everywhere.
> * The current mailman stores everything as files (even mailing list
> settings), mailman3 actually uses a proper database for everything meaning
> proper backup and recovery, high availability and much more.
>
> I have already put up a test setup and humbly ask you (specially list
> admins) to test it (and its admin interface), if you want to become a list
> admin, drop me a message. Keep in mind that we don't maintain the software
> so the most I can do is to change configuration and can't handle a feature
> request or solve a bug (you are more than welcome to file it against
> upstream though)
>
> Here's the test setup:
> * https://lists.wmcloud.org
>
> Here's a mailing list:
> * https://lists.wmcloud.org/postorius/lists/test.lists.wmcloud.org/
>
> Here's an archive post:
> *
>
> https://lists.wmcloud.org/hyperkitty/list/t...@lists.wmcloud.org/thread/RMQPKSS4ID3WALFXAF636J2NGBVCN3UA/
>
> Issues that I haven't figured out yet:
> * This system has profile picture support but it's only gravatar which we
> can't enable due to our privacy policy but when you disable it, it shows
> empty squares and looks bad. Reported upstream [1] but also we can have a
> gravatar proxy in production. And in the worst case scenario we can just
> inject "$('.gravatar').remove();" and remove them. Feel free to chime in in
> the phabricator ticket in this regard:
> https://phabricator.wikimedia.org/T256541
>
> * Upgrade will break archive links, making it work forever is not trivial
> (you need write apache rewrite rule) (You can read about it in
> https://docs.mailman3.org/en/latest/migration.html#other-considerations)
>
> * Mailman allows us to upgrade mailing list by mailing list, that's good
> but we haven't found a way to keep the old version and the new ones in sync
> (archives, etc.). Maybe we migrate a mailing list and the archives for the
> old version will stop getting updated. Would that work for you? Feel free
> to chime in: https://phabricator.wikimedia.org/T256539
>
> * We don't know what would be the size of the database after upgrade
> because these two versions are so inherently different, one idea was to
> check the size of a fully public mailing list, then move the files to the
> test setup, upgrade it to the new version  and check how it changes, then
> extrapolate the size of the final database. The discussion around the
> database is happening in https://phabricator.wikimedia.org/T256538
>
> If you want to help in the upgrade (like puppetzining its configuration,
> etc.) just let me know and I add you to the project! It uses a stand-alone
> puppetmaster so you don't need to get your puppet patches merged to see its
> effects.
>
> The main ticket about the upgrade:
> https://phabricator.wikimedia.org/T52864
>
> [1] https://gitlab.com/mailman/hyperkitty/-/issues/303#note_365162201
>
> Hope that'll be useful for you :)
> --
> Amir (he/him)
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Ethical question regarding some code

2020-08-08 Thread Nathan
For my part, I think Amir is going way above and beyond to be so thoughtful
and open about the future of his tool.

I don't see how any part of it constitutes creating biometric identifiers,
nor is it obvious to me how it must remove anonymity of users.

John, perhaps you can elaborate on your reasoning there?

Ultimately I don't think community approval for this tool is technically
required. I appreciate the effort to solicit input and don't think it would
hurt to do that more broadly.

On Sat, Aug 8, 2020 at 5:44 PM John Erling Blad  wrote:

> Please stop calling this an “AI” system, it is not. It is statistical
> learning.
>
> This is probably not going to make me popular…
>
> In some jurisdictions you will need a permit to create, manage, and store
> biometric identifiers, no matter if the biometric identifier is for a known
> person or not. If you want to create biometric identifiers, and use them,
> make darn sure you follow every applicable law and rule. I'm not amused by
> the idea of having CUs using illegal tools to wet ordinary users.
>
> Any system that tries to remove anonymity og users on Wikipedia should have
> an RfC where the community can make their concerns heard. This is not the
> proper forum to get acceptance from Wikipedias community.
>
> And btw, systems for cleanup of prose exists for a whole bunch of
> languages, not only English. Grammarly is one, LanguageTool another, and
> there are a whole bunch other such tools.
>
> lør. 8. aug. 2020, 19.42 skrev Amir Sarabadani :
>
> > Thank you all for the responses, I try to summarize my responses here.
> >
> > * By closed source, I don't mean it will be only accessible to me, It's
> > already accessible by another CU and one WMF staff, and I would gladly
> > share the code with anyone who has signed NDA and they are of course more
> > than welcome to change it. Github has a really low limit for people who
> can
> > access a private repo but I would be fine with any means to fix this.
> >
> > * I have read that people say that there are already public tools to
> > analyze text. I disagree, 1- The tools you mentioned are for English and
> > not other languages (maybe I missed something) and even if we imagine
> there
> > would be such tools for big languages like German and/or French, they
> don't
> > cover lots of languages unlike my tool that's basically language agnostic
> > and depends on the volume of discussions happened in the wiki.
> >
> > * I also disagree that it's not hard to build. I have lots of experience
> > with NLP (with my favorite work being a tool that finds swear words in
> > every language based on history of vandalism in that Wikipedia [1]) and
> > still it took me more than a year (a couple of hours almost in every
> > weekend) to build this, analyzing a pure clean text is not hard, cleaning
> > up wikitext and templates and links to get only text people "spoke" is
> > doubly hard, analyzing user signatures brings only suffer and sorrow.
> >
> > * While in general I agree if a government wants to build this, they can
> > but reality is more complicated and this situation is similar to
> security.
> > You can never be 100% secure but you can increase the cost of hacking you
> > so much that it would be pointless for a major actor to do it.
> Governments
> > have a limited budget and dictatorships are by design corrupt and filled
> > with incompotent people [2] and sanctions put another restrain on such
> > governments too so I would not give them such opportunity for oppersion
> in
> > a silver plate for free, if they really want to, then they must pay for
> it
> > (which means they can't use that money/resources on oppersing some other
> > groups).
> >
> > * People have said this AI is easy to be gamed, while it's not that easy
> > and the tools you mentioned are limited to English, it's still a big win
> > for the integrity of our projects. It boils down again to increasing the
> > cost. If a major actor wants to spread disinformation, so far they only
> > need to fake their UA and IP which is a piece of cake and I already see
> > that (as a CU) but now they have to mess with UA/IP AND change their
> > methods of speaking (which is one order of magnitude harder than changing
> > IP). As I said, increasing this cost might not prevent it from happening
> > but at least it takes away the ability of oppressing other groups.
> >
> > * This tool never will be the only reason to block a sock. It's more than
> > anything a helper, if CU brings a large range and they are similar but
> the
> > result is not conclusive, this tool can help. Or when we are 90% sure
> it's
> > a WP:DUCK, this tool can help too but blocking just because this tool
> said
> > so would imply a "Minority report" situation and to be honest and I would
> > really like to avoid that. It is supposed to empower CUs.
> >
> > * Banning using this tool is not possible legally, the content of
> Wikipedia
> > is published under CC-BY-SA and this allows such ana

Re: [Wikitech-l] Ethical question regarding some code

2020-08-08 Thread bawolff
On Sat, Aug 8, 2020 at 9:44 PM John Erling Blad  wrote:

> Please stop calling this an “AI” system, it is not. It is statistical
> learning.
>
>
So in other words, it is an AI system? AI is just a colloquial synonym for
statistical learning at this point.

--
Brian
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Ethical question regarding some code

2020-08-08 Thread John Erling Blad
Please stop calling this an “AI” system, it is not. It is statistical
learning.

This is probably not going to make me popular…

In some jurisdictions you will need a permit to create, manage, and store
biometric identifiers, no matter if the biometric identifier is for a known
person or not. If you want to create biometric identifiers, and use them,
make darn sure you follow every applicable law and rule. I'm not amused by
the idea of having CUs using illegal tools to wet ordinary users.

Any system that tries to remove anonymity og users on Wikipedia should have
an RfC where the community can make their concerns heard. This is not the
proper forum to get acceptance from Wikipedias community.

And btw, systems for cleanup of prose exists for a whole bunch of
languages, not only English. Grammarly is one, LanguageTool another, and
there are a whole bunch other such tools.

lør. 8. aug. 2020, 19.42 skrev Amir Sarabadani :

> Thank you all for the responses, I try to summarize my responses here.
>
> * By closed source, I don't mean it will be only accessible to me, It's
> already accessible by another CU and one WMF staff, and I would gladly
> share the code with anyone who has signed NDA and they are of course more
> than welcome to change it. Github has a really low limit for people who can
> access a private repo but I would be fine with any means to fix this.
>
> * I have read that people say that there are already public tools to
> analyze text. I disagree, 1- The tools you mentioned are for English and
> not other languages (maybe I missed something) and even if we imagine there
> would be such tools for big languages like German and/or French, they don't
> cover lots of languages unlike my tool that's basically language agnostic
> and depends on the volume of discussions happened in the wiki.
>
> * I also disagree that it's not hard to build. I have lots of experience
> with NLP (with my favorite work being a tool that finds swear words in
> every language based on history of vandalism in that Wikipedia [1]) and
> still it took me more than a year (a couple of hours almost in every
> weekend) to build this, analyzing a pure clean text is not hard, cleaning
> up wikitext and templates and links to get only text people "spoke" is
> doubly hard, analyzing user signatures brings only suffer and sorrow.
>
> * While in general I agree if a government wants to build this, they can
> but reality is more complicated and this situation is similar to security.
> You can never be 100% secure but you can increase the cost of hacking you
> so much that it would be pointless for a major actor to do it. Governments
> have a limited budget and dictatorships are by design corrupt and filled
> with incompotent people [2] and sanctions put another restrain on such
> governments too so I would not give them such opportunity for oppersion in
> a silver plate for free, if they really want to, then they must pay for it
> (which means they can't use that money/resources on oppersing some other
> groups).
>
> * People have said this AI is easy to be gamed, while it's not that easy
> and the tools you mentioned are limited to English, it's still a big win
> for the integrity of our projects. It boils down again to increasing the
> cost. If a major actor wants to spread disinformation, so far they only
> need to fake their UA and IP which is a piece of cake and I already see
> that (as a CU) but now they have to mess with UA/IP AND change their
> methods of speaking (which is one order of magnitude harder than changing
> IP). As I said, increasing this cost might not prevent it from happening
> but at least it takes away the ability of oppressing other groups.
>
> * This tool never will be the only reason to block a sock. It's more than
> anything a helper, if CU brings a large range and they are similar but the
> result is not conclusive, this tool can help. Or when we are 90% sure it's
> a WP:DUCK, this tool can help too but blocking just because this tool said
> so would imply a "Minority report" situation and to be honest and I would
> really like to avoid that. It is supposed to empower CUs.
>
> * Banning using this tool is not possible legally, the content of Wikipedia
> is published under CC-BY-SA and this allows such analysis specially you
> can't ban an offwiki action. Also, if a university professor can do it, I
> don't see the point of banning using it by the most trusted group of users
> (CUs). You can ban blocking based on this tool but I don't think we should
> block solely based on this anyway.
>
> * It has been pointed out by people in the checkuser mailing list that
> there's no point in logging accessing this tool, since the code is
> accessible to CUs (if they want to), so they can download and run it on
> their computer without logging anyway.
>
> * There is a huge difference between CU and this AI tool in matters of
> privacy. While both are privacy sensitive but CU reveals much more, as a
> CU, I know where

[Wikitech-l] Upgrading mailman (the software behind mailing lists)

2020-08-08 Thread Amir Sarabadani
Hey,
Mailman, the software that powers our mailing lists, is extremely old, by
looking at https://lists.wikimedia.org/ you can guess how old it is.

I would really like to upgrade it to mailman 3 which has these benefits:
* Much better security (including but not limited to
https://phabricator.wikimedia.org/T181803)
* Much better UI and UX
* Much easier moderation and maintaining mailing lists
* Ability to send mail from the web
* Ability to search in archives.
* Ability to like/dislike an email
* List admins will be able to delete emails, merge threads, and much more.
* Admins won't need to store passwords for each mailing list separately,
they just login as their account everywhere.
* The current mailman stores everything as files (even mailing list
settings), mailman3 actually uses a proper database for everything meaning
proper backup and recovery, high availability and much more.

I have already put up a test setup and humbly ask you (specially list
admins) to test it (and its admin interface), if you want to become a list
admin, drop me a message. Keep in mind that we don't maintain the software
so the most I can do is to change configuration and can't handle a feature
request or solve a bug (you are more than welcome to file it against
upstream though)

Here's the test setup:
* https://lists.wmcloud.org

Here's a mailing list:
* https://lists.wmcloud.org/postorius/lists/test.lists.wmcloud.org/

Here's an archive post:
*
https://lists.wmcloud.org/hyperkitty/list/t...@lists.wmcloud.org/thread/RMQPKSS4ID3WALFXAF636J2NGBVCN3UA/

Issues that I haven't figured out yet:
* This system has profile picture support but it's only gravatar which we
can't enable due to our privacy policy but when you disable it, it shows
empty squares and looks bad. Reported upstream [1] but also we can have a
gravatar proxy in production. And in the worst case scenario we can just
inject "$('.gravatar').remove();" and remove them. Feel free to chime in in
the phabricator ticket in this regard:
https://phabricator.wikimedia.org/T256541

* Upgrade will break archive links, making it work forever is not trivial
(you need write apache rewrite rule) (You can read about it in
https://docs.mailman3.org/en/latest/migration.html#other-considerations)

* Mailman allows us to upgrade mailing list by mailing list, that's good
but we haven't found a way to keep the old version and the new ones in sync
(archives, etc.). Maybe we migrate a mailing list and the archives for the
old version will stop getting updated. Would that work for you? Feel free
to chime in: https://phabricator.wikimedia.org/T256539

* We don't know what would be the size of the database after upgrade
because these two versions are so inherently different, one idea was to
check the size of a fully public mailing list, then move the files to the
test setup, upgrade it to the new version  and check how it changes, then
extrapolate the size of the final database. The discussion around the
database is happening in https://phabricator.wikimedia.org/T256538

If you want to help in the upgrade (like puppetzining its configuration,
etc.) just let me know and I add you to the project! It uses a stand-alone
puppetmaster so you don't need to get your puppet patches merged to see its
effects.

The main ticket about the upgrade: https://phabricator.wikimedia.org/T52864

[1] https://gitlab.com/mailman/hyperkitty/-/issues/303#note_365162201

Hope that'll be useful for you :)
-- 
Amir (he/him)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Ethical question regarding some code

2020-08-08 Thread Amir Sarabadani
Thank you all for the responses, I try to summarize my responses here.

* By closed source, I don't mean it will be only accessible to me, It's
already accessible by another CU and one WMF staff, and I would gladly
share the code with anyone who has signed NDA and they are of course more
than welcome to change it. Github has a really low limit for people who can
access a private repo but I would be fine with any means to fix this.

* I have read that people say that there are already public tools to
analyze text. I disagree, 1- The tools you mentioned are for English and
not other languages (maybe I missed something) and even if we imagine there
would be such tools for big languages like German and/or French, they don't
cover lots of languages unlike my tool that's basically language agnostic
and depends on the volume of discussions happened in the wiki.

* I also disagree that it's not hard to build. I have lots of experience
with NLP (with my favorite work being a tool that finds swear words in
every language based on history of vandalism in that Wikipedia [1]) and
still it took me more than a year (a couple of hours almost in every
weekend) to build this, analyzing a pure clean text is not hard, cleaning
up wikitext and templates and links to get only text people "spoke" is
doubly hard, analyzing user signatures brings only suffer and sorrow.

* While in general I agree if a government wants to build this, they can
but reality is more complicated and this situation is similar to security.
You can never be 100% secure but you can increase the cost of hacking you
so much that it would be pointless for a major actor to do it. Governments
have a limited budget and dictatorships are by design corrupt and filled
with incompotent people [2] and sanctions put another restrain on such
governments too so I would not give them such opportunity for oppersion in
a silver plate for free, if they really want to, then they must pay for it
(which means they can't use that money/resources on oppersing some other
groups).

* People have said this AI is easy to be gamed, while it's not that easy
and the tools you mentioned are limited to English, it's still a big win
for the integrity of our projects. It boils down again to increasing the
cost. If a major actor wants to spread disinformation, so far they only
need to fake their UA and IP which is a piece of cake and I already see
that (as a CU) but now they have to mess with UA/IP AND change their
methods of speaking (which is one order of magnitude harder than changing
IP). As I said, increasing this cost might not prevent it from happening
but at least it takes away the ability of oppressing other groups.

* This tool never will be the only reason to block a sock. It's more than
anything a helper, if CU brings a large range and they are similar but the
result is not conclusive, this tool can help. Or when we are 90% sure it's
a WP:DUCK, this tool can help too but blocking just because this tool said
so would imply a "Minority report" situation and to be honest and I would
really like to avoid that. It is supposed to empower CUs.

* Banning using this tool is not possible legally, the content of Wikipedia
is published under CC-BY-SA and this allows such analysis specially you
can't ban an offwiki action. Also, if a university professor can do it, I
don't see the point of banning using it by the most trusted group of users
(CUs). You can ban blocking based on this tool but I don't think we should
block solely based on this anyway.

* It has been pointed out by people in the checkuser mailing list that
there's no point in logging accessing this tool, since the code is
accessible to CUs (if they want to), so they can download and run it on
their computer without logging anyway.

* There is a huge difference between CU and this AI tool in matters of
privacy. While both are privacy sensitive but CU reveals much more, as a
CU, I know where lots of people are living or studying because they showed
up in my CUs and while I won't tell a soul about them but it makes me
uncomfortable (I'm also not implying CUs are not trusted, it's just we
should respect people's privacy and avoid "unreasonable search and
seizure"[3]) but this tool only reveals a connection between accounts if
one of them is linked to a public identity and the other is not which I
wholeheartedly agree is not great but it's not on the same level as seeing
people's IPs. So I even think in an ideal world where the AI model is more
accurate than CU, we should stop using CU and rely solely on the AI instead
(important: I'm not implying the current model is better, I'm saying if it
was better). This would help us understand why for example fishing for sock
puppets with CU is bad (and banned by the policy) but fishing for socks
using this AI is not bad and can be a good starting point. In other words,
this tool being used right, can reduce check user actions and protect
people's privacy instead.

* People have been saying