Re: [Wikitech-l] Upgrading mailman (the software behind mailing lists)
Great! It's really good and useful. On Sun, Aug 9, 2020, 7:41 AM Denny Vrandečić wrote: > Thank you so much! > > On Sat, Aug 8, 2020, 13:56 Amir Sarabadani wrote: > > > Hey, > > Mailman, the software that powers our mailing lists, is extremely old, by > > looking at https://lists.wikimedia.org/ you can guess how old it is. > > > > I would really like to upgrade it to mailman 3 which has these benefits: > > * Much better security (including but not limited to > > https://phabricator.wikimedia.org/T181803) > > * Much better UI and UX > > * Much easier moderation and maintaining mailing lists > > * Ability to send mail from the web > > * Ability to search in archives. > > * Ability to like/dislike an email > > * List admins will be able to delete emails, merge threads, and much > more. > > * Admins won't need to store passwords for each mailing list separately, > > they just login as their account everywhere. > > * The current mailman stores everything as files (even mailing list > > settings), mailman3 actually uses a proper database for everything > meaning > > proper backup and recovery, high availability and much more. > > > > I have already put up a test setup and humbly ask you (specially list > > admins) to test it (and its admin interface), if you want to become a > list > > admin, drop me a message. Keep in mind that we don't maintain the > software > > so the most I can do is to change configuration and can't handle a > feature > > request or solve a bug (you are more than welcome to file it against > > upstream though) > > > > Here's the test setup: > > * https://lists.wmcloud.org > > > > Here's a mailing list: > > * https://lists.wmcloud.org/postorius/lists/test.lists.wmcloud.org/ > > > > Here's an archive post: > > * > > > > > https://lists.wmcloud.org/hyperkitty/list/t...@lists.wmcloud.org/thread/RMQPKSS4ID3WALFXAF636J2NGBVCN3UA/ > > > > Issues that I haven't figured out yet: > > * This system has profile picture support but it's only gravatar which we > > can't enable due to our privacy policy but when you disable it, it shows > > empty squares and looks bad. Reported upstream [1] but also we can have a > > gravatar proxy in production. And in the worst case scenario we can just > > inject "$('.gravatar').remove();" and remove them. Feel free to chime in > in > > the phabricator ticket in this regard: > > https://phabricator.wikimedia.org/T256541 > > > > * Upgrade will break archive links, making it work forever is not trivial > > (you need write apache rewrite rule) (You can read about it in > > https://docs.mailman3.org/en/latest/migration.html#other-considerations) > > > > * Mailman allows us to upgrade mailing list by mailing list, that's good > > but we haven't found a way to keep the old version and the new ones in > sync > > (archives, etc.). Maybe we migrate a mailing list and the archives for > the > > old version will stop getting updated. Would that work for you? Feel free > > to chime in: https://phabricator.wikimedia.org/T256539 > > > > * We don't know what would be the size of the database after upgrade > > because these two versions are so inherently different, one idea was to > > check the size of a fully public mailing list, then move the files to the > > test setup, upgrade it to the new version and check how it changes, then > > extrapolate the size of the final database. The discussion around the > > database is happening in https://phabricator.wikimedia.org/T256538 > > > > If you want to help in the upgrade (like puppetzining its configuration, > > etc.) just let me know and I add you to the project! It uses a > stand-alone > > puppetmaster so you don't need to get your puppet patches merged to see > its > > effects. > > > > The main ticket about the upgrade: > > https://phabricator.wikimedia.org/T52864 > > > > [1] https://gitlab.com/mailman/hyperkitty/-/issues/303#note_365162201 > > > > Hope that'll be useful for you :) > > -- > > Amir (he/him) > > ___ > > Wikitech-l mailing list > > Wikitech-l@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > ___ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Upgrading mailman (the software behind mailing lists)
Thank you so much! On Sat, Aug 8, 2020, 13:56 Amir Sarabadani wrote: > Hey, > Mailman, the software that powers our mailing lists, is extremely old, by > looking at https://lists.wikimedia.org/ you can guess how old it is. > > I would really like to upgrade it to mailman 3 which has these benefits: > * Much better security (including but not limited to > https://phabricator.wikimedia.org/T181803) > * Much better UI and UX > * Much easier moderation and maintaining mailing lists > * Ability to send mail from the web > * Ability to search in archives. > * Ability to like/dislike an email > * List admins will be able to delete emails, merge threads, and much more. > * Admins won't need to store passwords for each mailing list separately, > they just login as their account everywhere. > * The current mailman stores everything as files (even mailing list > settings), mailman3 actually uses a proper database for everything meaning > proper backup and recovery, high availability and much more. > > I have already put up a test setup and humbly ask you (specially list > admins) to test it (and its admin interface), if you want to become a list > admin, drop me a message. Keep in mind that we don't maintain the software > so the most I can do is to change configuration and can't handle a feature > request or solve a bug (you are more than welcome to file it against > upstream though) > > Here's the test setup: > * https://lists.wmcloud.org > > Here's a mailing list: > * https://lists.wmcloud.org/postorius/lists/test.lists.wmcloud.org/ > > Here's an archive post: > * > > https://lists.wmcloud.org/hyperkitty/list/t...@lists.wmcloud.org/thread/RMQPKSS4ID3WALFXAF636J2NGBVCN3UA/ > > Issues that I haven't figured out yet: > * This system has profile picture support but it's only gravatar which we > can't enable due to our privacy policy but when you disable it, it shows > empty squares and looks bad. Reported upstream [1] but also we can have a > gravatar proxy in production. And in the worst case scenario we can just > inject "$('.gravatar').remove();" and remove them. Feel free to chime in in > the phabricator ticket in this regard: > https://phabricator.wikimedia.org/T256541 > > * Upgrade will break archive links, making it work forever is not trivial > (you need write apache rewrite rule) (You can read about it in > https://docs.mailman3.org/en/latest/migration.html#other-considerations) > > * Mailman allows us to upgrade mailing list by mailing list, that's good > but we haven't found a way to keep the old version and the new ones in sync > (archives, etc.). Maybe we migrate a mailing list and the archives for the > old version will stop getting updated. Would that work for you? Feel free > to chime in: https://phabricator.wikimedia.org/T256539 > > * We don't know what would be the size of the database after upgrade > because these two versions are so inherently different, one idea was to > check the size of a fully public mailing list, then move the files to the > test setup, upgrade it to the new version and check how it changes, then > extrapolate the size of the final database. The discussion around the > database is happening in https://phabricator.wikimedia.org/T256538 > > If you want to help in the upgrade (like puppetzining its configuration, > etc.) just let me know and I add you to the project! It uses a stand-alone > puppetmaster so you don't need to get your puppet patches merged to see its > effects. > > The main ticket about the upgrade: > https://phabricator.wikimedia.org/T52864 > > [1] https://gitlab.com/mailman/hyperkitty/-/issues/303#note_365162201 > > Hope that'll be useful for you :) > -- > Amir (he/him) > ___ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Ethical question regarding some code
For my part, I think Amir is going way above and beyond to be so thoughtful and open about the future of his tool. I don't see how any part of it constitutes creating biometric identifiers, nor is it obvious to me how it must remove anonymity of users. John, perhaps you can elaborate on your reasoning there? Ultimately I don't think community approval for this tool is technically required. I appreciate the effort to solicit input and don't think it would hurt to do that more broadly. On Sat, Aug 8, 2020 at 5:44 PM John Erling Blad wrote: > Please stop calling this an “AI” system, it is not. It is statistical > learning. > > This is probably not going to make me popular… > > In some jurisdictions you will need a permit to create, manage, and store > biometric identifiers, no matter if the biometric identifier is for a known > person or not. If you want to create biometric identifiers, and use them, > make darn sure you follow every applicable law and rule. I'm not amused by > the idea of having CUs using illegal tools to wet ordinary users. > > Any system that tries to remove anonymity og users on Wikipedia should have > an RfC where the community can make their concerns heard. This is not the > proper forum to get acceptance from Wikipedias community. > > And btw, systems for cleanup of prose exists for a whole bunch of > languages, not only English. Grammarly is one, LanguageTool another, and > there are a whole bunch other such tools. > > lør. 8. aug. 2020, 19.42 skrev Amir Sarabadani : > > > Thank you all for the responses, I try to summarize my responses here. > > > > * By closed source, I don't mean it will be only accessible to me, It's > > already accessible by another CU and one WMF staff, and I would gladly > > share the code with anyone who has signed NDA and they are of course more > > than welcome to change it. Github has a really low limit for people who > can > > access a private repo but I would be fine with any means to fix this. > > > > * I have read that people say that there are already public tools to > > analyze text. I disagree, 1- The tools you mentioned are for English and > > not other languages (maybe I missed something) and even if we imagine > there > > would be such tools for big languages like German and/or French, they > don't > > cover lots of languages unlike my tool that's basically language agnostic > > and depends on the volume of discussions happened in the wiki. > > > > * I also disagree that it's not hard to build. I have lots of experience > > with NLP (with my favorite work being a tool that finds swear words in > > every language based on history of vandalism in that Wikipedia [1]) and > > still it took me more than a year (a couple of hours almost in every > > weekend) to build this, analyzing a pure clean text is not hard, cleaning > > up wikitext and templates and links to get only text people "spoke" is > > doubly hard, analyzing user signatures brings only suffer and sorrow. > > > > * While in general I agree if a government wants to build this, they can > > but reality is more complicated and this situation is similar to > security. > > You can never be 100% secure but you can increase the cost of hacking you > > so much that it would be pointless for a major actor to do it. > Governments > > have a limited budget and dictatorships are by design corrupt and filled > > with incompotent people [2] and sanctions put another restrain on such > > governments too so I would not give them such opportunity for oppersion > in > > a silver plate for free, if they really want to, then they must pay for > it > > (which means they can't use that money/resources on oppersing some other > > groups). > > > > * People have said this AI is easy to be gamed, while it's not that easy > > and the tools you mentioned are limited to English, it's still a big win > > for the integrity of our projects. It boils down again to increasing the > > cost. If a major actor wants to spread disinformation, so far they only > > need to fake their UA and IP which is a piece of cake and I already see > > that (as a CU) but now they have to mess with UA/IP AND change their > > methods of speaking (which is one order of magnitude harder than changing > > IP). As I said, increasing this cost might not prevent it from happening > > but at least it takes away the ability of oppressing other groups. > > > > * This tool never will be the only reason to block a sock. It's more than > > anything a helper, if CU brings a large range and they are similar but > the > > result is not conclusive, this tool can help. Or when we are 90% sure > it's > > a WP:DUCK, this tool can help too but blocking just because this tool > said > > so would imply a "Minority report" situation and to be honest and I would > > really like to avoid that. It is supposed to empower CUs. > > > > * Banning using this tool is not possible legally, the content of > Wikipedia > > is published under CC-BY-SA and this allows such ana
Re: [Wikitech-l] Ethical question regarding some code
On Sat, Aug 8, 2020 at 9:44 PM John Erling Blad wrote: > Please stop calling this an “AI” system, it is not. It is statistical > learning. > > So in other words, it is an AI system? AI is just a colloquial synonym for statistical learning at this point. -- Brian ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Ethical question regarding some code
Please stop calling this an “AI” system, it is not. It is statistical learning. This is probably not going to make me popular… In some jurisdictions you will need a permit to create, manage, and store biometric identifiers, no matter if the biometric identifier is for a known person or not. If you want to create biometric identifiers, and use them, make darn sure you follow every applicable law and rule. I'm not amused by the idea of having CUs using illegal tools to wet ordinary users. Any system that tries to remove anonymity og users on Wikipedia should have an RfC where the community can make their concerns heard. This is not the proper forum to get acceptance from Wikipedias community. And btw, systems for cleanup of prose exists for a whole bunch of languages, not only English. Grammarly is one, LanguageTool another, and there are a whole bunch other such tools. lør. 8. aug. 2020, 19.42 skrev Amir Sarabadani : > Thank you all for the responses, I try to summarize my responses here. > > * By closed source, I don't mean it will be only accessible to me, It's > already accessible by another CU and one WMF staff, and I would gladly > share the code with anyone who has signed NDA and they are of course more > than welcome to change it. Github has a really low limit for people who can > access a private repo but I would be fine with any means to fix this. > > * I have read that people say that there are already public tools to > analyze text. I disagree, 1- The tools you mentioned are for English and > not other languages (maybe I missed something) and even if we imagine there > would be such tools for big languages like German and/or French, they don't > cover lots of languages unlike my tool that's basically language agnostic > and depends on the volume of discussions happened in the wiki. > > * I also disagree that it's not hard to build. I have lots of experience > with NLP (with my favorite work being a tool that finds swear words in > every language based on history of vandalism in that Wikipedia [1]) and > still it took me more than a year (a couple of hours almost in every > weekend) to build this, analyzing a pure clean text is not hard, cleaning > up wikitext and templates and links to get only text people "spoke" is > doubly hard, analyzing user signatures brings only suffer and sorrow. > > * While in general I agree if a government wants to build this, they can > but reality is more complicated and this situation is similar to security. > You can never be 100% secure but you can increase the cost of hacking you > so much that it would be pointless for a major actor to do it. Governments > have a limited budget and dictatorships are by design corrupt and filled > with incompotent people [2] and sanctions put another restrain on such > governments too so I would not give them such opportunity for oppersion in > a silver plate for free, if they really want to, then they must pay for it > (which means they can't use that money/resources on oppersing some other > groups). > > * People have said this AI is easy to be gamed, while it's not that easy > and the tools you mentioned are limited to English, it's still a big win > for the integrity of our projects. It boils down again to increasing the > cost. If a major actor wants to spread disinformation, so far they only > need to fake their UA and IP which is a piece of cake and I already see > that (as a CU) but now they have to mess with UA/IP AND change their > methods of speaking (which is one order of magnitude harder than changing > IP). As I said, increasing this cost might not prevent it from happening > but at least it takes away the ability of oppressing other groups. > > * This tool never will be the only reason to block a sock. It's more than > anything a helper, if CU brings a large range and they are similar but the > result is not conclusive, this tool can help. Or when we are 90% sure it's > a WP:DUCK, this tool can help too but blocking just because this tool said > so would imply a "Minority report" situation and to be honest and I would > really like to avoid that. It is supposed to empower CUs. > > * Banning using this tool is not possible legally, the content of Wikipedia > is published under CC-BY-SA and this allows such analysis specially you > can't ban an offwiki action. Also, if a university professor can do it, I > don't see the point of banning using it by the most trusted group of users > (CUs). You can ban blocking based on this tool but I don't think we should > block solely based on this anyway. > > * It has been pointed out by people in the checkuser mailing list that > there's no point in logging accessing this tool, since the code is > accessible to CUs (if they want to), so they can download and run it on > their computer without logging anyway. > > * There is a huge difference between CU and this AI tool in matters of > privacy. While both are privacy sensitive but CU reveals much more, as a > CU, I know where
[Wikitech-l] Upgrading mailman (the software behind mailing lists)
Hey, Mailman, the software that powers our mailing lists, is extremely old, by looking at https://lists.wikimedia.org/ you can guess how old it is. I would really like to upgrade it to mailman 3 which has these benefits: * Much better security (including but not limited to https://phabricator.wikimedia.org/T181803) * Much better UI and UX * Much easier moderation and maintaining mailing lists * Ability to send mail from the web * Ability to search in archives. * Ability to like/dislike an email * List admins will be able to delete emails, merge threads, and much more. * Admins won't need to store passwords for each mailing list separately, they just login as their account everywhere. * The current mailman stores everything as files (even mailing list settings), mailman3 actually uses a proper database for everything meaning proper backup and recovery, high availability and much more. I have already put up a test setup and humbly ask you (specially list admins) to test it (and its admin interface), if you want to become a list admin, drop me a message. Keep in mind that we don't maintain the software so the most I can do is to change configuration and can't handle a feature request or solve a bug (you are more than welcome to file it against upstream though) Here's the test setup: * https://lists.wmcloud.org Here's a mailing list: * https://lists.wmcloud.org/postorius/lists/test.lists.wmcloud.org/ Here's an archive post: * https://lists.wmcloud.org/hyperkitty/list/t...@lists.wmcloud.org/thread/RMQPKSS4ID3WALFXAF636J2NGBVCN3UA/ Issues that I haven't figured out yet: * This system has profile picture support but it's only gravatar which we can't enable due to our privacy policy but when you disable it, it shows empty squares and looks bad. Reported upstream [1] but also we can have a gravatar proxy in production. And in the worst case scenario we can just inject "$('.gravatar').remove();" and remove them. Feel free to chime in in the phabricator ticket in this regard: https://phabricator.wikimedia.org/T256541 * Upgrade will break archive links, making it work forever is not trivial (you need write apache rewrite rule) (You can read about it in https://docs.mailman3.org/en/latest/migration.html#other-considerations) * Mailman allows us to upgrade mailing list by mailing list, that's good but we haven't found a way to keep the old version and the new ones in sync (archives, etc.). Maybe we migrate a mailing list and the archives for the old version will stop getting updated. Would that work for you? Feel free to chime in: https://phabricator.wikimedia.org/T256539 * We don't know what would be the size of the database after upgrade because these two versions are so inherently different, one idea was to check the size of a fully public mailing list, then move the files to the test setup, upgrade it to the new version and check how it changes, then extrapolate the size of the final database. The discussion around the database is happening in https://phabricator.wikimedia.org/T256538 If you want to help in the upgrade (like puppetzining its configuration, etc.) just let me know and I add you to the project! It uses a stand-alone puppetmaster so you don't need to get your puppet patches merged to see its effects. The main ticket about the upgrade: https://phabricator.wikimedia.org/T52864 [1] https://gitlab.com/mailman/hyperkitty/-/issues/303#note_365162201 Hope that'll be useful for you :) -- Amir (he/him) ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Ethical question regarding some code
Thank you all for the responses, I try to summarize my responses here. * By closed source, I don't mean it will be only accessible to me, It's already accessible by another CU and one WMF staff, and I would gladly share the code with anyone who has signed NDA and they are of course more than welcome to change it. Github has a really low limit for people who can access a private repo but I would be fine with any means to fix this. * I have read that people say that there are already public tools to analyze text. I disagree, 1- The tools you mentioned are for English and not other languages (maybe I missed something) and even if we imagine there would be such tools for big languages like German and/or French, they don't cover lots of languages unlike my tool that's basically language agnostic and depends on the volume of discussions happened in the wiki. * I also disagree that it's not hard to build. I have lots of experience with NLP (with my favorite work being a tool that finds swear words in every language based on history of vandalism in that Wikipedia [1]) and still it took me more than a year (a couple of hours almost in every weekend) to build this, analyzing a pure clean text is not hard, cleaning up wikitext and templates and links to get only text people "spoke" is doubly hard, analyzing user signatures brings only suffer and sorrow. * While in general I agree if a government wants to build this, they can but reality is more complicated and this situation is similar to security. You can never be 100% secure but you can increase the cost of hacking you so much that it would be pointless for a major actor to do it. Governments have a limited budget and dictatorships are by design corrupt and filled with incompotent people [2] and sanctions put another restrain on such governments too so I would not give them such opportunity for oppersion in a silver plate for free, if they really want to, then they must pay for it (which means they can't use that money/resources on oppersing some other groups). * People have said this AI is easy to be gamed, while it's not that easy and the tools you mentioned are limited to English, it's still a big win for the integrity of our projects. It boils down again to increasing the cost. If a major actor wants to spread disinformation, so far they only need to fake their UA and IP which is a piece of cake and I already see that (as a CU) but now they have to mess with UA/IP AND change their methods of speaking (which is one order of magnitude harder than changing IP). As I said, increasing this cost might not prevent it from happening but at least it takes away the ability of oppressing other groups. * This tool never will be the only reason to block a sock. It's more than anything a helper, if CU brings a large range and they are similar but the result is not conclusive, this tool can help. Or when we are 90% sure it's a WP:DUCK, this tool can help too but blocking just because this tool said so would imply a "Minority report" situation and to be honest and I would really like to avoid that. It is supposed to empower CUs. * Banning using this tool is not possible legally, the content of Wikipedia is published under CC-BY-SA and this allows such analysis specially you can't ban an offwiki action. Also, if a university professor can do it, I don't see the point of banning using it by the most trusted group of users (CUs). You can ban blocking based on this tool but I don't think we should block solely based on this anyway. * It has been pointed out by people in the checkuser mailing list that there's no point in logging accessing this tool, since the code is accessible to CUs (if they want to), so they can download and run it on their computer without logging anyway. * There is a huge difference between CU and this AI tool in matters of privacy. While both are privacy sensitive but CU reveals much more, as a CU, I know where lots of people are living or studying because they showed up in my CUs and while I won't tell a soul about them but it makes me uncomfortable (I'm also not implying CUs are not trusted, it's just we should respect people's privacy and avoid "unreasonable search and seizure"[3]) but this tool only reveals a connection between accounts if one of them is linked to a public identity and the other is not which I wholeheartedly agree is not great but it's not on the same level as seeing people's IPs. So I even think in an ideal world where the AI model is more accurate than CU, we should stop using CU and rely solely on the AI instead (important: I'm not implying the current model is better, I'm saying if it was better). This would help us understand why for example fishing for sock puppets with CU is bad (and banned by the policy) but fishing for socks using this AI is not bad and can be a good starting point. In other words, this tool being used right, can reduce check user actions and protect people's privacy instead. * People have been saying