Re: [Wikimedia-l] Responses to en-wp sourcing question
Hi Adam, Pine, Robert, Thank for the suggestions! In particular, Adam's link to Ford, etal., where I read: -- We used Apache's Map Reduce framework on Amazon's Elastic Map Reduce (EMR) cloud computing infrastructure to efficiently extract the history of references to all articles. That sounds like power tools! I've been using more of a clunky bucket chain procedure which only captures part of what has "stuck" in the river. (They downloaded a corpus with all its deletion history.) We do reach some of the same conclusions, but the data are very different (twitter & facebook weren't quite as weighty back in 2012, for example). That said, I suspect it would be much wiser to work on a database dump as they have. The classified version (linked below) is getting more interesting now. Left papers often do better than their circulation figures would suggest, though Brazil & Germany being the notable exceptions. In any case, what's very clear is that on en-wp, *Pitchfork* does much better than the *Poetry Foundation*. >> http://www.creoliste.fr/docs/WikiInSources_cat.pdf << Not to worry, Robert, *Wikipediocracy* barely makes the list... I'll have a look at the research mailing list once I've finished exploring Adam's suggestion, Pine. Thanks to the three of you for taking the time to respond! sashi ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>
[Wikimedia-l] Wikipedia's sourcing
Hello, I thought I would ask if any of the junior or senior researchers here on this mailing list have conducted previous inquiries into Wikipedia's sourcing. I am currently working on a project of determining what proportion of Wikipedia is sourced to newspapers, the military, the Church, social media, etc. The data I've compiled this month, along with a brief write-up, have been posted to Wikipediocracy: http://wikipediocracy.com/2018/08/26/wikipedia-sources-methods/ I imagine I'm reinventing the wheel... such studies have been done before, by the WMF, with power tools (bots), right? Thanks for any corrections / suggestions, sashi ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>
[Wikimedia-l] Category: French Jews on en.wp / GDPR
Another follow-up: == Benjamin Lees wrote: "No, French Christians are just tagged with subcategories of Category:French Christians. The "requiring diffusion" category that you complain of is in fact a way to tell editors that pages in the category should really be in subcategories instead." == Aha! You're right, I had not realized that "diffuse" (disseminate/spread widely) was being used as specialized en-wiki-jargon for "subcategorize". It might be wise to give that hidden category a more descriptive name. I looked into one of the many BLP entries with an unscourced Category:French Jews tag, and found a review of a book they wrote. In that book, the person stated that while they had a Jewish mother, they did not consider themselves Jewish. Given that the category French Jews contains more members than the category French Roman Catholics, and that there are living people included in both categories... I seriously wonder what it is that motivates folks to anonymously tag others in this way (i.e. whether they want to be tagged or not). The Library of Congress, the BNF, Wikidata, etc. don't label people according to religion, unless their notability is due specifically to their religion (e.g. Alfred Dreyfus, Maimonides, etc.). On en.wp people being labeled as Jewish/Catholic, etc. tend to be industrialists, politicians, journalists, bankers etc. I don't think this is "best practice" and I'm afraid I do not agree that en.wp is mostly "getting it right" with regard to this specific question. Fr.WP and Wikidata are doing much better. The relevant section on "data subject" privacy rights in the GDPR (in English) is based on the 1978 French law I cited earlier (though it has become more restrictive since -- see below). As David Gerard noted, it is quite likely that this affects not only Wikipedians (who can petition to have libel/slander concerning their *online identity* (cf. definition of data subject) removed from (inter alia) block logs), but also the *content* of biographies of living people in the encyclopedia. == GDPR (Article 9)== *Processing* of personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person's sex life or sexual orientation shall be prohibited. == As one who has contributed to the projects since 2006, I am posting this here not because I wish to sow dissent, but because I think some quick thinking and corrective action is needed. sashi ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>
Re: [Wikimedia-l] Category: French Jews on en.wp / GPDR
to its volunteers and should not facilitate their contributors (whose age they don't verify) falling afoul of their national laws. Simply excluding members of Category:BLP & Category:French Jews/Catholics/Muslims/Freemasons/etc. from the hidden Category "requiring diffusion" and adding them to the hidden Category "noindex" would go a long way towards protecting privacy rights (at least as far as google is concerned). Finally -- again -- how useful are these automatically generated lists towards advancing the "freedom of knowledge" (as Nathan put it)? To repeat: these categories make it seem that there are/have been 40 times more notable Jewish people and five times more notable Muslims in France than notable Christians . This (derived) "knowledge" is patently false. Now, granted, the purpose of the automatically generated categories is not to come up with a comparative tally of noteworthy people; but I think what this tally shows is in itself revealing: Wikipedians are 40 times more likely to tag notable Jewish people as Jews and 5 times more likely to tag notable Muslims as Muslim than they are to tag notable Christians as Christians. This is worth thinking about for a minute... Why would it be so hard to be humble and respect national laws by making it such that membership in the category would not be diffused concerning living people in countries where such lists are illegal? (As Yaroslav points out, there is no guarantee of the quality of the sourcing). En.wp might be wise to learn from the conservative approach to this question taken by fr.wp and wikidata. I hope this helps to clarify the original post. sashi ps: *Correction*: Contrary to what I mistakenly wrote in my OP there are 96 members of the category French Muslims (not 0). ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>
[Wikimedia-l] Category: French Jews on en.wp / GPDR
Hello, I am writing to ask if there are any plans to render the English Wikipedia compliant with French privacy laws. Currently, if a French high school student goes to a French library, reserves a computer, and types "List of French Jews" into Google, Duckduckgo, or Dogpile, an adhoc en.wikipedia list of over 850 people (approximately half of them living) appears in the #2 position (Category: French Jews). In the first position is the English Wikipedia page "List of French Jews" containing the following text, originally added in 2010, showing that the en.wikipedia community is aware that they are breaking French law: "The French nationality law itself, strongly secular, forbids any statistics or lists based on ethnic or religious membership." A French person tagging biographies of living people in en.wp with the category "French Jews" is a violation of French privacy law which would expose the Wikipedian to a penalty of €300,000 and/or 5 years imprisonment: "Le fait, hors les cas prévus par la loi, de mettre ou de conserver en mémoire informatisée, sans le consentement exprès de l’intéressé, des données à caractère personnel qui, directement ou indirectement, font apparaître les origines raciales ou ethniques, les opinions politiques, philosophiques ou religieuses, ou les appartenances syndicales des personnes, ou qui sont relatives à la santé ou à l’orientation ou à l'identité sexuelle de celles-ci, est puni de cinq ans d’emprisonnement et de 300 000 € d’amende." (source: https://www.cnil.fr/fr/les-sanctions-penales ) There is, to the best of my knowledge, no such category on fr.wp, as people in France are well aware of the law. See also "List of West European Jews" / Category: French People of Jewish descent / Category: French People of Arab descent / Category: French Freemasons (167), Category: French Atheists (93 including a recent president), etc. I noticed in researching the question that the Category "French rapists" (2 BLP) is associated with the hidden category "No indexed", whereas the category "French Jews" (100s of BLP) is associated with the hidden category: "categories requiring diffusion". As a temporary measure (to avoid actively feeding this info into search engines), perhaps categories related to racial/ethnic origins, religious & philosophical opinions could be tagged "No indexed" rather than "requiring diffusion"? The WMF hosts their servers in the US, the Netherlands and will soon also be hosting off-shore in Singapore, which probably leads WMF legal to believe that this grants them immunity from French privacy laws. Nevertheless, I thought I would mention that this is a potentially significant problem going forward. Discussion leading to action correcting this potential avenue of abuse might help the WMF to avoid litigation, given that the current policies on English Wikipedia actively facilitate violation of French laws. (data from petscan.wmflabs.org): French Christians (21 members), French Hindus (17 members), French Buddhists (9 members), French Muslims (0 members), French Jews (862 members). Thank you for your time considering how best to address this problem. sashi ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and https://meta.wikimedia.org/wiki/Wikimedia-l New messages to: Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>