[Wikimedia-l] Copy and Paste Detection Bot
The new and improved version of the copy and detection bot that we at [[WP: MED]] have been using for nearly a year [ https://en.wikipedia.org/wiki/User:EranBot/Copyright here] is nearly ready to be expanded to other topic areas. It can be found here [ https://en.wikipedia.org/wiki/User:EranBot/Copyright/rc]. If you install the common.js code it will give you buttons to click to indicate follow up of concerns. Additionally one can sort the edits in question by WikiProject. We are working to set up auto-archiving such that once concerns are dealt with they will be removed from the main list. We also want to have automatic compilation of data such as the frequency of true positives and false positives generated by the bot. A blacklist of sites that are know mirrors of Wikipedia is here [ https://en.wikipedia.org/wiki/User:EranBot/Copyright/Blacklist]. As this list is improved / expanded the accuracy of the bot will improve. Many thanks to [[User:ערן]] for his amazing work. The bot also has the potential to work in other languages. -- James Heilman MD, CCFP-EM, Wikipedian The Wikipedia Open Textbook of Medicine www.opentextbookofmedicine.com ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Introducing Kourosh Karimkhany, Vice President of Strategic Partnerships
Hi Peter, The complete quote goes: There must be another way to work for the value of free knowledge for the people but to destroy net neutrality and the experience of an open web in the very beginning at the same time. When it comes to schools and other educational organisations in developing countries the project Wikipedia on a USB-Stick was a good idea to start from I think. Something equally usuable for mobiles could be one direction to think. But of course, as the Walled Wikipedia of WP0 this project isn't really giving the full experience of an open and free wikipedia. So it would be a pratical alternative for WP0 (without the dealing with the access providers), but nothing more. Apart from this existing project I described in a former discussion (and in talks with e.g. Jan-Bart and others) that a more political initiative of a public open knowledge project with delepment of a first framework could be a midterm approach. In short: the public knowledge project would define standard framework for content which has to be provided for free to everybody for free use. This could include different knowledge providing entities from public via civil-societal to even free content of commercial providers. Every content could be proved to fit the standards for open knowledge and in different countries different content providers would create the mix. The system would be open and so it would be independent from the access providers. It could be mandatory or non-mandatory for the access providers to offer access to the public open knowledge project (which in essence would be a list of registered websites you have full-functional access to), according to what would be more appropiated for the actual market situation in the country or area. The government could provide subsidies for the cost the access providers have - it would be seen as cost for the cultural intellectual infrastructure of your country (like libraries, museums, schools etc. today.) It would be a mixture between public service and voluntary engagment of civil and commercial players framed by standards which are discussed in an possibly multi-stakeholder forum regularly. Then Wikipedia could be an important knot in a free public knowledge network secured by laws, international cooperations and civil engagement. This, of course, would first make the access providers cry out loud, because of - as they would describe it - unbearable duties for single telecoms. And surely it would need support by international community, government and cooperation between the single access providers. Also, in an absolutist way this would be a violation of net neutrality, but it would be a violation that isn't driven by the intent to develop a market with customers used to pay different prizes for different data types which is the clear intent for which WP0 is misused in reality. Market isn't a solution for everything. A open public knowledge project would establish an area in the web which could be experienced as true publicness, as a truely public place, created, operated and sustained by the triangle which makes the public (state-people-business). It would be like a public web inside the internet. Considering the commercialisation of the internet and the access to it that could be an important counterbalance to the ongoing development. Well, this is just a quick thought and surely as ambitious as WP0 is in its way, but its not always about only the ambition, but also about the path you walk to reach the then version of what you thought is right in the beginning. This project would be a real piece of work in strategic multi-partnership and not some cheap play with some access providers looking to enrichen their marketing bouquet with the beautiful Wikipedia flower. It would truely mean to take all our values seriously and work on a partnership that puts Wikipedia in the center of a network of free knowledge that would deserve that name. It would mean to become an grown-up organisation taking strategic professional care of the field it works and leads in - free knowledge. Apart from that quick idea I'm also not the only one this question should be asked. And apart from all possible answers, WP0 still stays the wrong path. Some things are already wrong even before you learn that their numbers also don't work out. In the end WP0 is a tiny example about the ethos of WMF. Do you believe market and entrepreneurship is always good for your common target (like e.g. free knowledge) or does even something anarchistic like the web has some structural framework - even unrecognized in its beginnings - that make sure that openess is possible? net neutrality isn't a religion (like some people here havong no godd arguments on their own try to phrase), but net neutrality could be an important piece of the framework which is needed to balance a network structure which is ruled by the governments, by the companies and - happily - by the people in the same time. So far some
Re: [Wikimedia-l] [Wikimedia Announcements] New Wikimedia Foundation report on activities in 2014
Still, in my assessment it is lacking on concrete details. There are many terms that are coined and movements cited which are not definitively explained, in some cases with hints that the departments doing the reporting have not themselves yet arrived at precise meaning. I suppose that, like the entree to the the full-course meal, this is the limitation to the medium: something to digest ahead of the full-course annual plan. An overall sense is one of transition. On Thu, Apr 2, 2015 at 8:19 PM, Risker risker...@gmail.com wrote: On 2 April 2015 at 17:48, Andreas Kolbe jayen...@gmail.com wrote: On Thu, Apr 2, 2015 at 8:31 PM, Katherine Maher kma...@wikimedia.org wrote: Hi all, Today the Wikimedia Foundation published a report on its activities in calendar year 2014. [...] Although the information in the report was originally gathered in response to an internal Foundation need, we planned to make it public as a report from the very beginning. It is intended to be relatively candid, sharing insight into where teams feel they have strengths and where they feel there are development areas. [...] We hope you find it interesting, and welcome your feedback. Thanks, Katherine [1] https://en.wikipedia.org/wiki/Blind_men_and_an_elephant [2] Thanks to everyone at the Foundation who contributed so much great information to their various teams sections. And a special thanks to Juliet Barbara and Heather Walls who wrote and produced the whole thing! Thanks. This looks indeed like a candid report. If it's an indication of a change in communication style, I like it. Good to have it available on Meta as well as in pdf format (I think the pdf is very nicely done). I agree, pretty much. This is probably the best 'big picture look at the WMF I have seen: accomplishments, plans, honest assessments of challenges. Thanks very much! Risker/Anne ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Announcement: WMF to file suit against the NSA
Okay, but seriously, please stop resurrecting this thread. If you think it's important that something be done, start a new one, and *actually suggest something* rather than just copying articles from somewhere else. Austin On Fri, Apr 3, 2015 at 1:58 AM, Andreas Kolbe jayen...@gmail.com wrote: Article in Eurasianet today: Wikipedia Founder Distances Himself from Kazakhstan PR Machine http://www.eurasianet.org/node/72831 ---o0o--- [...] On March 20, Wikipedia founder Jimmy Wales hosted an Ask Me Anything http://www.reddit.com/r/IAmA/comments/2zpkxx/we_are_jameel_jaffer_of_the_aclu_wikipedia/cpl4maq conversation (AMA) on Reddit, a social-networking platform. Before long the audience was questioning Wales’s and Wikipedia’s roles in helping to improve Kazakhstan’s image. Back in 2011, Wales awarded http://www.eurasianet.org/node/66343 a once-and-future Kazakh government employee, Rauan Kenzhekhanuly, the inaugural “Wikipedian of the Year” for his work with WikiBilim, a Kazakh-language platform criticized both for receiving state funds and for publishing multiple articles toeing the authoritarian government’s line. At the time, Wales told EurasiaNet.org, “As far as I know, the WikiBilim organization is not politicized.” But during the AMA, Wales backpedaled on his decision to name Kenzhekhanuly the first Wikipedian of the Year. Wales was on the receiving end of a fresh round of criticism last year when Kenzhekhanuly was named deputy governor of Kazakhstan’s Kyzylorda region. During the AMA, a commenter asked Wales if he would have bestowed the award had he known Kenzhekhanuly would go on to serve as deputy governor. “If I had known in 2011 that someone would get a job that I disapprove of in 2014, would I refuse to give them an award in 2011?” Wales responded. “Yes, I would have refused to give that award.” Wales also clarified that Kenzhekhanuly “was not a government official” at the time of the award – which is, technically, true. However, according to Kenzhekhanuly’s LinkedIn profile https://www.linkedin.com/pub/rauan-kenzhekhanuly/24/8b7/b16, before receiving the award he had served both as a policy adviser to the governor in Kazakhstan’s Mangystau region, as well as first secretary at Kazakhstan’s embassy in Moscow. After the AMA, Wales said by email that he was “not aware” Kenzhekhanuly had held those positions. [...] ---o0o--- ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Copy and Paste Detection Bot
Hi, James. Is the source code available anywhere? IF you want to try your bot in other languages, I could help you with testing in Russian Wikipedia :) Best regards. rubin16 2015-04-03 12:07 GMT+03:00 James Heilman jmh...@gmail.com: The new and improved version of the copy and detection bot that we at [[WP: MED]] have been using for nearly a year [ https://en.wikipedia.org/wiki/User:EranBot/Copyright here] is nearly ready to be expanded to other topic areas. It can be found here [ https://en.wikipedia.org/wiki/User:EranBot/Copyright/rc]. If you install the common.js code it will give you buttons to click to indicate follow up of concerns. Additionally one can sort the edits in question by WikiProject. We are working to set up auto-archiving such that once concerns are dealt with they will be removed from the main list. We also want to have automatic compilation of data such as the frequency of true positives and false positives generated by the bot. A blacklist of sites that are know mirrors of Wikipedia is here [ https://en.wikipedia.org/wiki/User:EranBot/Copyright/Blacklist]. As this list is improved / expanded the accuracy of the bot will improve. Many thanks to [[User:ערן]] for his amazing work. The bot also has the potential to work in other languages. -- James Heilman MD, CCFP-EM, Wikipedian The Wikipedia Open Textbook of Medicine www.opentextbookofmedicine.com ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Copy and Paste Detection Bot
Hi James I often suspect copy-paste and find exact matches of the text elsewhere. However, whereas one can painstakingly (unless there is a trick that I am not aware of) ascertain when text was enetered into an article, it is not always possible to know when the other text first appeared on the internet to know for sure who coppied who. From my limited knowledge, I believe that some trace of the date of upload must be retained somewhere in the code - will this bot be able to pick up on that and provide a date? Thanks and congratulations to all involved and for sharing. Regards, Rui 2015-04-03 11:07 GMT+02:00 James Heilman jmh...@gmail.com: The new and improved version of the copy and detection bot that we at [[WP: MED]] have been using for nearly a year [ https://en.wikipedia.org/wiki/User:EranBot/Copyright here] is nearly ready to be expanded to other topic areas. It can be found here [ https://en.wikipedia.org/wiki/User:EranBot/Copyright/rc]. If you install the common.js code it will give you buttons to click to indicate follow up of concerns. Additionally one can sort the edits in question by WikiProject. We are working to set up auto-archiving such that once concerns are dealt with they will be removed from the main list. We also want to have automatic compilation of data such as the frequency of true positives and false positives generated by the bot. A blacklist of sites that are know mirrors of Wikipedia is here [ https://en.wikipedia.org/wiki/User:EranBot/Copyright/Blacklist]. As this list is improved / expanded the accuracy of the bot will improve. Many thanks to [[User:ערן]] for his amazing work. The bot also has the potential to work in other languages. -- James Heilman MD, CCFP-EM, Wikipedian The Wikipedia Open Textbook of Medicine www.opentextbookofmedicine.com ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe -- _ Rui Correia Advocacy, Human Rights, Media and Language Work Consultant Bridge to Angola - Angola Liaison Consultant Mobile Number in South Africa +27 74 425 4186 Número de Telemóvel na África do Sul +27 74 425 4186 ___ ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
[Wikimedia-l] [Wikimedia Announcements] The Signpost -- Volume 11, Issue 13 -- 01 April 2015
In focus: WMF's latest strategy document shows successes, vagueness, and the need for better data http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-04-01/In_focus In the media: Wiki-PR duo bulldoze a piñata store; Wifione arbitration case; French parliamentary plagiarism http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-04-01/In_the_media Traffic report: All over the place http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-04-01/Traffic_report Featured content: Stop Press. ''Marie Celeste'' Mystery Solved. Crew Found Hiding In Wardrobe. http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-04-01/Featured_content Single page view http://en.wikipedia.org/wiki/Wikipedia:Wikipedia Signpost/Single/2015-04-01 PDF version http://en.wikipedia.org/wiki/Book:Wikipedia_Signpost/2015-04-01 https://www.facebook.com/wikisignpost / https://twitter.com/wikisignpost -- Wikipedia Signpost Staff http://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost ___ Please note: all replies sent to this mailing list will be immediately directed to Wikimedia-l, the public mailing list of the Wikimedia community. For more information about Wikimedia-l: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l ___ WikimediaAnnounce-l mailing list wikimediaannounc...@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimediaannounce-l ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
[Wikimedia-l] Copy and Paste Detection Bot
1) Yes the source code is available. User:Eran has posted it here https://github.com/valhallasw/plagiabot 2) This bot ONLY works on new edits within a couple of hours of them occurring. This reducing the number of false positives. It DOES NOT look at old edits. 3) This requires human follow up and common sense. One needs to make sure that a) the source is not PD/CCBYSA b) that it is not wiki text that has been moved around c) that the authors of both are not the same, etc 4) True positive rate is around 50% which is from my perspective good / useful. This bot has flagged a lot of copyright issues would have been missed otherwise. -- James Heilman MD, CCFP-EM, Wikipedian The Wikipedia Open Textbook of Medicine www.opentextbookofmedicine.com ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Announcing: The Wikipedia Prize!
Hi Brian, 2015-03-30 0:25 GMT+02:00 Brian reflect...@gmail.com: Although the initial goal of the Netflix Prize was to design a collaborative filtering algorithm, it became notorious when the data was used to de-anonymize Netflix users. Researchers proved that given just a user's movie ratings on one site, you can plug those ratings into another site, such as the IMDB. You can then take that information, and with some Google searches and optionally a bit of cash (for websites that sell user information, including, in some cases, their SSN) figure out who they are. You could even drive up to their house and take a selfie with them, or follow them to work and meet their boss and tell them about their views on the topics they were editing. somewhat tangentially, and to bring back this to topic to a more scientific setting I would like to point out that there has already been reasearch in the past on this topic. I highly recommend reading the following paper: Lieberman, Michael D., and Jimmy Lin. You Are Where You Edit: Locating Wikipedia Contributors through Edit Histories. ICWSM. 2009. (PDF http://www.pensivepuffin.com/dwmcphd/syllabi/infx598_wi12/papers/wikipedia/lieberman-lin.YouAreWhereYouEdit.ICWSM09.pdf) For those of you that don't want to read the whole paper, you can find a recap of the most relevant findings in this presentation by Maurizio Napolitano: http://www.slideshare.net/napo/social-geography-wikipedia-a-quick-overwiew The main idea is associating spatial coordinates to a Wikipedia articles when possible, this articles are called geopages. Then you extract from the history of articles the users which have edited a geopage. If you plot the geopages edited by a given contributor you can see that they tend to cluster, so you can define an edit area. The study finds that 30-35% of contributors concentrate their edits in an edit area smaller than 1 deg^2 (~12,362 km^2, approximately the area of Connecticut or Northern Ireland[1] (thanks, Wikipedia!)). For another free/libre project with a geographic focus like OpenStreetMap this is even more marked, check out for example this tool «“Your OSM Heat Map” (aka Where did you contribute?)»[2] by Pascal Neis. This, of course, is not a straightforward de-anonimization but this methods work in principle for every contributor even if you obfuscate their IP or username (provided that you can still assign all the edits from a given user to a unique and univocal identifier) C [1] https://en.wikipedia.org/wiki/Square_degree [2a] http://yosmhm.neis-one.org/ [2b] http://neis-one.org/2011/08/yosmhm/ ___ Wikimedia-l mailing list, guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe