[Wikimedia-l] [Reminder] Language Engineering IRC Office Hour on November 13, 2013 at 1500 UTC
Hello, A quick reminder that the Wikimedia Language Engineering team will be hosting an IRC office hour from 1500 to 1600UTC later today on #wikimedia-office (FreeNode). Please see below for the event details. Thanks Runa -- Forwarded message -- From: Runa Bhattacharjee rbhattachar...@wikimedia.org Date: Thu, Nov 7, 2013 at 11:40 AM Subject: Language Engineering IRC Office Hour on November 13, 2013 at 1500 UTC To: MediaWiki internationalisation mediawiki-i...@lists.wikimedia.org, Wikimedia Mailing List wikimedia-l@lists.wikimedia.org, Wikimedia developers wikitec...@lists.wikimedia.org, wikitech-ambassad...@lists.wikimedia.org [x-posted] Hello, The Wikimedia Language Engineering team will be hosting an IRC office hour on Wednesday, November 13, 2013 between 15:00 - 16:00 UTC on #wikimedia-office. (See below for timezone conversion and other details.) We will be talking about some of our recent and upcoming projects and then taking questions for the remaining time. We also look forward to hear about anything that needs our attention. Questions and other concerns can also be sent to me directly before the event. See you there! Thanks Runa === Event Details === What: WMF Language Engineering Office hour When: November 13, 2013 (Wednesday). 1500-1600 UTC http://www.timeanddate.com/worldclock/fixedtime.html?iso=20131113T1500 Where: IRC Channel #wikimedia-office on FreeNode -- Language Engineering - Outreach and QA Coordinator Wikimedia Foundation ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Copyright infringement - The real elephant in the room
Hoi, Seriously we should never ever be ruled be panic.What you see is bad, no doubt but the notion that we should dump everything because of the latest issue to come along is way overboard. - by stopping the flow on projects like Visual Editor you break dependencies for the work of many developers - what you have noticed is for only one Wikipedia not all of them - we do need more mature discussion software what we have is horrible - such dramatics only have you go away and upset others it does not solve things - the dramatics detract me from your message - my hobby horse needs more attention too and I think my argument is better ... Anyway, it would be nice when someone looks at the tool with an eye of making it happen and making it scale. When it doesn't it becomes a less attractive option to pursue. Thanks, GerardM On 13 November 2013 08:40, James Heilman jmh...@gmail.com wrote: The Wikimedia Foundation needs to wake up and deal with the real tech elephant in the room. Our primary issue is not a lack of FLOW, a lack of a visual editor, or a lack of a rapidly expanding education program. Our biggest issue is copyright infringement. We have had the Indian program, we have had issues with the Education program, and I have today come across a user who has made nearly 20,000 edits to 1,742 article since 2006 which appear to be nearly all copy and pasted from the sources he has used. https://en.wikipedia.org/wiki/User_talk:DrMicro#Copyright_infringement This has seriously shaken my faith in Wikipedia. This is especially devastating as there is a tech solution that would have prevented it. The efforts are being worked on by volunteers here https://en.wikipedia.org/wiki/Wikipedia:Turnitin and has been since at least March of 2012. We NEED all tech resource at the foundation thrown at this project. Other less important project like FLOW and the visual editor need to be put on hold to develop this tool. -- James Heilman MD, CCFP-EM, Wikipedian The Wikipedia Open Textbook of Medicine www.opentextbookofmedicine.com ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Copyright infringement - The real elephant in the room
On 11/13/2013 02:40 AM, James Heilman wrote: The Wikimedia Foundation needs to wake up and deal with the real tech elephant in the room. Our primary issue is not a lack of FLOW, a lack of a visual editor, or a lack of a rapidly expanding education program. Our biggest issue is copyright infringement. I don't really agree with that. It is a serious issue, but I would put NPOV (in the face of active threats such as companies paying for publicity on Wikipedia) and growing the editor community higher. We also have solutions to address it (not perfectly, true), both preventing the problem and dealing with it after the fact * MadmanBot (https://en.wikipedia.org/wiki/User:MadmanBot) (mentioned at Wikipedia:TurnItIn, and a major technical tool against copyright infringement). * Clear policies against copyright infringement * Dealing with copyright violations (https://en.wikipedia.org/wiki/Wikipedia:Text_Copyright_Violations_101) * Finally, the DMCA ensures the foundation is not liable as long as they promptly respond to notifications (which of course we want them to anyway). We have had the Indian program, we have had issues with the Education program, and I have today come across a user who has made nearly 20,000 edits to 1,742 article since 2006 which appear to be nearly all copy and pasted from the sources he has used. https://en.wikipedia.org/wiki/User_talk:DrMicro#Copyright_infringement This has seriously shaken my faith in Wikipedia. That is indeed disturbing, and I'm glad you found it. This is especially devastating as there is a tech solution that would have prevented it. The efforts are being worked on by volunteers here https://en.wikipedia.org/wiki/Wikipedia:Turnitin and has been since at least March of 2012. We NEED all tech resource at the foundation thrown at this project. Other less important project like FLOW and the visual editor need to be put on hold to develop this tool. I don't agree that all tech resources should be used for this. However, there may be room for enhancing MadmanBot (e.g. as a GSOC or OPW project). A significant problem with TurnItIn is that is proprietary, and can not be customized by anyone in the movement. The fact that it is proprietary also means it can never be port of the main infrastructure, nor run on Wikimedia Labs. Matt Flaschen ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
[Wikimedia-l] Recovering wikipedia.it: top 1 trademark priority per it.wiki poll
Hello all (cc Yana, Michelle, Geoff, legal, board). In a formal poll[1] proposed by two admins, the it.wiki community has decided the following: «The Italian Wikipedia community considers that, among all the actions in defense of its name (as in public image and trademark) pertaining to the Wikimedia Foundation, the maximum priority should be given to recovering the domain wikipedia.it (and if possible its sisters) and therefore asks WMF to follow one or more of the legal paths suggested by the experts to that purpose, using the funds assigned by the WMF board for 2013-14.» https://it.wikipedia.org/wiki/Wikipedia:Sondaggi/Recupero_domini_a_nome_Wikipedia The decision has been taken 132:1:4 which seems to be the largest absolute margin ever reached by a poll on it.wiki; the funds in question are the $700K to upgrade the trademark portfolio.[2] Quick background: * the domain wikipedia.it has been registered by a commercial hosting provider in 2003; WMIT members and others have been in contact with him since 2004 but he never replied; * since 2006, the domain displays an ad banner hosted by Yepa on top (in 2006-9 it also trapped the user in it via a frame), of which the WMF is aware since 2006-12-14 (and has been reminded several times): this makes many users[3] who believe it our official domain think that Wikipedia is a for-profit effort. WMIT, to serve the community's concerns, has sent an official complaint to NIC.it in 2009 but the registration is formally correct (via their ad hoc Wikipedia Italy Association) so only the trademark owner (WMF) can proceed with one of the 3 remaining legal tools (including challenging procedure and arbitrage), as summarised by a document kindly provided to the WMF by .mau., one of the authors of the NIC.it rules. The last known concrete action taken by the WMF has been the extension of the trademark to Italy in 2007, by Florence (thanks Florence!). Nemo [1] The last resort decision-making official tool which on it.wiki overrides any decision past and future until revoked by another poll with same requirements. [2] https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Plan_2013-14#2013-14_Plan_Finances_and_Staffing [3] About 43 visits per minute that we were able to count via a JavaScript trick by Pietrodn in april 2009. ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Copyright infringement - The real elephant in the room
On Tue, Nov 12, 2013 at 11:40 PM, James Heilman jmh...@gmail.com wrote: The Wikimedia Foundation needs to wake up and deal with the real tech elephant in the room. Our primary issue is not a lack of FLOW, a lack of a visual editor, or a lack of a rapidly expanding education program. Our biggest issue is copyright infringement. We have had the Indian program, we have had issues with the Education program, and I have today come across a user who has made nearly 20,000 edits to 1,742 article since 2006 which appear to be nearly all copy and pasted from the sources he has used. https://en.wikipedia.org/wiki/User_talk:DrMicro#Copyright_infringement This has seriously shaken my faith in Wikipedia. This is especially devastating as there is a tech solution that would have prevented it. The efforts are being worked on by volunteers here https://en.wikipedia.org/wiki/Wikipedia:Turnitin and has been since at least March of 2012. We NEED all tech resource at the foundation thrown at this project. Other less important project like FLOW and the visual editor need to be put on hold to develop this tool. Relevant info on the subject of copyvio is the recent plagiarism study by the Education Program team. They looked different types of users (students, newbies, experienced editors, admins) and compared them. Results were published on Meta at https://meta.wikimedia.org/wiki/Research:Plagiarism_on_the_English_Wikipediaand also discussed in the last WMF Metrics Activities meeting: https://meta.wikimedia.org/wiki/Metrics_and_activities_meetings/2013-11-07 AFAIK this is the best data we have about how often different kinds of editors close paraphrase or outright copy/paste. ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Copyright infringement - The real elephant in the room
On Wed, Nov 13, 2013 at 8:40 AM, James Heilman jmh...@gmail.com wrote: Our biggest issue is copyright infringement. We have had the Indian program, we have had issues with the Education program, and I have today come across a user who has made nearly 20,000 edits to 1,742 article since 2006 which appear to be nearly all copy and pasted from the sources he has used. https://en.wikipedia.org/wiki/User_talk:DrMicro#Copyright_infringement This has seriously shaken my faith in Wikipedia. Back in 2007 we found out a user on it.wp, a former sysop, with more than 40,000 edits that used to copy-paste from his sources, often outdated. He was banned, and the community made a great effort to cleanup the articles he contributed to (and damn it was hard, because those articles had a long history after his edits). And in the following years, we had other similar cases, you can find a selection here: https://it.wikipedia.org/wiki/Progetto:Cococo/Controlli_conclusi There are bots that go and look whether a newly inserted block of text is already present somewhere else, it doesn't find everything (of course it won't find things copied from a printed book), but sooner or later serial copyviolers get caught, and the fall from hero to zero is sooo quick. At the end of the day, I think copyvios have always been taken seriously, so that I don't remember big problems with that, while there have always been more problems with libel, privacy, and editor retention. Marco (Cruccone) ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Copyright infringement - The real elephant in the room
Marco: I agree, we had also issues on the Dutch Wikipedia - these have been around for ages, the English Wikipedia is just less aware of them. Often, copypasting in the same language is caught easily - between different languages is much harder and persistent. There are many people, including experienced editors, that think translating from random sources is OK. It is no new problem, and chapters have indeed been working on getting this understanding of what free licenses really mean more widely accepted in the general audience. Not something that is easily measured of course. Technical solutions sound great, but are only catching a small amount inside the same language. Steven: I understand this research was limited to the English Wikipedia (where most of the plagiarism will be in the same language). It would not strike me out of the realm of realism to assume this might be very different for other languages than English. It also says little about the problem in general of course. For those who don't want to click on links to get information, it basically says (simplification alert) that they don't have any indication that the US Canada education program makes the plagiarism problem on the English Wikipedia any worse than it already is. Anyway: I think this problem is more prominently there in non-English communities, and that technical solutions are not going to be the answer there. An educational answer is more likely to be successful, focusing on explaining people how Wikipedia works and doesn't work, and what are do's and don'ts. This doesn't have to be an education program like executed in the US, but basically all outreach programs as executed by chapters, user groups, thematic organizations or groups of volunteers can contribute to this. This is already happening in most countries. In some countries (like Germany ;-) ) politicians are doing the work for us, explaining how evil plagiarism is and how it works by firing government ministers over it :) Best, Lodewijk 2013/11/13 Marco Chiesa chiesa.ma...@gmail.com On Wed, Nov 13, 2013 at 8:40 AM, James Heilman jmh...@gmail.com wrote: Our biggest issue is copyright infringement. We have had the Indian program, we have had issues with the Education program, and I have today come across a user who has made nearly 20,000 edits to 1,742 article since 2006 which appear to be nearly all copy and pasted from the sources he has used. https://en.wikipedia.org/wiki/User_talk:DrMicro#Copyright_infringement This has seriously shaken my faith in Wikipedia. Back in 2007 we found out a user on it.wp, a former sysop, with more than 40,000 edits that used to copy-paste from his sources, often outdated. He was banned, and the community made a great effort to cleanup the articles he contributed to (and damn it was hard, because those articles had a long history after his edits). And in the following years, we had other similar cases, you can find a selection here: https://it.wikipedia.org/wiki/Progetto:Cococo/Controlli_conclusi There are bots that go and look whether a newly inserted block of text is already present somewhere else, it doesn't find everything (of course it won't find things copied from a printed book), but sooner or later serial copyviolers get caught, and the fall from hero to zero is sooo quick. At the end of the day, I think copyvios have always been taken seriously, so that I don't remember big problems with that, while there have always been more problems with libel, privacy, and editor retention. Marco (Cruccone) ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Copyright infringement - The real elephant in the room
Marco Chiesa, 13/11/2013 10:21: There are bots that go and look whether a newly inserted block of text is already present somewhere else, [...] Rectius: there *used* to be a bot (RevertBot, Lusumbot). The program https://www.mediawiki.org/wiki/Manual:Pywikibot/copyright.py has been stopped when search engines changed their limits and Lusum has been waiting for the WMF's Yahoo! BOSS key, needed to run the bot, for a while. Nemo ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Copyright infringement - The real elephant in the room
On Wed, Nov 13, 2013 at 2:37 AM, Matthew Flaschen matthew.flasc...@gatech.edu wrote: A significant problem with TurnItIn is that is proprietary, and can not be customized by anyone in the movement. The fact that it is proprietary also means it can never be port of the main infrastructure, nor run on Wikimedia Labs. Another significant issue is the False Positive factor that is created by our overwhelming popularity. Frankly, we're mirrored all over the place. And tools like Turnitin find the mirrors too. It's not an easy problem to solve. I was on the team that looked at this a couple of years back - it's just not simple, and there are complex challenges. *Philippe Beaudette * \\ Director, Community Advocacy \\ Wikimedia Foundation, Inc. T: 1-415-839-6885 x6643 | phili...@wikimedia.org | : @Philippewikihttps://twitter.com/Philippewiki ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Copyright infringement - The real elephant in the room
On 11/13/2013 05:16 AM, Philippe Beaudette wrote: On Wed, Nov 13, 2013 at 2:37 AM, Matthew Flaschen matthew.flasc...@gatech.edu wrote: A significant problem with TurnItIn is that is proprietary, and can not be customized by anyone in the movement. The fact that it is proprietary also means it can never be port of the main infrastructure, nor run on Wikimedia Labs. Another significant issue is the False Positive factor that is created by our overwhelming popularity. Frankly, we're mirrored all over the place. And tools like Turnitin find the mirrors too. It's not an easy problem to solve. I was on the team that looked at this a couple of years back - it's just not simple, and there are complex challenges. Yes, an intelligent solution would take into account when the mirror was first indexed (or ideally first published), and when the Wikipedia article was edited, to reduce false positives requiring manual intervention. Matt Flaschen ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Copyright infringement - The real elephant in the room
Hoi I know several authors who publish and use their original text to publish on Wikipedia as well.. This is another source of false positives because they have the copyright to the original source... To recognise this you have to be even more sophisticated. The point I want to make is that having a tool that is KNOWN to be deficient in specific ways can still be a huge advantage over not having a tool at all. So PLEASE lets not make perfection the enemy of the good. Thanks, GerardM On 13 November 2013 11:23, Matthew Flaschen matthew.flasc...@gatech.eduwrote: On 11/13/2013 05:16 AM, Philippe Beaudette wrote: On Wed, Nov 13, 2013 at 2:37 AM, Matthew Flaschen matthew.flasc...@gatech.edu wrote: A significant problem with TurnItIn is that is proprietary, and can not be customized by anyone in the movement. The fact that it is proprietary also means it can never be port of the main infrastructure, nor run on Wikimedia Labs. Another significant issue is the False Positive factor that is created by our overwhelming popularity. Frankly, we're mirrored all over the place. And tools like Turnitin find the mirrors too. It's not an easy problem to solve. I was on the team that looked at this a couple of years back - it's just not simple, and there are complex challenges. Yes, an intelligent solution would take into account when the mirror was first indexed (or ideally first published), and when the Wikipedia article was edited, to reduce false positives requiring manual intervention. Matt Flaschen ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Copyright infringement - The real elephant in the room
On Wed, Nov 13, 2013 at 11:44 AM, Gerard Meijssen gerard.meijs...@gmail.com wrote: Hoi I know several authors who publish and use their original text to publish on Wikipedia as well.. This is another source of false positives because they have the copyright to the original source... To recognise this you have to be even more sophisticated. Actually, we consider these as copyvios, we delete the text straight away, and we tell the editor if you're the author write to OTRS. Of course, if the text is already somewhere else under a compatible free-license, we don't need this. Until you can't be sure that User:MrX is actually the physical person MrX, we need to protect the author's right. Marco ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Copyright infringement - The real elephant in the room
On Wed, 13 Nov 2013, Marco Chiesa wrote: On Wed, Nov 13, 2013 at 11:44 AM, Gerard Meijssen gerard.meijs...@gmail.com wrote: Hoi I know several authors who publish and use their original text to publish on Wikipedia as well.. This is another source of false positives because they have the copyright to the original source... To recognise this you have to be even more sophisticated. Actually, we consider these as copyvios, we delete the text straight away, and we tell the editor if you're the author write to OTRS. Of course, if the text is already somewhere else under a compatible free-license, we don't need this. Until you can't be sure that User:MrX is actually the physical person MrX, we need to protect the author's right. But an automated tool can not know whether OTRS verification has happened or not. Chris McKenna cmcke...@sucs.org www.sucs.org/~cmckenna The essential things in life are seen not with the eyes, but with the heart Antoine de Saint Exupery ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Copyright infringement - The real elephant in the room
On Wed, 13 Nov 2013, Gerard Meijssen wrote: The point I want to make is that having a tool that is KNOWN to be deficient in specific ways can still be a huge advantage over not having a tool at all. So PLEASE lets not make perfection the enemy of the good. The problem isn't that we're waiting for perfection. We're waiting for the proportion of false positives and false negatives to fall to a level where don't overwhelm the true positives. Chris McKenna cmcke...@sucs.org www.sucs.org/~cmckenna The essential things in life are seen not with the eyes, but with the heart Antoine de Saint Exupery ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Copyright infringement - The real elephant in the room
On Wed, Nov 13, 2013 at 12:36 PM, Chris McKenna cmcke...@sucs.org wrote: But an automated tool can not know whether OTRS verification has happened or not. We put something like {{OTRS verified}} in the article's talk page, something saying: Part of the text comes from website X, ticket 1234567890. And if the author wants to use his work for many articles, we tell him/her to put the template in all his/her articles' talk page. Marco ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Copyright infringement - The real elephant in the room
On Wed, Nov 13, 2013 at 12:39 PM, Chris McKenna cmcke...@sucs.org wrote: The problem isn't that we're waiting for perfection. We're waiting for the proportion of false positives and false negatives to fall to a level where don't overwhelm the true positives. To avoid false positives from mirrors, the best option is to compare a text as soon as it is saved. Also, you exclude certain websites from the comparison because you know they're the mirrors, you exclude rollbacks, ... Then, it is better to have a human checking that it is really a copyvio (it could well be a public domain text, or another Wikipedia article). Marco ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Copyright infringement - The real elephant in the room
On 13 November 2013 07:40, James Heilman jmh...@gmail.com wrote: ... Our biggest issue is copyright infringement. ... Thanks for raising this James. Yes, this is an issue but if you are gunning for elephants this month, I really don't think the copyright elephant is the biggest one in the herd. As a practical example of the tools we already have in place, yesterday I was facilitating an edit-a-thon for women in science with King's College London and we had one of the example stubs we had created on the English Wikipedia up on a projector. Within literally *minutes* of creation it had been (correctly) flagged by a bot as a possible copyright violation as some of the text had been cut past from King's own website; one of the participants quickly re-wrote it using their own words. As the communications manager was sitting next to me at the time, no doubt she found this rather reassuring, even though in parallel she was asking about how best to officially release text. :-) We have a more complex problem with how images uploaded to Wikimedia Commons can be flagged where they match images found elsewhere on the internet, this is something that may be done by a future bot but we might need to partner with someone like Google Images or Tineye to make this truly effective. Having run my own experimental bots on this area, I would love to see this become a funded project. PS with regard to OTRS verification, we could do with better standards for verification, at the moment volunteers like myself are left to use our own judgement about what checks to make. I tend to double check text or images being released with Google, just in case, as well as doing whois checks on email domains. These sorts of checks could become part of OTRS guidelines and would make the reliability of OTRS tickets a notch higher. Cheers, Fae -- fae...@gmail.com http://j.mp/faewm ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
[Wikimedia-l] [Wikimedia Announcements] Wikimedia UK report, September 2013
Hello everyone, Please find below the Wikimedia UK monthly reporthttps://wikimedia.org.uk/wiki/Reports for the period 1st to 30th September 2013. If you want to keep up with the chapter's activities as they happen, please subscribe to our bloghttp://blog.wikimedia.org.uk/ , join a UK mailing listhttps://lists.wikimedia.org/mailman/listinfo/wikimediauk-l, and/or follow us on Twitter http://twitter.com/wikimediauk. If you have any questions or comments, please drop us a line on this report's talk pagehttps://wikimedia.org.uk/w/index.php?title=Talk:Reports/2013/Septemberaction=editredlink=1. If you prefer to read the page on wiki, you can find it at https://wikimedia.org.uk/wiki/Reports/2013/September Thanks and regards, Stevie Program activities Community - On 7 September 2013, Wikimedians and amateur photographers gathered in the Grade II-listed St Michael’s Church for Wikipedia Takes Chester, a day-long photo scavenger hunt, held to increase participation in Wiki Loves Monuments UK. At Wikipedia Takes Chester, and at Wikipedia Takes Coventry this time last year, many people attended who would not normally come along to Wikimedia events. A huge range of photographers attend these events, from the point-and-shoot-wielding amateur to the very-expensive-DSLR-toting professionals. GLAM activities - Wikimedia UK is pleased to announce its new partnership with York Museums Trust http://www.yorkmuseumstrust.org.uk/Page/Index.aspx. The partnership, which was confirmed in September, is to be supported by the recruitment of a Wikimedian in Residence who will promote open access to collections data across the trust. - We were also delighted to announce in September that The Royal Society is recruiting a Wikimedian in Residence. Education activities This summer Wikimedia UK embarked on a systematic campaign to raise awareness of the assistance the charity can offer to university students towards the creation of new student societies associated with Wikipedia and other Wikimedia projects. With support from Wikimedia UK, Wikipedia student societies have already been established at Imperial College London and Cambridge University. We are keen to see this sort of activity develop on other campuses across the UK. We’re in the process of discussing the possibility of new Wikipedia students’ societies developing at a number of universities in Cardiff, Dundee, Manchester, Hull/Scarborough, Swansea and London. Please help us spread this information across university campuses throughout the UK, or if you’re a university student and a Wikipedian just email educat...@wikimedia.org.uk and we’ll take it from there We were also preparing for the delivery of the EduWiki Conference 2013https://wikimedia.org.uk/wiki/EduWiki_Conference_2013 on 1-2 November. Technology In September, the WMUK wiki was migrated from the Wikimedia Foundation's datacentre to Wikimedia UK's. Details of the migration can be found on the WMUK website https://wikimedia.org.uk/wiki/WMUK_wiki_migration, which is now at the new address of wikimedia.org.uk Other activities 15 October is recognised as Ada Lovelace Day and is dedicated to celebrating the contributions of women in science, technology, engineering and mathematics (STEM). Wikimedia UK was proud to be a part of those celebrations. We have delivered many events about Women in Science in October. For example, along with Jisc, we have supported an editathon focusing on women in science which took place at the University of Oxford. As if to illustrate that the ongoing campaign to encourage greater recognition of women in STEM, BBC Radio 4′s Woman’s Hour show featured a discussion of this topic, featuring our very own Daria Cybulska. UK readers of this blog can listen to the show here http://www.bbc.co.uk/programmes/b03cmt4n. The section about Ada Lovelace Day begins after around 7:45 of the recording. Wiki Loves Monuments - September 2013 will always be the month the UK took part in Wiki Loves Monuments for the first time. The first few minutes of 1 September were nervous. Would everything work? Would we have long to wait for our first upload? What did the month ahead hold? Thirty days later we had 11,995 photos from 573 people - great success! To ensure participants could ask questions, it was possible for them to comment on the Wiki Loves Monuments UK website, get in touch via Twitter or Facebook, or send an email which would be picked up through OTRS. Microgrants Information about microgrants that are currently running, and how to submit a microgrant application of your own, are at Microgrants/Applicationshttps://wikimedia.org.uk/wiki/Microgrants/Applications . UK press coverage (and coverage of UK projects activities) - Storming Wikipedia - Project tackles the site's 'women problem' - Huffington Posthttp://www.huffingtonpost.com/2013/08/26/wikipedia-women-storming-female-editors_n_3817138.html -
Re: [Wikimedia-l] next Wikidata office hour
On Sat, Nov 2, 2013 at 4:27 PM, Lydia Pintscher lydia.pintsc...@wikimedia.de wrote: Hi everyone, I'll be holding an office hour together with addshore on Wednesday, November 13 at 17:00 UTC. For your timezone see http://www.timeanddate.com/worldclock/fixedtime.html?hour=17min=00sec=0day=13month=11year=2013 We'll be meeting in #wikimedia-office on freenode. I'll start with a short overview of the current state of Wikidata and then there will be time for all your Wikidata related questions. I hope to see many of you there. Reminder: This is in 20 minutes. Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Obentrautstr. 72 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Copyright infringement - The real elephant in the room
On 11/13/2013 12:37 AM, Matthew Flaschen wrote: However, there may be room for enhancing MadmanBot (e.g. as a GSOC or OPW project). Any technical project able to identify small tasks and mentors available are welcome to join Wikimedia's Google Code-in team at https://www.mediawiki.org/wiki/Google_Code-In GCI will start next week and will last until the beginning of January. Hundreds of young students will scan our tasks and will eventually complete some of them. It is a program ideal for small projects, like the bots or gadgets used by editors. -- Quim Gil Technical Contributor Coordinator @ Wikimedia Foundation http://www.mediawiki.org/wiki/User:Qgil ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Copyright infringement - The real elephant in the room
On Wed, Nov 13, 2013 at 4:53 AM, Lodewijk lodew...@effeietsanders.orgwrote: Marco: I agree, we had also issues on the Dutch Wikipedia - these have been around for ages, the English Wikipedia is just less aware of them. Not sure if you meant this how it sounds, but the English Wikipedia community is acutely aware of copyright problems and have undertaken many, many large and complicated cleanup tasks of the sort Marco described. ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Copyright infringement - The real elephant in the room
On Wed, Nov 13, 2013 at 3:48 AM, Fæ fae...@gmail.com wrote: ... PS with regard to OTRS verification, we could do with better standards for verification, We are not attempting to perform a complete and unassailable verification; imagining that we can is folly. The point is, we need someone who credibly is the author or rightsholder, and with whom we have an audit trail of their claims and identity (email address we corresponded with, etc). When it comes down to it, we have no idea if an email is associated with the given person, that the alleged sender of a certified letter really is that person, or that the John Doe that came in to the office and showed valid government issued ID with a claim of copyright violation is the same John Doe who wrote the original material. There's no way for us to confirm in any reasonable manner. If there is an attempt at identity theft that is discovered, that audit trail is available to investigators with proper legal authorization etc. -- -george william herbert george.herb...@gmail.com ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] next Wikidata office hour
On Sat, Nov 2, 2013 at 4:27 PM, Lydia Pintscher lydia.pintsc...@wikimedia.de wrote: Hi everyone, I'll be holding an office hour together with addshore on Wednesday, November 13 at 17:00 UTC. For your timezone see http://www.timeanddate.com/worldclock/fixedtime.html?hour=17min=00sec=0day=13month=11year=2013 We'll be meeting in #wikimedia-office on freenode. I'll start with a short overview of the current state of Wikidata and then there will be time for all your Wikidata related questions. I hope to see many of you there. And the log can now be found at https://meta.wikimedia.org/wiki/IRC_office_hours/Office_hours_2013-11-13b Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Obentrautstr. 72 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Copyright infringement - The real elephant in the room
On 11/13/2013 10:39 AM, Nathan wrote: On Wed, Nov 13, 2013 at 4:53 AM, Lodewijk lodew...@effeietsanders.orgwrote: Marco: I agree, we had also issues on the Dutch Wikipedia - these have been around for ages, the English Wikipedia is just less aware of them. Not sure if you meant this how it sounds, but the English Wikipedia community is acutely aware of copyright problems and have undertaken many, many large and complicated cleanup tasks of the sort Marco described. I think he meant that the English Wikipedia community is less aware of the fact that we face these sorts of large-scale challenges in many other languages as well. In other words, the antecedent to them is issues on the Dutch/Italian/etc. Wikipedia, rather than copyright issues generally. Most people participating in other languages are reasonably aware when major concerns surface from the English Wikipedia; people participating only in English often haven't a clue about the concerns being dealt with in other languages. --Michael Snow ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Copyright infringement - The real elephant in the room
On Wed, Nov 13, 2013 at 1:48 PM, Michael Snow wikipe...@frontier.comwrote: On 11/13/2013 10:39 AM, Nathan wrote: On Wed, Nov 13, 2013 at 4:53 AM, Lodewijk lodew...@effeietsanders.org wrote: Marco: I agree, we had also issues on the Dutch Wikipedia - these have been around for ages, the English Wikipedia is just less aware of them. Not sure if you meant this how it sounds, but the English Wikipedia community is acutely aware of copyright problems and have undertaken many, many large and complicated cleanup tasks of the sort Marco described. I think he meant that the English Wikipedia community is less aware of the fact that we face these sorts of large-scale challenges in many other languages as well. In other words, the antecedent to them is issues on the Dutch/Italian/etc. Wikipedia, rather than copyright issues generally. Most people participating in other languages are reasonably aware when major concerns surface from the English Wikipedia; people participating only in English often haven't a clue about the concerns being dealt with in other languages. --Michael Snow That makes sense, thanks for clearing that up for me. ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Copyright infringement - The real elephant in the room
On 11/13/2013 08:40 AM, James Heilman wrote: Our biggest issue is copyright infringement. When it comes to copyright infringement, among all community sites on the Internet, Wikipedia is one of the best to handle it. Many websites don't even bother with copyright unless they get a DMCA Takedown notice. We on the other hand have voluntary contributors checking pages and raising flags whenever there is even a suspicion of a copyright violation. This seems to be highly effective in many cases. A few days ago, I wrote an email to a photographer, whose photos had been uploaded to Commons. He said I was the third to ask him whether he really had uploaded those images (which he had). Unquestionably, there are also many instances where the systems fails and where lots of copyrighted material gets uploaded. Back in 2005, we had a case similar to the one you described in German Wikipedia, where various IPs copied content from old books. It is a big mess to clean up, but it can be done. And luckily the cases of massive copyvios are quite rare. I think the community has done a very good job in the past 12 years when it comes to copyright. It is important to see that we are a community site – nothing is ever going to be perfect, and certainly we are not free of any copyright violations. But we are dealing with them in a very responsible way and I would say that our current efforts are sufficient. Tobias ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
Re: [Wikimedia-l] Copyright infringement - The real elephant in the room
Unquestionably, there are also many instances where the systems fails and where lots of copyrighted material gets uploaded. Back in 2005, we had a case similar to the one you described in German Wikipedia, where various IPs copied content from old books. It is a big mess to clean up, but it can be done. And luckily the cases of massive copyvios are quite rare. For further information see https://de.wikipedia.org/wiki/Wikipedia:Archiv/DDR-URV/Presseinfo (German). Cheers Martin ___ Wikimedia-l mailing list Wikimedia-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe