Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-22 Thread Steven Walling
On Thu, Nov 21, 2013 at 12:37 AM, WereSpielChequers werespielchequ...@gmail.com wrote: Typo correction and vandalism reversion are certainly both entries to editing, and it isn't just anti-vandalism where the opportunities have declined in recent years. Typos are getting harder to find,

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-22 Thread The Cunctator
Also, vandalism had always been a red herring, kind of like the terrorism that justifies the TSA security theater and NBA surveillance or the Red Scare. It's a wrong-headed obsession that weakens community. On Nov 22, 2013 2:06 PM, Steven Walling steven.wall...@gmail.com wrote: On Thu, Nov 21,

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-20 Thread
On 19 November 2013 20:44, Samuel Klein meta...@gmail.com wrote: Aside @Fae: the tineye crew are curious quite pro-freeculture, I bet they would be glad to help design a bot that uses their API to check image copyvios. This is an area this spins off from my little experiments with better

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-20 Thread The Cunctator
Yes, let's keep on pushing for policies that drive away editors! On Nov 20, 2013 2:10 AM, Fæ fae...@gmail.com wrote: On 19 November 2013 20:44, Samuel Klein meta...@gmail.com wrote: Aside @Fae: the tineye crew are curious quite pro-freeculture, I bet they would be glad to help design a bot

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-20 Thread Martijn Hoekstra
On Nov 20, 2013 1:13 PM, The Cunctator cuncta...@gmail.com wrote: Yes, let's keep on pushing for policies that drive away editors! I'm not sure exactly what kind of policy you are getting at here. Could you elaborate a little? On Nov 20, 2013 2:10 AM, Fæ fae...@gmail.com wrote: On 19

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-20 Thread Marc A. Pelletier
On 11/20/2013 07:13 AM, The Cunctator wrote: Yes, let's keep on pushing for policies that drive away editors! Let's be clear here: contributions that are copyright violations are not desirable to begin with. If someone is driven away because they cannot cut and paste from random websites

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-20 Thread Michael Snow
On 11/20/2013 8:31 AM, Marc A. Pelletier wrote: On 11/20/2013 07:13 AM, The Cunctator wrote: Yes, let's keep on pushing for policies that drive away editors! Let's be clear here: contributions that are copyright violations are not desirable to begin with. If someone is driven away because

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-20 Thread The Cunctator
There's also been discussion of automatically deleting content from contributors contributor from their own writing. On Nov 20, 2013 8:31 AM, Marc A. Pelletier m...@uberbox.org wrote: On 11/20/2013 07:13 AM, The Cunctator wrote: Yes, let's keep on pushing for policies that drive away editors!

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-20 Thread Marc A. Pelletier
On 11/20/2013 11:59 AM, Michael Snow wrote: An essential part of collaboration is, after all, reviewing each other's work. From the terseness of the comment, it might be alluding to either aspect or both. That's actually an interesting question that has been lurking beneath all the editing is

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-20 Thread Richard Symonds
Not quite: I would argue that anti-vandalism work is a gateway drug to the rest of the project. Just a hunch, though. On Nov 20, 2013 5:21 PM, Marc A. Pelletier m...@uberbox.org wrote: On 11/20/2013 11:59 AM, Michael Snow wrote: An essential part of collaboration is, after all, reviewing each

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-20 Thread Michael Snow
On 11/20/2013 9:20 AM, Marc A. Pelletier wrote: That's actually an interesting question that has been lurking beneath all the editing is going down nervousness. How much of that 'editing' was, in fact, busy work made immaterial by technical advantage (bots, extensions, abusefilter)? The number

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-20 Thread Marc A. Pelletier
On 11/20/2013 01:06 PM, Richard Symonds wrote: Not quite: I would argue that anti-vandalism work is a gateway drug to the rest of the project. Just a hunch, though. I'm pretty sure that typo correction fills pretty much the same niche, though. -- Marc

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-20 Thread Marc A. Pelletier
On 11/20/2013 01:13 PM, Michael Snow wrote: My general point is that opportunities for automation are best considered with our overall mission in mind, not just the speed or efficiency of a particular workflow. In certain situations, automation that creates more work rather than removing it

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-20 Thread Michael Snow
On 11/20/2013 10:52 AM, Marc A. Pelletier wrote: Perhaps another way of putting it is to ask whether the encyclopedia-building community is the means or the ends. To my eyes, having more contributors is not valuable unless it has better encyclopedia as a direct consequence. I believe the

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-19 Thread Andrew Gray
It could use abuse-filter tags, just not in an entirely standard way: * Bot scans edit X * Script flags it as a problem * Bot makes edit X+1 to page (perhaps adding copyvio template?) which triggers an abusefilter rule for (if this bot and does such-and-such an edit) and tags it. The offending

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-19 Thread Samuel Klein
Aside @Fae: the tineye crew are curious quite pro-freeculture, I bet they would be glad to help design a bot that uses their API to check image copyvios. On Nov 13, 2013 6:48 AM, Fæ fae...@gmail.com wrote: On 13 November 2013 07:40, James Heilman jmh...@gmail.com wrote: ... Our biggest issue

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-19 Thread Federico Leva (Nemo)
Samuel Klein, 19/11/2013 21:44: Aside @Fae: the tineye crew are curious quite pro-freeculture, I bet they would be glad to help design a bot that uses their API to check image copyvios. How to make them include the whole Commons dataset into their own, to start with? Nemo

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-19 Thread Matthew Flaschen
On 11/13/2013 04:57 AM, Federico Leva (Nemo) wrote: Marco Chiesa, 13/11/2013 10:21: There are bots that go and look whether a newly inserted block of text is already present somewhere else, [...] Rectius: there *used* to be a bot (RevertBot, Lusumbot). The program

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-19 Thread Federico Leva (Nemo)
Matthew Flaschen, 20/11/2013 06:05: On 11/13/2013 04:57 AM, Federico Leva (Nemo) wrote: Marco Chiesa, 13/11/2013 10:21: There are bots that go and look whether a newly inserted block of text is already present somewhere else, [...] Rectius: there *used* to be a bot (RevertBot, Lusumbot). The

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-18 Thread Matthew Flaschen
On 11/16/2013 09:04 AM, Anthony Cole wrote: The problem of false positives from mirrors doesn't exist if we scan edits as they are made. Agreed. However, that example is a legal, attributed (at least on the talk page) copy from a third-party freely licensed text, not a false positive copy

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-16 Thread Anthony Cole
The problem of false positives from mirrors doesn't exist if we scan edits as they are made. Maggie says herehttps://en.wikipedia.org/wiki/Wikipedia:Administrators%27_noticeboard#Emergency_block_of_an_editor_with_which_I_have_been_previously_involvedthat copyright bots populate WP:SCV

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-16 Thread Marc A. Pelletier
On 11/13/2013 04:57 AM, Federico Leva (Nemo) wrote: Rectius: there *used* to be a bot (RevertBot, Lusumbot). The program https://www.mediawiki.org/wiki/Manual:Pywikibot/copyright.py has been stopped when search engines changed their limits and Lusum has been waiting for the WMF's Yahoo! BOSS

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-16 Thread Marc A. Pelletier
On 11/13/2013 04:41 PM, Tobias wrote: I think the community has done a very good job in the past 12 years when it comes to copyright. It is important to see that we are a community site – nothing is ever going to be perfect, and certainly we are not free of any copyright violations. But we are

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-16 Thread Federico Leva (Nemo)
Marc A. Pelletier, 16/11/2013 16:34: On 11/13/2013 04:57 AM, Federico Leva (Nemo) wrote: Rectius: there *used* to be a bot (RevertBot, Lusumbot). The program https://www.mediawiki.org/wiki/Manual:Pywikibot/copyright.py has been stopped when search engines changed their limits and Lusum has been

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-15 Thread Florence Devouard
H Rupert, The case you mention is unrelated to any copyright infringement (the book is explicitely published under cc by sa. So there is no copyvio). Its mention here is like hair falling in soup. Now, I think there is a developing personal feud between you and Iolenda. It sincerely

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-15 Thread rupert THURNER
Salut florence, i obviously need to improve my English :) Marco suggested human checking to avoid false positives and some annotation that it happened. In my eyes the cited case is a verbatim copy of some compatible license text which could be used as an example to demonstrate what he ment. I did

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-14 Thread rupert THURNER
There is such a case in http://en.m.wikipedia.org/wiki/Education_in_Cameroon, reference is on the talk page. would you be so kind to mark or refer to it correctly? rupert Am 13.11.2013 12:46 schrieb Marco Chiesa chiesa.ma...@gmail.com: On Wed, Nov 13, 2013 at 12:39 PM, Chris McKenna

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-14 Thread Andrew Lih
FYI, on the last Wikipedia Weekly podcast, we talked with Sage Ross about the plagiarism issue, and he walked through the study with some very interesting insights. Video here, and the discussion started at 11 minutes, 30 seconds into the podcast. https://www.youtube.com/watch?v=IOgYytn2JRk

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-14 Thread Laura Hale
On Thu, Nov 14, 2013 at 4:47 PM, Andrew Lih andrew@gmail.com wrote: FYI, on the last Wikipedia Weekly podcast, we talked with Sage Ross about the plagiarism issue, and he walked through the study with some very interesting insights. Video here, and the discussion started at 11 minutes, 30

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Gerard Meijssen
Hoi, Seriously we should never ever be ruled be panic.What you see is bad, no doubt but the notion that we should dump everything because of the latest issue to come along is way overboard. - by stopping the flow on projects like Visual Editor you break dependencies for the work of many

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Matthew Flaschen
On 11/13/2013 02:40 AM, James Heilman wrote: The Wikimedia Foundation needs to wake up and deal with the real tech elephant in the room. Our primary issue is not a lack of FLOW, a lack of a visual editor, or a lack of a rapidly expanding education program. Our biggest issue is copyright

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Steven Walling
On Tue, Nov 12, 2013 at 11:40 PM, James Heilman jmh...@gmail.com wrote: The Wikimedia Foundation needs to wake up and deal with the real tech elephant in the room. Our primary issue is not a lack of FLOW, a lack of a visual editor, or a lack of a rapidly expanding education program. Our

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Marco Chiesa
On Wed, Nov 13, 2013 at 8:40 AM, James Heilman jmh...@gmail.com wrote: Our biggest issue is copyright infringement. We have had the Indian program, we have had issues with the Education program, and I have today come across a user who has made nearly 20,000 edits to 1,742 article since 2006

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Lodewijk
Marco: I agree, we had also issues on the Dutch Wikipedia - these have been around for ages, the English Wikipedia is just less aware of them. Often, copypasting in the same language is caught easily - between different languages is much harder and persistent. There are many people, including

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Federico Leva (Nemo)
Marco Chiesa, 13/11/2013 10:21: There are bots that go and look whether a newly inserted block of text is already present somewhere else, [...] Rectius: there *used* to be a bot (RevertBot, Lusumbot). The program https://www.mediawiki.org/wiki/Manual:Pywikibot/copyright.py has been stopped

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Philippe Beaudette
On Wed, Nov 13, 2013 at 2:37 AM, Matthew Flaschen matthew.flasc...@gatech.edu wrote: A significant problem with TurnItIn is that is proprietary, and can not be customized by anyone in the movement. The fact that it is proprietary also means it can never be port of the main infrastructure,

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Matthew Flaschen
On 11/13/2013 05:16 AM, Philippe Beaudette wrote: On Wed, Nov 13, 2013 at 2:37 AM, Matthew Flaschen matthew.flasc...@gatech.edu wrote: A significant problem with TurnItIn is that is proprietary, and can not be customized by anyone in the movement. The fact that it is proprietary also means

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Gerard Meijssen
Hoi I know several authors who publish and use their original text to publish on Wikipedia as well.. This is another source of false positives because they have the copyright to the original source... To recognise this you have to be even more sophisticated. The point I want to make is that

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Marco Chiesa
On Wed, Nov 13, 2013 at 11:44 AM, Gerard Meijssen gerard.meijs...@gmail.com wrote: Hoi I know several authors who publish and use their original text to publish on Wikipedia as well.. This is another source of false positives because they have the copyright to the original source... To

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Chris McKenna
On Wed, 13 Nov 2013, Marco Chiesa wrote: On Wed, Nov 13, 2013 at 11:44 AM, Gerard Meijssen gerard.meijs...@gmail.com wrote: Hoi I know several authors who publish and use their original text to publish on Wikipedia as well.. This is another source of false positives because they have the

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Chris McKenna
On Wed, 13 Nov 2013, Gerard Meijssen wrote: The point I want to make is that having a tool that is KNOWN to be deficient in specific ways can still be a huge advantage over not having a tool at all. So PLEASE lets not make perfection the enemy of the good. The problem isn't that we're waiting

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Marco Chiesa
On Wed, Nov 13, 2013 at 12:36 PM, Chris McKenna cmcke...@sucs.org wrote: But an automated tool can not know whether OTRS verification has happened or not. We put something like {{OTRS verified}} in the article's talk page, something saying: Part of the text comes from website X, ticket

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Marco Chiesa
On Wed, Nov 13, 2013 at 12:39 PM, Chris McKenna cmcke...@sucs.org wrote: The problem isn't that we're waiting for perfection. We're waiting for the proportion of false positives and false negatives to fall to a level where don't overwhelm the true positives. To avoid false positives from

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread
On 13 November 2013 07:40, James Heilman jmh...@gmail.com wrote: ... Our biggest issue is copyright infringement. ... Thanks for raising this James. Yes, this is an issue but if you are gunning for elephants this month, I really don't think the copyright elephant is the biggest one in the herd.

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Quim Gil
On 11/13/2013 12:37 AM, Matthew Flaschen wrote: However, there may be room for enhancing MadmanBot (e.g. as a GSOC or OPW project). Any technical project able to identify small tasks and mentors available are welcome to join Wikimedia's Google Code-in team at

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Nathan
On Wed, Nov 13, 2013 at 4:53 AM, Lodewijk lodew...@effeietsanders.orgwrote: Marco: I agree, we had also issues on the Dutch Wikipedia - these have been around for ages, the English Wikipedia is just less aware of them. Not sure if you meant this how it sounds, but the English Wikipedia

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread George Herbert
On Wed, Nov 13, 2013 at 3:48 AM, Fæ fae...@gmail.com wrote: ... PS with regard to OTRS verification, we could do with better standards for verification, We are not attempting to perform a complete and unassailable verification; imagining that we can is folly. The point is, we need someone

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Michael Snow
On 11/13/2013 10:39 AM, Nathan wrote: On Wed, Nov 13, 2013 at 4:53 AM, Lodewijk lodew...@effeietsanders.orgwrote: Marco: I agree, we had also issues on the Dutch Wikipedia - these have been around for ages, the English Wikipedia is just less aware of them. Not sure if you meant this how it

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Nathan
On Wed, Nov 13, 2013 at 1:48 PM, Michael Snow wikipe...@frontier.comwrote: On 11/13/2013 10:39 AM, Nathan wrote: On Wed, Nov 13, 2013 at 4:53 AM, Lodewijk lodew...@effeietsanders.org wrote: Marco: I agree, we had also issues on the Dutch Wikipedia - these have been around for ages, the

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Tobias
On 11/13/2013 08:40 AM, James Heilman wrote: Our biggest issue is copyright infringement. When it comes to copyright infringement, among all community sites on the Internet, Wikipedia is one of the best to handle it. Many websites don't even bother with copyright unless they get a DMCA

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Martin Rulsch
Unquestionably, there are also many instances where the systems fails and where lots of copyrighted material gets uploaded. Back in 2005, we had a case similar to the one you described in German Wikipedia, where various IPs copied content from old books. It is a big mess to clean up, but it