date:20131113

[Wikimedia-l] [Reminder] Language Engineering IRC Office Hour on November 13, 2013 at 1500 UTC

2013-11-13 Thread Runa Bhattacharjee

Hello,

A quick reminder that the Wikimedia Language Engineering team will be
hosting an IRC office hour from 1500 to 1600UTC later today on
#wikimedia-office (FreeNode). Please see below for the event details.

Thanks
Runa

-- Forwarded message --
From: Runa Bhattacharjee rbhattachar...@wikimedia.org
Date: Thu, Nov 7, 2013 at 11:40 AM
Subject: Language Engineering IRC Office Hour on November 13, 2013 at 1500
UTC
To: MediaWiki internationalisation mediawiki-i...@lists.wikimedia.org,
Wikimedia Mailing List wikimedia-l@lists.wikimedia.org, Wikimedia
developers wikitec...@lists.wikimedia.org,
wikitech-ambassad...@lists.wikimedia.org


[x-posted]

Hello,

The Wikimedia Language Engineering team will be hosting an IRC office
hour on Wednesday, November 13, 2013 between 15:00 - 16:00 UTC on
#wikimedia-office. (See below for timezone conversion and other details.)
We will be talking about some of our recent and upcoming projects and then
taking questions for the remaining time.

We also look forward to hear about anything that needs our attention.
Questions and other concerns can also be sent to me directly before the
event. See you there!

Thanks
Runa

=== Event Details ===

What: WMF Language Engineering Office hour
When: November 13, 2013 (Wednesday). 1500-1600 UTC
http://www.timeanddate.com/worldclock/fixedtime.html?iso=20131113T1500
Where: IRC Channel #wikimedia-office on FreeNode





-- 
Language Engineering - Outreach and QA Coordinator
Wikimedia Foundation
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Gerard Meijssen

Hoi,
Seriously we should never ever be ruled be panic.What you see is bad, no
doubt but the notion that we should dump everything because of the latest
issue to come along is way overboard.

   - by stopping the flow on projects like Visual Editor you break
   dependencies for the work of many developers
   - what you have noticed is for only one Wikipedia not all of them
   - we do need more mature discussion software what we have is horrible
   - such dramatics only have you go away and upset others it does not
   solve things
   - the dramatics detract me from your message
   - my hobby horse needs more attention too and I think my argument is
   better ...

Anyway, it would be nice when someone looks at the tool with an eye of
making it happen and making it scale. When it doesn't it becomes a less
attractive option to pursue.
Thanks,
  GerardM


On 13 November 2013 08:40, James Heilman jmh...@gmail.com wrote:

 The Wikimedia Foundation needs to wake up and deal with the real tech
 elephant in the room. Our primary issue is not a lack of FLOW, a lack of a
 visual editor, or a lack of a rapidly expanding education program.

 Our biggest issue is copyright infringement. We have had the Indian
 program, we have had issues with the Education program, and I have today
 come across a user who has made nearly 20,000 edits to 1,742 article since
 2006 which appear to be nearly all copy and pasted from the sources he has
 used.
 https://en.wikipedia.org/wiki/User_talk:DrMicro#Copyright_infringement
 This
 has seriously shaken my faith in Wikipedia.

 This is especially devastating as there is a tech solution that would have
 prevented it. The efforts are being worked on by volunteers here
 https://en.wikipedia.org/wiki/Wikipedia:Turnitin and has been since at
 least March of 2012. We NEED all tech resource at the foundation thrown at
 this project. Other less important project like FLOW and the visual editor
 need to be put on hold to develop this tool.

 --
 James Heilman
 MD, CCFP-EM, Wikipedian

 The Wikipedia Open Textbook of Medicine
 www.opentextbookofmedicine.com
 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
 mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Matthew Flaschen


On 11/13/2013 02:40 AM, James Heilman wrote:

The Wikimedia Foundation needs to wake up and deal with the real tech
elephant in the room. Our primary issue is not a lack of FLOW, a lack of a
visual editor, or a lack of a rapidly expanding education program.

Our biggest issue is copyright infringement.


I don't really agree with that.  It is a serious issue, but I would put 
NPOV (in the face of active threats such as companies paying for 
publicity on Wikipedia) and growing the editor community higher.


We also have solutions to address it (not perfectly, true), both 
preventing the problem and dealing with it after the fact


* MadmanBot (https://en.wikipedia.org/wiki/User:MadmanBot) (mentioned at 
Wikipedia:TurnItIn, and a major technical tool against copyright 
infringement).

* Clear policies against copyright infringement
* Dealing with copyright violations 
(https://en.wikipedia.org/wiki/Wikipedia:Text_Copyright_Violations_101)
* Finally, the DMCA ensures the foundation is not liable as long as they 
promptly respond to notifications (which of course we want them to anyway).



We have had the Indian program, we have had issues with the Education program, 
and I have today
come across a user who has made nearly 20,000 edits to 1,742 article since
2006 which appear to be nearly all copy and pasted from the sources he has
used. https://en.wikipedia.org/wiki/User_talk:DrMicro#Copyright_infringement
This has seriously shaken my faith in Wikipedia.


That is indeed disturbing, and I'm glad you found it.


This is especially devastating as there is a tech solution that would have
prevented it. The efforts are being worked on by volunteers here
https://en.wikipedia.org/wiki/Wikipedia:Turnitin and has been since at
least March of 2012. We NEED all tech resource at the foundation thrown at
this project. Other less important project like FLOW and the visual editor
need to be put on hold to develop this tool.


I don't agree that all tech resources should be used for this.  However, 
there may be room for enhancing MadmanBot (e.g. as a GSOC or OPW project).


A significant problem with TurnItIn is that is proprietary, and can not 
be customized by anyone in the movement.  The fact that it is 
proprietary also means it can never be port of the main infrastructure, 
nor run on Wikimedia Labs.


Matt Flaschen

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

[Wikimedia-l] Recovering wikipedia.it: top 1 trademark priority per it.wiki poll

2013-11-13 Thread Federico Leva (Nemo)


Hello all (cc Yana, Michelle, Geoff, legal, board).
In a formal poll[1] proposed by two admins, the it.wiki community has 
decided the following:


«The Italian Wikipedia community considers that, among all the actions 
in defense of its name (as in public image and trademark) pertaining to 
the Wikimedia Foundation, the maximum priority should be given to 
recovering the domain wikipedia.it (and if possible its sisters) and 
therefore asks WMF to follow one or more of the legal paths suggested by 
the experts to that purpose, using the funds assigned by the WMF board 
for 2013-14.»

https://it.wikipedia.org/wiki/Wikipedia:Sondaggi/Recupero_domini_a_nome_Wikipedia

The decision has been taken 132:1:4 which seems to be the largest 
absolute margin ever reached by a poll on it.wiki; the funds in question 
are the $700K to upgrade the trademark portfolio.[2]


Quick background:
* the domain wikipedia.it has been registered by a commercial hosting 
provider in 2003; WMIT members and others have been in contact with him 
since 2004 but he never replied;
* since 2006, the domain displays an ad banner hosted by Yepa on top 
(in 2006-9 it also trapped the user in it via a frame), of which the 
WMF is aware since 2006-12-14 (and has been reminded several times): 
this makes many users[3] who believe it our official domain think that 
Wikipedia is a for-profit effort.
WMIT, to serve the community's concerns, has sent an official complaint 
to NIC.it in 2009 but the registration is formally correct (via their ad 
hoc Wikipedia Italy Association) so only the trademark owner (WMF) can 
proceed with one of the 3 remaining legal tools (including challenging 
procedure and arbitrage), as summarised by a document kindly provided to 
the WMF by .mau., one of the authors of the NIC.it rules.
The last known concrete action taken by the WMF has been the extension 
of the trademark to Italy in 2007, by Florence (thanks Florence!).


Nemo

[1] The last resort decision-making official tool which on it.wiki 
overrides any decision past and future until revoked by another poll 
with same requirements.
[2] 
https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Plan_2013-14#2013-14_Plan_Finances_and_Staffing
[3] About 43 visits per minute that we were able to count via a 
JavaScript trick by Pietrodn in april 2009.


___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Steven Walling

On Tue, Nov 12, 2013 at 11:40 PM, James Heilman jmh...@gmail.com wrote:

 The Wikimedia Foundation needs to wake up and deal with the real tech
 elephant in the room. Our primary issue is not a lack of FLOW, a lack of a
 visual editor, or a lack of a rapidly expanding education program.

 Our biggest issue is copyright infringement. We have had the Indian
 program, we have had issues with the Education program, and I have today
 come across a user who has made nearly 20,000 edits to 1,742 article since
 2006 which appear to be nearly all copy and pasted from the sources he has
 used.
 https://en.wikipedia.org/wiki/User_talk:DrMicro#Copyright_infringement
 This
 has seriously shaken my faith in Wikipedia.

 This is especially devastating as there is a tech solution that would have
 prevented it. The efforts are being worked on by volunteers here
 https://en.wikipedia.org/wiki/Wikipedia:Turnitin and has been since at
 least March of 2012. We NEED all tech resource at the foundation thrown at
 this project. Other less important project like FLOW and the visual editor
 need to be put on hold to develop this tool.


Relevant info on the subject of copyvio is the recent plagiarism study by
the Education Program team. They looked different types of users (students,
newbies, experienced editors, admins) and compared them. Results were
published on Meta at
https://meta.wikimedia.org/wiki/Research:Plagiarism_on_the_English_Wikipediaand
also discussed in the last WMF Metrics  Activities meeting:
https://meta.wikimedia.org/wiki/Metrics_and_activities_meetings/2013-11-07

AFAIK this is the best data we have about how often different kinds of
editors close paraphrase or outright copy/paste.
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Marco Chiesa

On Wed, Nov 13, 2013 at 8:40 AM, James Heilman jmh...@gmail.com wrote:


 Our biggest issue is copyright infringement. We have had the Indian
 program, we have had issues with the Education program, and I have today
 come across a user who has made nearly 20,000 edits to 1,742 article since
 2006 which appear to be nearly all copy and pasted from the sources he has
 used.
 https://en.wikipedia.org/wiki/User_talk:DrMicro#Copyright_infringement
 This
 has seriously shaken my faith in Wikipedia.


Back in 2007 we found out a user on it.wp, a former sysop, with more than
40,000 edits that used to copy-paste from his sources, often outdated. He
was banned, and the community made a great effort to cleanup the articles
he contributed to (and damn it was hard, because those articles had a long
history after his edits). And in the following years, we had other similar
cases, you can find a selection here:
https://it.wikipedia.org/wiki/Progetto:Cococo/Controlli_conclusi
There are bots that go and look whether a newly inserted block of text is
already present somewhere else, it doesn't find everything  (of course it
won't find things copied from a printed book), but sooner or later serial
copyviolers get caught, and the fall from hero to zero is sooo quick.

At the end of the day, I think copyvios have always been taken seriously,
so that I don't remember big problems with that, while there have always
been more problems with libel, privacy, and editor retention.


Marco (Cruccone)
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Lodewijk

Marco: I agree, we had also issues on the Dutch Wikipedia - these have been
around for ages, the English Wikipedia is just less aware of them. Often,
copypasting in the same language is caught easily - between different
languages is much harder and persistent. There are many people, including
experienced editors, that think translating from random sources is OK. It
is no new problem, and chapters have indeed been working on getting this
understanding of what free licenses really mean more widely accepted in the
general audience. Not something that is easily measured of course.
Technical solutions sound great, but are only catching a small amount
inside the same language.

Steven: I understand this research was limited to the English Wikipedia
(where most of the plagiarism will be in the same language). It would not
strike me out of the realm of realism to assume this might be very
different for other languages than English. It also says little about the
problem in general of course.

For those who don't want to click on links to get information, it basically
says (simplification alert) that they don't have any indication that the US
 Canada education program makes the plagiarism problem on the English
Wikipedia any worse than it already is.

Anyway: I think this problem is more prominently there in non-English
communities, and that technical solutions are not going to be the answer
there. An educational answer is more likely to be successful, focusing on
explaining people how Wikipedia works and doesn't work, and what are do's
and don'ts. This doesn't have to be an education program like executed in
the US, but basically all outreach programs as executed by chapters, user
groups, thematic organizations or groups of volunteers can contribute to
this. This is already happening in most countries.

In some countries (like Germany ;-) ) politicians are doing the work for
us, explaining how evil plagiarism is and how it works by firing government
ministers over it :)

Best,
Lodewijk




2013/11/13 Marco Chiesa chiesa.ma...@gmail.com

 On Wed, Nov 13, 2013 at 8:40 AM, James Heilman jmh...@gmail.com wrote:

 
  Our biggest issue is copyright infringement. We have had the Indian
  program, we have had issues with the Education program, and I have today
  come across a user who has made nearly 20,000 edits to 1,742 article
 since
  2006 which appear to be nearly all copy and pasted from the sources he
 has
  used.
  https://en.wikipedia.org/wiki/User_talk:DrMicro#Copyright_infringement
  This
  has seriously shaken my faith in Wikipedia.
 

 Back in 2007 we found out a user on it.wp, a former sysop, with more than
 40,000 edits that used to copy-paste from his sources, often outdated. He
 was banned, and the community made a great effort to cleanup the articles
 he contributed to (and damn it was hard, because those articles had a long
 history after his edits). And in the following years, we had other similar
 cases, you can find a selection here:
 https://it.wikipedia.org/wiki/Progetto:Cococo/Controlli_conclusi
 There are bots that go and look whether a newly inserted block of text is
 already present somewhere else, it doesn't find everything  (of course it
 won't find things copied from a printed book), but sooner or later serial
 copyviolers get caught, and the fall from hero to zero is sooo quick.

 At the end of the day, I think copyvios have always been taken seriously,
 so that I don't remember big problems with that, while there have always
 been more problems with libel, privacy, and editor retention.


 Marco (Cruccone)
 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
 mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Federico Leva (Nemo)


Marco Chiesa, 13/11/2013 10:21:

There are bots that go and look whether a newly inserted block of text is
already present somewhere else, [...]


Rectius: there *used* to be a bot (RevertBot, Lusumbot). The program 
https://www.mediawiki.org/wiki/Manual:Pywikibot/copyright.py has been 
stopped when search engines changed their limits and Lusum has been 
waiting for the WMF's Yahoo! BOSS key, needed to run the bot, for a while.


Nemo

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Philippe Beaudette

On Wed, Nov 13, 2013 at 2:37 AM, Matthew Flaschen 
matthew.flasc...@gatech.edu wrote:

 A significant problem with TurnItIn is that is proprietary, and can not be
 customized by anyone in the movement.  The fact that it is proprietary also
 means it can never be port of the main infrastructure, nor run on Wikimedia
 Labs.


Another significant issue is the False Positive factor that is created by
our overwhelming popularity.  Frankly, we're mirrored all over the place.
And tools like Turnitin find the mirrors too.  It's not an easy problem to
solve.  I was on the team that looked at this a couple of years back - it's
just not simple, and there are complex challenges.


*Philippe Beaudette * \\  Director, Community Advocacy \\ Wikimedia
Foundation, Inc.
 T: 1-415-839-6885 x6643 |  phili...@wikimedia.org  |  :
@Philippewikihttps://twitter.com/Philippewiki
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Matthew Flaschen


On 11/13/2013 05:16 AM, Philippe Beaudette wrote:

On Wed, Nov 13, 2013 at 2:37 AM, Matthew Flaschen 
matthew.flasc...@gatech.edu wrote:


A significant problem with TurnItIn is that is proprietary, and can not be
customized by anyone in the movement.  The fact that it is proprietary also
means it can never be port of the main infrastructure, nor run on Wikimedia
Labs.



Another significant issue is the False Positive factor that is created by
our overwhelming popularity.  Frankly, we're mirrored all over the place.
And tools like Turnitin find the mirrors too.  It's not an easy problem to
solve.  I was on the team that looked at this a couple of years back - it's
just not simple, and there are complex challenges.


Yes, an intelligent solution would take into account when the mirror was 
first indexed (or ideally first published), and when the Wikipedia 
article was edited, to reduce false positives requiring manual intervention.


Matt Flaschen


___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Gerard Meijssen

Hoi
I know several authors who publish and use their original text to publish
on Wikipedia as well.. This is another source of false positives because
they have the copyright to the original source... To recognise this you
have to be even more sophisticated.

The point I want to make is that having a tool that is KNOWN to be
deficient in specific ways can still be a huge advantage over not having a
tool at all. So PLEASE lets not make perfection the enemy of the good.
Thanks,
   GerardM


On 13 November 2013 11:23, Matthew Flaschen matthew.flasc...@gatech.eduwrote:

 On 11/13/2013 05:16 AM, Philippe Beaudette wrote:

 On Wed, Nov 13, 2013 at 2:37 AM, Matthew Flaschen 
 matthew.flasc...@gatech.edu wrote:

  A significant problem with TurnItIn is that is proprietary, and can not
 be
 customized by anyone in the movement.  The fact that it is proprietary
 also
 means it can never be port of the main infrastructure, nor run on
 Wikimedia
 Labs.



 Another significant issue is the False Positive factor that is created
 by
 our overwhelming popularity.  Frankly, we're mirrored all over the place.
 And tools like Turnitin find the mirrors too.  It's not an easy problem to
 solve.  I was on the team that looked at this a couple of years back -
 it's
 just not simple, and there are complex challenges.


 Yes, an intelligent solution would take into account when the mirror was
 first indexed (or ideally first published), and when the Wikipedia article
 was edited, to reduce false positives requiring manual intervention.

 Matt Flaschen



 ___
 Wikimedia-l mailing list
 Wikimedia-l@lists.wikimedia.org
 Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
 mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Marco Chiesa

On Wed, Nov 13, 2013 at 11:44 AM, Gerard Meijssen gerard.meijs...@gmail.com
 wrote:

 Hoi
 I know several authors who publish and use their original text to publish
 on Wikipedia as well.. This is another source of false positives because
 they have the copyright to the original source... To recognise this you
 have to be even more sophisticated.


Actually, we consider these as copyvios, we delete the text straight away,
and we tell the editor if you're the author write to OTRS. Of course, if
the text is already somewhere else under a compatible free-license, we
don't need this. Until you can't be sure that User:MrX is actually the
physical person MrX, we need to protect the author's right.

Marco
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Chris McKenna


On Wed, 13 Nov 2013, Marco Chiesa wrote:


On Wed, Nov 13, 2013 at 11:44 AM, Gerard Meijssen gerard.meijs...@gmail.com

wrote:



Hoi
I know several authors who publish and use their original text to publish
on Wikipedia as well.. This is another source of false positives because
they have the copyright to the original source... To recognise this you
have to be even more sophisticated.



Actually, we consider these as copyvios, we delete the text straight away,
and we tell the editor if you're the author write to OTRS. Of course, if
the text is already somewhere else under a compatible free-license, we
don't need this. Until you can't be sure that User:MrX is actually the
physical person MrX, we need to protect the author's right.



But an automated tool can not know whether OTRS verification has happened 
or not.



Chris McKenna

cmcke...@sucs.org
www.sucs.org/~cmckenna


The essential things in life are seen not with the eyes,
but with the heart

Antoine de Saint Exupery


___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Chris McKenna


On Wed, 13 Nov 2013, Gerard Meijssen wrote:


The point I want to make is that having a tool that is KNOWN to be
deficient in specific ways can still be a huge advantage over not having a
tool at all. So PLEASE lets not make perfection the enemy of the good.


The problem isn't that we're waiting for perfection. We're waiting for the 
proportion of false positives and false negatives to fall to a level where 
don't overwhelm the true positives.



Chris McKenna

cmcke...@sucs.org
www.sucs.org/~cmckenna


The essential things in life are seen not with the eyes,
but with the heart

Antoine de Saint Exupery


___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Marco Chiesa

On Wed, Nov 13, 2013 at 12:36 PM, Chris McKenna cmcke...@sucs.org wrote:


 But an automated tool can not know whether OTRS verification has happened
 or not.

 We put something like {{OTRS verified}} in the article's talk page,
something saying: Part of the text comes from website X, ticket 1234567890.
And if the author wants to use his work for many articles, we tell him/her
to put the template in all his/her articles' talk page.
Marco
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Marco Chiesa

On Wed, Nov 13, 2013 at 12:39 PM, Chris McKenna cmcke...@sucs.org wrote:


 The problem isn't that we're waiting for perfection. We're waiting for the
 proportion of false positives and false negatives to fall to a level where
 don't overwhelm the true positives.


To avoid false positives from mirrors, the best option is to compare a text
as soon as it is saved. Also, you exclude certain websites from the
comparison because you know they're the mirrors, you exclude rollbacks, ...
Then, it is better to have a human checking that it is really a copyvio (it
could well be a public domain text, or another Wikipedia article).

Marco
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Fæ

On 13 November 2013 07:40, James Heilman jmh...@gmail.com wrote:
...
 Our biggest issue is copyright infringement.
...

Thanks for raising this James.

Yes, this is an issue but if you are gunning for elephants this month,
I really don't think the copyright elephant is the biggest one in the
herd.

As a practical example of the tools we already have in place,
yesterday I was facilitating an edit-a-thon for women in science with
King's College London and we had one of the example stubs we had
created on the English Wikipedia up on a projector. Within literally
*minutes* of creation it had been (correctly) flagged by a bot as a
possible copyright violation as some of the text had been cut  past
from King's own website; one of the participants quickly re-wrote it
using their own words. As the communications manager was sitting next
to me at the time, no doubt she found this rather reassuring, even
though in parallel she was asking about how best to officially
release text. :-)

We have a more complex problem with how images uploaded to Wikimedia
Commons can be flagged where they match images found elsewhere on the
internet, this is something that may be done by a future bot but we
might need to partner with someone like Google Images or Tineye to
make this truly effective. Having run my own experimental bots on this
area, I would love to see this become a funded project.

PS with regard to OTRS verification, we could do with better standards
for verification, at the moment volunteers like myself are left to use
our own judgement about what checks to make. I tend to double check
text or images being released with Google, just in case, as well as
doing whois checks on email domains. These sorts of checks could
become part of OTRS guidelines and would make the reliability of OTRS
tickets a notch higher.

Cheers,
Fae
-- 
fae...@gmail.com http://j.mp/faewm

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

[Wikimedia-l] [Wikimedia Announcements] Wikimedia UK report, September 2013

2013-11-13 Thread Stevie Benton

Hello everyone,

Please find below the Wikimedia UK monthly
reporthttps://wikimedia.org.uk/wiki/Reports for
the period 1st to 30th September 2013. If you want to keep up with the
chapter's activities as they happen, please subscribe to our
bloghttp://blog.wikimedia.org.uk/
, join a UK mailing
listhttps://lists.wikimedia.org/mailman/listinfo/wikimediauk-l,
and/or follow us on Twitter http://twitter.com/wikimediauk. If you have
any questions or comments, please drop us a line on this report's talk
pagehttps://wikimedia.org.uk/w/index.php?title=Talk:Reports/2013/Septemberaction=editredlink=1.
If you prefer to read the page on wiki, you can find it at
https://wikimedia.org.uk/wiki/Reports/2013/September

Thanks and regards,

Stevie

Program activities
Community

- On 7 September 2013, Wikimedians and amateur photographers gathered in
the Grade II-listed St Michael’s Church for Wikipedia Takes Chester, a
day-long photo scavenger hunt, held to increase participation in Wiki Loves
Monuments UK. At Wikipedia Takes Chester, and at Wikipedia Takes Coventry
this time last year, many people attended who would not normally come along
to Wikimedia events. A huge range of photographers attend these events,
from the point-and-shoot-wielding amateur to the very-expensive-DSLR-toting
professionals.

GLAM activities

- Wikimedia UK is pleased to announce its new partnership with York
Museums Trust http://www.yorkmuseumstrust.org.uk/Page/Index.aspx. The
partnership, which was confirmed in September, is to be supported by the
recruitment of a Wikimedian in Residence who will promote open access to
collections data across the trust.
- We were also delighted to announce in September that The Royal Society
is recruiting a Wikimedian in Residence.

Education activities

This summer Wikimedia UK embarked on a systematic campaign to raise
awareness of the assistance the charity can offer to university students
towards the creation of new student societies associated with Wikipedia and
other Wikimedia projects. With support from Wikimedia UK, Wikipedia student
societies have already been established at Imperial College London and
Cambridge University. We are keen to see this sort of activity develop on
other campuses across the UK.

We’re in the process of discussing the possibility of new Wikipedia
students’ societies developing at a number of universities in Cardiff,
Dundee, Manchester, Hull/Scarborough, Swansea and London. Please help us
spread this information across university campuses throughout the UK, or if
you’re a university student and a Wikipedian just email
educat...@wikimedia.org.uk and we’ll take it from there

We were also preparing for the delivery of the EduWiki Conference
2013https://wikimedia.org.uk/wiki/EduWiki_Conference_2013 on
1-2 November.

Technology

In September, the WMUK wiki was migrated from the Wikimedia Foundation's
datacentre to Wikimedia UK's. Details of the migration can be found on the WMUK
website https://wikimedia.org.uk/wiki/WMUK_wiki_migration, which is now
at the new address of wikimedia.org.uk

Other activities

15 October is recognised as Ada Lovelace Day and is dedicated to
celebrating the contributions of women in science, technology, engineering
and mathematics (STEM).

Wikimedia UK was proud to be a part of those celebrations. We have
delivered many events about Women in Science in October. For example, along
with Jisc, we have supported an editathon focusing on women in science
which took place at the University of Oxford. As if to illustrate that the
ongoing campaign to encourage greater recognition of women in STEM, BBC
Radio 4′s Woman’s Hour show featured a discussion of this topic, featuring
our very own Daria Cybulska. UK readers of this blog can listen to the show
here http://www.bbc.co.uk/programmes/b03cmt4n. The section about Ada
Lovelace Day begins after around 7:45 of the recording.

Wiki Loves Monuments

- September 2013 will always be the month the UK took part in Wiki Loves
Monuments for the first time. The first few minutes of 1 September were
nervous. Would everything work? Would we have long to wait for our first
upload? What did the month ahead hold? Thirty days later we had 11,995
photos from 573 people - great success!

To ensure participants could ask questions, it was possible for them to
comment on the Wiki Loves Monuments UK website, get in touch via Twitter or
Facebook, or send an email which would be picked up through OTRS.

Microgrants

Information about microgrants that are currently running, and how to submit
a microgrant application of your own, are at
Microgrants/Applicationshttps://wikimedia.org.uk/wiki/Microgrants/Applications
.

UK press coverage (and coverage of UK projects activities)

- Storming Wikipedia - Project tackles the site's 'women problem' -
Huffington
Posthttp://www.huffingtonpost.com/2013/08/26/wikipedia-women-storming-female-editors_n_3817138.html
-

Re: [Wikimedia-l] next Wikidata office hour

2013-11-13 Thread Lydia Pintscher

On Sat, Nov 2, 2013 at 4:27 PM, Lydia Pintscher
lydia.pintsc...@wikimedia.de wrote:
 Hi everyone,

 I'll be holding an office hour together with addshore on Wednesday,
 November 13 at 17:00 UTC. For your timezone see
 http://www.timeanddate.com/worldclock/fixedtime.html?hour=17min=00sec=0day=13month=11year=2013
  We'll be meeting in #wikimedia-office on freenode. I'll start with a
 short overview of the current state of Wikidata and then there will be
 time for all your Wikidata related questions.
 I hope to see many of you there.

Reminder: This is in 20 minutes.


Cheers
Lydia

-- 
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Obentrautstr. 72
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Quim Gil

On 11/13/2013 12:37 AM, Matthew Flaschen wrote:
 However,
 there may be room for enhancing MadmanBot (e.g. as a GSOC or OPW project).

Any technical project able to identify small tasks and mentors available
are welcome to join Wikimedia's Google Code-in team at

https://www.mediawiki.org/wiki/Google_Code-In

GCI will start next week and will last until the beginning of January.
Hundreds of young students will scan our tasks and will eventually
complete some of them.

It is a program ideal for small projects, like the bots or gadgets used
by editors.

-- 
Quim Gil
Technical Contributor Coordinator @ Wikimedia Foundation
http://www.mediawiki.org/wiki/User:Qgil

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Nathan

On Wed, Nov 13, 2013 at 4:53 AM, Lodewijk lodew...@effeietsanders.orgwrote:

 Marco: I agree, we had also issues on the Dutch Wikipedia - these have been
 around for ages, the English Wikipedia is just less aware of them.



Not sure if you meant this how it sounds, but the English Wikipedia
community is acutely aware of copyright problems and have undertaken many,
many large and complicated cleanup tasks of the sort Marco described.
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread George Herbert

On Wed, Nov 13, 2013 at 3:48 AM, Fæ fae...@gmail.com wrote:

 ...
 PS with regard to OTRS verification, we could do with better standards
 for verification,


We are not attempting to perform a complete and unassailable verification;
imagining that we can is folly.

The point is, we need someone who credibly is the author or rightsholder,
and with whom we have an audit trail of their claims and identity (email
address we corresponded with, etc).

When it comes down to it, we have no idea if an email is associated with
the given person, that the alleged sender of a certified letter really is
that person, or that the John Doe that came in to the office and showed
valid government issued ID with a claim of copyright violation is the same
John Doe who wrote the original material.  There's no way for us to confirm
in any reasonable manner.

If there is an attempt at identity theft that is discovered, that audit
trail is available to investigators with proper legal authorization etc.


-- 
-george william herbert
george.herb...@gmail.com
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] next Wikidata office hour

2013-11-13 Thread Lydia Pintscher

On Sat, Nov 2, 2013 at 4:27 PM, Lydia Pintscher
lydia.pintsc...@wikimedia.de wrote:
 Hi everyone,

 I'll be holding an office hour together with addshore on Wednesday,
 November 13 at 17:00 UTC. For your timezone see
 http://www.timeanddate.com/worldclock/fixedtime.html?hour=17min=00sec=0day=13month=11year=2013
  We'll be meeting in #wikimedia-office on freenode. I'll start with a
 short overview of the current state of Wikidata and then there will be
 time for all your Wikidata related questions.
 I hope to see many of you there.

And the log can now be found at
https://meta.wikimedia.org/wiki/IRC_office_hours/Office_hours_2013-11-13b


Cheers
Lydia

-- 
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Obentrautstr. 72
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Michael Snow


On 11/13/2013 10:39 AM, Nathan wrote:

On Wed, Nov 13, 2013 at 4:53 AM, Lodewijk lodew...@effeietsanders.orgwrote:

Marco: I agree, we had also issues on the Dutch Wikipedia - these have been
around for ages, the English Wikipedia is just less aware of them.

Not sure if you meant this how it sounds, but the English Wikipedia
community is acutely aware of copyright problems and have undertaken many,
many large and complicated cleanup tasks of the sort Marco described.
I think he meant that the English Wikipedia community is less aware of 
the fact that we face these sorts of large-scale challenges in many 
other languages as well. In other words, the antecedent to them is 
issues on the Dutch/Italian/etc. Wikipedia, rather than copyright 
issues generally. Most people participating in other languages are 
reasonably aware when major concerns surface from the English Wikipedia; 
people participating only in English often haven't a clue about the 
concerns being dealt with in other languages.


--Michael Snow

___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Nathan

On Wed, Nov 13, 2013 at 1:48 PM, Michael Snow wikipe...@frontier.comwrote:

 On 11/13/2013 10:39 AM, Nathan wrote:

 On Wed, Nov 13, 2013 at 4:53 AM, Lodewijk lodew...@effeietsanders.org
 wrote:

 Marco: I agree, we had also issues on the Dutch Wikipedia - these have
 been
 around for ages, the English Wikipedia is just less aware of them.

 Not sure if you meant this how it sounds, but the English Wikipedia
 community is acutely aware of copyright problems and have undertaken many,
 many large and complicated cleanup tasks of the sort Marco described.

 I think he meant that the English Wikipedia community is less aware of the
 fact that we face these sorts of large-scale challenges in many other
 languages as well. In other words, the antecedent to them is issues on
 the Dutch/Italian/etc. Wikipedia, rather than copyright issues
 generally. Most people participating in other languages are reasonably
 aware when major concerns surface from the English Wikipedia; people
 participating only in English often haven't a clue about the concerns being
 dealt with in other languages.

 --Michael Snow


That makes sense, thanks for clearing that up for me.
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Tobias


On 11/13/2013 08:40 AM, James Heilman wrote:

Our biggest issue is copyright infringement.


When it comes to copyright infringement, among all community sites on 
the Internet, Wikipedia is one of the best to handle it. Many websites 
don't even bother with copyright unless they get a DMCA Takedown notice. 
We on the other hand have voluntary contributors checking pages and 
raising flags whenever there is even a suspicion of a copyright violation.


This seems to be highly effective in many cases. A few days ago, I wrote 
an email to a photographer, whose photos had been uploaded to Commons. 
He said I was the third to ask him whether he really had uploaded those 
images (which he had).


Unquestionably, there are also many instances where the systems fails 
and where lots of copyrighted material gets uploaded. Back in 2005, we 
had a case similar to the one you described in German Wikipedia, where 
various IPs copied content from old books. It is a big mess to clean up, 
but it can be done. And luckily the cases of massive copyvios are quite 
rare.


I think the community has done a very good job in the past 12 years when 
it comes to copyright. It is important to see that we are a community 
site – nothing is ever going to be perfect, and certainly we are not 
free of any copyright violations. But we are dealing with them in a very 
responsible way and I would say that our current efforts are sufficient.



Tobias


___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

2013-11-13 Thread Martin Rulsch

 Unquestionably, there are also many instances where the systems fails and
 where lots of copyrighted material gets uploaded. Back in 2005, we had a
 case similar to the one you described in German Wikipedia, where various
 IPs copied content from old books. It is a big mess to clean up, but it can
 be done. And luckily the cases of massive copyvios are quite rare.


For further information see
https://de.wikipedia.org/wiki/Wikipedia:Archiv/DDR-URV/Presseinfo (German).

Cheers
Martin
___
Wikimedia-l mailing list
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe

[Wikimedia-l] [Reminder] Language Engineering IRC Office Hour on November 13, 2013 at 1500 UTC

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

[Wikimedia-l] Recovering wikipedia.it: top 1 trademark priority per it.wiki poll

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

[Wikimedia-l] [Wikimedia Announcements] Wikimedia UK report, September 2013

Re: [Wikimedia-l] next Wikidata office hour

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

Re: [Wikimedia-l] next Wikidata office hour

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

Re: [Wikimedia-l] Copyright infringement - The real elephant in the room

27 matches

Site Navigation

Mail list logo

Footer information