Re: [Wikimedia-l] Copyright issues

2019-06-18 Thread Asaf Bartov
(Doc James is too modest to give himself credit, so it falls to me to point
out that the tool was his own idea; I witnessed the birth of it at
Wikimania 2014, when I connected him to Eran, who implemented the first
version of the tool before the end of the conference.)

  A.

On Tue, Jun 18, 2019 at 2:27 AM James Heilman  wrote:

> Clarifying one small bit, the "copypatrol" tool was initially developed by
> Eran (a Wikimedia volunteer from Israel). It was than further developed by
> the Wikimedia Foundation. Agree that it is a great success, not only with
> respect to the final result but with respect to it being a successful
> collaborative project between the foundation and the community.
>
> James
>
> On Mon, Jun 17, 2019 at 10:36 AM Yaroslav Blanter 
> wrote:
>
> > Actually, I am afraid, for CCI at some point we will have to remove all
> > added text by bot. I do not see any other scalable solution.
> >
> > Cheers
> > Yaroslav
> >
> > On Mon, Jun 17, 2019 at 5:36 PM Stephen Philbrick <
> > stephen.w.philbr...@gmail.com> wrote:
> >
> > > I have seen a couple comments on copyright issues in the last couple
> days
> > > so I thought I'd share some information that I think may be not
> > well-known
> > > by everyone.
> > >
> > > Very roughly, copyright issues (text) can be viewed in three
> categories:
> > > 1. Addition of copyrighted material to articles in years past, not yet
> > > removed (one-off)
> > > 2. Same as above, except by a serial violator
> > > 3. Close to real-time edits which may include copyrighted material
> > >
> > > The reason for distinguishing these three categories is that our
> approach
> > > and success rates are very different.
> > >
> > > In case 1, an editor identifies what they believe to be a copyright
> issue
> > > in an existing article. They can report it to
> > Wikipedia:Copyright_problems.
> > > In the case of a single issue or a very small handful of issues, those
> > > items are identified and taken care of by volunteers. (I think this
> > aspect
> > > is handled adequately — I used to be active there but haven't been
> > > recently)
> > >
> > > The second case arises when a potential violation is identified. An
> > > examination of the editors contributions reveals many examples
> (typically
> > > five or more). If this occurs, it is referred to Wikipedia:Contributor
> > > copyright investigations. A CCI is opened, and the intent is to examine
> > > every single edit by that editor. This aspect is extremely backlogged.
> > I've
> > > spent many hours working on CCI's, but it isn't easy, it isn't
> rewarding,
> > > and it is discouraging because I think the backlog is increasing rather
> > > than decreasing. (This isn't due to newly created copyright issues but
> > > newly found ones.)
> > >
> > > The third case is handled by Copy Patrol, a  foundation created tool
> that
> > > examines all new edits in close to real time and generates a report,
> > which
> > > is handled by volunteers.
> > >
> > > I want to emphasize this third aspect for multiple reasons. I think it
> is
> > > one of the least known tools. Some of the prior emails on the subject
> > leave
> > > the impression that the authors are unaware of the existence of this
> > tool.
> > > On the one hand, it works very well, as almost all of the several
> hundred
> > > reports each week are reviewed, most within 24 hours.
> > >
> > > Good news:
> > > * Copy Patrol is working, so my guess is that the growth in true
> > copyright
> > > issues is close to nonexistent.
> > >
> > > Bad news:
> > > * Copy Patrol is adequately staffed but just barely. One editor is
> > > responsible for the handling of far more than half of all of these
> > reports
> > > (major kudos to Diannaa), but that much reliance on a single volunteer
> is
> > > not good for the long-term health of the project.
> > >
> > > * The copy patrol tool is pretty good, and was being improved for a
> > while,
> > > but I've identified some desirable improvements and my sense is that
> > it's a
> > > very back burner project in terms of additional enhancements.
> > >
> > > * CCI clearance is going to take many years
> > >
> > > Phil (Sphilbrick)
> > > ___
> > > Wikimedia-l mailing list, guidelines at:
> > > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> > > https://meta.wikimedia.org/wiki/Wikimedia-l
> > > New messages to: Wikimedia-l@lists.wikimedia.org
> > > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > > 
> > ___
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> > https://meta.wikimedia.org/wiki/Wikimedia-l
> > New messages to: Wikimedia-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > 

Re: [Wikimedia-l] Copyright issues

2019-06-17 Thread James Heilman
Clarifying one small bit, the "copypatrol" tool was initially developed by
Eran (a Wikimedia volunteer from Israel). It was than further developed by
the Wikimedia Foundation. Agree that it is a great success, not only with
respect to the final result but with respect to it being a successful
collaborative project between the foundation and the community.

James

On Mon, Jun 17, 2019 at 10:36 AM Yaroslav Blanter  wrote:

> Actually, I am afraid, for CCI at some point we will have to remove all
> added text by bot. I do not see any other scalable solution.
>
> Cheers
> Yaroslav
>
> On Mon, Jun 17, 2019 at 5:36 PM Stephen Philbrick <
> stephen.w.philbr...@gmail.com> wrote:
>
> > I have seen a couple comments on copyright issues in the last couple days
> > so I thought I'd share some information that I think may be not
> well-known
> > by everyone.
> >
> > Very roughly, copyright issues (text) can be viewed in three categories:
> > 1. Addition of copyrighted material to articles in years past, not yet
> > removed (one-off)
> > 2. Same as above, except by a serial violator
> > 3. Close to real-time edits which may include copyrighted material
> >
> > The reason for distinguishing these three categories is that our approach
> > and success rates are very different.
> >
> > In case 1, an editor identifies what they believe to be a copyright issue
> > in an existing article. They can report it to
> Wikipedia:Copyright_problems.
> > In the case of a single issue or a very small handful of issues, those
> > items are identified and taken care of by volunteers. (I think this
> aspect
> > is handled adequately — I used to be active there but haven't been
> > recently)
> >
> > The second case arises when a potential violation is identified. An
> > examination of the editors contributions reveals many examples (typically
> > five or more). If this occurs, it is referred to Wikipedia:Contributor
> > copyright investigations. A CCI is opened, and the intent is to examine
> > every single edit by that editor. This aspect is extremely backlogged.
> I've
> > spent many hours working on CCI's, but it isn't easy, it isn't rewarding,
> > and it is discouraging because I think the backlog is increasing rather
> > than decreasing. (This isn't due to newly created copyright issues but
> > newly found ones.)
> >
> > The third case is handled by Copy Patrol, a  foundation created tool that
> > examines all new edits in close to real time and generates a report,
> which
> > is handled by volunteers.
> >
> > I want to emphasize this third aspect for multiple reasons. I think it is
> > one of the least known tools. Some of the prior emails on the subject
> leave
> > the impression that the authors are unaware of the existence of this
> tool.
> > On the one hand, it works very well, as almost all of the several hundred
> > reports each week are reviewed, most within 24 hours.
> >
> > Good news:
> > * Copy Patrol is working, so my guess is that the growth in true
> copyright
> > issues is close to nonexistent.
> >
> > Bad news:
> > * Copy Patrol is adequately staffed but just barely. One editor is
> > responsible for the handling of far more than half of all of these
> reports
> > (major kudos to Diannaa), but that much reliance on a single volunteer is
> > not good for the long-term health of the project.
> >
> > * The copy patrol tool is pretty good, and was being improved for a
> while,
> > but I've identified some desirable improvements and my sense is that
> it's a
> > very back burner project in terms of additional enhancements.
> >
> > * CCI clearance is going to take many years
> >
> > Phil (Sphilbrick)
> > ___
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> > https://meta.wikimedia.org/wiki/Wikimedia-l
> > New messages to: Wikimedia-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > 
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> New messages to: Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 



-- 
James Heilman
MD, CCFP-EM, Wikipedian
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Copyright issues

2019-06-17 Thread Yaroslav Blanter
Actually, I am afraid, for CCI at some point we will have to remove all
added text by bot. I do not see any other scalable solution.

Cheers
Yaroslav

On Mon, Jun 17, 2019 at 5:36 PM Stephen Philbrick <
stephen.w.philbr...@gmail.com> wrote:

> I have seen a couple comments on copyright issues in the last couple days
> so I thought I'd share some information that I think may be not well-known
> by everyone.
>
> Very roughly, copyright issues (text) can be viewed in three categories:
> 1. Addition of copyrighted material to articles in years past, not yet
> removed (one-off)
> 2. Same as above, except by a serial violator
> 3. Close to real-time edits which may include copyrighted material
>
> The reason for distinguishing these three categories is that our approach
> and success rates are very different.
>
> In case 1, an editor identifies what they believe to be a copyright issue
> in an existing article. They can report it to Wikipedia:Copyright_problems.
> In the case of a single issue or a very small handful of issues, those
> items are identified and taken care of by volunteers. (I think this aspect
> is handled adequately — I used to be active there but haven't been
> recently)
>
> The second case arises when a potential violation is identified. An
> examination of the editors contributions reveals many examples (typically
> five or more). If this occurs, it is referred to Wikipedia:Contributor
> copyright investigations. A CCI is opened, and the intent is to examine
> every single edit by that editor. This aspect is extremely backlogged. I've
> spent many hours working on CCI's, but it isn't easy, it isn't rewarding,
> and it is discouraging because I think the backlog is increasing rather
> than decreasing. (This isn't due to newly created copyright issues but
> newly found ones.)
>
> The third case is handled by Copy Patrol, a  foundation created tool that
> examines all new edits in close to real time and generates a report, which
> is handled by volunteers.
>
> I want to emphasize this third aspect for multiple reasons. I think it is
> one of the least known tools. Some of the prior emails on the subject leave
> the impression that the authors are unaware of the existence of this tool.
> On the one hand, it works very well, as almost all of the several hundred
> reports each week are reviewed, most within 24 hours.
>
> Good news:
> * Copy Patrol is working, so my guess is that the growth in true copyright
> issues is close to nonexistent.
>
> Bad news:
> * Copy Patrol is adequately staffed but just barely. One editor is
> responsible for the handling of far more than half of all of these reports
> (major kudos to Diannaa), but that much reliance on a single volunteer is
> not good for the long-term health of the project.
>
> * The copy patrol tool is pretty good, and was being improved for a while,
> but I've identified some desirable improvements and my sense is that it's a
> very back burner project in terms of additional enhancements.
>
> * CCI clearance is going to take many years
>
> Phil (Sphilbrick)
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> New messages to: Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


[Wikimedia-l] Copyright issues

2019-06-17 Thread Stephen Philbrick
I have seen a couple comments on copyright issues in the last couple days
so I thought I'd share some information that I think may be not well-known
by everyone.

Very roughly, copyright issues (text) can be viewed in three categories:
1. Addition of copyrighted material to articles in years past, not yet
removed (one-off)
2. Same as above, except by a serial violator
3. Close to real-time edits which may include copyrighted material

The reason for distinguishing these three categories is that our approach
and success rates are very different.

In case 1, an editor identifies what they believe to be a copyright issue
in an existing article. They can report it to Wikipedia:Copyright_problems.
In the case of a single issue or a very small handful of issues, those
items are identified and taken care of by volunteers. (I think this aspect
is handled adequately — I used to be active there but haven't been recently)

The second case arises when a potential violation is identified. An
examination of the editors contributions reveals many examples (typically
five or more). If this occurs, it is referred to Wikipedia:Contributor
copyright investigations. A CCI is opened, and the intent is to examine
every single edit by that editor. This aspect is extremely backlogged. I've
spent many hours working on CCI's, but it isn't easy, it isn't rewarding,
and it is discouraging because I think the backlog is increasing rather
than decreasing. (This isn't due to newly created copyright issues but
newly found ones.)

The third case is handled by Copy Patrol, a  foundation created tool that
examines all new edits in close to real time and generates a report, which
is handled by volunteers.

I want to emphasize this third aspect for multiple reasons. I think it is
one of the least known tools. Some of the prior emails on the subject leave
the impression that the authors are unaware of the existence of this tool.
On the one hand, it works very well, as almost all of the several hundred
reports each week are reviewed, most within 24 hours.

Good news:
* Copy Patrol is working, so my guess is that the growth in true copyright
issues is close to nonexistent.

Bad news:
* Copy Patrol is adequately staffed but just barely. One editor is
responsible for the handling of far more than half of all of these reports
(major kudos to Diannaa), but that much reliance on a single volunteer is
not good for the long-term health of the project.

* The copy patrol tool is pretty good, and was being improved for a while,
but I've identified some desirable improvements and my sense is that it's a
very back burner project in terms of additional enhancements.

* CCI clearance is going to take many years

Phil (Sphilbrick)
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,