Re: [Wikimedia-l] Quality issues

2015-12-18 Thread Gerard Meijssen
Hoi,
The CC-0 license was set up with the express reason that everybody can use
our data without any impediment.  Our objective is to share in the sum of
all knowledge and we are more effective in that way.

We do not care about market dominance, we care about doing our utmost to
have the best data available. At that I could not care less for theoretical
what ifs, I am interested in making a difference in our content because
that is where we make a difference.
Thanks,
   GerardM

On 18 December 2015 at 09:05, Andreas Kolbe  wrote:

> Gerard,
>
> Of course you can't license or copyright facts, but as the WMF legal team's
> page on this topic[1] outlines, there are database and compilation rights
> that exist independently of copyright. IANAL, but as I read that page, if
> you simply go ahead and copy all the infobox, template etc. content from a
> Wikipedia, this "would likely be a violation" even under US law (not to
> mention EU law).
>
> I don't know why Wikipedia was set up with a CC BY-SA licence rather than a
> CC0 licence, and the attribution required under CC BY-SA is unduly
> cumbersome, but attribution has always seemed to me like a useful concept.
> The fact that people like VDM Publishing who sell Wikipedia articles as
> books are required to say that their material comes from Wikipedia is
> useful, for example.
>
> Naturally it fosters re-use if you make Wikidata CC0, but that's precisely
> the point: you end up with a level of "market dominance" that just ain't
> healthy.
>
> [1] https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights
>
> On Thu, Dec 17, 2015 at 2:33 PM, Gerard Meijssen <
> gerard.meijs...@gmail.com>
> wrote:
>
> > Hoi,
> > Andreas, the law is an arse. However the law has it that you cannot
> license
> > facts. When in distributed processes data is retrieved from Wikipedia, it
> > is the authors who may contest their rights. There is no such thing as
> > collective rights for Wikipedia, all Wikipedias.
> >
> > You may not like this and that is fine.
> >
> > DBpedia has its license in the current way NOT because they care about
> the
> > license but because they are not interested in a row with Wikipedians on
> > the subject. They are quite happy to share their data with Wikidata and
> > make data retrieved in their processes with a CC-0.
> >
> > Thanks,
> >  GerardM
> >
> > On 17 December 2015 at 15:17, Andreas Kolbe  wrote:
> >
> > > On Wed, Dec 16, 2015 at 11:12 AM, Andrea Zanni <
> zanni.andre...@gmail.com
> > >
> > > wrote:
> > >
> > > > On Sun, Dec 13, 2015 at 9:35 PM, Jane Darnell 
> > wrote:
> > > >
> > > > > Andrea,
> > > > > I totally agree on the mission/vision thing, but am not sure what
> you
> > > > mean
> > > > > exactly by scale - do you mean that Wikidata shouldn't try to be so
> > > > > granular that it has a statement to cover each factoid in any
> > Wikipedia
> > > > > article, or do you mean we need to talk about what constitutes
> > > notability
> > > > > in order not to grow Wikidata exponentially to the point the
> servers
> > > > crash?
> > > > > Jane
> > > > >
> > > > >
> > > > Hi Jane, I explained myself poorly (sometime English is too difficult
> > :-)
> > > >
> > > > What I mean is that the scale of the error *could* be of another
> scale,
> > > > another order of magnitude.
> > > > The propagation of the error is multiplied, it's not just a single
> > error
> > > on
> > > > a wikipage: it's an error propagated in many wikipages, and then
> > Google,
> > > > etc.
> > > > A single point of failure.
> > > >
> > >
> > >
> > > Exactly: a single point of failure. A system where a single point of
> > > failure can have such consequences, potentially corrupting knowledge
> > > forever, is a bad system. It's not robust.
> > >
> > > In the op-ed, I mentioned the Brazilian aardvark hoax[1] as an example
> of
> > > error propagation (which happened entirely without Wikidata's and the
> > > Knowledge Graph's help). It took the New Yorker quite a bit of research
> > to
> > > piece together and confirm what happened, research which I understand
> > would
> > > not have happened if the originator of the hoax had not been willing to
> > > talk about his prank.
> > >
> > > It was the same with the fake Maurice Jarre quotes in Wikipedia[2] that
> > > made their way into mainstream press obituaries a few years ago. If the
> > > hoaxer had not come forward, no one would have been the wiser. The fake
> > > quotes would have remained a permanent part of the historical record.
> > >
> > > More recent cases include the widely repeated (including by Associated
> > > Press, for God's sake, to this day) claim that Joe Streater was
> involved
> > in
> > > the Boston College basketball point shaving scandal[3] and the Amelia
> > > Bedelia hoax.[4]
> > >
> > > If even things people insert as a joke propagate around the globe as a
> > > result of this vulnerability, then there is a clear and present
> potential

Re: [Wikimedia-l] Quality issues

2015-12-18 Thread Andreas Kolbe
Gerard,

Of course you can't license or copyright facts, but as the WMF legal team's
page on this topic[1] outlines, there are database and compilation rights
that exist independently of copyright. IANAL, but as I read that page, if
you simply go ahead and copy all the infobox, template etc. content from a
Wikipedia, this "would likely be a violation" even under US law (not to
mention EU law).

I don't know why Wikipedia was set up with a CC BY-SA licence rather than a
CC0 licence, and the attribution required under CC BY-SA is unduly
cumbersome, but attribution has always seemed to me like a useful concept.
The fact that people like VDM Publishing who sell Wikipedia articles as
books are required to say that their material comes from Wikipedia is
useful, for example.

Naturally it fosters re-use if you make Wikidata CC0, but that's precisely
the point: you end up with a level of "market dominance" that just ain't
healthy.

[1] https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights

On Thu, Dec 17, 2015 at 2:33 PM, Gerard Meijssen 
wrote:

> Hoi,
> Andreas, the law is an arse. However the law has it that you cannot license
> facts. When in distributed processes data is retrieved from Wikipedia, it
> is the authors who may contest their rights. There is no such thing as
> collective rights for Wikipedia, all Wikipedias.
>
> You may not like this and that is fine.
>
> DBpedia has its license in the current way NOT because they care about the
> license but because they are not interested in a row with Wikipedians on
> the subject. They are quite happy to share their data with Wikidata and
> make data retrieved in their processes with a CC-0.
>
> Thanks,
>  GerardM
>
> On 17 December 2015 at 15:17, Andreas Kolbe  wrote:
>
> > On Wed, Dec 16, 2015 at 11:12 AM, Andrea Zanni  >
> > wrote:
> >
> > > On Sun, Dec 13, 2015 at 9:35 PM, Jane Darnell 
> wrote:
> > >
> > > > Andrea,
> > > > I totally agree on the mission/vision thing, but am not sure what you
> > > mean
> > > > exactly by scale - do you mean that Wikidata shouldn't try to be so
> > > > granular that it has a statement to cover each factoid in any
> Wikipedia
> > > > article, or do you mean we need to talk about what constitutes
> > notability
> > > > in order not to grow Wikidata exponentially to the point the servers
> > > crash?
> > > > Jane
> > > >
> > > >
> > > Hi Jane, I explained myself poorly (sometime English is too difficult
> :-)
> > >
> > > What I mean is that the scale of the error *could* be of another scale,
> > > another order of magnitude.
> > > The propagation of the error is multiplied, it's not just a single
> error
> > on
> > > a wikipage: it's an error propagated in many wikipages, and then
> Google,
> > > etc.
> > > A single point of failure.
> > >
> >
> >
> > Exactly: a single point of failure. A system where a single point of
> > failure can have such consequences, potentially corrupting knowledge
> > forever, is a bad system. It's not robust.
> >
> > In the op-ed, I mentioned the Brazilian aardvark hoax[1] as an example of
> > error propagation (which happened entirely without Wikidata's and the
> > Knowledge Graph's help). It took the New Yorker quite a bit of research
> to
> > piece together and confirm what happened, research which I understand
> would
> > not have happened if the originator of the hoax had not been willing to
> > talk about his prank.
> >
> > It was the same with the fake Maurice Jarre quotes in Wikipedia[2] that
> > made their way into mainstream press obituaries a few years ago. If the
> > hoaxer had not come forward, no one would have been the wiser. The fake
> > quotes would have remained a permanent part of the historical record.
> >
> > More recent cases include the widely repeated (including by Associated
> > Press, for God's sake, to this day) claim that Joe Streater was involved
> in
> > the Boston College basketball point shaving scandal[3] and the Amelia
> > Bedelia hoax.[4]
> >
> > If even things people insert as a joke propagate around the globe as a
> > result of this vulnerability, then there is a clear and present potential
> > for purposeful manipulation. We've seen enough cases of that, too.[5]
> >
> > This is not the sort of system the Wikimedia community should be helping
> to
> > build. The very values at the heart of the Wikimedia movement are about
> > transparency, accountability, multiple points of view, pluralism,
> > democracy, opposing dominance and control by vested interests, and so
> > forth.
> >
> > What is the way forward?
> >
> > Wikidata should, as a matter of urgency, rescind its decision to make its
> > content available under the CC0 licence. Global propagation without
> > attribution is a terrible idea.
> >
> > Quite apart from that, in my opinion Wikidata's CC0 licensing also
> > infringes Wikipedia contributors' rights as enshrined in Wikipedia's CC
> > BY-SA licence, a 

Re: [Wikimedia-l] Quality issues

2015-12-18 Thread Peter Southwood
Wikipedia is not about infoboxes, they are (and are intended to be) a small to 
very small part of the article in most cases. Similarly, Wikipedias are not 
databases, so also without being a lawyer, I think your interpretation is wrong.
Cheers,
Peter

-Original Message-
From: Wikimedia-l [mailto:wikimedia-l-boun...@lists.wikimedia.org] On Behalf Of 
Andreas Kolbe
Sent: Friday, 18 December 2015 10:06 AM
To: Wikimedia Mailing List
Subject: Re: [Wikimedia-l] Quality issues

Gerard,

Of course you can't license or copyright facts, but as the WMF legal team's 
page on this topic[1] outlines, there are database and compilation rights that 
exist independently of copyright. IANAL, but as I read that page, if you simply 
go ahead and copy all the infobox, template etc. content from a Wikipedia, this 
"would likely be a violation" even under US law (not to mention EU law).

I don't know why Wikipedia was set up with a CC BY-SA licence rather than a
CC0 licence, and the attribution required under CC BY-SA is unduly cumbersome, 
but attribution has always seemed to me like a useful concept.
The fact that people like VDM Publishing who sell Wikipedia articles as books 
are required to say that their material comes from Wikipedia is useful, for 
example.

Naturally it fosters re-use if you make Wikidata CC0, but that's precisely the 
point: you end up with a level of "market dominance" that just ain't healthy.

[1] https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights

On Thu, Dec 17, 2015 at 2:33 PM, Gerard Meijssen 
wrote:

> Hoi,
> Andreas, the law is an arse. However the law has it that you cannot 
> license facts. When in distributed processes data is retrieved from 
> Wikipedia, it is the authors who may contest their rights. There is no 
> such thing as collective rights for Wikipedia, all Wikipedias.
>
> You may not like this and that is fine.
>
> DBpedia has its license in the current way NOT because they care about 
> the license but because they are not interested in a row with 
> Wikipedians on the subject. They are quite happy to share their data 
> with Wikidata and make data retrieved in their processes with a CC-0.
>
> Thanks,
>  GerardM
>
> On 17 December 2015 at 15:17, Andreas Kolbe  wrote:
>
> > On Wed, Dec 16, 2015 at 11:12 AM, Andrea Zanni 
> >  >
> > wrote:
> >
> > > On Sun, Dec 13, 2015 at 9:35 PM, Jane Darnell 
> wrote:
> > >
> > > > Andrea,
> > > > I totally agree on the mission/vision thing, but am not sure 
> > > > what you
> > > mean
> > > > exactly by scale - do you mean that Wikidata shouldn't try to be 
> > > > so granular that it has a statement to cover each factoid in any
> Wikipedia
> > > > article, or do you mean we need to talk about what constitutes
> > notability
> > > > in order not to grow Wikidata exponentially to the point the 
> > > > servers
> > > crash?
> > > > Jane
> > > >
> > > >
> > > Hi Jane, I explained myself poorly (sometime English is too 
> > > difficult
> :-)
> > >
> > > What I mean is that the scale of the error *could* be of another 
> > > scale, another order of magnitude.
> > > The propagation of the error is multiplied, it's not just a single
> error
> > on
> > > a wikipage: it's an error propagated in many wikipages, and then
> Google,
> > > etc.
> > > A single point of failure.
> > >
> >
> >
> > Exactly: a single point of failure. A system where a single point of 
> > failure can have such consequences, potentially corrupting knowledge 
> > forever, is a bad system. It's not robust.
> >
> > In the op-ed, I mentioned the Brazilian aardvark hoax[1] as an 
> > example of error propagation (which happened entirely without 
> > Wikidata's and the Knowledge Graph's help). It took the New Yorker 
> > quite a bit of research
> to
> > piece together and confirm what happened, research which I 
> > understand
> would
> > not have happened if the originator of the hoax had not been willing 
> > to talk about his prank.
> >
> > It was the same with the fake Maurice Jarre quotes in Wikipedia[2] 
> > that made their way into mainstream press obituaries a few years 
> > ago. If the hoaxer had not come forward, no one would have been the 
> > wiser. The fake quotes would have remained a permanent part of the 
> > historical record.
> >
> > More recent cases include the widely repeated (including by 
> > Associated Press, for God's sake, to this day) claim that Joe 
> > Streater was involved
> in
> > the Boston College basketball point shaving scandal[3] and the 
> > Amelia Bedelia hoax.[4]
> >
> > If even things people insert as a joke propagate around the globe as 
> > a result of this vulnerability, then there is a clear and present 
> > potential for purposeful manipulation. We've seen enough cases of 
> > that, too.[5]
> >
> > This is not the sort of system the Wikimedia community should be 
> > helping
> to
> > build. The very values at the heart of the 

Re: [Wikimedia-l] Quality issues

2015-12-18 Thread Jane Darnell
The only infoboxes I have touched on Wikipedia in relation to Wikidata are
the ones I created with data from Wikidata with PrepBio and not the other
way around. As far as I know there is no tool available to import Wikidata
statements from Wikipedia infoboxes. This is why it took so long to get rid
of the persondata infoboxes, because the data was not formatted in a way
that was easily importable into Wikidata. Eventually the persondata was
deleted because the birth/death data was updated in Wikidata, albeit in a
different way. Unfortunately we lost all of the alternate spellings that
could have been added to the aliases on Wikidata, but I was delighted that
Maarten Dammers was able to upload aliases for artists into Wikidata last
week from ULAN, which means we now have way more aliases per artist
available for searching than we ever had on Wikipedia.

On Fri, Dec 18, 2015 at 9:24 AM, Peter Southwood <
peter.southw...@telkomsa.net> wrote:

> Wikipedia is not about infoboxes, they are (and are intended to be) a
> small to very small part of the article in most cases. Similarly,
> Wikipedias are not databases, so also without being a lawyer, I think your
> interpretation is wrong.
> Cheers,
> Peter
>
> -Original Message-
> From: Wikimedia-l [mailto:wikimedia-l-boun...@lists.wikimedia.org] On
> Behalf Of Andreas Kolbe
> Sent: Friday, 18 December 2015 10:06 AM
> To: Wikimedia Mailing List
> Subject: Re: [Wikimedia-l] Quality issues
>
> Gerard,
>
> Of course you can't license or copyright facts, but as the WMF legal
> team's page on this topic[1] outlines, there are database and compilation
> rights that exist independently of copyright. IANAL, but as I read that
> page, if you simply go ahead and copy all the infobox, template etc.
> content from a Wikipedia, this "would likely be a violation" even under US
> law (not to mention EU law).
>
> I don't know why Wikipedia was set up with a CC BY-SA licence rather than a
> CC0 licence, and the attribution required under CC BY-SA is unduly
> cumbersome, but attribution has always seemed to me like a useful concept.
> The fact that people like VDM Publishing who sell Wikipedia articles as
> books are required to say that their material comes from Wikipedia is
> useful, for example.
>
> Naturally it fosters re-use if you make Wikidata CC0, but that's precisely
> the point: you end up with a level of "market dominance" that just ain't
> healthy.
>
> [1] https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights
>
> On Thu, Dec 17, 2015 at 2:33 PM, Gerard Meijssen <
> gerard.meijs...@gmail.com>
> wrote:
>
> > Hoi,
> > Andreas, the law is an arse. However the law has it that you cannot
> > license facts. When in distributed processes data is retrieved from
> > Wikipedia, it is the authors who may contest their rights. There is no
> > such thing as collective rights for Wikipedia, all Wikipedias.
> >
> > You may not like this and that is fine.
> >
> > DBpedia has its license in the current way NOT because they care about
> > the license but because they are not interested in a row with
> > Wikipedians on the subject. They are quite happy to share their data
> > with Wikidata and make data retrieved in their processes with a CC-0.
> >
> > Thanks,
> >  GerardM
> >
> > On 17 December 2015 at 15:17, Andreas Kolbe  wrote:
> >
> > > On Wed, Dec 16, 2015 at 11:12 AM, Andrea Zanni
> > >  > >
> > > wrote:
> > >
> > > > On Sun, Dec 13, 2015 at 9:35 PM, Jane Darnell 
> > wrote:
> > > >
> > > > > Andrea,
> > > > > I totally agree on the mission/vision thing, but am not sure
> > > > > what you
> > > > mean
> > > > > exactly by scale - do you mean that Wikidata shouldn't try to be
> > > > > so granular that it has a statement to cover each factoid in any
> > Wikipedia
> > > > > article, or do you mean we need to talk about what constitutes
> > > notability
> > > > > in order not to grow Wikidata exponentially to the point the
> > > > > servers
> > > > crash?
> > > > > Jane
> > > > >
> > > > >
> > > > Hi Jane, I explained myself poorly (sometime English is too
> > > > difficult
> > :-)
> > > >
> > > > What I mean is that the scale of the error *could* be of another
> > > > scale, another order of magnitude.
> > > > The propagation of the error is multiplied, it's not just a single
> > error
> > > on
> > > > a wikipage: it's an error propagated in many wikipages, and then
> > Google,
> > > > etc.
> > > > A single point of failure.
> > > >
> > >
> > >
> > > Exactly: a single point of failure. A system where a single point of
> > > failure can have such consequences, potentially corrupting knowledge
> > > forever, is a bad system. It's not robust.
> > >
> > > In the op-ed, I mentioned the Brazilian aardvark hoax[1] as an
> > > example of error propagation (which happened entirely without
> > > Wikidata's and the Knowledge Graph's help). It took the New Yorker
> > > quite a bit of research
> > to
> > > piece 

Re: [Wikimedia-l] Quality issues

2015-12-18 Thread Andreas Kolbe
On Fri, Dec 18, 2015 at 8:24 AM, Peter Southwood <
peter.southw...@telkomsa.net> wrote:

> Wikipedia is not about infoboxes, they are (and are intended to be) a
> small to very small part of the article in most cases. Similarly,
> Wikipedias are not databases, so also without being a lawyer, I think your
> interpretation is wrong.



If you look at the Meta document I linked, you'll find that the definition
of a database provided there is quite broad:

---o0o---

From a legal perspective, a database is any organized collection of
materials — hard copy or electronic — that permits a user to search for and
access individual pieces of information contained within the materials. No
database software, as a programmer would understand it, is necessary. In
the US, for example, Black’s Law Dictionary defines a database as a
"compilation of information arranged in a systematic way and offering a
means of finding specific elements it contains, often today by electronic
means."[1] Databases may be protected by US copyright law as
"compilations." In the EU, databases are protected by the Database
Directive, which defines a database as "a collection of independent works,
data or other materials arranged in a systematic or methodical way and
individually accessible by electronic or other means."

---o0o---

You could argue that the sum of Wikipedia's harvestable infoboxes,
templates etc. constitutes a database, according to those definitions.

There is also the argument about the benefit of attribution, as opposed to
having data appear out of nowhere in a way that is completely opaque to end
users.


On Fri, Dec 18, 2015 at 10:21 AM, Gerard Meijssen  wrote:

> Hoi,
> The CC-0 license was set up with the express reason that everybody can use
> our data without any impediment.  Our objective is to share in the sum of
> all knowledge and we are more effective in that way.
>


> We do not care about market dominance, we care about doing our utmost to
> have the best data available.



Are these not just well-worn platitudes? If you cared so much about
quality, you or someone else would have fixed the Grasulf II of Friuli
entry by now.




> On 18 December 2015 at 09:05, Andreas Kolbe  wrote:
>
> > Gerard,
> >
> > Of course you can't license or copyright facts, but as the WMF legal
> team's
> > page on this topic[1] outlines, there are database and compilation rights
> > that exist independently of copyright. IANAL, but as I read that page, if
> > you simply go ahead and copy all the infobox, template etc. content from
> a
> > Wikipedia, this "would likely be a violation" even under US law (not to
> > mention EU law).
> >
> > I don't know why Wikipedia was set up with a CC BY-SA licence rather
> than a
> > CC0 licence, and the attribution required under CC BY-SA is unduly
> > cumbersome, but attribution has always seemed to me like a useful
> concept.
> > The fact that people like VDM Publishing who sell Wikipedia articles as
> > books are required to say that their material comes from Wikipedia is
> > useful, for example.
> >
> > Naturally it fosters re-use if you make Wikidata CC0, but that's
> precisely
> > the point: you end up with a level of "market dominance" that just ain't
> > healthy.
> >
> > [1] https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights
>
>
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-12-18 Thread Peter Southwood
Depending on how broad you want to stretch it, that covers an encyclopaedia or 
even a public library.
Not particularly helpful. 
Also there is the matter of how much is taken from it in the form of data, 
there is likely to be much more data available in the articles than is or will 
ever be used by Wikidata.
You could equally, possibly more convincingly, argue that the sum of 
Wikipedia's infoboxes, templates etc does not constitute a database, 
particularly since that was not the intention, and they have not been applied 
consistently and/or systematically to the whole project.
Cheers,
P

-Original Message-
From: Wikimedia-l [mailto:wikimedia-l-boun...@lists.wikimedia.org] On Behalf Of 
Andreas Kolbe
Sent: Friday, 18 December 2015 1:05 PM
To: Wikimedia Mailing List
Subject: Re: [Wikimedia-l] Quality issues

On Fri, Dec 18, 2015 at 8:24 AM, Peter Southwood < 
peter.southw...@telkomsa.net> wrote:

> Wikipedia is not about infoboxes, they are (and are intended to be) a 
> small to very small part of the article in most cases. Similarly, 
> Wikipedias are not databases, so also without being a lawyer, I think 
> your interpretation is wrong.



If you look at the Meta document I linked, you'll find that the definition of a 
database provided there is quite broad:

---o0o---

From a legal perspective, a database is any organized collection of materials — 
hard copy or electronic — that permits a user to search for and access 
individual pieces of information contained within the materials. No database 
software, as a programmer would understand it, is necessary. In the US, for 
example, Black’s Law Dictionary defines a database as a "compilation of 
information arranged in a systematic way and offering a means of finding 
specific elements it contains, often today by electronic means."[1] Databases 
may be protected by US copyright law as "compilations." In the EU, databases 
are protected by the Database Directive, which defines a database as "a 
collection of independent works, data or other materials arranged in a 
systematic or methodical way and individually accessible by electronic or other 
means."

---o0o---

You could argue that the sum of Wikipedia's harvestable infoboxes, templates 
etc. constitutes a database, according to those definitions.

There is also the argument about the benefit of attribution, as opposed to 
having data appear out of nowhere in a way that is completely opaque to end 
users.


On Fri, Dec 18, 2015 at 10:21 AM, Gerard Meijssen  wrote:

> Hoi,
> The CC-0 license was set up with the express reason that everybody can 
> use our data without any impediment.  Our objective is to share in the 
> sum of all knowledge and we are more effective in that way.
>


> We do not care about market dominance, we care about doing our utmost 
> to have the best data available.



Are these not just well-worn platitudes? If you cared so much about quality, 
you or someone else would have fixed the Grasulf II of Friuli entry by now.




> On 18 December 2015 at 09:05, Andreas Kolbe  wrote:
>
> > Gerard,
> >
> > Of course you can't license or copyright facts, but as the WMF legal
> team's
> > page on this topic[1] outlines, there are database and compilation 
> > rights that exist independently of copyright. IANAL, but as I read 
> > that page, if you simply go ahead and copy all the infobox, template 
> > etc. content from
> a
> > Wikipedia, this "would likely be a violation" even under US law (not 
> > to mention EU law).
> >
> > I don't know why Wikipedia was set up with a CC BY-SA licence rather
> than a
> > CC0 licence, and the attribution required under CC BY-SA is unduly 
> > cumbersome, but attribution has always seemed to me like a useful
> concept.
> > The fact that people like VDM Publishing who sell Wikipedia articles 
> > as books are required to say that their material comes from 
> > Wikipedia is useful, for example.
> >
> > Naturally it fosters re-use if you make Wikidata CC0, but that's
> precisely
> > the point: you end up with a level of "market dominance" that just 
> > ain't healthy.
> >
> > [1] https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights
>
>
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2016.0.7294 / Virus Database: 4489/11202 - Release Date: 12/18/15


___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 

Re: [Wikimedia-l] Quality issues

2015-12-18 Thread Gerard Meijssen
Hoi,
I have made changes to Grasulf II and I believe  it is better because of
it. If you find fault, you can do what I often do: make a difference.. Yes,
I do edit Wikipedia occasionally based on the info that I find.
Thanks,
  GerardM

On 18 December 2015 at 12:04, Andreas Kolbe  wrote:

> On Fri, Dec 18, 2015 at 8:24 AM, Peter Southwood <
> peter.southw...@telkomsa.net> wrote:
>
> > Wikipedia is not about infoboxes, they are (and are intended to be) a
> > small to very small part of the article in most cases. Similarly,
> > Wikipedias are not databases, so also without being a lawyer, I think
> your
> > interpretation is wrong.
>
>
>
> If you look at the Meta document I linked, you'll find that the definition
> of a database provided there is quite broad:
>
> ---o0o---
>
> From a legal perspective, a database is any organized collection of
> materials — hard copy or electronic — that permits a user to search for and
> access individual pieces of information contained within the materials. No
> database software, as a programmer would understand it, is necessary. In
> the US, for example, Black’s Law Dictionary defines a database as a
> "compilation of information arranged in a systematic way and offering a
> means of finding specific elements it contains, often today by electronic
> means."[1] Databases may be protected by US copyright law as
> "compilations." In the EU, databases are protected by the Database
> Directive, which defines a database as "a collection of independent works,
> data or other materials arranged in a systematic or methodical way and
> individually accessible by electronic or other means."
>
> ---o0o---
>
> You could argue that the sum of Wikipedia's harvestable infoboxes,
> templates etc. constitutes a database, according to those definitions.
>
> There is also the argument about the benefit of attribution, as opposed to
> having data appear out of nowhere in a way that is completely opaque to end
> users.
>
>
> On Fri, Dec 18, 2015 at 10:21 AM, Gerard Meijssen <
> gerard.meijs...@gmail.com
> > wrote:
>
> > Hoi,
> > The CC-0 license was set up with the express reason that everybody can
> use
> > our data without any impediment.  Our objective is to share in the sum of
> > all knowledge and we are more effective in that way.
> >
>
>
> > We do not care about market dominance, we care about doing our utmost to
> > have the best data available.
>
>
>
> Are these not just well-worn platitudes? If you cared so much about
> quality, you or someone else would have fixed the Grasulf II of Friuli
> entry by now.
>
>
>
>
> > On 18 December 2015 at 09:05, Andreas Kolbe  wrote:
> >
> > > Gerard,
> > >
> > > Of course you can't license or copyright facts, but as the WMF legal
> > team's
> > > page on this topic[1] outlines, there are database and compilation
> rights
> > > that exist independently of copyright. IANAL, but as I read that page,
> if
> > > you simply go ahead and copy all the infobox, template etc. content
> from
> > a
> > > Wikipedia, this "would likely be a violation" even under US law (not to
> > > mention EU law).
> > >
> > > I don't know why Wikipedia was set up with a CC BY-SA licence rather
> > than a
> > > CC0 licence, and the attribution required under CC BY-SA is unduly
> > > cumbersome, but attribution has always seemed to me like a useful
> > concept.
> > > The fact that people like VDM Publishing who sell Wikipedia articles as
> > > books are required to say that their material comes from Wikipedia is
> > > useful, for example.
> > >
> > > Naturally it fosters re-use if you make Wikidata CC0, but that's
> > precisely
> > > the point: you end up with a level of "market dominance" that just
> ain't
> > > healthy.
> > >
> > > [1] https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights
> >
> >
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> New messages to: Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-12-18 Thread Andrea Zanni
On Thu, Dec 17, 2015 at 3:17 PM, Andreas Kolbe  wrote:

> > A single point of failure.
> >
>
> Exactly: a single point of failure. A system where a single point of
> failure can have such consequences, potentially corrupting knowledge
> forever, is a bad system. It's not robust.



Andreas, you apparently did not read the following sentence:
"Of course, the opposite is also true: it's a single point of openness,
correction, information. "

At last, I agree with Gerard:
you seem not to accept people arguments and continue to reiterate yours
again and again.
The problem, to me, is that you don't like Wikis: you don't like that they
are open, and prone to errors and vulnerable. Yet, this is our greatest
weakness and strength, at the same time.
The Wikimedia movement, at least for the last 15 years, believes in this,
is one of our pillars.
So, if you don't like it, maybe the Wikimedia movements is not suitable for
you, maybe you'd like more working in Citizendium or something. There's no
shame in it, and I really believe it: it's just a matter of choice.

I personally choose to believe in openness as a way to leverage good will
from people, willingness to share knowledge. I believe Wikidata is going in
the same direction, and I have not found evidence yet that the "power and
centralisation" of data make the openness a problem of a different
magnitudo, different from Wikipedia.

I'm happy to discuss this point specifically, as I think we can have a
reasonable and constructive debate on this.

But if you reiterate examples on Wikipedia, you lose me. We already have
taken a choice, we believe that the payoff between openness and control is
worth it.



>Are these not just well-worn platitudes? If you cared so much about
>quality, you or someone else would have fixed the Grasulf II of Friuli
>entry by now.

You are included in the set of "someone else", you found all the errors,
and you could have corrected them. You decided it was best to write a very
long mail instead of correcting them. It's you're right, but it's not the
wikimedia way.
The Wikimedia way is wonderfully explained in three magical words: so fix
it [1].


Aubrey

[1] https://en.wikipedia.org/wiki/Template:Sofixit
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-12-18 Thread Andreas Kolbe
On Fri, Dec 18, 2015 at 11:28 AM, Andrea Zanni 
wrote:

> Andreas, you apparently did not read the following sentence:
> "Of course, the opposite is also true: it's a single point of openness,
> correction, information. "
>

Andrea,

I understand and appreciate your point, but I would like you to consider
that what you say may be less true of Wikidata than it is for other
Wikimedia wikis, for several reasons:

Wikipedia, Wiktionary etc. are functionally open and correctable because
people by and large view their content on Wikipedia, Wiktionary etc. itself
(or in places where the provenance is clearly indicated, thanks to CC
BY-SA). The place where you read it is the same place where you can edit
it. There is an "Edit" tab, and it really *is* easy to change the content.
(It is certainly easy to correct a typo, which is how many of us started.)

With Wikidata, this is different. Wikidata, as a semantic wiki, is designed
to be read by machines. These machines don't edit, they *propagate*.
Wikidata is not a site that end users--human beings--will browse and
consult the way people consult Wikipedia, Wiktionary, Commons, etc.

Wikidata is, or will be, of interest mostly to re-users--search engines and
other intermediaries who will use its machine-readable data as an input to
build and design their own content. And when they use Wikidata as an input,
they don't have to acknowledge the source.

Allowing unattributed re-use may *seem* more open. But I contend that in
practice it makes Wikidata *less* open as a wiki: because when people don't
know where the information comes from, they are also unable to contribute
at source. The underlying Wikimedia project effectively becomes invisible
to them, a closed book.

That is not good for a crowdsourced project from multiple points of view.

Firstly, it impedes recruitment. Far fewer consumers of Wikidata
information will become Wikidata editors, because they will typically find
Wikidata content on other sites where Wikidata is not even mentioned.

Secondly, it reduces transparency. Data provenance is important, as Mark
Graham and Heather Ford have pointed out.

Thirdly, it fails to encourage appropriate vigilance in the consumer. (The
error propagation problems I've described in this thread all involved
unattributed re-use of Wikimedia content.)

There are other reasons why Wikidata is less open, besides CC0 and the lack
of attribution.

Wikidata is the least user-friendly Wikimedia wiki. The hurdle that
newbies--even experienced Wikimedians--have to overcome to contribute is an
order of magnitude higher than it is for other Wikimedia projects.

For a start, there is no Edit tab at the top of the page. When you go to
Barack Obama's entry in Wikidata[1] for example, the word "Edit" is not to
be found anywhere on the page. It does not look like a page you can edit
(and indeed, members of the public can't edit it).

It took me a while to figure out that the item is protected (just like the
Jerusalem item).

In other Wikimedia wikis that do have an "Edit" tab, that tab changes to
"View source" if the page is protected, giving a visual indication of the
page's status that people--Wikimedia insiders at least--can recognise.

Unprotected Wikidata items do have "edit" and "add" links, but they are
less prominent. (The "add" link for adding new properties is hidden away at
the very bottom of the page.) And when you do click "edit" or "add", it is
not obvious what you are supposed to do, the way it is in text-based wikis.

The learning curve involved in actually editing a Wikidata item is far
steeper than it is in other Wikimedia wikis. There is no Wikidata
equivalent of the "correcting a typo" edit in Wikipedia. You need to go
away and learn the syntax before you can do anything at all in Wikidata.

For all of these reasons I believe the systemic balance between information
delivery (output) and ease of contribution (input) is substantially
different for Wikidata than it is for any other Wikimedia wiki.



> So, if you don't like it, maybe the Wikimedia movements is not suitable for
> you, maybe you'd like more working in Citizendium or something. There's no
> shame in it, and I really believe it: it's just a matter of choice.
>


I have been contributing to Wikimedia projects for ten years now. I
consider it an important movement to be involved in, exactly per your
arguments about openness and public involvement above. If openness is a
strength, then it follows that Wikimedia as a movement is stronger for
debate and dissent.

On a more personal level, I find the idea of free knowledge inspiring. At
every Wikimedia event I have attended, that excitement and the joy of
creation are in the air and communicate themselves. I relate to it, and
share in it. There are many Wikimedia content creators whose I work I
admire and respect, and who have become friends.

But I don't share the quasi-religious zeal that seems to suffuse some of
the public discourse in 

Re: [Wikimedia-l] Footer link fix request

2015-12-18 Thread Richard Symonds
I've redirected the link to go to
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines :-)

Richard Symonds
Wikimedia UK
0207 065 0992

Wikimedia UK is a Company Limited by Guarantee registered in England and
Wales, Registered No. 6741827. Registered Charity No.1144513. Registered
Office 4th Floor, Development House, 56-64 Leonard Street, London EC2A 4LT.
United Kingdom. Wikimedia UK is the UK chapter of a global Wikimedia
movement. The Wikimedia projects are run by the Wikimedia Foundation (who
operate Wikipedia, amongst other projects).

*Wikimedia UK is an independent non-profit charity with no legal control
over Wikipedia nor responsibility for its contents.*

On 17 December 2015 at 18:37, Tito Dutta  wrote:

> I was reading a mail from this list where I saw this footer:
>
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > Wikimedia-l@lists.wikimedia.org
> > <
> https://meta.wikimedia.org/wiki/Mailing_lists/guidelineswikimedi...@lists.wikimedia.org
> >
>
> The mailing list guidelines link redirect to a non-existing page link
> Perhaps, this link may be used:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines ?
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-12-18 Thread Gerard Meijssen
Hoi,
Andreas you have a point. The point you make that Wikidata is only
considered for re-use is compelling. I edit very much but I do NOT use
Wikidata to understand what data is there. It is a mess and not fit for
humans. This however is not necessarily true. Magnus created the
"Reasonator" and it provides me with an environment that helps me
understand what data is available. It makes information out of data and, it
is actionable in many ways.

It is not really hard to make a native Reasonator and, it will be usable in
any language as it is. It will make a big difference because it does negate
the negative arguments that you make. It is imho the biggest hurdle for
Wikidata and it is totally unnecessary for the Wikidata team to persist in
their lack of a usable user interface. It is a matter of priority.

For your information, a German university is considering the use of
Wikidata and for them a Reasonator like interface that allows them to edit
as well is what is missing for them to go ahead with Wikidata at this time.

They would use Wikidata for science and, for them the ability to link from
Wikidata to any and all other resources is a relevant of consideration.

They are interesting to share their data. They do not mind that it becomes
available under CC-0, what they look for is a best practice where their
data becomes available with a reference. We all agree that this IS a best
practice. They are as interested to learn where Wikidata disagrees because
to them it is a matter of quality to get things exactly right.
Thanks,
  GerardM

On 18 December 2015 at 16:06, Andreas Kolbe  wrote:

> On Fri, Dec 18, 2015 at 11:28 AM, Andrea Zanni 
> wrote:
>
> > Andreas, you apparently did not read the following sentence:
> > "Of course, the opposite is also true: it's a single point of openness,
> > correction, information. "
> >
>
> Andrea,
>
> I understand and appreciate your point, but I would like you to consider
> that what you say may be less true of Wikidata than it is for other
> Wikimedia wikis, for several reasons:
>
> Wikipedia, Wiktionary etc. are functionally open and correctable because
> people by and large view their content on Wikipedia, Wiktionary etc. itself
> (or in places where the provenance is clearly indicated, thanks to CC
> BY-SA). The place where you read it is the same place where you can edit
> it. There is an "Edit" tab, and it really *is* easy to change the content.
> (It is certainly easy to correct a typo, which is how many of us started.)
>
> With Wikidata, this is different. Wikidata, as a semantic wiki, is designed
> to be read by machines. These machines don't edit, they *propagate*.
> Wikidata is not a site that end users--human beings--will browse and
> consult the way people consult Wikipedia, Wiktionary, Commons, etc.
>
> Wikidata is, or will be, of interest mostly to re-users--search engines and
> other intermediaries who will use its machine-readable data as an input to
> build and design their own content. And when they use Wikidata as an input,
> they don't have to acknowledge the source.
>
> Allowing unattributed re-use may *seem* more open. But I contend that in
> practice it makes Wikidata *less* open as a wiki: because when people don't
> know where the information comes from, they are also unable to contribute
> at source. The underlying Wikimedia project effectively becomes invisible
> to them, a closed book.
>
> That is not good for a crowdsourced project from multiple points of view.
>
> Firstly, it impedes recruitment. Far fewer consumers of Wikidata
> information will become Wikidata editors, because they will typically find
> Wikidata content on other sites where Wikidata is not even mentioned.
>
> Secondly, it reduces transparency. Data provenance is important, as Mark
> Graham and Heather Ford have pointed out.
>
> Thirdly, it fails to encourage appropriate vigilance in the consumer. (The
> error propagation problems I've described in this thread all involved
> unattributed re-use of Wikimedia content.)
>
> There are other reasons why Wikidata is less open, besides CC0 and the lack
> of attribution.
>
> Wikidata is the least user-friendly Wikimedia wiki. The hurdle that
> newbies--even experienced Wikimedians--have to overcome to contribute is an
> order of magnitude higher than it is for other Wikimedia projects.
>
> For a start, there is no Edit tab at the top of the page. When you go to
> Barack Obama's entry in Wikidata[1] for example, the word "Edit" is not to
> be found anywhere on the page. It does not look like a page you can edit
> (and indeed, members of the public can't edit it).
>
> It took me a while to figure out that the item is protected (just like the
> Jerusalem item).
>
> In other Wikimedia wikis that do have an "Edit" tab, that tab changes to
> "View source" if the page is protected, giving a visual indication of the
> page's status that people--Wikimedia insiders at least--can 

[Wikimedia-l] What Wikimedia Research is up to in the next quarter

2015-12-18 Thread Dario Taraborelli
Hey all,

I’m glad to announce that the Wikimedia Research team’s goals

for
the next quarter (January - March 2016) are up on wiki.

The Research and Data
 team
will continue to work with our volunteers and collaborators on revision
scoring as a service  adding
support for 5 new languages and prototyping new models (including an edit
type classifier
).
We will also continue to iterate on the design of article creation
recommendations
,
running a dedicated campaign in coordination with existing editathons to
improve the quality of these recommendations. Finally, we will extend a
research project we started in November aimed at understanding the behavior
of Wikipedia readers

, by combining qualitative survey data with behavioral analysis from our
HTTP request logs.

The Design Research
 team
will conduct an in-depth study of user needs (particularly readers) on the
ground in February. We will continue to work with other Wikimedia
Engineering teams throughout the quarter to ensure the adoption of
human-centered design principles and pragmatic personas
 in our
product development cycle. We’re also excited to start a collaboration

with
students at the University of Washington to understand what free online
information resources (including, but not limited to, Wikimedia projects)
students use.

I am also glad to report that two papers on link and article
recommendations (the result of a formal collaboration with a team at
Stanford) were accepted for presentation at WSDM '16 and WWW ’16 (preprints
will be made available shortly). An overview on revision scoring as a
service
 was
published a few weeks ago on the Wikimedia blog, and got some good media
coverage

.

We're constantly looking for contributors and as usual we welcome feedback
on these projects via the corresponding talk pages on Meta. You can contact
us for any question on IRC via the #wikimedia-research channel and follow
@WikiResearch  on Twitter for the latest
Wikipedia and Wikimedia research updates hot off the press.

Wishing you all happy holidays,

Dario and Abbey on behalf of the team


*Dario Taraborelli  *Head of Research, Wikimedia Foundation
wikimediafoundation.org • nitens.org • @readermeter

___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,