subject:"Re\: \[Wikimedia\-l\] Quality issues"

...and you seem to think one can live by an encyclopedia. I can assure you,
Wikipedia is a lot of things, but it is not a way of life. To answer your
fear which I read between the lines of what you are saying, in order to
create a Wikipedia project you need a basic list of 10,000 articles. The
list as I am sure you are aware, is a pretty boring and strangely ordered
grouping of fairly dry, non-political subjects. I believe there are very
few articles on there that are worth firebombing someone over. [[Michael
Jackson]] is on the list, among other notable Americans. Granted, you could
get past the 10,000 article startup requirement somehow and then start
creating lots of POV articles, but once you do this you will soon be
discovered. There is just no way to hide it.

On Tue, Dec 29, 2015 at 3:18 PM, Andreas Kolbe  wrote:

> On Tue, Dec 29, 2015 at 10:44 AM, Lilburne 
> wrote:
>
> > On 28/12/2015 18:00, Jane Darnell wrote:
> >
> >> All I said is that the wiki way works, that's all. You can't hide it
> when
> >> someone tries to take over a project, and that is the reason we
> shouldn't
> >> try to anticipate that with convoluted strategies. "Assume Good Faith"
> >> will
> >> always win out over any strange misguided takeover strategy, which is
> why
> >> governments that intend to do such things choose nowadays to just block
> >> wikimedia altogether. It is not our wake-up call to take, but that of
> the
> >> Kazakh people.
> >>
> >>
> > Facebook showed the other year that it could manipulate people by what it
> > showed them in their feeds.
> >
> >
> http://www.telegraph.co.uk/technology/facebook/10932534/Facebook-conducted-secret-psychology-experiment-on-users-emotions.html
> > http://www.bbc.co.uk/news/technology-28051930
> >
> > They didn't do this for fun, they did it to show their clients
> > (advertisers, governments) that they could manipulate millions of people.
> > You only need a small push in one direction or another to influence a
> large
> > population. Doesn't matter if the push is to buy a particular soap, vote
> > one way or another, or how you see a particular minority, or issue.
> >
> >
> http://www.networkworld.com/article/2450825/big-data-business-intelligence/facebooks-icky-psychology-experiment-is-actually-business-as-usual.html
> >
> > Do it to a naively trusted source and you have a triple word score
> > jackpot^H^H^Hboot.
>
>
>
> I thought Epstein's and Robertson's paper, "The search engine manipulation
> effect (SEME) and its possible impact on the outcomes of elections", was
> very interesting as well:
>
>
> http://www.politico.com/magazine/story/2015/08/how-google-could-rig-the-2016-election-121548
>
> http://www.pnas.org/content/112/33/E4512.abstract
>
>
> On Mon, Dec 28, 2015 at 7:43 PM, Jane Darnell  wrote:
>
> > Well the chances of me being firebombed while on vacation in the states
> are
> > probably higher than me being firebombed for editing Wikipedia, but that
> > still doesn't mean we need to worry about changing the wiki model. I
> guess
> > I have lost the thread of your point entirely now.
>
>
>
> To be honest, I don't think you had ever gotten hold of it in the first
> place. To me, you seem to live in a very sheltered and naive world.
>
> If we have reports of Wikipedians being tortured in Azerbaijan (and there
> seems to have been some truth to these reports, as the sysop named in them
> was globally blocked by the WMF a short while later[1]), you should be able
> to understand that it is not quite as easy to live the wiki way there as it
> is in your country, and that some of the assumptions you have formed based
> on your own experiences of the wiki model may not hold in other locales.
>
> [1]
>
> https://meta.wikimedia.org/w/index.php?title=User:Irada=12421543=7322889
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> New messages to: Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

Interesting link, thanks Gerard! I was referring to a citation for this
quote however:
"and a
> significant
> > selection of the information unsourced WikiDatas data lacks the quality,
> > integrity we all expect of ourselves when we add content to any of the
> > projects."

On Tue, Dec 29, 2015 at 1:35 PM, Gerard Meijssen 
wrote:

>
> http://www.amnesty.nl/sites/default/files/public/ainl_guidelines_use_of_force.pdf
>
> On 29 December 2015 at 13:30, Jane Darnell  wrote:
>
> > citation needed
> >
> > On Tue, Dec 29, 2015 at 1:27 PM, Gnangarra  wrote:
> >
> > > >
> > > > This is when sources truly become vital. But do
> > > > remember, the POV of the USA and many of its sources are as suspect
> as
> > > > those from Kazakhstan.
> > >
> > >
> > > And that is why regardless of the fact a citation  is so important, 
> > >
> > > because the person receiving the information must able to make their
> own
> > > assessment of the sources reliability with a CC0 license and a
> > significant
> > > selection of the information unsourced WikiDatas data lacks the
> quality,
> > > integrity we all expect of ourselves when we add content to any of the
> > > projects.
> > >
> > > On 29 December 2015 at 20:15, Gerard Meijssen <
> gerard.meijs...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hoi,
> > > > So you have determined that people can be manipulated. Good, then
> what?
> > > >
> > > > If this is the tack that you take you will be grounded because there
> is
> > > no
> > > > plan. It is a negative attitude that only stifles. Quality is not
> only
> > in
> > > > sources, sources can be and are manipulations in their own right.
> Many
> > > > important subjects are woefully underrepresented. The argument has it
> > > that
> > > > it is because of a lack of sources..
> > > >
> > > > Sources are relevant but we only are interested in particular
> subjects.
> > > We
> > > > do not need to look at Kazakhstan to find fault. Amnest (reliable
> > source)
> > > > indicates that all USA police forces are not in compliance with
> > > > international agreements on the use of force. NOW WHAT ??
> > > >
> > > > When quality is the subject, it is important to decide how we
> > effectively
> > > > improve quality. VIAF provided Wikidata with a list of issues they
> > found.
> > > > Tom checked it out and our quality is better as a result. It means
> that
> > > > more information is linked for people who visit a library. When
> awards
> > > are
> > > > known, adding known recipients in Wikidata based on info from
> multiple
> > > > Wikipedias improves the quality and in this way many incorrect links
> > are
> > > > exposed.
> > > >
> > > > When quality of our projects is the subject, decide how we can do a
> > > better
> > > > job. When Facebook invites companies to manipulate people, it is why
> > > > Facebook information is suspect. At most it is a reminder that
> > > manipulation
> > > > is an important issue. It does not mean that people cannot add data
> on
> > > > their hobby horse.
> > > >
> > > > Quality is important but quality is more than sources. When sources
> are
> > > > used as an argument that is detrimental to the quality of Wikidata,
> > then
> > > in
> > > > my opinion we have forgotten why Wikipedia was possible in the first
> > > place.
> > > > It was not because of sources, it was because of the web of
> information
> > > we
> > > > created, a web that is of a NPOV.
> > > >
> > > > Wikidata does not have a NPOV. It represents facts found in many
> > places.
> > > As
> > > > the information becomes more extended, it becomes possible to find
> > > > manipulations, errors. This is when sources truly become vital. But
> do
> > > > remember, the POV of the USA and many of its sources are as suspect
> as
> > > > those from Kazakhstan.
> > > > Thanks,
> > > >  GerardM
> > > >
> > > > On 29 December 2015 at 11:44, Lilburne  >
> > > > wrote:
> > > >
> > > > > On 28/12/2015 18:00, Jane Darnell wrote:
> > > > >
> > > > >> All I said is that the wiki way works, that's all. You can't hide
> it
> > > > when
> > > > >> someone tries to take over a project, and that is the reason we
> > > > shouldn't
> > > > >> try to anticipate that with convoluted strategies. "Assume Good
> > Faith"
> > > > >> will
> > > > >> always win out over any strange misguided takeover strategy, which
> > is
> > > > why
> > > > >> governments that intend to do such things choose nowadays to just
> > > block
> > > > >> wikimedia altogether. It is not our wake-up call to take, but that
> > of
> > > > the
> > > > >> Kazakh people.
> > > > >>
> > > > >>
> > > > > Facebook showed the other year that it could manipulate people by
> > what
> > > it
> > > > > showed them in their feeds.
> > > > >
> > > > >
> > > >
> > >
> >
> http://www.telegraph.co.uk/technology/facebook/10932534/Facebook-conducted-secret-psychology-experiment-on-users-emotions.html
> > > > >

Re: [Wikimedia-l] Quality issues

Well I may live in a fantasy world, but that is entirely beside the point.
When I say these things will be discovered, that's exactly what you are
saying happened years ago. These things will always be discovered, because
they are unhidable. In your example the Uzbek Wikipedians have learned to
stay off certain pages in order to coexist with Uzbek authorities. Similar
coping strategies exist on other projects. It doesn't mean the entire Uzbek
encyclopedia is untrustworthy or that the wiki model is at fault. The trail
of tears is in the talk pages. I don't see anything wrong with making such
concessions, since after discovery it becomes public record and everyone
knows it anyway. What I don't understand is what you are trying to say. If
you are proposing something, just come out and propose it instead of
complaining about what goes on in certain projects and jumping from one
scare tactic to another.

On Tue, Dec 29, 2015 at 9:06 PM, Andreas Kolbe  wrote:

> On Tue, Dec 29, 2015 at 5:39 PM, Jane Darnell  wrote:
>
> > Granted, you could
> > get past the 10,000 article startup requirement somehow and then start
> > creating lots of POV articles, but once you do this you will soon be
> > discovered. There is just no way to hide it.
>
>
>
> Jane, you're living in a fantasy world. We already have Wikipedias with
> these POV articles. They've been "discovered" long ago, and it makes zero
> difference.
>
> See e.g. the hagiography of the Uzbek President in the Uzbek Wikipedia[1]
> (him of the boiled dissidents). It hails him as the best thing since sliced
> bread.
>
> Then see what Human Rights organisations have to say about his regime[2],
> or compare the English Wikipedia article.[3]
>
> That train left the station a long time ago. The wiki model does *not* work
> in these contexts.
>
> [1]
>
> https://translate.google.com/translate?hl=en=uz=en=https%3A%2F%2Fuz.wikipedia.org%2Fwiki%2FIslom_Karimov=1
> [2] https://www.hrw.org/europe/central-asia/uzbekistan
> [3]
> https://en.wikipedia.org/wiki/Islam_Karimov#Human_rights_and_press_freedom
>
>
> >
> > On Tue, Dec 29, 2015 at 3:18 PM, Andreas Kolbe 
> wrote:
> >
> > > On Tue, Dec 29, 2015 at 10:44 AM, Lilburne <
> lilbu...@tygers-of-wrath.net
> > >
> > > wrote:
> > >
> > > > On 28/12/2015 18:00, Jane Darnell wrote:
> > > >
> > > >> All I said is that the wiki way works, that's all. You can't hide it
> > > when
> > > >> someone tries to take over a project, and that is the reason we
> > > shouldn't
> > > >> try to anticipate that with convoluted strategies. "Assume Good
> Faith"
> > > >> will
> > > >> always win out over any strange misguided takeover strategy, which
> is
> > > why
> > > >> governments that intend to do such things choose nowadays to just
> > block
> > > >> wikimedia altogether. It is not our wake-up call to take, but that
> of
> > > the
> > > >> Kazakh people.
> > > >>
> > > >>
> > > > Facebook showed the other year that it could manipulate people by
> what
> > it
> > > > showed them in their feeds.
> > > >
> > > >
> > >
> >
> http://www.telegraph.co.uk/technology/facebook/10932534/Facebook-conducted-secret-psychology-experiment-on-users-emotions.html
> > > > http://www.bbc.co.uk/news/technology-28051930
> > > >
> > > > They didn't do this for fun, they did it to show their clients
> > > > (advertisers, governments) that they could manipulate millions of
> > people.
> > > > You only need a small push in one direction or another to influence a
> > > large
> > > > population. Doesn't matter if the push is to buy a particular soap,
> > vote
> > > > one way or another, or how you see a particular minority, or issue.
> > > >
> > > >
> > >
> >
> http://www.networkworld.com/article/2450825/big-data-business-intelligence/facebooks-icky-psychology-experiment-is-actually-business-as-usual.html
> > > >
> > > > Do it to a naively trusted source and you have a triple word score
> > > > jackpot^H^H^Hboot.
> > >
> > >
> > >
> > > I thought Epstein's and Robertson's paper, "The search engine
> > manipulation
> > > effect (SEME) and its possible impact on the outcomes of elections",
> was
> > > very interesting as well:
> > >
> > >
> > >
> >
> http://www.politico.com/magazine/story/2015/08/how-google-could-rig-the-2016-election-121548
> > >
> > > http://www.pnas.org/content/112/33/E4512.abstract
> > >
> > >
> > > On Mon, Dec 28, 2015 at 7:43 PM, Jane Darnell 
> wrote:
> > >
> > > > Well the chances of me being firebombed while on vacation in the
> states
> > > are
> > > > probably higher than me being firebombed for editing Wikipedia, but
> > that
> > > > still doesn't mean we need to worry about changing the wiki model. I
> > > guess
> > > > I have lost the thread of your point entirely now.
> > >
> > >
> > >
> > > To be honest, I don't think you had ever gotten hold of it in the first
> > > place. To me, you seem to live in a very sheltered and naive world.
> > >
> > > If we have

Re: [Wikimedia-l] Quality issues

2015-12-29 Thread Andreas Kolbe

On Tue, Dec 29, 2015 at 5:39 PM, Jane Darnell  wrote:

> Granted, you could
> get past the 10,000 article startup requirement somehow and then start
> creating lots of POV articles, but once you do this you will soon be
> discovered. There is just no way to hide it.



Jane, you're living in a fantasy world. We already have Wikipedias with
these POV articles. They've been "discovered" long ago, and it makes zero
difference.

See e.g. the hagiography of the Uzbek President in the Uzbek Wikipedia[1]
(him of the boiled dissidents). It hails him as the best thing since sliced
bread.

Then see what Human Rights organisations have to say about his regime[2],
or compare the English Wikipedia article.[3]

That train left the station a long time ago. The wiki model does *not* work
in these contexts.

[1]
https://translate.google.com/translate?hl=en=uz=en=https%3A%2F%2Fuz.wikipedia.org%2Fwiki%2FIslom_Karimov=1
[2] https://www.hrw.org/europe/central-asia/uzbekistan
[3]
https://en.wikipedia.org/wiki/Islam_Karimov#Human_rights_and_press_freedom


>
> On Tue, Dec 29, 2015 at 3:18 PM, Andreas Kolbe  wrote:
>
> > On Tue, Dec 29, 2015 at 10:44 AM, Lilburne  >
> > wrote:
> >
> > > On 28/12/2015 18:00, Jane Darnell wrote:
> > >
> > >> All I said is that the wiki way works, that's all. You can't hide it
> > when
> > >> someone tries to take over a project, and that is the reason we
> > shouldn't
> > >> try to anticipate that with convoluted strategies. "Assume Good Faith"
> > >> will
> > >> always win out over any strange misguided takeover strategy, which is
> > why
> > >> governments that intend to do such things choose nowadays to just
> block
> > >> wikimedia altogether. It is not our wake-up call to take, but that of
> > the
> > >> Kazakh people.
> > >>
> > >>
> > > Facebook showed the other year that it could manipulate people by what
> it
> > > showed them in their feeds.
> > >
> > >
> >
> http://www.telegraph.co.uk/technology/facebook/10932534/Facebook-conducted-secret-psychology-experiment-on-users-emotions.html
> > > http://www.bbc.co.uk/news/technology-28051930
> > >
> > > They didn't do this for fun, they did it to show their clients
> > > (advertisers, governments) that they could manipulate millions of
> people.
> > > You only need a small push in one direction or another to influence a
> > large
> > > population. Doesn't matter if the push is to buy a particular soap,
> vote
> > > one way or another, or how you see a particular minority, or issue.
> > >
> > >
> >
> http://www.networkworld.com/article/2450825/big-data-business-intelligence/facebooks-icky-psychology-experiment-is-actually-business-as-usual.html
> > >
> > > Do it to a naively trusted source and you have a triple word score
> > > jackpot^H^H^Hboot.
> >
> >
> >
> > I thought Epstein's and Robertson's paper, "The search engine
> manipulation
> > effect (SEME) and its possible impact on the outcomes of elections", was
> > very interesting as well:
> >
> >
> >
> http://www.politico.com/magazine/story/2015/08/how-google-could-rig-the-2016-election-121548
> >
> > http://www.pnas.org/content/112/33/E4512.abstract
> >
> >
> > On Mon, Dec 28, 2015 at 7:43 PM, Jane Darnell  wrote:
> >
> > > Well the chances of me being firebombed while on vacation in the states
> > are
> > > probably higher than me being firebombed for editing Wikipedia, but
> that
> > > still doesn't mean we need to worry about changing the wiki model. I
> > guess
> > > I have lost the thread of your point entirely now.
> >
> >
> >
> > To be honest, I don't think you had ever gotten hold of it in the first
> > place. To me, you seem to live in a very sheltered and naive world.
> >
> > If we have reports of Wikipedians being tortured in Azerbaijan (and there
> > seems to have been some truth to these reports, as the sysop named in
> them
> > was globally blocked by the WMF a short while later[1]), you should be
> able
> > to understand that it is not quite as easy to live the wiki way there as
> it
> > is in your country, and that some of the assumptions you have formed
> based
> > on your own experiences of the wiki model may not hold in other locales.
> >
> > [1]
> >
> >
> https://meta.wikimedia.org/w/index.php?title=User:Irada=12421543=7322889
> > ___
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > New messages to: Wikimedia-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > 
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> New messages to: Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
>

Re: [Wikimedia-l] Quality issues

2015-12-29 Thread Gnangarra

>
> This is when sources truly become vital. But do
> remember, the POV of the USA and many of its sources are as suspect as
> those from Kazakhstan.


And that is why regardless of the fact a citation  is so important, 

because the person receiving the information must able to make their own
assessment of the sources reliability with a CC0 license and a significant
selection of the information unsourced WikiDatas data lacks the quality,
integrity we all expect of ourselves when we add content to any of the
projects.

On 29 December 2015 at 20:15, Gerard Meijssen 
wrote:

> Hoi,
> So you have determined that people can be manipulated. Good, then what?
>
> If this is the tack that you take you will be grounded because there is no
> plan. It is a negative attitude that only stifles. Quality is not only in
> sources, sources can be and are manipulations in their own right. Many
> important subjects are woefully underrepresented. The argument has it that
> it is because of a lack of sources..
>
> Sources are relevant but we only are interested in particular subjects. We
> do not need to look at Kazakhstan to find fault. Amnest (reliable source)
> indicates that all USA police forces are not in compliance with
> international agreements on the use of force. NOW WHAT ??
>
> When quality is the subject, it is important to decide how we effectively
> improve quality. VIAF provided Wikidata with a list of issues they found.
> Tom checked it out and our quality is better as a result. It means that
> more information is linked for people who visit a library. When awards are
> known, adding known recipients in Wikidata based on info from multiple
> Wikipedias improves the quality and in this way many incorrect links are
> exposed.
>
> When quality of our projects is the subject, decide how we can do a better
> job. When Facebook invites companies to manipulate people, it is why
> Facebook information is suspect. At most it is a reminder that manipulation
> is an important issue. It does not mean that people cannot add data on
> their hobby horse.
>
> Quality is important but quality is more than sources. When sources are
> used as an argument that is detrimental to the quality of Wikidata, then in
> my opinion we have forgotten why Wikipedia was possible in the first place.
> It was not because of sources, it was because of the web of information we
> created, a web that is of a NPOV.
>
> Wikidata does not have a NPOV. It represents facts found in many places. As
> the information becomes more extended, it becomes possible to find
> manipulations, errors. This is when sources truly become vital. But do
> remember, the POV of the USA and many of its sources are as suspect as
> those from Kazakhstan.
> Thanks,
>  GerardM
>
> On 29 December 2015 at 11:44, Lilburne 
> wrote:
>
> > On 28/12/2015 18:00, Jane Darnell wrote:
> >
> >> All I said is that the wiki way works, that's all. You can't hide it
> when
> >> someone tries to take over a project, and that is the reason we
> shouldn't
> >> try to anticipate that with convoluted strategies. "Assume Good Faith"
> >> will
> >> always win out over any strange misguided takeover strategy, which is
> why
> >> governments that intend to do such things choose nowadays to just block
> >> wikimedia altogether. It is not our wake-up call to take, but that of
> the
> >> Kazakh people.
> >>
> >>
> > Facebook showed the other year that it could manipulate people by what it
> > showed them in their feeds.
> >
> >
> http://www.telegraph.co.uk/technology/facebook/10932534/Facebook-conducted-secret-psychology-experiment-on-users-emotions.html
> > http://www.bbc.co.uk/news/technology-28051930
> >
> > They didn't do this for fun, they did it to show their clients
> > (advertisers, governments) that they could manipulate millions of people.
> > You only need a small push in one direction or another to influence a
> large
> > population. Doesn't matter if the push is to buy a particular soap, vote
> > one way or another, or how you see a particular minority, or issue.
> >
> >
> http://www.networkworld.com/article/2450825/big-data-business-intelligence/facebooks-icky-psychology-experiment-is-actually-business-as-usual.html
> >
> > Do it to a naively trusted source and you have a triple word score
> > jackpot^H^H^Hboot.
> >
> >
> >
> > ___
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > New messages to: Wikimedia-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > 
> >
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> New messages to: Wikimedia-l@lists.wikimedia.org
> Unsubscribe:

Re: [Wikimedia-l] Quality issues

2015-12-29 Thread Gerard Meijssen

http://www.amnesty.nl/sites/default/files/public/ainl_guidelines_use_of_force.pdf

On 29 December 2015 at 13:30, Jane Darnell  wrote:

> citation needed
>
> On Tue, Dec 29, 2015 at 1:27 PM, Gnangarra  wrote:
>
> > >
> > > This is when sources truly become vital. But do
> > > remember, the POV of the USA and many of its sources are as suspect as
> > > those from Kazakhstan.
> >
> >
> > And that is why regardless of the fact a citation  is so important, 
> >
> > because the person receiving the information must able to make their own
> > assessment of the sources reliability with a CC0 license and a
> significant
> > selection of the information unsourced WikiDatas data lacks the quality,
> > integrity we all expect of ourselves when we add content to any of the
> > projects.
> >
> > On 29 December 2015 at 20:15, Gerard Meijssen  >
> > wrote:
> >
> > > Hoi,
> > > So you have determined that people can be manipulated. Good, then what?
> > >
> > > If this is the tack that you take you will be grounded because there is
> > no
> > > plan. It is a negative attitude that only stifles. Quality is not only
> in
> > > sources, sources can be and are manipulations in their own right. Many
> > > important subjects are woefully underrepresented. The argument has it
> > that
> > > it is because of a lack of sources..
> > >
> > > Sources are relevant but we only are interested in particular subjects.
> > We
> > > do not need to look at Kazakhstan to find fault. Amnest (reliable
> source)
> > > indicates that all USA police forces are not in compliance with
> > > international agreements on the use of force. NOW WHAT ??
> > >
> > > When quality is the subject, it is important to decide how we
> effectively
> > > improve quality. VIAF provided Wikidata with a list of issues they
> found.
> > > Tom checked it out and our quality is better as a result. It means that
> > > more information is linked for people who visit a library. When awards
> > are
> > > known, adding known recipients in Wikidata based on info from multiple
> > > Wikipedias improves the quality and in this way many incorrect links
> are
> > > exposed.
> > >
> > > When quality of our projects is the subject, decide how we can do a
> > better
> > > job. When Facebook invites companies to manipulate people, it is why
> > > Facebook information is suspect. At most it is a reminder that
> > manipulation
> > > is an important issue. It does not mean that people cannot add data on
> > > their hobby horse.
> > >
> > > Quality is important but quality is more than sources. When sources are
> > > used as an argument that is detrimental to the quality of Wikidata,
> then
> > in
> > > my opinion we have forgotten why Wikipedia was possible in the first
> > place.
> > > It was not because of sources, it was because of the web of information
> > we
> > > created, a web that is of a NPOV.
> > >
> > > Wikidata does not have a NPOV. It represents facts found in many
> places.
> > As
> > > the information becomes more extended, it becomes possible to find
> > > manipulations, errors. This is when sources truly become vital. But do
> > > remember, the POV of the USA and many of its sources are as suspect as
> > > those from Kazakhstan.
> > > Thanks,
> > >  GerardM
> > >
> > > On 29 December 2015 at 11:44, Lilburne 
> > > wrote:
> > >
> > > > On 28/12/2015 18:00, Jane Darnell wrote:
> > > >
> > > >> All I said is that the wiki way works, that's all. You can't hide it
> > > when
> > > >> someone tries to take over a project, and that is the reason we
> > > shouldn't
> > > >> try to anticipate that with convoluted strategies. "Assume Good
> Faith"
> > > >> will
> > > >> always win out over any strange misguided takeover strategy, which
> is
> > > why
> > > >> governments that intend to do such things choose nowadays to just
> > block
> > > >> wikimedia altogether. It is not our wake-up call to take, but that
> of
> > > the
> > > >> Kazakh people.
> > > >>
> > > >>
> > > > Facebook showed the other year that it could manipulate people by
> what
> > it
> > > > showed them in their feeds.
> > > >
> > > >
> > >
> >
> http://www.telegraph.co.uk/technology/facebook/10932534/Facebook-conducted-secret-psychology-experiment-on-users-emotions.html
> > > > http://www.bbc.co.uk/news/technology-28051930
> > > >
> > > > They didn't do this for fun, they did it to show their clients
> > > > (advertisers, governments) that they could manipulate millions of
> > people.
> > > > You only need a small push in one direction or another to influence a
> > > large
> > > > population. Doesn't matter if the push is to buy a particular soap,
> > vote
> > > > one way or another, or how you see a particular minority, or issue.
> > > >
> > > >
> > >
> >
> http://www.networkworld.com/article/2450825/big-data-business-intelligence/facebooks-icky-psychology-experiment-is-actually-business-as-usual.html
> >

Re: [Wikimedia-l] Quality issues

2015-12-29 Thread Gerard Meijssen

Hoi,
That is a circular argument.
Thanks,
 GerardM

On 29 December 2015 at 13:33, Gnangarra  wrote:

> no I agree quality is more than just the sources, but without sources
> quality cannot be achieved
>
> On 29 December 2015 at 20:29, Gerard Meijssen 
> wrote:
>
> > Hoi,
> > You do not get the point or you deliberately distort it. The point is
> that
> > quality is not sources. Quality is more than that.
> > Thanks,
> >GerardM
> >
> > On 29 December 2015 at 13:27, Gnangarra  wrote:
> >
> > > >
> > > > This is when sources truly become vital. But do
> > > > remember, the POV of the USA and many of its sources are as suspect
> as
> > > > those from Kazakhstan.
> > >
> > >
> > > And that is why regardless of the fact a citation  is so important, 
> > >
> > > because the person receiving the information must able to make their
> own
> > > assessment of the sources reliability with a CC0 license and a
> > significant
> > > selection of the information unsourced WikiDatas data lacks the
> quality,
> > > integrity we all expect of ourselves when we add content to any of the
> > > projects.
> > >
> > > On 29 December 2015 at 20:15, Gerard Meijssen <
> gerard.meijs...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hoi,
> > > > So you have determined that people can be manipulated. Good, then
> what?
> > > >
> > > > If this is the tack that you take you will be grounded because there
> is
> > > no
> > > > plan. It is a negative attitude that only stifles. Quality is not
> only
> > in
> > > > sources, sources can be and are manipulations in their own right.
> Many
> > > > important subjects are woefully underrepresented. The argument has it
> > > that
> > > > it is because of a lack of sources..
> > > >
> > > > Sources are relevant but we only are interested in particular
> subjects.
> > > We
> > > > do not need to look at Kazakhstan to find fault. Amnest (reliable
> > source)
> > > > indicates that all USA police forces are not in compliance with
> > > > international agreements on the use of force. NOW WHAT ??
> > > >
> > > > When quality is the subject, it is important to decide how we
> > effectively
> > > > improve quality. VIAF provided Wikidata with a list of issues they
> > found.
> > > > Tom checked it out and our quality is better as a result. It means
> that
> > > > more information is linked for people who visit a library. When
> awards
> > > are
> > > > known, adding known recipients in Wikidata based on info from
> multiple
> > > > Wikipedias improves the quality and in this way many incorrect links
> > are
> > > > exposed.
> > > >
> > > > When quality of our projects is the subject, decide how we can do a
> > > better
> > > > job. When Facebook invites companies to manipulate people, it is why
> > > > Facebook information is suspect. At most it is a reminder that
> > > manipulation
> > > > is an important issue. It does not mean that people cannot add data
> on
> > > > their hobby horse.
> > > >
> > > > Quality is important but quality is more than sources. When sources
> are
> > > > used as an argument that is detrimental to the quality of Wikidata,
> > then
> > > in
> > > > my opinion we have forgotten why Wikipedia was possible in the first
> > > place.
> > > > It was not because of sources, it was because of the web of
> information
> > > we
> > > > created, a web that is of a NPOV.
> > > >
> > > > Wikidata does not have a NPOV. It represents facts found in many
> > places.
> > > As
> > > > the information becomes more extended, it becomes possible to find
> > > > manipulations, errors. This is when sources truly become vital. But
> do
> > > > remember, the POV of the USA and many of its sources are as suspect
> as
> > > > those from Kazakhstan.
> > > > Thanks,
> > > >  GerardM
> > > >
> > > > On 29 December 2015 at 11:44, Lilburne  >
> > > > wrote:
> > > >
> > > > > On 28/12/2015 18:00, Jane Darnell wrote:
> > > > >
> > > > >> All I said is that the wiki way works, that's all. You can't hide
> it
> > > > when
> > > > >> someone tries to take over a project, and that is the reason we
> > > > shouldn't
> > > > >> try to anticipate that with convoluted strategies. "Assume Good
> > Faith"
> > > > >> will
> > > > >> always win out over any strange misguided takeover strategy, which
> > is
> > > > why
> > > > >> governments that intend to do such things choose nowadays to just
> > > block
> > > > >> wikimedia altogether. It is not our wake-up call to take, but that
> > of
> > > > the
> > > > >> Kazakh people.
> > > > >>
> > > > >>
> > > > > Facebook showed the other year that it could manipulate people by
> > what
> > > it
> > > > > showed them in their feeds.
> > > > >
> > > > >
> > > >
> > >
> >
> http://www.telegraph.co.uk/technology/facebook/10932534/Facebook-conducted-secret-psychology-experiment-on-users-emotions.html
> > > > > http://www.bbc.co.uk/news/technology-28051930
> > > >

Re: [Wikimedia-l] Quality issues

2015-12-29 Thread Andreas Kolbe

On Tue, Dec 29, 2015 at 10:44 AM, Lilburne 
wrote:

> On 28/12/2015 18:00, Jane Darnell wrote:
>
>> All I said is that the wiki way works, that's all. You can't hide it when
>> someone tries to take over a project, and that is the reason we shouldn't
>> try to anticipate that with convoluted strategies. "Assume Good Faith"
>> will
>> always win out over any strange misguided takeover strategy, which is why
>> governments that intend to do such things choose nowadays to just block
>> wikimedia altogether. It is not our wake-up call to take, but that of the
>> Kazakh people.
>>
>>
> Facebook showed the other year that it could manipulate people by what it
> showed them in their feeds.
>
> http://www.telegraph.co.uk/technology/facebook/10932534/Facebook-conducted-secret-psychology-experiment-on-users-emotions.html
> http://www.bbc.co.uk/news/technology-28051930
>
> They didn't do this for fun, they did it to show their clients
> (advertisers, governments) that they could manipulate millions of people.
> You only need a small push in one direction or another to influence a large
> population. Doesn't matter if the push is to buy a particular soap, vote
> one way or another, or how you see a particular minority, or issue.
>
> http://www.networkworld.com/article/2450825/big-data-business-intelligence/facebooks-icky-psychology-experiment-is-actually-business-as-usual.html
>
> Do it to a naively trusted source and you have a triple word score
> jackpot^H^H^Hboot.



I thought Epstein's and Robertson's paper, "The search engine manipulation
effect (SEME) and its possible impact on the outcomes of elections", was
very interesting as well:

http://www.politico.com/magazine/story/2015/08/how-google-could-rig-the-2016-election-121548

http://www.pnas.org/content/112/33/E4512.abstract


On Mon, Dec 28, 2015 at 7:43 PM, Jane Darnell  wrote:

> Well the chances of me being firebombed while on vacation in the states are
> probably higher than me being firebombed for editing Wikipedia, but that
> still doesn't mean we need to worry about changing the wiki model. I guess
> I have lost the thread of your point entirely now.



To be honest, I don't think you had ever gotten hold of it in the first
place. To me, you seem to live in a very sheltered and naive world.

If we have reports of Wikipedians being tortured in Azerbaijan (and there
seems to have been some truth to these reports, as the sysop named in them
was globally blocked by the WMF a short while later[1]), you should be able
to understand that it is not quite as easy to live the wiki way there as it
is in your country, and that some of the assumptions you have formed based
on your own experiences of the wiki model may not hold in other locales.

[1]
https://meta.wikimedia.org/w/index.php?title=User:Irada=12421543=7322889
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

citation needed

On Tue, Dec 29, 2015 at 1:27 PM, Gnangarra  wrote:

> >
> > This is when sources truly become vital. But do
> > remember, the POV of the USA and many of its sources are as suspect as
> > those from Kazakhstan.
>
>
> And that is why regardless of the fact a citation  is so important, 
>
> because the person receiving the information must able to make their own
> assessment of the sources reliability with a CC0 license and a significant
> selection of the information unsourced WikiDatas data lacks the quality,
> integrity we all expect of ourselves when we add content to any of the
> projects.
>
> On 29 December 2015 at 20:15, Gerard Meijssen 
> wrote:
>
> > Hoi,
> > So you have determined that people can be manipulated. Good, then what?
> >
> > If this is the tack that you take you will be grounded because there is
> no
> > plan. It is a negative attitude that only stifles. Quality is not only in
> > sources, sources can be and are manipulations in their own right. Many
> > important subjects are woefully underrepresented. The argument has it
> that
> > it is because of a lack of sources..
> >
> > Sources are relevant but we only are interested in particular subjects.
> We
> > do not need to look at Kazakhstan to find fault. Amnest (reliable source)
> > indicates that all USA police forces are not in compliance with
> > international agreements on the use of force. NOW WHAT ??
> >
> > When quality is the subject, it is important to decide how we effectively
> > improve quality. VIAF provided Wikidata with a list of issues they found.
> > Tom checked it out and our quality is better as a result. It means that
> > more information is linked for people who visit a library. When awards
> are
> > known, adding known recipients in Wikidata based on info from multiple
> > Wikipedias improves the quality and in this way many incorrect links are
> > exposed.
> >
> > When quality of our projects is the subject, decide how we can do a
> better
> > job. When Facebook invites companies to manipulate people, it is why
> > Facebook information is suspect. At most it is a reminder that
> manipulation
> > is an important issue. It does not mean that people cannot add data on
> > their hobby horse.
> >
> > Quality is important but quality is more than sources. When sources are
> > used as an argument that is detrimental to the quality of Wikidata, then
> in
> > my opinion we have forgotten why Wikipedia was possible in the first
> place.
> > It was not because of sources, it was because of the web of information
> we
> > created, a web that is of a NPOV.
> >
> > Wikidata does not have a NPOV. It represents facts found in many places.
> As
> > the information becomes more extended, it becomes possible to find
> > manipulations, errors. This is when sources truly become vital. But do
> > remember, the POV of the USA and many of its sources are as suspect as
> > those from Kazakhstan.
> > Thanks,
> >  GerardM
> >
> > On 29 December 2015 at 11:44, Lilburne 
> > wrote:
> >
> > > On 28/12/2015 18:00, Jane Darnell wrote:
> > >
> > >> All I said is that the wiki way works, that's all. You can't hide it
> > when
> > >> someone tries to take over a project, and that is the reason we
> > shouldn't
> > >> try to anticipate that with convoluted strategies. "Assume Good Faith"
> > >> will
> > >> always win out over any strange misguided takeover strategy, which is
> > why
> > >> governments that intend to do such things choose nowadays to just
> block
> > >> wikimedia altogether. It is not our wake-up call to take, but that of
> > the
> > >> Kazakh people.
> > >>
> > >>
> > > Facebook showed the other year that it could manipulate people by what
> it
> > > showed them in their feeds.
> > >
> > >
> >
> http://www.telegraph.co.uk/technology/facebook/10932534/Facebook-conducted-secret-psychology-experiment-on-users-emotions.html
> > > http://www.bbc.co.uk/news/technology-28051930
> > >
> > > They didn't do this for fun, they did it to show their clients
> > > (advertisers, governments) that they could manipulate millions of
> people.
> > > You only need a small push in one direction or another to influence a
> > large
> > > population. Doesn't matter if the push is to buy a particular soap,
> vote
> > > one way or another, or how you see a particular minority, or issue.
> > >
> > >
> >
> http://www.networkworld.com/article/2450825/big-data-business-intelligence/facebooks-icky-psychology-experiment-is-actually-business-as-usual.html
> > >
> > > Do it to a naively trusted source and you have a triple word score
> > > jackpot^H^H^Hboot.
> > >
> > >
> > >
> > > ___
> > > Wikimedia-l mailing list, guidelines at:
> > > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > > New messages to: Wikimedia-l@lists.wikimedia.org
> > > Unsubscribe:

Re: [Wikimedia-l] Quality issues

2015-12-29 Thread Gnangarra

no I agree quality is more than just the sources, but without sources
quality cannot be achieved

On 29 December 2015 at 20:29, Gerard Meijssen 
wrote:

> Hoi,
> You do not get the point or you deliberately distort it. The point is that
> quality is not sources. Quality is more than that.
> Thanks,
>GerardM
>
> On 29 December 2015 at 13:27, Gnangarra  wrote:
>
> > >
> > > This is when sources truly become vital. But do
> > > remember, the POV of the USA and many of its sources are as suspect as
> > > those from Kazakhstan.
> >
> >
> > And that is why regardless of the fact a citation  is so important, 
> >
> > because the person receiving the information must able to make their own
> > assessment of the sources reliability with a CC0 license and a
> significant
> > selection of the information unsourced WikiDatas data lacks the quality,
> > integrity we all expect of ourselves when we add content to any of the
> > projects.
> >
> > On 29 December 2015 at 20:15, Gerard Meijssen  >
> > wrote:
> >
> > > Hoi,
> > > So you have determined that people can be manipulated. Good, then what?
> > >
> > > If this is the tack that you take you will be grounded because there is
> > no
> > > plan. It is a negative attitude that only stifles. Quality is not only
> in
> > > sources, sources can be and are manipulations in their own right. Many
> > > important subjects are woefully underrepresented. The argument has it
> > that
> > > it is because of a lack of sources..
> > >
> > > Sources are relevant but we only are interested in particular subjects.
> > We
> > > do not need to look at Kazakhstan to find fault. Amnest (reliable
> source)
> > > indicates that all USA police forces are not in compliance with
> > > international agreements on the use of force. NOW WHAT ??
> > >
> > > When quality is the subject, it is important to decide how we
> effectively
> > > improve quality. VIAF provided Wikidata with a list of issues they
> found.
> > > Tom checked it out and our quality is better as a result. It means that
> > > more information is linked for people who visit a library. When awards
> > are
> > > known, adding known recipients in Wikidata based on info from multiple
> > > Wikipedias improves the quality and in this way many incorrect links
> are
> > > exposed.
> > >
> > > When quality of our projects is the subject, decide how we can do a
> > better
> > > job. When Facebook invites companies to manipulate people, it is why
> > > Facebook information is suspect. At most it is a reminder that
> > manipulation
> > > is an important issue. It does not mean that people cannot add data on
> > > their hobby horse.
> > >
> > > Quality is important but quality is more than sources. When sources are
> > > used as an argument that is detrimental to the quality of Wikidata,
> then
> > in
> > > my opinion we have forgotten why Wikipedia was possible in the first
> > place.
> > > It was not because of sources, it was because of the web of information
> > we
> > > created, a web that is of a NPOV.
> > >
> > > Wikidata does not have a NPOV. It represents facts found in many
> places.
> > As
> > > the information becomes more extended, it becomes possible to find
> > > manipulations, errors. This is when sources truly become vital. But do
> > > remember, the POV of the USA and many of its sources are as suspect as
> > > those from Kazakhstan.
> > > Thanks,
> > >  GerardM
> > >
> > > On 29 December 2015 at 11:44, Lilburne 
> > > wrote:
> > >
> > > > On 28/12/2015 18:00, Jane Darnell wrote:
> > > >
> > > >> All I said is that the wiki way works, that's all. You can't hide it
> > > when
> > > >> someone tries to take over a project, and that is the reason we
> > > shouldn't
> > > >> try to anticipate that with convoluted strategies. "Assume Good
> Faith"
> > > >> will
> > > >> always win out over any strange misguided takeover strategy, which
> is
> > > why
> > > >> governments that intend to do such things choose nowadays to just
> > block
> > > >> wikimedia altogether. It is not our wake-up call to take, but that
> of
> > > the
> > > >> Kazakh people.
> > > >>
> > > >>
> > > > Facebook showed the other year that it could manipulate people by
> what
> > it
> > > > showed them in their feeds.
> > > >
> > > >
> > >
> >
> http://www.telegraph.co.uk/technology/facebook/10932534/Facebook-conducted-secret-psychology-experiment-on-users-emotions.html
> > > > http://www.bbc.co.uk/news/technology-28051930
> > > >
> > > > They didn't do this for fun, they did it to show their clients
> > > > (advertisers, governments) that they could manipulate millions of
> > people.
> > > > You only need a small push in one direction or another to influence a
> > > large
> > > > population. Doesn't matter if the push is to buy a particular soap,
> > vote
> > > > one way or another, or how you see a particular minority, or issue.
> > > >
>

Re: [Wikimedia-l] Quality issues

2015-12-29 Thread Gerard Meijssen

Hoi,
So you have determined that people can be manipulated. Good, then what?

If this is the tack that you take you will be grounded because there is no
plan. It is a negative attitude that only stifles. Quality is not only in
sources, sources can be and are manipulations in their own right. Many
important subjects are woefully underrepresented. The argument has it that
it is because of a lack of sources..

Sources are relevant but we only are interested in particular subjects. We
do not need to look at Kazakhstan to find fault. Amnest (reliable source)
indicates that all USA police forces are not in compliance with
international agreements on the use of force. NOW WHAT ??

When quality is the subject, it is important to decide how we effectively
improve quality. VIAF provided Wikidata with a list of issues they found.
Tom checked it out and our quality is better as a result. It means that
more information is linked for people who visit a library. When awards are
known, adding known recipients in Wikidata based on info from multiple
Wikipedias improves the quality and in this way many incorrect links are
exposed.

When quality of our projects is the subject, decide how we can do a better
job. When Facebook invites companies to manipulate people, it is why
Facebook information is suspect. At most it is a reminder that manipulation
is an important issue. It does not mean that people cannot add data on
their hobby horse.

Quality is important but quality is more than sources. When sources are
used as an argument that is detrimental to the quality of Wikidata, then in
my opinion we have forgotten why Wikipedia was possible in the first place.
It was not because of sources, it was because of the web of information we
created, a web that is of a NPOV.

Wikidata does not have a NPOV. It represents facts found in many places. As
the information becomes more extended, it becomes possible to find
manipulations, errors. This is when sources truly become vital. But do
remember, the POV of the USA and many of its sources are as suspect as
those from Kazakhstan.
Thanks,
 GerardM

On 29 December 2015 at 11:44, Lilburne  wrote:

> On 28/12/2015 18:00, Jane Darnell wrote:
>
>> All I said is that the wiki way works, that's all. You can't hide it when
>> someone tries to take over a project, and that is the reason we shouldn't
>> try to anticipate that with convoluted strategies. "Assume Good Faith"
>> will
>> always win out over any strange misguided takeover strategy, which is why
>> governments that intend to do such things choose nowadays to just block
>> wikimedia altogether. It is not our wake-up call to take, but that of the
>> Kazakh people.
>>
>>
> Facebook showed the other year that it could manipulate people by what it
> showed them in their feeds.
>
> http://www.telegraph.co.uk/technology/facebook/10932534/Facebook-conducted-secret-psychology-experiment-on-users-emotions.html
> http://www.bbc.co.uk/news/technology-28051930
>
> They didn't do this for fun, they did it to show their clients
> (advertisers, governments) that they could manipulate millions of people.
> You only need a small push in one direction or another to influence a large
> population. Doesn't matter if the push is to buy a particular soap, vote
> one way or another, or how you see a particular minority, or issue.
>
> http://www.networkworld.com/article/2450825/big-data-business-intelligence/facebooks-icky-psychology-experiment-is-actually-business-as-usual.html
>
> Do it to a naively trusted source and you have a triple word score
> jackpot^H^H^Hboot.
>
>
>
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> New messages to: Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-28 Thread Andreas Kolbe

Pete,

Thanks. Comments interspersed below.

On Sat, Dec 26, 2015 at 5:46 PM, Pete Forsyth  wrote:

>
> I'd say the better question, is "what legal or moral right would we call
> upon to *insist* on having the same for Wikidata?" If we had a clear answer
> to that one, it would really move forward; but I don't think we do, or if
> we do, it's not yet clear to me.
>

The same as in the case of Wikipedia.

Is Wikidata different because it aspires to listing machine-readable facts
only, rather than written expositions? Not to my mind, because facts are
frequently debatable, and their presentation and sourcing involves choice
and expertise.

Moreover, speaking somewhat less seriously for a moment, Wikidata doesn't
actually just contain non-copyrightable facts. As we've seen, it contains
some of the same hoaxes and errors Wikipedia contains, which are by
definition creative. It's an entertaining fact that dictionary publishers
would in the past (perhaps they still do it now) include a small number of
hoax entries -- made-up words -- in their dictionaries, so they would be
able to demonstrate that another dictionary publisher had simply copied
their work. The Wikidata project is (involuntarily of course) doing the
same.

> No, and I should have been clearer -- I do see the general advantage in a
> site providing information about the source of information (of course).
> What I don't see is the advantage of requiring them to do so in a certain
> way.
>

Personally, I wouldn't insist on it being done in a certain way. I only
feel, very strongly, that having no information at all about the source of
information is very much undesirable, for the reasons previously mentioned
(data provenance, providing a bridge to potential users, etc.).

> I don't think Google or Bing aspires to having the highest standard of
> credibility. If they are useful, their business interests have been served,
> and I would hope that no student or academic would be able to cite the
> Google Knowledge Graph in a formal paper, any more than they could cite
> Wikipedia. (caveat emptor)
>

The problem with free information is that it displaces non-free
information, much like a cheaper product displaces a more expensive one.
We've seen this with Wikipedia replacing professionally published
encyclopedias.

Free information tends to become pervasive. This pervasiveness creates a
steady drip effect – if a certain item of information becomes ubiquitous,
so you see it in Google, in Bing, and elsewhere, you don't question it any
more after a while. And once information becomes unquestioned, it enters
more credible sources, because the authors of those are human, too. People
cannot be on their guard 24/7, questioning everything they see. This is how
citogenesis happens.

I'm currently thinking about the Kazakh Wikipedia again, as the topic has
(rightly) reappeared on Jimmy Wales' talk page.[1] It provides a good
example. I believe the reason the Kazakh dictatorship embraced Creative
Commons, releasing its Kazakh National Encyclopedia under a free licence so
its articles could be imported *en masse* into the Kazakh Wikipedia (by
editors incentivised by the chance to win laptops etc.), was because that
encyclopedia reflected the regime's political views and censorship
criteria. If you make your information ubiquitous, ensuring it appears
under different brand names, with its real provenance obscured, eventually
it will not be questioned any more.

The WMF allowed itself to be used there, enthusiastically so. To me it's
one of the most shameful episodes in its history.

> I believe in the agency of multiple people and entities in curating
> knowledge. Individuals, and individual information projects, should have
> the ability to make their own judgment about how much, and what kind, of
> citation is required for their purposes. I don't believe that information
> curation can be perfected by anticipating all needs in policy and legal
> documents.
>
> If our users have a moral or legal right that needs to be defended, we
> should do so. But I don't see one in this case (perhaps a clear
> hypothetical example could help?)
>

Some users certainly feel very strongly that they have moral rights they
would like to see upheld. See the discussion at
https://meta.wikimedia.org/wiki/Talk:Wikidata#Is_CC_the_right_license_for_data.3F
for examples.

> > See
> >
> >
> https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights#The_legal_definition_of_.E2.80.9Cdatabase.E2.80.9D
> >
> > 
>
> So according to that page, created by Wikimedia legal staff, databases may
> > be protected even by US copyright law as "compilations". In the EU (is
> > Wikidata currently based in the EU, given that it's a Wikimedia
> Deutschland
> > project?) the protections are still more stringent. As I understand it,
> the
> > community as a whole holds the copyright, but you'd have to check with
> > Foundation legal staff or some other lawyer to be sure.
>
>
>

Re: [Wikimedia-l] Quality issues

If anything, the Kazakh thing just proves that the wiki model works. No
shame in that. It's probably why the Chinese are blocking Wikipedia and not
embracing it. You can't hide your propaganda, even from your own people.

As far as the compilation of Christmas songs goes, the list of songs is not
copyrightable, because the sort order of a list is not creative (unless
it's something that becomes poetry when you read the titles as a list).

On Mon, Dec 28, 2015 at 12:47 PM, Andreas Kolbe  wrote:

> Pete,
>
> Thanks. Comments interspersed below.
>
> On Sat, Dec 26, 2015 at 5:46 PM, Pete Forsyth 
> wrote:
>
> >
> > I'd say the better question, is "what legal or moral right would we call
> > upon to *insist* on having the same for Wikidata?" If we had a clear
> answer
> > to that one, it would really move forward; but I don't think we do, or if
> > we do, it's not yet clear to me.
> >
>
>
> The same as in the case of Wikipedia.
>
> Is Wikidata different because it aspires to listing machine-readable facts
> only, rather than written expositions? Not to my mind, because facts are
> frequently debatable, and their presentation and sourcing involves choice
> and expertise.
>
> Moreover, speaking somewhat less seriously for a moment, Wikidata doesn't
> actually just contain non-copyrightable facts. As we've seen, it contains
> some of the same hoaxes and errors Wikipedia contains, which are by
> definition creative. It's an entertaining fact that dictionary publishers
> would in the past (perhaps they still do it now) include a small number of
> hoax entries -- made-up words -- in their dictionaries, so they would be
> able to demonstrate that another dictionary publisher had simply copied
> their work. The Wikidata project is (involuntarily of course) doing the
> same.
>
>
>
> > No, and I should have been clearer -- I do see the general advantage in a
> > site providing information about the source of information (of course).
> > What I don't see is the advantage of requiring them to do so in a certain
> > way.
> >
>
>
> Personally, I wouldn't insist on it being done in a certain way. I only
> feel, very strongly, that having no information at all about the source of
> information is very much undesirable, for the reasons previously mentioned
> (data provenance, providing a bridge to potential users, etc.).
>
>
>
> > I don't think Google or Bing aspires to having the highest standard of
> > credibility. If they are useful, their business interests have been
> served,
> > and I would hope that no student or academic would be able to cite the
> > Google Knowledge Graph in a formal paper, any more than they could cite
> > Wikipedia. (caveat emptor)
> >
>
>
> The problem with free information is that it displaces non-free
> information, much like a cheaper product displaces a more expensive one.
> We've seen this with Wikipedia replacing professionally published
> encyclopedias.
>
> Free information tends to become pervasive. This pervasiveness creates a
> steady drip effect – if a certain item of information becomes ubiquitous,
> so you see it in Google, in Bing, and elsewhere, you don't question it any
> more after a while. And once information becomes unquestioned, it enters
> more credible sources, because the authors of those are human, too. People
> cannot be on their guard 24/7, questioning everything they see. This is how
> citogenesis happens.
>
> I'm currently thinking about the Kazakh Wikipedia again, as the topic has
> (rightly) reappeared on Jimmy Wales' talk page.[1] It provides a good
> example. I believe the reason the Kazakh dictatorship embraced Creative
> Commons, releasing its Kazakh National Encyclopedia under a free licence so
> its articles could be imported *en masse* into the Kazakh Wikipedia (by
> editors incentivised by the chance to win laptops etc.), was because that
> encyclopedia reflected the regime's political views and censorship
> criteria. If you make your information ubiquitous, ensuring it appears
> under different brand names, with its real provenance obscured, eventually
> it will not be questioned any more.
>
> The WMF allowed itself to be used there, enthusiastically so. To me it's
> one of the most shameful episodes in its history.
>
>
>
> > I believe in the agency of multiple people and entities in curating
> > knowledge. Individuals, and individual information projects, should have
> > the ability to make their own judgment about how much, and what kind, of
> > citation is required for their purposes. I don't believe that information
> > curation can be perfected by anticipating all needs in policy and legal
> > documents.
> >
> > If our users have a moral or legal right that needs to be defended, we
> > should do so. But I don't see one in this case (perhaps a clear
> > hypothetical example could help?)
> >
>
>
> Some users certainly feel very strongly that they have moral rights they
> would like to see upheld. See the

Re: [Wikimedia-l] Quality issues

2015-12-28 Thread Andreas Kolbe

On Mon, Dec 28, 2015 at 12:40 PM, Jane Darnell  wrote:

> If anything, the Kazakh thing just proves that the wiki model works. No
> shame in that. It's probably why the Chinese are blocking Wikipedia and not
> embracing it. You can't hide your propaganda, even from your own people.
>

Jane,

You don't seem to understand what's happening here. Kazakhstan is in the
process of replicating the Chinese "Great Firewall" for its own citizens,
using slightly different means. From a recent report in the New York
Times:[1]

---o0o---

Unlike with China, which filters data through an expensive and complex
digital infrastructure known as the Great Firewall, security experts say
Kazakhstan is trying to achieve the same effect at a lower cost. The
country is mandating that its citizens install a new "national security
certificate" on their computers and smartphones that will intercept
requests to and from foreign websites.

That gives officials the opportunity to read encrypted traffic between
Kazakh users and foreign servers, in what security experts call a "man in
the middle attack."

As a result, Kazakh telecom operators, and government officials, will be
privy to mobile and web traffic between Kazakh users and foreign servers,
bypassing encryption protections known as S.S.L., or Secure Sockets Layer,
and H.T.T.P.S., technology that encrypts browsing sessions and is familiar
to users by the tiny padlock icon that appears in browsers.

---o0o---

Do you understand what this means? The Kazakh government will be *able to
identify any Kazakh citizen who edits Wikipedia, and see what they did
there.* Even if you go into an Internet café in that country, you have to
give your name, and your activities will be monitored. That is a major
chilling effect.

So you now have a situation where the government-published encyclopedia,
with its own bent on the country's history and government, is in the Kazakh
Wikipedia, appearing under the Wikipedia brand name. It was put there by
volunteers who were promised laptops and other prizes for their work
transcribing these articles.

This was an effort that WMF board members went out of their way to praise
and reward, even though it's always been clear, since June 2011, when state
support was announced, that Wikibilim was a Kazakh government-sponsored
effort. Wikibilim's Kazakh Wikipedia project is publicly described as
"implemented under the auspices of the Prime Minister of Kazakhstan."[2]

Ting Chen, then chairman of the WMF board, even participated in a press
conference with Kazakh government representatives and functionaries. Yet
Wikibilim reportedly had a trademark licence agreement with the Wikimedia
Foundation within a month of the organisation's founding,[3] something I
believe most regular chapters have to wait a lot longer for, and was
immediately hailed as a future chapter.

At Wikimania 2011, this was followed by Wales' "Wikipedian of the Year"
award for Wikibilim, which was widely publicised by the Kazakh government.
What could be better PR for them than an endorsement by a free-speech
figure like Jimmy Wales?

Yet it's long been established that Wikibilim's leaders have been and are
part of the Kazakh government machine. One is now the vice-governor of a
major province in the country,[4] and the founding director of a
Brussels-based think tank that human rights organisations consider a PR
front for the regime.[5][6] Another went on to become Vice Chairman of the
company that runs the Kazakh Prime Minister's website; he is at the same
time an active editor and one of a small number of administrators in the
Kazakh Wikipedia.

The country's opposition press has been shut down. Even when remnants of it
still existed, it was clear that opposition papers would not be considered
"reliable sources" in the Kazakh Wikipedia.

If this proves that the "wiki model works", then it can only mean that it
"works" in the sense that dictatorships can very smartly exploit it for
their own ends--in this case, with apparent WMF connivance. (I would really
like to know who, if anyone, advised the WMF on this at the time.)

China has its own internet encyclopedias that it controls in a similar
manner. They have no need for Wikipedia. They have two crowdsourced
internet encyclopedias that are bigger even than the English Wikipedia, and
positively dwarf the Chinese Wikipedia.

However, there is significant government interest in Wikipedia in other
Asian countries.

What the WMF should do is to start examining to what extent these
Wikipedias are functionally censored, using the services of linguists and
political/human rights experts.

I have long advocated that there should be a Wikipedia Freedom Index[7]
indicating to the reader how free of censorship any Wikipedia is. Where a
Wikipedia is found to suffer from significant problems, the WMF should
place server-side banners on its pages, in the local language and English,
alerting readers to this fact and suggesting that they

Re: [Wikimedia-l] Quality issues

Anyone can exploit the content on WMF for their needs. What I mean by "it
works" is that you can't fool people when you try to change Wikipedia to
fit government policy. We can easily identify problematic edits. Never
underestimate the diaspora of any country. Wikimedia is always bigger than
any one government will ever estimate.

On Mon, Dec 28, 2015 at 4:30 PM, Andreas Kolbe  wrote:

> On Mon, Dec 28, 2015 at 12:40 PM, Jane Darnell  wrote:
>
> > If anything, the Kazakh thing just proves that the wiki model works. No
> > shame in that. It's probably why the Chinese are blocking Wikipedia and
> not
> > embracing it. You can't hide your propaganda, even from your own people.
> >
>
>
> Jane,
>
> You don't seem to understand what's happening here. Kazakhstan is in the
> process of replicating the Chinese "Great Firewall" for its own citizens,
> using slightly different means. From a recent report in the New York
> Times:[1]
>
> ---o0o---
>
> Unlike with China, which filters data through an expensive and complex
> digital infrastructure known as the Great Firewall, security experts say
> Kazakhstan is trying to achieve the same effect at a lower cost. The
> country is mandating that its citizens install a new "national security
> certificate" on their computers and smartphones that will intercept
> requests to and from foreign websites.
>
> That gives officials the opportunity to read encrypted traffic between
> Kazakh users and foreign servers, in what security experts call a "man in
> the middle attack."
>
> As a result, Kazakh telecom operators, and government officials, will be
> privy to mobile and web traffic between Kazakh users and foreign servers,
> bypassing encryption protections known as S.S.L., or Secure Sockets Layer,
> and H.T.T.P.S., technology that encrypts browsing sessions and is familiar
> to users by the tiny padlock icon that appears in browsers.
>
> ---o0o---
>
> Do you understand what this means? The Kazakh government will be *able to
> identify any Kazakh citizen who edits Wikipedia, and see what they did
> there.* Even if you go into an Internet café in that country, you have to
> give your name, and your activities will be monitored. That is a major
> chilling effect.
>
> So you now have a situation where the government-published encyclopedia,
> with its own bent on the country's history and government, is in the Kazakh
> Wikipedia, appearing under the Wikipedia brand name. It was put there by
> volunteers who were promised laptops and other prizes for their work
> transcribing these articles.
>
> This was an effort that WMF board members went out of their way to praise
> and reward, even though it's always been clear, since June 2011, when state
> support was announced, that Wikibilim was a Kazakh government-sponsored
> effort. Wikibilim's Kazakh Wikipedia project is publicly described as
> "implemented under the auspices of the Prime Minister of Kazakhstan."[2]
>
> Ting Chen, then chairman of the WMF board, even participated in a press
> conference with Kazakh government representatives and functionaries. Yet
> Wikibilim reportedly had a trademark licence agreement with the Wikimedia
> Foundation within a month of the organisation's founding,[3] something I
> believe most regular chapters have to wait a lot longer for, and was
> immediately hailed as a future chapter.
>
> At Wikimania 2011, this was followed by Wales' "Wikipedian of the Year"
> award for Wikibilim, which was widely publicised by the Kazakh government.
> What could be better PR for them than an endorsement by a free-speech
> figure like Jimmy Wales?
>
> Yet it's long been established that Wikibilim's leaders have been and are
> part of the Kazakh government machine. One is now the vice-governor of a
> major province in the country,[4] and the founding director of a
> Brussels-based think tank that human rights organisations consider a PR
> front for the regime.[5][6] Another went on to become Vice Chairman of the
> company that runs the Kazakh Prime Minister's website; he is at the same
> time an active editor and one of a small number of administrators in the
> Kazakh Wikipedia.
>
> The country's opposition press has been shut down. Even when remnants of it
> still existed, it was clear that opposition papers would not be considered
> "reliable sources" in the Kazakh Wikipedia.
>
> If this proves that the "wiki model works", then it can only mean that it
> "works" in the sense that dictatorships can very smartly exploit it for
> their own ends--in this case, with apparent WMF connivance. (I would really
> like to know who, if anyone, advised the WMF on this at the time.)
>
> China has its own internet encyclopedias that it controls in a similar
> manner. They have no need for Wikipedia. They have two crowdsourced
> internet encyclopedias that are bigger even than the English Wikipedia, and
> positively dwarf the Chinese Wikipedia.
>
> However, there is significant government

Re: [Wikimedia-l] Quality issues

2015-12-28 Thread Risker

On 28 December 2015 at 11:22, Jane Darnell  wrote:

> Anyone can exploit the content on WMF for their needs. What I mean by "it
> works" is that you can't fool people when you try to change Wikipedia to
> fit government policy. We can easily identify problematic edits. Never
> underestimate the diaspora of any country. Wikimedia is always bigger than
> any one government will ever estimate.
>
>

Well, yes, anyone can exploit the content of WMF projects; we don't usually
give them kudos for doing so, though.  And you most certainly CAN fool
people when you change Wikipedia to change government policy, if the
government overwhelms a small "traditional" Wikipedia community with
bribes, threats to well-being and good old fashioned paid editing.  The
Wikipedia brand is perceived to be independent from such influences; that
it isn't in this case (and who knows how many other cases) cannot be
perceived by readers who do not have any alternative resources.

Small communities with less than 50 active editors can be pretty easily
swamped; a university class adding valuable, well sourced and researched
content may have a positive effect, just as focused addition of heavily
biased material by "editing for reward" (rewards including payment, gifts,
or simply not being incarcerated) can turn a Wikipedia into a platform for
third parties.This particular project was an easy target, and there are
many others that could similarly be overwhelmed.  We need to recognize that
most of the world does not live under the conditions that encourage or even
permit the development of freely available information. As a global
community we need to stop pretending that the example of Kazakh Wikipedia
is not a major and significant bellwether that requires very serious review
of how we encourage and  develop projects centered in countries with
repressive regimes.  Many of these regions are areas with significant
potential for growth of our content - the major focus of the mission of the
Wikimedia Foundation.  Figuring out how to grow these projects within the
founding principles is not just important, it's necessary.

Risker/Anne
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

All I said is that the wiki way works, that's all. You can't hide it when
someone tries to take over a project, and that is the reason we shouldn't
try to anticipate that with convoluted strategies. "Assume Good Faith" will
always win out over any strange misguided takeover strategy, which is why
governments that intend to do such things choose nowadays to just block
wikimedia altogether. It is not our wake-up call to take, but that of the
Kazakh people.

On Mon, Dec 28, 2015 at 5:58 PM, Risker  wrote:

> On 28 December 2015 at 11:22, Jane Darnell  wrote:
>
> > Anyone can exploit the content on WMF for their needs. What I mean by "it
> > works" is that you can't fool people when you try to change Wikipedia to
> > fit government policy. We can easily identify problematic edits. Never
> > underestimate the diaspora of any country. Wikimedia is always bigger
> than
> > any one government will ever estimate.
> >
> >
>
> Well, yes, anyone can exploit the content of WMF projects; we don't usually
> give them kudos for doing so, though.  And you most certainly CAN fool
> people when you change Wikipedia to change government policy, if the
> government overwhelms a small "traditional" Wikipedia community with
> bribes, threats to well-being and good old fashioned paid editing.  The
> Wikipedia brand is perceived to be independent from such influences; that
> it isn't in this case (and who knows how many other cases) cannot be
> perceived by readers who do not have any alternative resources.
>
> Small communities with less than 50 active editors can be pretty easily
> swamped; a university class adding valuable, well sourced and researched
> content may have a positive effect, just as focused addition of heavily
> biased material by "editing for reward" (rewards including payment, gifts,
> or simply not being incarcerated) can turn a Wikipedia into a platform for
> third parties.This particular project was an easy target, and there are
> many others that could similarly be overwhelmed.  We need to recognize that
> most of the world does not live under the conditions that encourage or even
> permit the development of freely available information. As a global
> community we need to stop pretending that the example of Kazakh Wikipedia
> is not a major and significant bellwether that requires very serious review
> of how we encourage and  develop projects centered in countries with
> repressive regimes.  Many of these regions are areas with significant
> potential for growth of our content - the major focus of the mission of the
> Wikimedia Foundation.  Figuring out how to grow these projects within the
> founding principles is not just important, it's necessary.
>
> Risker/Anne
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> New messages to: Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

Well the chances of me being firebombed while on vacation in the states are
probably higher than me being firebombed for editing Wikipedia, but that
still doesn't mean we need to worry about changing the wiki model. I guess
I have lost the thread of your point entirely now.

On Mon, Dec 28, 2015 at 8:13 PM, Andreas Kolbe  wrote:

> On Mon, Dec 28, 2015 at 6:00 PM, Jane Darnell  wrote:
>
> > All I said is that the wiki way works, that's all. You can't hide it when
> > someone tries to take over a project, and that is the reason we shouldn't
> > try to anticipate that with convoluted strategies. "Assume Good Faith"
> will
> > always win out over any strange misguided takeover strategy, which is why
> > governments that intend to do such things choose nowadays to just block
> > wikimedia altogether. It is not our wake-up call to take, but that of the
> > Kazakh people.
>
>
>
> Ah, I see. That's easy to say for people in the Western world.
>
> In Uzbekistan dissidents have been boiled alive.[1] In Kazakhstan,
> journalists are imprisoned and harassed; one was firebombed and had the
> decapitated carcass of a dog left outside her offices. (The dog's head
> later turned up at her home.)[2] In Azerbaijan, Wikipedians have been
> tortured and threatened with torture, according to posts on the WMCEE-l
> mailing list.[3]
>
> All respect to you if you run these risks in order to edit Wikipedia, and
> still do it regardless. But if you don't, please don't dispense blithe and
> jejune advice, and don't tell people who are concerned about remaining
> alive, preferably with their skin and fingernails intact, that they need a
> wake-up call.
>
> I'd rather you told the WMF not to reward the functionaries of such regimes
> with "Wikipedian of the Year" awards and trademark licence agreements.
>
> [1]
> http://www.rferl.org/content/uzbekistans-house-of-torture/24667200.html
> [2] https://en.wikipedia.org/wiki/Irina_Petrushova
> [3] http://listy.wikimedia.pl/pipermail/wmcee-l/2015-May/000839.html
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> New messages to: Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-28 Thread Risker

"Assume good faith" is actually what got Kazakh Wikipedia into the mess it
is in. Wikimedia projects have been blocked by governments practically
since their inception.  Perverting the content is the new way of doing
things. They've learned from the PR and SEO industries.

And that leads us back to Wikidata.  There has always been the recognized
potential for use of Wikidata to build articles for smaller projects, based
on properly-sourced, independently verified (or verifiable) data; it's one
of the reasons that Wikidata has been accepted into the Wikimedia family.
But the key problem with creating content in this way is that the contents
of Wikidata are currently mostly unsourced or so poorly sourced that they
can't be considered either verified or verifiable even in one's wildest
dreams.  My experience, based on reading about a hundred user talk pages on
Wikidata recently, is that Wikidatians do not consider sourcing to be
important or even desirable.  This is a major problem for any group that
wants to reuse the content, because bluntly put there's a fair amount of
junk that got transferred to Wikidata, and it's currently not possible to
sort the wheat from the chaff.  The absence of references on Wikidata is a
significant barrier to the reusability of its data. I despair every time
someone says "it's like Wikipedia, it will get better!"  Well, no.  Huge
swaths of existing Wikipedias have never improved despite being more than a
decade old.  Our first new Wikimedia project in years shouldn't be basing
its practices on principles that have already been proved insufficient to
maintain and curate major projects with thousands of active editors.

So - Wikidata could play an important role in the development of core
content on smaller-sized projects with a small editorial community.  (I say
"could" because we have seen unsuccessful experiments importing significant
quantities of information into smaller projects.  Swahili Wikipedia has
still not completely recovered from its experience.)  But without being
able to provide provenance, its data doesn't even meet the minimal criteria
for verifiability.

Risker/Anne

On 28 December 2015 at 13:00, Jane Darnell  wrote:

> All I said is that the wiki way works, that's all. You can't hide it when
> someone tries to take over a project, and that is the reason we shouldn't
> try to anticipate that with convoluted strategies. "Assume Good Faith" will
> always win out over any strange misguided takeover strategy, which is why
> governments that intend to do such things choose nowadays to just block
> wikimedia altogether. It is not our wake-up call to take, but that of the
> Kazakh people.
>
> On Mon, Dec 28, 2015 at 5:58 PM, Risker  wrote:
>
> > On 28 December 2015 at 11:22, Jane Darnell  wrote:
> >
> > > Anyone can exploit the content on WMF for their needs. What I mean by
> "it
> > > works" is that you can't fool people when you try to change Wikipedia
> to
> > > fit government policy. We can easily identify problematic edits. Never
> > > underestimate the diaspora of any country. Wikimedia is always bigger
> > than
> > > any one government will ever estimate.
> > >
> > >
> >
> > Well, yes, anyone can exploit the content of WMF projects; we don't
> usually
> > give them kudos for doing so, though.  And you most certainly CAN fool
> > people when you change Wikipedia to change government policy, if the
> > government overwhelms a small "traditional" Wikipedia community with
> > bribes, threats to well-being and good old fashioned paid editing.  The
> > Wikipedia brand is perceived to be independent from such influences; that
> > it isn't in this case (and who knows how many other cases) cannot be
> > perceived by readers who do not have any alternative resources.
> >
> > Small communities with less than 50 active editors can be pretty easily
> > swamped; a university class adding valuable, well sourced and researched
> > content may have a positive effect, just as focused addition of heavily
> > biased material by "editing for reward" (rewards including payment,
> gifts,
> > or simply not being incarcerated) can turn a Wikipedia into a platform
> for
> > third parties.This particular project was an easy target, and there
> are
> > many others that could similarly be overwhelmed.  We need to recognize
> that
> > most of the world does not live under the conditions that encourage or
> even
> > permit the development of freely available information. As a global
> > community we need to stop pretending that the example of Kazakh Wikipedia
> > is not a major and significant bellwether that requires very serious
> review
> > of how we encourage and  develop projects centered in countries with
> > repressive regimes.  Many of these regions are areas with significant
> > potential for growth of our content - the major focus of the mission of
> the
> > Wikimedia Foundation.  Figuring out how to grow these

Re: [Wikimedia-l] Quality issues

2015-12-28 Thread Andreas Kolbe

On Mon, Dec 28, 2015 at 6:00 PM, Jane Darnell  wrote:

> All I said is that the wiki way works, that's all. You can't hide it when
> someone tries to take over a project, and that is the reason we shouldn't
> try to anticipate that with convoluted strategies. "Assume Good Faith" will
> always win out over any strange misguided takeover strategy, which is why
> governments that intend to do such things choose nowadays to just block
> wikimedia altogether. It is not our wake-up call to take, but that of the
> Kazakh people.

Ah, I see. That's easy to say for people in the Western world.

In Uzbekistan dissidents have been boiled alive.[1] In Kazakhstan,
journalists are imprisoned and harassed; one was firebombed and had the
decapitated carcass of a dog left outside her offices. (The dog's head
later turned up at her home.)[2] In Azerbaijan, Wikipedians have been
tortured and threatened with torture, according to posts on the WMCEE-l
mailing list.[3]

All respect to you if you run these risks in order to edit Wikipedia, and
still do it regardless. But if you don't, please don't dispense blithe and
jejune advice, and don't tell people who are concerned about remaining
alive, preferably with their skin and fingernails intact, that they need a
wake-up call.

I'd rather you told the WMF not to reward the functionaries of such regimes
with "Wikipedian of the Year" awards and trademark licence agreements.

[1] http://www.rferl.org/content/uzbekistans-house-of-torture/24667200.html
[2] https://en.wikipedia.org/wiki/Irina_Petrushova
[3] http://listy.wikimedia.pl/pipermail/wmcee-l/2015-May/000839.html
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-26 Thread Pete Forsyth

Andreas,

Helpful questions and observations, thank you. My replies inline:

On Sat, Dec 26, 2015 at 1:37 AM, Andreas Kolbe  wrote:

>
> Pete,
>
> 
>
> Is anyone arguing that Google are in fact breaking CC
> BY-SA by restricting their attribution to a link to Wikipedia? Because if
> not, we can lay that one to rest.
>

No, I think we agree there.

Now, if that works for Wikipedia, why can't we have the same for Wikidata?

I'd say the better question, is "what legal or moral right would we call
upon to *insist* on having the same for Wikidata?" If we had a clear answer
to that one, it would really move forward; but I don't think we do, or if
we do, it's not yet clear to me.

>
> > Requiring that reusers credit the *web site* would be new in the
> Wikimedia
> > world, and I don't see the advantage. (-Pete)
>
>
>
> The advantage is transparency about data provenance, as well as creating a
> path to Wikidata where users can contest, correct and refine the
> information.
>
> This is a benefit to the end user, and in line with Foundation values like
> transparency and user engagement. Do you disagree?
>

No, and I should have been clearer -- I do see the general advantage in a
site providing information about the source of information (of course).
What I don't see is the advantage of requiring them to do so in a certain
way.

>
>
>
> > Certainly, serious reusers who wish
> > to establish credibility should be transparent about the source of their
> > data; (-Pete)
>
>
>
> I have never seen Google credit Freebase (Bing does, probably because
> Freebase is a Google property), and I think neither Google nor Bing will
> credit Wikidata either.
>

I don't think Google or Bing aspires to having the highest standard of
credibility. If they are useful, their business interests have been served,
and I would hope that no student or academic would be able to cite the
Google Knowledge Graph in a formal paper, any more than they could cite
Wikipedia. (caveat emptor)

>
>
>
> > but it's not our proper role to compel them to do so. (-Pete)
>
>
> Could you explain why in your view it is not out proper role to do so?
>

I believe in the agency of multiple people and entities in curating
knowledge. Individuals, and individual information projects, should have
the ability to make their own judgment about how much, and what kind, of
citation is required for their purposes. I don't believe that information
curation can be perfected by anticipating all needs in policy and legal
documents.

If our users have a moral or legal right that needs to be defended, we
should do so. But I don't see one in this case (perhaps a clear
hypothetical example could help?)

> Attribution requirements in CC licenses are about crediting the *copyright
> > holders*.
> >
> > Andreas, I realize this has been much discussed in this thread, but I
> don't
> > think I've seen this angle addressed directly: In order for any copyright
> > license to apply, somebody has to hold the copyright. Who do you imagine
> > has a legitimate claim to copyright over the emergent database that grows
> > as multiple individuals and automated processes add individual,
> > non-copyrightable claims/statements/facts? (-Pete)
> >
>
>
> See
>
> https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights#The_legal_definition_of_.E2.80.9Cdatabase.E2.80.9D
>
> 

So according to that page, created by Wikimedia legal staff, databases may
> be protected even by US copyright law as "compilations". In the EU (is
> Wikidata currently based in the EU, given that it's a Wikimedia Deutschland
> project?) the protections are still more stringent. As I understand it, the
> community as a whole holds the copyright, but you'd have to check with
> Foundation legal staff or some other lawyer to be sure.

Helpful link, thank you. My eye is drawn to the word "may." If databases
MAY be protected, what conditions need to pertain in order for that to
happen? I'd be very interested in hearing from a legal expert about that.

My best guess is that a "database" like an edited compilation of papers
about biology, or a compilation of Christmas songs, would be protected by
copyright -- the people or organizations who curated the collection would
hold the copyright to the collection, while the individual authors/artists
would hold the copyright to the individual papers or songs. But the phone
book would not carry copyright, because there was no editorial or creative
judgment in assembling the list.

"The Wikimedia community as a whole" is certainly not a legal entity, and
I'm skeptical that it's an entity at all. How can something that is not a
legal entity hold a copyright?

Whose rights do you wish to protect?

Pete
[[User:Peteforsyth]]
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe:

Re: [Wikimedia-l] Quality issues

2015-12-26 Thread Andreas Kolbe

On Tue, Dec 22, 2015 at 6:15 PM, Pete Forsyth  wrote:

> On Tue, Dec 22, 2015 at 9:37 AM, geni  wrote:
>
> > On 22 December 2015 at 12:27, Andreas Kolbe  wrote:
> >
>
> > > It's surely not beyond human skill to devise a licence for Wikidata
> that
> > > requires re-users to include the three words above on their website,
> > while
> > > placing no other duties or restrictions on them.
> >
> > You appear to be suggesting a homebrew license
>
>
> +1
>

Pete,

As I understand it, people here have raised the objection that in order to
follow the letter of CC BY-SA, re-users would have to list all
contributors, the way some of the Wikipedia-based books do for example. I
think we all agree that this would be completely impractical for something
like a Knowledge Graph box, and not in the end user's interest.

What would make sense is the sort of attribution Bing uses today to credit
Freebase and Wikipedia.

Anyone wishing to argue that CC BY-SA requires all re-users to list all
contributors has to realise that if that were true, Google, Bing and others
infringe Wikipedia's CC BY-SA licence billions of times a year.

As I've said before, I'm pretty sure that if you were to take them to court
for not listing all the contributors who participated in creating the
snippets and timelines they display in their SERPs' Knowledge
Graph/Snapshot boxes, you would not prevail.

As I understand it, the CC BY-SA licence only requires attribution that is
"reasonable to the medium or means You are utilizing". I think a court
would agree that given the inherent space limitations, Google and Bing are
being "reasonable" by providing a link to the Wikipedia article they're
excerpting, and providing no more attribution than that.

Do you disagree? Is anyone arguing that Google are in fact breaking CC
BY-SA by restricting their attribution to a link to Wikipedia? Because if
not, we can lay that one to rest.

Now, if that works for Wikipedia, why can't we have the same for Wikidata?

> Requiring that reusers credit the *web site* would be new in the Wikimedia
> world, and I don't see the advantage.

The advantage is transparency about data provenance, as well as creating a
path to Wikidata where users can contest, correct and refine the
information.

This is a benefit to the end user, and in line with Foundation values like
transparency and user engagement. Do you disagree?

> Certainly, serious reusers who wish
> to establish credibility should be transparent about the source of their
> data;

I have never seen Google credit Freebase (Bing does, probably because
Freebase is a Google property), and I think neither Google nor Bing will
credit Wikidata either.

> but it's not our proper role to compel them to do so.
>

Could you explain why in your view it is not out proper role to do so?

> Attribution requirements in CC licenses are about crediting the *copyright
> holders*.
>
> Andreas, I realize this has been much discussed in this thread, but I don't
> think I've seen this angle addressed directly: In order for any copyright
> license to apply, somebody has to hold the copyright. Who do you imagine
> has a legitimate claim to copyright over the emergent database that grows
> as multiple individuals and automated processes add individual,
> non-copyrightable claims/statements/facts?
>

See
https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights#The_legal_definition_of_.E2.80.9Cdatabase.E2.80.9D

---o0o---

From a legal perspective, a database is any organized collection of
materials — hard copy or electronic — that permits a user to search for and
access individual pieces of information contained within the materials. No
database software, as a programmer would understand it, is necessary. In
the US, for example, Black’s Law Dictionary defines a database as a
"compilation of information arranged in a systematic way and offering a
means of finding specific elements it contains, often today by electronic
means."[1]

*Databases
may be protected by US copyright law as "compilations."* In the EU,
databases are protected by the Database Directive
, which defines a
database as "a collection of independent works, data or other materials
arranged in a systematic or methodical way and individually accessible by
electronic or other means."

---o0o---

So according to that page, created by Wikimedia legal staff, databases may
be protected even by US copyright law as "compilations". In the EU (is
Wikidata currently based in the EU, given that it's a Wikimedia Deutschland
project?) the protections are still more stringent. As I understand it, the
community as a whole holds the copyright, but you'd have to check with
Foundation legal staff or some other lawyer to be sure.

Best,
Andreas
___
Wikimedia-l

Re: [Wikimedia-l] Quality issues

2015-12-22 Thread Pete Forsyth

On Tue, Dec 22, 2015 at 9:37 AM, geni  wrote:

> On 22 December 2015 at 12:27, Andreas Kolbe  wrote:
>

> > It's surely not beyond human skill to devise a licence for Wikidata that
> > requires re-users to include the three words above on their website,
> while
> > placing no other duties or restrictions on them.
>
> You appear to be suggesting a homebrew license

+1

Requiring that reusers credit the *web site* would be new in the Wikimedia
world, and I don't see the advantage. Certainly, serious reusers who wish
to establish credibility should be transparent about the source of their
data; but it's not our proper role to compel them to do so.

Attribution requirements in CC licenses are about crediting the *copyright
holders*.

Andreas, I realize this has been much discussed in this thread, but I don't
think I've seen this angle addressed directly: In order for any copyright
license to apply, somebody has to hold the copyright. Who do you imagine
has a legitimate claim to copyright over the emergent database that grows
as multiple individuals and automated processes add individual,
non-copyrightable claims/statements/facts?

-Pete
[[User:Peteforsyth]]
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-22 Thread geni

On 21 December 2015 at 15:25, Andreas Kolbe  wrote:

>
> Re-users are very, very unlikely indeed to spend "way too much of their
> time worrying" about, say, having to add the words "Source: Wikidata.
> (Disclaimer.)" to their websites -- hyperlinked to wikidata.org and the
> Wikidata disclaimer.
>
> It's a one-minute job.
>
>

You've broken say a CC-BY-SA license in at least two ways there.


-- 
geni
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-22 Thread Andreas Kolbe

On Tue, Dec 22, 2015 at 8:10 AM, geni  wrote:

> On 21 December 2015 at 15:25, Andreas Kolbe  wrote:
>
> >
> > Re-users are very, very unlikely indeed to spend "way too much of their
> > time worrying" about, say, having to add the words "Source: Wikidata.
> > (Disclaimer.)" to their websites -- hyperlinked to wikidata.org and the
> > Wikidata disclaimer.
> >
> > It's a one-minute job.
>
> You've broken say a CC-BY-SA license in at least two ways there.

I was unaware that you were in favour of CC BY-SA for Wikidata now.

It's surely not beyond human skill to devise a licence for Wikidata that
requires re-users to include the three words above on their website, while
placing no other duties or restrictions on them.
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-22 Thread geni

On 22 December 2015 at 12:27, Andreas Kolbe  wrote:

>
> I was unaware that you were in favour of CC BY-SA for Wikidata now.
>
>
I'm not but you failed to specify a license and CC-BY-SA is one you might
be vaguely familiar with

> It's surely not beyond human skill to devise a licence for Wikidata that
> requires re-users to include the three words above on their website, while
> placing no other duties or restrictions on them.

You appear to be suggesting a homebrew license so we are already above the
one minute mark. Worse still by talking about websites you are suffering
from the classic problem of failing to consider all use cases. For example
books, calendars or indeed any form of data transmission that isn't the web.

-- 
geni
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-20 Thread Fabian Flöck

Not a contribution to the discussion at large, but I had the same problem 
Andreas is mentioning a couple of days ago when doing some of my first WD edits 
and adding a reference. I had no idea what to chose in that property field and 
it only showed me “instance of” and “subclass of” as a cold start. (I guess 
your average new editor might even wonder why to enter a property at all there 
- they would probably expect a single field to enter the source.)  So I just 
went to some other items and checked how it was done there, which is not 
optimal. A pre-selection of relevant properties (maybe most used in other 
items?) in the type-ahead would be nice. And maybe a small explanation of what 
the property for references means (something like “specifies type of reference” 
?). I was also unsure if and when to ever use “imported from” in that field 
(i.e., if I got the fact from a Wikipedia page, but no primary source exists) 
or if that was reserved for machine imports.

Fabian


> Date: Sun, 20 Dec 2015 15:59:58 +
> From: Andreas Kolbe <jayen...@gmail.com <mailto:jayen...@gmail.com>>
> To: Wikimedia Mailing List <wikimedia-l@lists.wikimedia.org 
> <mailto:wikimedia-l@lists.wikimedia.org>>
> Subject: Re: [Wikimedia-l] Quality issues
>> 
> Just try it, Lydia. Click "add" in subsidiaries in
> https://www.wikidata.org/wiki/Q37156 <https://www.wikidata.org/wiki/Q37156> 
> -- enter a company name, and then
> click "add reference". When I do that, the text field contains a greyed-out
> "property", and the drop-down shows the unhelpful items I mentioned above.
> 
> And it would be good if the help text actually *asked* people to cite a
> reference.
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

Re: [Wikimedia-l] Quality issues

2015-12-20 Thread Gnangarra

This is going nowhere, one of the big issues is that there is a lack of
understanding on how WikiData works and whats it purpose is.


Wikimedia Australia solution is invest community money into bringing
someone who has contributed to Wikidata to Australia to do a series of
talks and workshops around the country over a three week period.


Perhaps Wikidata community could improve the wider Wikimedian communities
understanding by doing the same on a larger scale get out to where the
contributors are, answer question explain how it works and do workshops
aimed directly at existing Wikiproject contributors. obviously its a lot
simpler and cheaper in places like Europe/US where there are already active
contributors who have the knowledge

On 21 December 2015 at 08:00, Fabian Flöck <f.flo...@gmail.com> wrote:

> Not a contribution to the discussion at large, but I had the same problem
> Andreas is mentioning a couple of days ago when doing some of my first WD
> edits and adding a reference. I had no idea what to chose in that property
> field and it only showed me “instance of” and “subclass of” as a cold
> start. (I guess your average new editor might even wonder why to enter a
> property at all there - they would probably expect a single field to enter
> the source.)  So I just went to some other items and checked how it was
> done there, which is not optimal. A pre-selection of relevant properties
> (maybe most used in other items?) in the type-ahead would be nice. And
> maybe a small explanation of what the property for references means
> (something like “specifies type of reference” ?). I was also unsure if and
> when to ever use “imported from” in that field (i.e., if I got the fact
> from a Wikipedia page, but no primary source exists) or if that was
> reserved for machine imports.
>
> Fabian
>
>
> > Date: Sun, 20 Dec 2015 15:59:58 +
> > From: Andreas Kolbe <jayen...@gmail.com <mailto:jayen...@gmail.com>>
> > To: Wikimedia Mailing List <wikimedia-l@lists.wikimedia.org  wikimedia-l@lists.wikimedia.org>>
> > Subject: Re: [Wikimedia-l] Quality issues
> >>
> > Just try it, Lydia. Click "add" in subsidiaries in
> > https://www.wikidata.org/wiki/Q37156 <
> https://www.wikidata.org/wiki/Q37156> -- enter a company name, and then
> > click "add reference". When I do that, the text field contains a
> greyed-out
> > "property", and the drop-down shows the unhelpful items I mentioned
> above.
> >
> > And it would be good if the help text actually *asked* people to cite a
> > reference.
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> New messages to: Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>
>



-- 
GN.
President Wikimedia Australia
WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra
Photo Gallery: http://gnangarra.redbubble.com
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

Re: [Wikimedia-l] Quality issues

2015-12-20 Thread Lydia Pintscher

On Fri, Dec 18, 2015 at 4:06 PM, Andreas Kolbe  wrote:
> On Fri, Dec 18, 2015 at 11:28 AM, Andrea Zanni 
> wrote:
>
>> Andreas, you apparently did not read the following sentence:
>> "Of course, the opposite is also true: it's a single point of openness,
>> correction, information. "
>>
>
> Andrea,
>
> I understand and appreciate your point, but I would like you to consider
> that what you say may be less true of Wikidata than it is for other
> Wikimedia wikis, for several reasons:
>
> Wikipedia, Wiktionary etc. are functionally open and correctable because
> people by and large view their content on Wikipedia, Wiktionary etc. itself
> (or in places where the provenance is clearly indicated, thanks to CC
> BY-SA). The place where you read it is the same place where you can edit
> it. There is an "Edit" tab, and it really *is* easy to change the content.
> (It is certainly easy to correct a typo, which is how many of us started.)

You are used to the edit tab being there. Someone recently said on
Twitter this is the most displayed invisible link on the internet. All
a matter of perspective and what we are used to ;-)

> With Wikidata, this is different. Wikidata, as a semantic wiki, is designed
> to be read by machines. These machines don't edit, they *propagate*.
> Wikidata is not a site that end users--human beings--will browse and
> consult the way people consult Wikipedia, Wiktionary, Commons, etc.

Machines (with people behind them) _do_ edit Wikidata. Wikidata is
designed to be read and written my both humans and machines. And it is
used that way.

> Wikidata is, or will be, of interest mostly to re-users--search engines and
> other intermediaries who will use its machine-readable data as an input to
> build and design their own content. And when they use Wikidata as an input,
> they don't have to acknowledge the source.
>
> Allowing unattributed re-use may *seem* more open. But I contend that in
> practice it makes Wikidata *less* open as a wiki: because when people don't
> know where the information comes from, they are also unable to contribute
> at source. The underlying Wikimedia project effectively becomes invisible
> to them, a closed book.
>
> That is not good for a crowdsourced project from multiple points of view.
>
> Firstly, it impedes recruitment. Far fewer consumers of Wikidata
> information will become Wikidata editors, because they will typically find
> Wikidata content on other sites where Wikidata is not even mentioned.

That is why I am working with re-users of Wikidata's data on this.
They can link to Wikidata. They can build ways to let their users edit
in-place. inventaire and Histropedia are two projects that show the
start of this. As I wrote in my Signpost piece it needs work and
education that is ongoing.

> Secondly, it reduces transparency. Data provenance is important, as Mark
> Graham and Heather Ford have pointed out.
>
> Thirdly, it fails to encourage appropriate vigilance in the consumer. (The
> error propagation problems I've described in this thread all involved
> unattributed re-use of Wikimedia content.)
>
> There are other reasons why Wikidata is less open, besides CC0 and the lack
> of attribution.
>
> Wikidata is the least user-friendly Wikimedia wiki. The hurdle that
> newbies--even experienced Wikimedians--have to overcome to contribute is an
> order of magnitude higher than it is for other Wikimedia projects.

Granted Wikidata isn't the most userfriendly at this point - which is
why we are working on improvements in that area. Some of them have
gone live just the other week. More will go live in January.

> For a start, there is no Edit tab at the top of the page. When you go to
> Barack Obama's entry in Wikidata[1] for example, the word "Edit" is not to
> be found anywhere on the page. It does not look like a page you can edit
> (and indeed, members of the public can't edit it).

Now please go to any other page that is not protected. It has edit
links plastered all over it. Editing there is much much more obvious
than on Wikipedia.
I really encourage you to actually go and edit on Wikidata for longer
than 2 minutes.

> It took me a while to figure out that the item is protected (just like the
> Jerusalem item).

We have a lock icon in the top right corner to indicate protected
items like this.

> In other Wikimedia wikis that do have an "Edit" tab, that tab changes to
> "View source" if the page is protected, giving a visual indication of the
> page's status that people--Wikimedia insiders at least--can recognise.
>
> Unprotected Wikidata items do have "edit" and "add" links, but they are
> less prominent. (The "add" link for adding new properties is hidden away at
> the very bottom of the page.) And when you do click "edit" or "add", it is
> not obvious what you are supposed to do, the way it is in text-based wikis.

It is not a text-based wiki. So yes some things work differently. That
doesn't necessarily mean

Re: [Wikimedia-l] Quality issues

2015-12-20 Thread Andrea Zanni

I second all Lydia's answers.
Also, I do think that there is a huge difference between usability/UX
issues and core, fundamental, systemic issues.
I personally think, Andreas, that you are displaying usability issues,
which are solvable (not easy, and not trivial, but at least can be fixed).

Regarding the CC0 vs CC-BY-SA problem, I don't think a single switch
between license would solve all the attribution problem: it hasn't solved
propagation of errors in the past with Wikipedia, I don't really get how it
could solve propagation of errors for Wikidata (we do know, though, that it
would bring a hell of issues for Wikidata itaself).

Aubrey

On Sun, Dec 20, 2015 at 12:25 PM, Lydia Pintscher <
lydia.pintsc...@wikimedia.de> wrote:

> On Fri, Dec 18, 2015 at 4:06 PM, Andreas Kolbe  wrote:
> > On Fri, Dec 18, 2015 at 11:28 AM, Andrea Zanni  >
> > wrote:
> >
> >> Andreas, you apparently did not read the following sentence:
> >> "Of course, the opposite is also true: it's a single point of openness,
> >> correction, information. "
> >>
> >
> > Andrea,
> >
> > I understand and appreciate your point, but I would like you to consider
> > that what you say may be less true of Wikidata than it is for other
> > Wikimedia wikis, for several reasons:
> >
> > Wikipedia, Wiktionary etc. are functionally open and correctable because
> > people by and large view their content on Wikipedia, Wiktionary etc.
> itself
> > (or in places where the provenance is clearly indicated, thanks to CC
> > BY-SA). The place where you read it is the same place where you can edit
> > it. There is an "Edit" tab, and it really *is* easy to change the
> content.
> > (It is certainly easy to correct a typo, which is how many of us
> started.)
>
> You are used to the edit tab being there. Someone recently said on
> Twitter this is the most displayed invisible link on the internet. All
> a matter of perspective and what we are used to ;-)
>
> > With Wikidata, this is different. Wikidata, as a semantic wiki, is
> designed
> > to be read by machines. These machines don't edit, they *propagate*.
> > Wikidata is not a site that end users--human beings--will browse and
> > consult the way people consult Wikipedia, Wiktionary, Commons, etc.
>
> Machines (with people behind them) _do_ edit Wikidata. Wikidata is
> designed to be read and written my both humans and machines. And it is
> used that way.
>
> > Wikidata is, or will be, of interest mostly to re-users--search engines
> and
> > other intermediaries who will use its machine-readable data as an input
> to
> > build and design their own content. And when they use Wikidata as an
> input,
> > they don't have to acknowledge the source.
> >
> > Allowing unattributed re-use may *seem* more open. But I contend that in
> > practice it makes Wikidata *less* open as a wiki: because when people
> don't
> > know where the information comes from, they are also unable to contribute
> > at source. The underlying Wikimedia project effectively becomes invisible
> > to them, a closed book.
> >
> > That is not good for a crowdsourced project from multiple points of view.
> >
> > Firstly, it impedes recruitment. Far fewer consumers of Wikidata
> > information will become Wikidata editors, because they will typically
> find
> > Wikidata content on other sites where Wikidata is not even mentioned.
>
> That is why I am working with re-users of Wikidata's data on this.
> They can link to Wikidata. They can build ways to let their users edit
> in-place. inventaire and Histropedia are two projects that show the
> start of this. As I wrote in my Signpost piece it needs work and
> education that is ongoing.
>
> > Secondly, it reduces transparency. Data provenance is important, as Mark
> > Graham and Heather Ford have pointed out.
> >
> > Thirdly, it fails to encourage appropriate vigilance in the consumer.
> (The
> > error propagation problems I've described in this thread all involved
> > unattributed re-use of Wikimedia content.)
> >
> > There are other reasons why Wikidata is less open, besides CC0 and the
> lack
> > of attribution.
> >
> > Wikidata is the least user-friendly Wikimedia wiki. The hurdle that
> > newbies--even experienced Wikimedians--have to overcome to contribute is
> an
> > order of magnitude higher than it is for other Wikimedia projects.
>
> Granted Wikidata isn't the most userfriendly at this point - which is
> why we are working on improvements in that area. Some of them have
> gone live just the other week. More will go live in January.
>
> > For a start, there is no Edit tab at the top of the page. When you go to
> > Barack Obama's entry in Wikidata[1] for example, the word "Edit" is not
> to
> > be found anywhere on the page. It does not look like a page you can edit
> > (and indeed, members of the public can't edit it).
>
> Now please go to any other page that is not protected. It has edit
> links plastered all over it. Editing there is much much

Re: [Wikimedia-l] Quality issues

2015-12-20 Thread Andreas Kolbe

Lydia,

I can only relate my impressions to you. The first two items I looked at
(Jerusalem and Obama) happened to be protected, so on my first visit I was
completely non-plussed as to how to edit anything on Wikidata. I never
noticed the lock icon (whereas I would have noticed, say, a coloured box at
the top of the page informing me that the item is locked). If I had been
just a random user, I would not have been back.

Once I got over that one, I found the order in which statements are listed
completely confusing. I would have expected them to follow some logical
order, but it seems they are permanently *listed in the order in which they
were added to Wikidata*. So someone's date of birth can be the last
statement on a Wikidata page, or the first.

Compare for example the location of the date of birth for Angela Merkel in
https://www.wikidata.org/wiki/Q567 to the location of Barack Obama's date
of birth in https://www.wikidata.org/wiki/Q76

I tried to figure out a way to change the order, but couldn't find one.
Again, profoundly demotivating. Machines may not be bothered by this,
because they can instantly find what they are looking for, but people are.

It might help to establish a default order for statements that makes
logical sense to a human being, and that people can become used to.

As for actual editing, a few weeks ago, figuring out how to add an IBM
subsidiary to the IBM item, with a reference, must have taken me something
like half an hour. I read Wikidata:Introduction, learned about properties,
and then checked Help:Editing, which contained *nothing* about adding
properties. The word is not even mentioned.

After clicking "add" in the *existing* subsidiaries statement for IBM item,
I saw a question mark icon with a "help text" that reads,

---o0o---

Enter a value corresponding to the property named "subsidiaries". If the
property has no designated value or the actual value is not known, you may
choose an alternative to specifying a custom value by clicking the icon
next to the value input box.

---o0o---

I didn't find this text helpful at all. It could have simply said, "Enter
the name of the subsidiary in the text box, and then add a reference."

At any rate, this is what I did. After I clicked "add reference", I got a
new field that came with a "property" drop down menu pre-populated with
"sex or gender", "date of birth", "given name", "occupation", "country of
citizenship", "GND identifier" and "image", none of which are remotely
relevant to entering a reference.

The single property that would be most useful to list in that drop down
menu when people have said they want to add a reference is "reference URL".
But it's not included. If newbies don't know this property exists, how are
they supposed to discover it? Somehow I got there, but it was not enjoyable.

These are indeed all user interface issues, and quite separate from the
other aspects we have been talking about. But they contribute to making
this wiki less attractive as a site that ordinary people might want to
contribute to manually, on a casual basis.

Yes, if you are sufficiently motivated, you can figure things out. But as
things stand, I didn't find it inviting.

On Sun, Dec 20, 2015 at 11:25 AM, Lydia Pintscher <
lydia.pintsc...@wikimedia.de> wrote:

> That is why I am working with re-users of Wikidata's data on this.
> They can link to Wikidata. They can build ways to let their users edit
> in-place. inventaire and Histropedia are two projects that show the
> start of this. As I wrote in my Signpost piece it needs work and
> education that is ongoing.
>

Use a licence that requires re-users to mention "Wikidata" on their sites,
ideally with a link to the Wikidata disclaimer, and you won't have to do
any education at all, and at the same time you'll have done a great thing
for transparency of data provenance on the internet.

Moreover, you will have ensured that hundreds of millions of Internet users
are told where they can find Wikidata and edit it. Surely, if you actually
*want* to have human beings visiting and editing your wiki, that's in your
interest?

Andreas
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-20 Thread Lydia Pintscher

On Sun, Dec 20, 2015 at 2:18 PM, Andreas Kolbe  wrote:
> Lydia,
>
> I can only relate my impressions to you. The first two items I looked at

Now we're getting somewhere ;-)

> (Jerusalem and Obama) happened to be protected, so on my first visit I was
> completely non-plussed as to how to edit anything on Wikidata. I never
> noticed the lock icon (whereas I would have noticed, say, a coloured box at
> the top of the page informing me that the item is locked). If I had been
> just a random user, I would not have been back.

Ok. I think we can make the icon more colorful for example to draw
more attention to it. Mind you the icon is on-line with what you see
on Wikipedia as well. That is why we have it.

> Once I got over that one, I found the order in which statements are listed
> completely confusing. I would have expected them to follow some logical
> order, but it seems they are permanently *listed in the order in which they
> were added to Wikidata*. So someone's date of birth can be the last
> statement on a Wikidata page, or the first.
>
> Compare for example the location of the date of birth for Angela Merkel in
> https://www.wikidata.org/wiki/Q567 to the location of Barack Obama's date
> of birth in https://www.wikidata.org/wiki/Q76
>
> I tried to figure out a way to change the order, but couldn't find one.
> Again, profoundly demotivating. Machines may not be bothered by this,
> because they can instantly find what they are looking for, but people are.
>
> It might help to establish a default order for statements that makes
> logical sense to a human being, and that people can become used to.

Yes that is indeed one of the problems we have identified for quite
some time already. It is high on the list for 2016. I hope we get to
it in Q1.

> As for actual editing, a few weeks ago, figuring out how to add an IBM
> subsidiary to the IBM item, with a reference, must have taken me something
> like half an hour. I read Wikidata:Introduction, learned about properties,
> and then checked Help:Editing, which contained *nothing* about adding
> properties. The word is not even mentioned.

Ok so on-wiki documentation is not good enough. Point taken. It has
been written by editors who are familiar with Wikidata. Giving
feedback on the talk pages for those help pages would be valuable.

> After clicking "add" in the *existing* subsidiaries statement for IBM item,
> I saw a question mark icon with a "help text" that reads,
>
> ---o0o---
>
> Enter a value corresponding to the property named "subsidiaries". If the
> property has no designated value or the actual value is not known, you may
> choose an alternative to specifying a custom value by clicking the icon
> next to the value input box.
>
> ---o0o---
>
> I didn't find this text helpful at all. It could have simply said, "Enter
> the name of the subsidiary in the text box, and then add a reference."

It is not as easy as that unfortunately. Potentially no item exists
for that subsidiary and then you need to create one. Also the
explanation for no-value and some-value in the text is important
(though we need to improve the UI for them). But point taken we can
improve this message.

> At any rate, this is what I did. After I clicked "add reference", I got a
> new field that came with a "property" drop down menu pre-populated with
> "sex or gender", "date of birth", "given name", "occupation", "country of
> citizenship", "GND identifier" and "image", none of which are remotely
> relevant to entering a reference.

Those should not have shown up for references and I am not aware of
issues with that. Which statement was this specifically? The
suggestions are not always perfect but at least the distinction
between properties in the main part of the statement and its
references should work very well.

> The single property that would be most useful to list in that drop down
> menu when people have said they want to add a reference is "reference URL".
> But it's not included. If newbies don't know this property exists, how are
> they supposed to discover it? Somehow I got there, but it was not enjoyable.

As above this should have shown up.

> These are indeed all user interface issues, and quite separate from the
> other aspects we have been talking about. But they contribute to making
> this wiki less attractive as a site that ordinary people might want to
> contribute to manually, on a casual basis.
>
> Yes, if you are sufficiently motivated, you can figure things out. But as
> things stand, I didn't find it inviting.

Sure. As I said we still have quite some work to do and feedback such
as the above is what will help us make it better.

> On Sun, Dec 20, 2015 at 11:25 AM, Lydia Pintscher <
> lydia.pintsc...@wikimedia.de> wrote:
>
>> That is why I am working with re-users of Wikidata's data on this.
>> They can link to Wikidata. They can build ways to let their users edit
>> in-place. inventaire and Histropedia are two projects that show the
>>

Re: [Wikimedia-l] Quality issues

2015-12-20 Thread Gnangarra

there is a compromise license cc-by without the sa

On 20 December 2015 at 21:38, Lydia Pintscher 
wrote:

> On Sun, Dec 20, 2015 at 2:18 PM, Andreas Kolbe  wrote:
> > Lydia,
> >
> > I can only relate my impressions to you. The first two items I looked at
>
> Now we're getting somewhere ;-)
>
> > (Jerusalem and Obama) happened to be protected, so on my first visit I
> was
> > completely non-plussed as to how to edit anything on Wikidata. I never
> > noticed the lock icon (whereas I would have noticed, say, a coloured box
> at
> > the top of the page informing me that the item is locked). If I had been
> > just a random user, I would not have been back.
>
> Ok. I think we can make the icon more colorful for example to draw
> more attention to it. Mind you the icon is on-line with what you see
> on Wikipedia as well. That is why we have it.
>
> > Once I got over that one, I found the order in which statements are
> listed
> > completely confusing. I would have expected them to follow some logical
> > order, but it seems they are permanently *listed in the order in which
> they
> > were added to Wikidata*. So someone's date of birth can be the last
> > statement on a Wikidata page, or the first.
> >
> > Compare for example the location of the date of birth for Angela Merkel
> in
> > https://www.wikidata.org/wiki/Q567 to the location of Barack Obama's
> date
> > of birth in https://www.wikidata.org/wiki/Q76
> >
> > I tried to figure out a way to change the order, but couldn't find one.
> > Again, profoundly demotivating. Machines may not be bothered by this,
> > because they can instantly find what they are looking for, but people
> are.
> >
> > It might help to establish a default order for statements that makes
> > logical sense to a human being, and that people can become used to.
>
> Yes that is indeed one of the problems we have identified for quite
> some time already. It is high on the list for 2016. I hope we get to
> it in Q1.
>
> > As for actual editing, a few weeks ago, figuring out how to add an IBM
> > subsidiary to the IBM item, with a reference, must have taken me
> something
> > like half an hour. I read Wikidata:Introduction, learned about
> properties,
> > and then checked Help:Editing, which contained *nothing* about adding
> > properties. The word is not even mentioned.
>
> Ok so on-wiki documentation is not good enough. Point taken. It has
> been written by editors who are familiar with Wikidata. Giving
> feedback on the talk pages for those help pages would be valuable.
>
> > After clicking "add" in the *existing* subsidiaries statement for IBM
> item,
> > I saw a question mark icon with a "help text" that reads,
> >
> > ---o0o---
> >
> > Enter a value corresponding to the property named "subsidiaries". If the
> > property has no designated value or the actual value is not known, you
> may
> > choose an alternative to specifying a custom value by clicking the icon
> > next to the value input box.
> >
> > ---o0o---
> >
> > I didn't find this text helpful at all. It could have simply said, "Enter
> > the name of the subsidiary in the text box, and then add a reference."
>
> It is not as easy as that unfortunately. Potentially no item exists
> for that subsidiary and then you need to create one. Also the
> explanation for no-value and some-value in the text is important
> (though we need to improve the UI for them). But point taken we can
> improve this message.
>
> > At any rate, this is what I did. After I clicked "add reference", I got a
> > new field that came with a "property" drop down menu pre-populated with
> > "sex or gender", "date of birth", "given name", "occupation", "country of
> > citizenship", "GND identifier" and "image", none of which are remotely
> > relevant to entering a reference.
>
> Those should not have shown up for references and I am not aware of
> issues with that. Which statement was this specifically? The
> suggestions are not always perfect but at least the distinction
> between properties in the main part of the statement and its
> references should work very well.
>
> > The single property that would be most useful to list in that drop down
> > menu when people have said they want to add a reference is "reference
> URL".
> > But it's not included. If newbies don't know this property exists, how
> are
> > they supposed to discover it? Somehow I got there, but it was not
> enjoyable.
>
> As above this should have shown up.
>
> > These are indeed all user interface issues, and quite separate from the
> > other aspects we have been talking about. But they contribute to making
> > this wiki less attractive as a site that ordinary people might want to
> > contribute to manually, on a casual basis.
> >
> > Yes, if you are sufficiently motivated, you can figure things out. But as
> > things stand, I didn't find it inviting.
>
> Sure. As I said we still have quite some work to do and feedback such
> as the above is

Re: [Wikimedia-l] Quality issues

2015-12-20 Thread geni

On 20 December 2015 at 13:18, Andreas Kolbe  wrote:

> Lydia,
>
> I can only relate my impressions to you. The first two items I looked at
> (Jerusalem and Obama) happened to be protected, so on my first visit I was
> completely non-plussed as to how to edit anything on Wikidata.
>

Both are semied on en. I think this mostly shows your ignorance of
protection patterns. The first things you think of will pretty much always
be protected since they are the ones that attract a lot of vandalism. You
either use special:random or something closer to your personal interests.

-- 
geni
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-20 Thread Andreas Kolbe

On Sun, Dec 20, 2015 at 1:38 PM, Lydia Pintscher <
lydia.pintsc...@wikimedia.de> wrote:

> > At any rate, this is what I did. After I clicked "add reference", I got a
> > new field that came with a "property" drop down menu pre-populated with
> > "sex or gender", "date of birth", "given name", "occupation", "country of
> > citizenship", "GND identifier" and "image", none of which are remotely
> > relevant to entering a reference.
>
> Those should not have shown up for references and I am not aware of
> issues with that. Which statement was this specifically? The
> suggestions are not always perfect but at least the distinction
> between properties in the main part of the statement and its
> references should work very well.
>

Just try it, Lydia. Click "add" in subsidiaries in
https://www.wikidata.org/wiki/Q37156 -- enter a company name, and then
click "add reference". When I do that, the text field contains a greyed-out
"property", and the drop-down shows the unhelpful items I mentioned above.

And it would be good if the help text actually *asked* people to cite a
reference.

> > Use a licence that requires re-users to mention "Wikidata" on their
> sites,
> > ideally with a link to the Wikidata disclaimer, and you won't have to do
> > any education at all, and at the same time you'll have done a great thing
> > for transparency of data provenance on the internet.
> >
> > Moreover, you will have ensured that hundreds of millions of Internet
> users
> > are told where they can find Wikidata and edit it. Surely, if you
> actually
> > *want* to have human beings visiting and editing your wiki, that's in
> your
> > interest?
>
> I think we have to agree to disagree on the licensing part and what is
> best for Wikidata there. Yes I do want people to come to Wikidata but
> I do not want the license to be our forceful stick to achieve this. We
> have to work to build a project that people want to come to and
> contribute to. And we can do it as the number of editors for example
> shows.

Can you tell me just whose interests it serves if re-users do not have to
indicate that the data they're showing their users come from Wikidata? Max
Klein mused that the big search engines might be paying for Wikidata "to
remove a blemish on their perceived omniscience", because they can present
Wikidata content as though they had compiled it themselves.[1] That is at
least a plausible line of thought; but whom else does it serve?

It does not serve the end user, because they are left in the dark about the
provenance of the data. Moreover, they may not understand that these are
crowdsourced data, to which certain caveats always apply.

It does not serve Wikidata's interests, because many consumers of Wikidata
content who might otherwise come to edit the wiki, correct errors, refine
information and so on, will lack the bridge that would take them there.

We are a non-profit. The public good, the benefit to society, should be our
only concern.

So, who in society benefits, other than (arguably) the big commercial
search engines? Please explain.

Andreas

[1] http://hblog.org/
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-18 Thread Gerard Meijssen

Hoi,
The CC-0 license was set up with the express reason that everybody can use
our data without any impediment.  Our objective is to share in the sum of
all knowledge and we are more effective in that way.

We do not care about market dominance, we care about doing our utmost to
have the best data available. At that I could not care less for theoretical
what ifs, I am interested in making a difference in our content because
that is where we make a difference.
Thanks,
   GerardM

On 18 December 2015 at 09:05, Andreas Kolbe  wrote:

> Gerard,
>
> Of course you can't license or copyright facts, but as the WMF legal team's
> page on this topic[1] outlines, there are database and compilation rights
> that exist independently of copyright. IANAL, but as I read that page, if
> you simply go ahead and copy all the infobox, template etc. content from a
> Wikipedia, this "would likely be a violation" even under US law (not to
> mention EU law).
>
> I don't know why Wikipedia was set up with a CC BY-SA licence rather than a
> CC0 licence, and the attribution required under CC BY-SA is unduly
> cumbersome, but attribution has always seemed to me like a useful concept.
> The fact that people like VDM Publishing who sell Wikipedia articles as
> books are required to say that their material comes from Wikipedia is
> useful, for example.
>
> Naturally it fosters re-use if you make Wikidata CC0, but that's precisely
> the point: you end up with a level of "market dominance" that just ain't
> healthy.
>
> [1] https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights
>
> On Thu, Dec 17, 2015 at 2:33 PM, Gerard Meijssen <
> gerard.meijs...@gmail.com>
> wrote:
>
> > Hoi,
> > Andreas, the law is an arse. However the law has it that you cannot
> license
> > facts. When in distributed processes data is retrieved from Wikipedia, it
> > is the authors who may contest their rights. There is no such thing as
> > collective rights for Wikipedia, all Wikipedias.
> >
> > You may not like this and that is fine.
> >
> > DBpedia has its license in the current way NOT because they care about
> the
> > license but because they are not interested in a row with Wikipedians on
> > the subject. They are quite happy to share their data with Wikidata and
> > make data retrieved in their processes with a CC-0.
> >
> > Thanks,
> >  GerardM
> >
> > On 17 December 2015 at 15:17, Andreas Kolbe  wrote:
> >
> > > On Wed, Dec 16, 2015 at 11:12 AM, Andrea Zanni <
> zanni.andre...@gmail.com
> > >
> > > wrote:
> > >
> > > > On Sun, Dec 13, 2015 at 9:35 PM, Jane Darnell 
> > wrote:
> > > >
> > > > > Andrea,
> > > > > I totally agree on the mission/vision thing, but am not sure what
> you
> > > > mean
> > > > > exactly by scale - do you mean that Wikidata shouldn't try to be so
> > > > > granular that it has a statement to cover each factoid in any
> > Wikipedia
> > > > > article, or do you mean we need to talk about what constitutes
> > > notability
> > > > > in order not to grow Wikidata exponentially to the point the
> servers
> > > > crash?
> > > > > Jane
> > > > >
> > > > >
> > > > Hi Jane, I explained myself poorly (sometime English is too difficult
> > :-)
> > > >
> > > > What I mean is that the scale of the error *could* be of another
> scale,
> > > > another order of magnitude.
> > > > The propagation of the error is multiplied, it's not just a single
> > error
> > > on
> > > > a wikipage: it's an error propagated in many wikipages, and then
> > Google,
> > > > etc.
> > > > A single point of failure.
> > > >
> > >
> > >
> > > Exactly: a single point of failure. A system where a single point of
> > > failure can have such consequences, potentially corrupting knowledge
> > > forever, is a bad system. It's not robust.
> > >
> > > In the op-ed, I mentioned the Brazilian aardvark hoax[1] as an example
> of
> > > error propagation (which happened entirely without Wikidata's and the
> > > Knowledge Graph's help). It took the New Yorker quite a bit of research
> > to
> > > piece together and confirm what happened, research which I understand
> > would
> > > not have happened if the originator of the hoax had not been willing to
> > > talk about his prank.
> > >
> > > It was the same with the fake Maurice Jarre quotes in Wikipedia[2] that
> > > made their way into mainstream press obituaries a few years ago. If the
> > > hoaxer had not come forward, no one would have been the wiser. The fake
> > > quotes would have remained a permanent part of the historical record.
> > >
> > > More recent cases include the widely repeated (including by Associated
> > > Press, for God's sake, to this day) claim that Joe Streater was
> involved
> > in
> > > the Boston College basketball point shaving scandal[3] and the Amelia
> > > Bedelia hoax.[4]
> > >
> > > If even things people insert as a joke propagate around the globe as a
> > > result of this vulnerability, then there is a clear and present
> potential

Re: [Wikimedia-l] Quality issues

2015-12-18 Thread Andreas Kolbe

Gerard,

Of course you can't license or copyright facts, but as the WMF legal team's
page on this topic[1] outlines, there are database and compilation rights
that exist independently of copyright. IANAL, but as I read that page, if
you simply go ahead and copy all the infobox, template etc. content from a
Wikipedia, this "would likely be a violation" even under US law (not to
mention EU law).

I don't know why Wikipedia was set up with a CC BY-SA licence rather than a
CC0 licence, and the attribution required under CC BY-SA is unduly
cumbersome, but attribution has always seemed to me like a useful concept.
The fact that people like VDM Publishing who sell Wikipedia articles as
books are required to say that their material comes from Wikipedia is
useful, for example.

Naturally it fosters re-use if you make Wikidata CC0, but that's precisely
the point: you end up with a level of "market dominance" that just ain't
healthy.

[1] https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights

On Thu, Dec 17, 2015 at 2:33 PM, Gerard Meijssen 
wrote:

> Hoi,
> Andreas, the law is an arse. However the law has it that you cannot license
> facts. When in distributed processes data is retrieved from Wikipedia, it
> is the authors who may contest their rights. There is no such thing as
> collective rights for Wikipedia, all Wikipedias.
>
> You may not like this and that is fine.
>
> DBpedia has its license in the current way NOT because they care about the
> license but because they are not interested in a row with Wikipedians on
> the subject. They are quite happy to share their data with Wikidata and
> make data retrieved in their processes with a CC-0.
>
> Thanks,
>  GerardM
>
> On 17 December 2015 at 15:17, Andreas Kolbe  wrote:
>
> > On Wed, Dec 16, 2015 at 11:12 AM, Andrea Zanni  >
> > wrote:
> >
> > > On Sun, Dec 13, 2015 at 9:35 PM, Jane Darnell 
> wrote:
> > >
> > > > Andrea,
> > > > I totally agree on the mission/vision thing, but am not sure what you
> > > mean
> > > > exactly by scale - do you mean that Wikidata shouldn't try to be so
> > > > granular that it has a statement to cover each factoid in any
> Wikipedia
> > > > article, or do you mean we need to talk about what constitutes
> > notability
> > > > in order not to grow Wikidata exponentially to the point the servers
> > > crash?
> > > > Jane
> > > >
> > > >
> > > Hi Jane, I explained myself poorly (sometime English is too difficult
> :-)
> > >
> > > What I mean is that the scale of the error *could* be of another scale,
> > > another order of magnitude.
> > > The propagation of the error is multiplied, it's not just a single
> error
> > on
> > > a wikipage: it's an error propagated in many wikipages, and then
> Google,
> > > etc.
> > > A single point of failure.
> > >
> >
> >
> > Exactly: a single point of failure. A system where a single point of
> > failure can have such consequences, potentially corrupting knowledge
> > forever, is a bad system. It's not robust.
> >
> > In the op-ed, I mentioned the Brazilian aardvark hoax[1] as an example of
> > error propagation (which happened entirely without Wikidata's and the
> > Knowledge Graph's help). It took the New Yorker quite a bit of research
> to
> > piece together and confirm what happened, research which I understand
> would
> > not have happened if the originator of the hoax had not been willing to
> > talk about his prank.
> >
> > It was the same with the fake Maurice Jarre quotes in Wikipedia[2] that
> > made their way into mainstream press obituaries a few years ago. If the
> > hoaxer had not come forward, no one would have been the wiser. The fake
> > quotes would have remained a permanent part of the historical record.
> >
> > More recent cases include the widely repeated (including by Associated
> > Press, for God's sake, to this day) claim that Joe Streater was involved
> in
> > the Boston College basketball point shaving scandal[3] and the Amelia
> > Bedelia hoax.[4]
> >
> > If even things people insert as a joke propagate around the globe as a
> > result of this vulnerability, then there is a clear and present potential
> > for purposeful manipulation. We've seen enough cases of that, too.[5]
> >
> > This is not the sort of system the Wikimedia community should be helping
> to
> > build. The very values at the heart of the Wikimedia movement are about
> > transparency, accountability, multiple points of view, pluralism,
> > democracy, opposing dominance and control by vested interests, and so
> > forth.
> >
> > What is the way forward?
> >
> > Wikidata should, as a matter of urgency, rescind its decision to make its
> > content available under the CC0 licence. Global propagation without
> > attribution is a terrible idea.
> >
> > Quite apart from that, in my opinion Wikidata's CC0 licensing also
> > infringes Wikipedia contributors' rights as enshrined in Wikipedia's CC
> > BY-SA licence, a

Re: [Wikimedia-l] Quality issues

2015-12-18 Thread Peter Southwood

Wikipedia is not about infoboxes, they are (and are intended to be) a small to 
very small part of the article in most cases. Similarly, Wikipedias are not 
databases, so also without being a lawyer, I think your interpretation is wrong.
Cheers,
Peter

-Original Message-
From: Wikimedia-l [mailto:wikimedia-l-boun...@lists.wikimedia.org] On Behalf Of 
Andreas Kolbe
Sent: Friday, 18 December 2015 10:06 AM
To: Wikimedia Mailing List
Subject: Re: [Wikimedia-l] Quality issues

Gerard,

Of course you can't license or copyright facts, but as the WMF legal team's 
page on this topic[1] outlines, there are database and compilation rights that 
exist independently of copyright. IANAL, but as I read that page, if you simply 
go ahead and copy all the infobox, template etc. content from a Wikipedia, this 
"would likely be a violation" even under US law (not to mention EU law).

I don't know why Wikipedia was set up with a CC BY-SA licence rather than a
CC0 licence, and the attribution required under CC BY-SA is unduly cumbersome, 
but attribution has always seemed to me like a useful concept.
The fact that people like VDM Publishing who sell Wikipedia articles as books 
are required to say that their material comes from Wikipedia is useful, for 
example.

Naturally it fosters re-use if you make Wikidata CC0, but that's precisely the 
point: you end up with a level of "market dominance" that just ain't healthy.

[1] https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights

On Thu, Dec 17, 2015 at 2:33 PM, Gerard Meijssen <gerard.meijs...@gmail.com>
wrote:

> Hoi,
> Andreas, the law is an arse. However the law has it that you cannot 
> license facts. When in distributed processes data is retrieved from 
> Wikipedia, it is the authors who may contest their rights. There is no 
> such thing as collective rights for Wikipedia, all Wikipedias.
>
> You may not like this and that is fine.
>
> DBpedia has its license in the current way NOT because they care about 
> the license but because they are not interested in a row with 
> Wikipedians on the subject. They are quite happy to share their data 
> with Wikidata and make data retrieved in their processes with a CC-0.
>
> Thanks,
>  GerardM
>
> On 17 December 2015 at 15:17, Andreas Kolbe <jayen...@gmail.com> wrote:
>
> > On Wed, Dec 16, 2015 at 11:12 AM, Andrea Zanni 
> > <zanni.andre...@gmail.com
> >
> > wrote:
> >
> > > On Sun, Dec 13, 2015 at 9:35 PM, Jane Darnell <jane...@gmail.com>
> wrote:
> > >
> > > > Andrea,
> > > > I totally agree on the mission/vision thing, but am not sure 
> > > > what you
> > > mean
> > > > exactly by scale - do you mean that Wikidata shouldn't try to be 
> > > > so granular that it has a statement to cover each factoid in any
> Wikipedia
> > > > article, or do you mean we need to talk about what constitutes
> > notability
> > > > in order not to grow Wikidata exponentially to the point the 
> > > > servers
> > > crash?
> > > > Jane
> > > >
> > > >
> > > Hi Jane, I explained myself poorly (sometime English is too 
> > > difficult
> :-)
> > >
> > > What I mean is that the scale of the error *could* be of another 
> > > scale, another order of magnitude.
> > > The propagation of the error is multiplied, it's not just a single
> error
> > on
> > > a wikipage: it's an error propagated in many wikipages, and then
> Google,
> > > etc.
> > > A single point of failure.
> > >
> >
> >
> > Exactly: a single point of failure. A system where a single point of 
> > failure can have such consequences, potentially corrupting knowledge 
> > forever, is a bad system. It's not robust.
> >
> > In the op-ed, I mentioned the Brazilian aardvark hoax[1] as an 
> > example of error propagation (which happened entirely without 
> > Wikidata's and the Knowledge Graph's help). It took the New Yorker 
> > quite a bit of research
> to
> > piece together and confirm what happened, research which I 
> > understand
> would
> > not have happened if the originator of the hoax had not been willing 
> > to talk about his prank.
> >
> > It was the same with the fake Maurice Jarre quotes in Wikipedia[2] 
> > that made their way into mainstream press obituaries a few years 
> > ago. If the hoaxer had not come forward, no one would have been the 
> > wiser. The fake quotes would have remained a permanent part of the 
> > historical record.
> >
> > More recent cases include the widely repeated (including by 
> > Asso

Re: [Wikimedia-l] Quality issues

2015-12-18 Thread Jane Darnell

The only infoboxes I have touched on Wikipedia in relation to Wikidata are
the ones I created with data from Wikidata with PrepBio and not the other
way around. As far as I know there is no tool available to import Wikidata
statements from Wikipedia infoboxes. This is why it took so long to get rid
of the persondata infoboxes, because the data was not formatted in a way
that was easily importable into Wikidata. Eventually the persondata was
deleted because the birth/death data was updated in Wikidata, albeit in a
different way. Unfortunately we lost all of the alternate spellings that
could have been added to the aliases on Wikidata, but I was delighted that
Maarten Dammers was able to upload aliases for artists into Wikidata last
week from ULAN, which means we now have way more aliases per artist
available for searching than we ever had on Wikipedia.

On Fri, Dec 18, 2015 at 9:24 AM, Peter Southwood <
peter.southw...@telkomsa.net> wrote:

> Wikipedia is not about infoboxes, they are (and are intended to be) a
> small to very small part of the article in most cases. Similarly,
> Wikipedias are not databases, so also without being a lawyer, I think your
> interpretation is wrong.
> Cheers,
> Peter
>
> -Original Message-
> From: Wikimedia-l [mailto:wikimedia-l-boun...@lists.wikimedia.org] On
> Behalf Of Andreas Kolbe
> Sent: Friday, 18 December 2015 10:06 AM
> To: Wikimedia Mailing List
> Subject: Re: [Wikimedia-l] Quality issues
>
> Gerard,
>
> Of course you can't license or copyright facts, but as the WMF legal
> team's page on this topic[1] outlines, there are database and compilation
> rights that exist independently of copyright. IANAL, but as I read that
> page, if you simply go ahead and copy all the infobox, template etc.
> content from a Wikipedia, this "would likely be a violation" even under US
> law (not to mention EU law).
>
> I don't know why Wikipedia was set up with a CC BY-SA licence rather than a
> CC0 licence, and the attribution required under CC BY-SA is unduly
> cumbersome, but attribution has always seemed to me like a useful concept.
> The fact that people like VDM Publishing who sell Wikipedia articles as
> books are required to say that their material comes from Wikipedia is
> useful, for example.
>
> Naturally it fosters re-use if you make Wikidata CC0, but that's precisely
> the point: you end up with a level of "market dominance" that just ain't
> healthy.
>
> [1] https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights
>
> On Thu, Dec 17, 2015 at 2:33 PM, Gerard Meijssen <
> gerard.meijs...@gmail.com>
> wrote:
>
> > Hoi,
> > Andreas, the law is an arse. However the law has it that you cannot
> > license facts. When in distributed processes data is retrieved from
> > Wikipedia, it is the authors who may contest their rights. There is no
> > such thing as collective rights for Wikipedia, all Wikipedias.
> >
> > You may not like this and that is fine.
> >
> > DBpedia has its license in the current way NOT because they care about
> > the license but because they are not interested in a row with
> > Wikipedians on the subject. They are quite happy to share their data
> > with Wikidata and make data retrieved in their processes with a CC-0.
> >
> > Thanks,
> >  GerardM
> >
> > On 17 December 2015 at 15:17, Andreas Kolbe <jayen...@gmail.com> wrote:
> >
> > > On Wed, Dec 16, 2015 at 11:12 AM, Andrea Zanni
> > > <zanni.andre...@gmail.com
> > >
> > > wrote:
> > >
> > > > On Sun, Dec 13, 2015 at 9:35 PM, Jane Darnell <jane...@gmail.com>
> > wrote:
> > > >
> > > > > Andrea,
> > > > > I totally agree on the mission/vision thing, but am not sure
> > > > > what you
> > > > mean
> > > > > exactly by scale - do you mean that Wikidata shouldn't try to be
> > > > > so granular that it has a statement to cover each factoid in any
> > Wikipedia
> > > > > article, or do you mean we need to talk about what constitutes
> > > notability
> > > > > in order not to grow Wikidata exponentially to the point the
> > > > > servers
> > > > crash?
> > > > > Jane
> > > > >
> > > > >
> > > > Hi Jane, I explained myself poorly (sometime English is too
> > > > difficult
> > :-)
> > > >
> > > > What I mean is that the scale of the error *could* be of another
> > > > scale, another order of magnitude.
> > > > The propagation of the error is multiplied, it's not just a single
>

Re: [Wikimedia-l] Quality issues

2015-12-18 Thread Andreas Kolbe

On Fri, Dec 18, 2015 at 8:24 AM, Peter Southwood <
peter.southw...@telkomsa.net> wrote:

> Wikipedia is not about infoboxes, they are (and are intended to be) a
> small to very small part of the article in most cases. Similarly,
> Wikipedias are not databases, so also without being a lawyer, I think your
> interpretation is wrong.

If you look at the Meta document I linked, you'll find that the definition
of a database provided there is quite broad:

---o0o---

From a legal perspective, a database is any organized collection of
materials — hard copy or electronic — that permits a user to search for and
access individual pieces of information contained within the materials. No
database software, as a programmer would understand it, is necessary. In
the US, for example, Black’s Law Dictionary defines a database as a
"compilation of information arranged in a systematic way and offering a
means of finding specific elements it contains, often today by electronic
means."[1] Databases may be protected by US copyright law as
"compilations." In the EU, databases are protected by the Database
Directive, which defines a database as "a collection of independent works,
data or other materials arranged in a systematic or methodical way and
individually accessible by electronic or other means."

---o0o---

You could argue that the sum of Wikipedia's harvestable infoboxes,
templates etc. constitutes a database, according to those definitions.

There is also the argument about the benefit of attribution, as opposed to
having data appear out of nowhere in a way that is completely opaque to end
users.

On Fri, Dec 18, 2015 at 10:21 AM, Gerard Meijssen  wrote:

> Hoi,
> The CC-0 license was set up with the express reason that everybody can use
> our data without any impediment.  Our objective is to share in the sum of
> all knowledge and we are more effective in that way.
>

> We do not care about market dominance, we care about doing our utmost to
> have the best data available.

Are these not just well-worn platitudes? If you cared so much about
quality, you or someone else would have fixed the Grasulf II of Friuli
entry by now.

> On 18 December 2015 at 09:05, Andreas Kolbe  wrote:
>
> > Gerard,
> >
> > Of course you can't license or copyright facts, but as the WMF legal
> team's
> > page on this topic[1] outlines, there are database and compilation rights
> > that exist independently of copyright. IANAL, but as I read that page, if
> > you simply go ahead and copy all the infobox, template etc. content from
> a
> > Wikipedia, this "would likely be a violation" even under US law (not to
> > mention EU law).
> >
> > I don't know why Wikipedia was set up with a CC BY-SA licence rather
> than a
> > CC0 licence, and the attribution required under CC BY-SA is unduly
> > cumbersome, but attribution has always seemed to me like a useful
> concept.
> > The fact that people like VDM Publishing who sell Wikipedia articles as
> > books are required to say that their material comes from Wikipedia is
> > useful, for example.
> >
> > Naturally it fosters re-use if you make Wikidata CC0, but that's
> precisely
> > the point: you end up with a level of "market dominance" that just ain't
> > healthy.
> >
> > [1] https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights
>
>
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-18 Thread Peter Southwood

Depending on how broad you want to stretch it, that covers an encyclopaedia or 
even a public library.
Not particularly helpful. 
Also there is the matter of how much is taken from it in the form of data, 
there is likely to be much more data available in the articles than is or will 
ever be used by Wikidata.
You could equally, possibly more convincingly, argue that the sum of 
Wikipedia's infoboxes, templates etc does not constitute a database, 
particularly since that was not the intention, and they have not been applied 
consistently and/or systematically to the whole project.
Cheers,
P

-Original Message-
From: Wikimedia-l [mailto:wikimedia-l-boun...@lists.wikimedia.org] On Behalf Of 
Andreas Kolbe
Sent: Friday, 18 December 2015 1:05 PM
To: Wikimedia Mailing List
Subject: Re: [Wikimedia-l] Quality issues

On Fri, Dec 18, 2015 at 8:24 AM, Peter Southwood < 
peter.southw...@telkomsa.net> wrote:

> Wikipedia is not about infoboxes, they are (and are intended to be) a 
> small to very small part of the article in most cases. Similarly, 
> Wikipedias are not databases, so also without being a lawyer, I think 
> your interpretation is wrong.

If you look at the Meta document I linked, you'll find that the definition of a 
database provided there is quite broad:

---o0o---

From a legal perspective, a database is any organized collection of materials — 
hard copy or electronic — that permits a user to search for and access 
individual pieces of information contained within the materials. No database 
software, as a programmer would understand it, is necessary. In the US, for 
example, Black’s Law Dictionary defines a database as a "compilation of 
information arranged in a systematic way and offering a means of finding 
specific elements it contains, often today by electronic means."[1] Databases 
may be protected by US copyright law as "compilations." In the EU, databases 
are protected by the Database Directive, which defines a database as "a 
collection of independent works, data or other materials arranged in a 
systematic or methodical way and individually accessible by electronic or other 
means."

---o0o---

You could argue that the sum of Wikipedia's harvestable infoboxes, templates 
etc. constitutes a database, according to those definitions.

There is also the argument about the benefit of attribution, as opposed to 
having data appear out of nowhere in a way that is completely opaque to end 
users.

On Fri, Dec 18, 2015 at 10:21 AM, Gerard Meijssen <gerard.meijs...@gmail.com
> wrote:

> Hoi,
> The CC-0 license was set up with the express reason that everybody can 
> use our data without any impediment.  Our objective is to share in the 
> sum of all knowledge and we are more effective in that way.
>

> We do not care about market dominance, we care about doing our utmost 
> to have the best data available.

Are these not just well-worn platitudes? If you cared so much about quality, 
you or someone else would have fixed the Grasulf II of Friuli entry by now.

> On 18 December 2015 at 09:05, Andreas Kolbe <jayen...@gmail.com> wrote:
>
> > Gerard,
> >
> > Of course you can't license or copyright facts, but as the WMF legal
> team's
> > page on this topic[1] outlines, there are database and compilation 
> > rights that exist independently of copyright. IANAL, but as I read 
> > that page, if you simply go ahead and copy all the infobox, template 
> > etc. content from
> a
> > Wikipedia, this "would likely be a violation" even under US law (not 
> > to mention EU law).
> >
> > I don't know why Wikipedia was set up with a CC BY-SA licence rather
> than a
> > CC0 licence, and the attribution required under CC BY-SA is unduly 
> > cumbersome, but attribution has always seemed to me like a useful
> concept.
> > The fact that people like VDM Publishing who sell Wikipedia articles 
> > as books are required to say that their material comes from 
> > Wikipedia is useful, for example.
> >
> > Naturally it fosters re-use if you make Wikidata CC0, but that's
> precisely
> > the point: you end up with a level of "market dominance" that just 
> > ain't healthy.
> >
> > [1] https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights
>
>
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2016.0.7294 / Virus Database: 4489/11202 - Release Date: 12/18/15

__

Re: [Wikimedia-l] Quality issues

2015-12-18 Thread Gerard Meijssen

Hoi,
I have made changes to Grasulf II and I believe  it is better because of
it. If you find fault, you can do what I often do: make a difference.. Yes,
I do edit Wikipedia occasionally based on the info that I find.
Thanks,
  GerardM

On 18 December 2015 at 12:04, Andreas Kolbe  wrote:

> On Fri, Dec 18, 2015 at 8:24 AM, Peter Southwood <
> peter.southw...@telkomsa.net> wrote:
>
> > Wikipedia is not about infoboxes, they are (and are intended to be) a
> > small to very small part of the article in most cases. Similarly,
> > Wikipedias are not databases, so also without being a lawyer, I think
> your
> > interpretation is wrong.
>
>
>
> If you look at the Meta document I linked, you'll find that the definition
> of a database provided there is quite broad:
>
> ---o0o---
>
> From a legal perspective, a database is any organized collection of
> materials — hard copy or electronic — that permits a user to search for and
> access individual pieces of information contained within the materials. No
> database software, as a programmer would understand it, is necessary. In
> the US, for example, Black’s Law Dictionary defines a database as a
> "compilation of information arranged in a systematic way and offering a
> means of finding specific elements it contains, often today by electronic
> means."[1] Databases may be protected by US copyright law as
> "compilations." In the EU, databases are protected by the Database
> Directive, which defines a database as "a collection of independent works,
> data or other materials arranged in a systematic or methodical way and
> individually accessible by electronic or other means."
>
> ---o0o---
>
> You could argue that the sum of Wikipedia's harvestable infoboxes,
> templates etc. constitutes a database, according to those definitions.
>
> There is also the argument about the benefit of attribution, as opposed to
> having data appear out of nowhere in a way that is completely opaque to end
> users.
>
>
> On Fri, Dec 18, 2015 at 10:21 AM, Gerard Meijssen <
> gerard.meijs...@gmail.com
> > wrote:
>
> > Hoi,
> > The CC-0 license was set up with the express reason that everybody can
> use
> > our data without any impediment.  Our objective is to share in the sum of
> > all knowledge and we are more effective in that way.
> >
>
>
> > We do not care about market dominance, we care about doing our utmost to
> > have the best data available.
>
>
>
> Are these not just well-worn platitudes? If you cared so much about
> quality, you or someone else would have fixed the Grasulf II of Friuli
> entry by now.
>
>
>
>
> > On 18 December 2015 at 09:05, Andreas Kolbe  wrote:
> >
> > > Gerard,
> > >
> > > Of course you can't license or copyright facts, but as the WMF legal
> > team's
> > > page on this topic[1] outlines, there are database and compilation
> rights
> > > that exist independently of copyright. IANAL, but as I read that page,
> if
> > > you simply go ahead and copy all the infobox, template etc. content
> from
> > a
> > > Wikipedia, this "would likely be a violation" even under US law (not to
> > > mention EU law).
> > >
> > > I don't know why Wikipedia was set up with a CC BY-SA licence rather
> > than a
> > > CC0 licence, and the attribution required under CC BY-SA is unduly
> > > cumbersome, but attribution has always seemed to me like a useful
> > concept.
> > > The fact that people like VDM Publishing who sell Wikipedia articles as
> > > books are required to say that their material comes from Wikipedia is
> > > useful, for example.
> > >
> > > Naturally it fosters re-use if you make Wikidata CC0, but that's
> > precisely
> > > the point: you end up with a level of "market dominance" that just
> ain't
> > > healthy.
> > >
> > > [1] https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights
> >
> >
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> New messages to: Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-18 Thread Andrea Zanni

On Thu, Dec 17, 2015 at 3:17 PM, Andreas Kolbe  wrote:

> > A single point of failure.
> >
>
> Exactly: a single point of failure. A system where a single point of
> failure can have such consequences, potentially corrupting knowledge
> forever, is a bad system. It's not robust.

Andreas, you apparently did not read the following sentence:
"Of course, the opposite is also true: it's a single point of openness,
correction, information. "

At last, I agree with Gerard:
you seem not to accept people arguments and continue to reiterate yours
again and again.
The problem, to me, is that you don't like Wikis: you don't like that they
are open, and prone to errors and vulnerable. Yet, this is our greatest
weakness and strength, at the same time.
The Wikimedia movement, at least for the last 15 years, believes in this,
is one of our pillars.
So, if you don't like it, maybe the Wikimedia movements is not suitable for
you, maybe you'd like more working in Citizendium or something. There's no
shame in it, and I really believe it: it's just a matter of choice.

I personally choose to believe in openness as a way to leverage good will
from people, willingness to share knowledge. I believe Wikidata is going in
the same direction, and I have not found evidence yet that the "power and
centralisation" of data make the openness a problem of a different
magnitudo, different from Wikipedia.

I'm happy to discuss this point specifically, as I think we can have a
reasonable and constructive debate on this.

But if you reiterate examples on Wikipedia, you lose me. We already have
taken a choice, we believe that the payoff between openness and control is
worth it.

>Are these not just well-worn platitudes? If you cared so much about
>quality, you or someone else would have fixed the Grasulf II of Friuli
>entry by now.

You are included in the set of "someone else", you found all the errors,
and you could have corrected them. You decided it was best to write a very
long mail instead of correcting them. It's you're right, but it's not the
wikimedia way.
The Wikimedia way is wonderfully explained in three magical words: so fix
it [1].

Aubrey

[1] https://en.wikipedia.org/wiki/Template:Sofixit
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-18 Thread Andreas Kolbe

On Fri, Dec 18, 2015 at 11:28 AM, Andrea Zanni 
wrote:

> Andreas, you apparently did not read the following sentence:
> "Of course, the opposite is also true: it's a single point of openness,
> correction, information. "
>

Andrea,

I understand and appreciate your point, but I would like you to consider
that what you say may be less true of Wikidata than it is for other
Wikimedia wikis, for several reasons:

Wikipedia, Wiktionary etc. are functionally open and correctable because
people by and large view their content on Wikipedia, Wiktionary etc. itself
(or in places where the provenance is clearly indicated, thanks to CC
BY-SA). The place where you read it is the same place where you can edit
it. There is an "Edit" tab, and it really *is* easy to change the content.
(It is certainly easy to correct a typo, which is how many of us started.)

With Wikidata, this is different. Wikidata, as a semantic wiki, is designed
to be read by machines. These machines don't edit, they *propagate*.
Wikidata is not a site that end users--human beings--will browse and
consult the way people consult Wikipedia, Wiktionary, Commons, etc.

Wikidata is, or will be, of interest mostly to re-users--search engines and
other intermediaries who will use its machine-readable data as an input to
build and design their own content. And when they use Wikidata as an input,
they don't have to acknowledge the source.

Allowing unattributed re-use may *seem* more open. But I contend that in
practice it makes Wikidata *less* open as a wiki: because when people don't
know where the information comes from, they are also unable to contribute
at source. The underlying Wikimedia project effectively becomes invisible
to them, a closed book.

That is not good for a crowdsourced project from multiple points of view.

Firstly, it impedes recruitment. Far fewer consumers of Wikidata
information will become Wikidata editors, because they will typically find
Wikidata content on other sites where Wikidata is not even mentioned.

Secondly, it reduces transparency. Data provenance is important, as Mark
Graham and Heather Ford have pointed out.

Thirdly, it fails to encourage appropriate vigilance in the consumer. (The
error propagation problems I've described in this thread all involved
unattributed re-use of Wikimedia content.)

There are other reasons why Wikidata is less open, besides CC0 and the lack
of attribution.

Wikidata is the least user-friendly Wikimedia wiki. The hurdle that
newbies--even experienced Wikimedians--have to overcome to contribute is an
order of magnitude higher than it is for other Wikimedia projects.

For a start, there is no Edit tab at the top of the page. When you go to
Barack Obama's entry in Wikidata[1] for example, the word "Edit" is not to
be found anywhere on the page. It does not look like a page you can edit
(and indeed, members of the public can't edit it).

It took me a while to figure out that the item is protected (just like the
Jerusalem item).

In other Wikimedia wikis that do have an "Edit" tab, that tab changes to
"View source" if the page is protected, giving a visual indication of the
page's status that people--Wikimedia insiders at least--can recognise.

Unprotected Wikidata items do have "edit" and "add" links, but they are
less prominent. (The "add" link for adding new properties is hidden away at
the very bottom of the page.) And when you do click "edit" or "add", it is
not obvious what you are supposed to do, the way it is in text-based wikis.

The learning curve involved in actually editing a Wikidata item is far
steeper than it is in other Wikimedia wikis. There is no Wikidata
equivalent of the "correcting a typo" edit in Wikipedia. You need to go
away and learn the syntax before you can do anything at all in Wikidata.

For all of these reasons I believe the systemic balance between information
delivery (output) and ease of contribution (input) is substantially
different for Wikidata than it is for any other Wikimedia wiki.

> So, if you don't like it, maybe the Wikimedia movements is not suitable for
> you, maybe you'd like more working in Citizendium or something. There's no
> shame in it, and I really believe it: it's just a matter of choice.
>

I have been contributing to Wikimedia projects for ten years now. I
consider it an important movement to be involved in, exactly per your
arguments about openness and public involvement above. If openness is a
strength, then it follows that Wikimedia as a movement is stronger for
debate and dissent.

On a more personal level, I find the idea of free knowledge inspiring. At
every Wikimedia event I have attended, that excitement and the joy of
creation are in the air and communicate themselves. I relate to it, and
share in it. There are many Wikimedia content creators whose I work I
admire and respect, and who have become friends.

But I don't share the quasi-religious zeal that seems to suffuse some of
the public discourse in

Re: [Wikimedia-l] Quality issues

2015-12-18 Thread Gerard Meijssen

Hoi,
Andreas you have a point. The point you make that Wikidata is only
considered for re-use is compelling. I edit very much but I do NOT use
Wikidata to understand what data is there. It is a mess and not fit for
humans. This however is not necessarily true. Magnus created the
"Reasonator" and it provides me with an environment that helps me
understand what data is available. It makes information out of data and, it
is actionable in many ways.

It is not really hard to make a native Reasonator and, it will be usable in
any language as it is. It will make a big difference because it does negate
the negative arguments that you make. It is imho the biggest hurdle for
Wikidata and it is totally unnecessary for the Wikidata team to persist in
their lack of a usable user interface. It is a matter of priority.

For your information, a German university is considering the use of
Wikidata and for them a Reasonator like interface that allows them to edit
as well is what is missing for them to go ahead with Wikidata at this time.

They would use Wikidata for science and, for them the ability to link from
Wikidata to any and all other resources is a relevant of consideration.

They are interesting to share their data. They do not mind that it becomes
available under CC-0, what they look for is a best practice where their
data becomes available with a reference. We all agree that this IS a best
practice. They are as interested to learn where Wikidata disagrees because
to them it is a matter of quality to get things exactly right.
Thanks,
  GerardM

On 18 December 2015 at 16:06, Andreas Kolbe  wrote:

> On Fri, Dec 18, 2015 at 11:28 AM, Andrea Zanni 
> wrote:
>
> > Andreas, you apparently did not read the following sentence:
> > "Of course, the opposite is also true: it's a single point of openness,
> > correction, information. "
> >
>
> Andrea,
>
> I understand and appreciate your point, but I would like you to consider
> that what you say may be less true of Wikidata than it is for other
> Wikimedia wikis, for several reasons:
>
> Wikipedia, Wiktionary etc. are functionally open and correctable because
> people by and large view their content on Wikipedia, Wiktionary etc. itself
> (or in places where the provenance is clearly indicated, thanks to CC
> BY-SA). The place where you read it is the same place where you can edit
> it. There is an "Edit" tab, and it really *is* easy to change the content.
> (It is certainly easy to correct a typo, which is how many of us started.)
>
> With Wikidata, this is different. Wikidata, as a semantic wiki, is designed
> to be read by machines. These machines don't edit, they *propagate*.
> Wikidata is not a site that end users--human beings--will browse and
> consult the way people consult Wikipedia, Wiktionary, Commons, etc.
>
> Wikidata is, or will be, of interest mostly to re-users--search engines and
> other intermediaries who will use its machine-readable data as an input to
> build and design their own content. And when they use Wikidata as an input,
> they don't have to acknowledge the source.
>
> Allowing unattributed re-use may *seem* more open. But I contend that in
> practice it makes Wikidata *less* open as a wiki: because when people don't
> know where the information comes from, they are also unable to contribute
> at source. The underlying Wikimedia project effectively becomes invisible
> to them, a closed book.
>
> That is not good for a crowdsourced project from multiple points of view.
>
> Firstly, it impedes recruitment. Far fewer consumers of Wikidata
> information will become Wikidata editors, because they will typically find
> Wikidata content on other sites where Wikidata is not even mentioned.
>
> Secondly, it reduces transparency. Data provenance is important, as Mark
> Graham and Heather Ford have pointed out.
>
> Thirdly, it fails to encourage appropriate vigilance in the consumer. (The
> error propagation problems I've described in this thread all involved
> unattributed re-use of Wikimedia content.)
>
> There are other reasons why Wikidata is less open, besides CC0 and the lack
> of attribution.
>
> Wikidata is the least user-friendly Wikimedia wiki. The hurdle that
> newbies--even experienced Wikimedians--have to overcome to contribute is an
> order of magnitude higher than it is for other Wikimedia projects.
>
> For a start, there is no Edit tab at the top of the page. When you go to
> Barack Obama's entry in Wikidata[1] for example, the word "Edit" is not to
> be found anywhere on the page. It does not look like a page you can edit
> (and indeed, members of the public can't edit it).
>
> It took me a while to figure out that the item is protected (just like the
> Jerusalem item).
>
> In other Wikimedia wikis that do have an "Edit" tab, that tab changes to
> "View source" if the page is protected, giving a visual indication of the
> page's status that people--Wikimedia insiders at least--can

Re: [Wikimedia-l] Quality issues

2015-12-17 Thread Andreas Kolbe

On Wed, Dec 16, 2015 at 11:12 AM, Andrea Zanni 
wrote:

> On Sun, Dec 13, 2015 at 9:35 PM, Jane Darnell  wrote:
>
> > Andrea,
> > I totally agree on the mission/vision thing, but am not sure what you
> mean
> > exactly by scale - do you mean that Wikidata shouldn't try to be so
> > granular that it has a statement to cover each factoid in any Wikipedia
> > article, or do you mean we need to talk about what constitutes notability
> > in order not to grow Wikidata exponentially to the point the servers
> crash?
> > Jane
> >
> >
> Hi Jane, I explained myself poorly (sometime English is too difficult :-)
>
> What I mean is that the scale of the error *could* be of another scale,
> another order of magnitude.
> The propagation of the error is multiplied, it's not just a single error on
> a wikipage: it's an error propagated in many wikipages, and then Google,
> etc.
> A single point of failure.
>

Exactly: a single point of failure. A system where a single point of
failure can have such consequences, potentially corrupting knowledge
forever, is a bad system. It's not robust.

In the op-ed, I mentioned the Brazilian aardvark hoax[1] as an example of
error propagation (which happened entirely without Wikidata's and the
Knowledge Graph's help). It took the New Yorker quite a bit of research to
piece together and confirm what happened, research which I understand would
not have happened if the originator of the hoax had not been willing to
talk about his prank.

It was the same with the fake Maurice Jarre quotes in Wikipedia[2] that
made their way into mainstream press obituaries a few years ago. If the
hoaxer had not come forward, no one would have been the wiser. The fake
quotes would have remained a permanent part of the historical record.

More recent cases include the widely repeated (including by Associated
Press, for God's sake, to this day) claim that Joe Streater was involved in
the Boston College basketball point shaving scandal[3] and the Amelia
Bedelia hoax.[4]

If even things people insert as a joke propagate around the globe as a
result of this vulnerability, then there is a clear and present potential
for purposeful manipulation. We've seen enough cases of that, too.[5]

This is not the sort of system the Wikimedia community should be helping to
build. The very values at the heart of the Wikimedia movement are about
transparency, accountability, multiple points of view, pluralism,
democracy, opposing dominance and control by vested interests, and so
forth.

What is the way forward?

Wikidata should, as a matter of urgency, rescind its decision to make its
content available under the CC0 licence. Global propagation without
attribution is a terrible idea.

Quite apart from that, in my opinion Wikidata's CC0 licensing also
infringes Wikipedia contributors' rights as enshrined in Wikipedia's CC
BY-SA licence, a point Lydia Pintscher did not even contest on the Signpost
talk page. As I understand her response,[6] she restricts herself to
asserting that the responsibility for any potential licence infringement
lies with Wikidata contributors rather than with her and Wikimedia
Deutschland. That's passing the buck.

If Wikidata is not prepared to follow CC BY-SA, the way DBpedia does[7],
the next step should be a DMCA takedown notice for material mass-imported
from Wikipedia.

And of course, Wikidata needs to step up its efforts to cite verifiable
sources.

[1] http://www.newyorker.com/tech/elements/how-a-raccoon-became-an-aardvark
[2]
http://www.theguardian.com/commentisfree/2009/may/04/journalism-obituaries-shane-fitzgerald
[3]
http://awfulannouncing.com/2014/guilt-wikipedia-joe-streater-became-falsely-attached-boston-college-point-shaving-scandal.html
Associated Press:
http://bigstory.ap.org/article/list-worst-scandals-college-sports
[4] http://www.dailydot.com/lol/amelia-bedelia-wikipedia-hoax/
[5]
http://www.newsweek.com/2015/04/03/manipulating-wikipedia-promote-bogus-business-school-316133.html
and
http://www.dailydot.com/lifestyle/wikipedia-plastic-surgery-otto-placik-labiaplasty/
and many others
[6]
https://en.wikipedia.org/w/index.php?title=Wikipedia_talk:Wikipedia_Signpost/2015-12-09/Op-ed=695228403=695228022
[7] http://wiki.dbpedia.org/terms-imprint

> Of course, the opposite is also true: it's a single point of openness,
> correction, information.
> I was just wondering if this different scale is a factor in making
> Wikipedia and Wikidata different enough to accept/reject Andreas arguments.
>
> Andrea
>
>
>
> > On Sun, Dec 13, 2015 at 7:10 PM, Andrea Zanni 
> > wrote:
> >
> > > I really feel we are drowning in a glass of water.
> > > The issue of "data quality" or "reliability" that Andreas raises is
> well
> > > known:
> > > what I don't understand if the "scale" of it is much bigger on Wikidata
> > > than Wikipedia,
> > > and if this different scale makes it much more important. The scale of
> > the
> > > issue is maybe

Re: [Wikimedia-l] Quality issues

2015-12-16 Thread Jane Darnell

OK I see now what you mean, and that is an interesting point. I think in
this context you need to see the objections to the "Bonnie and Clyde"
problem. Now that we have exploded the concepts of Wikipedia into items,
our interlinking (which is what Wikidata was built for) is a bit less
tightly knit than it was. Some would argue that it's a good thing because
we have fewer unresolvable interwiki links and others would argue it's a
bad thing because they have less opportunity to redirect readers to
material on other projects. Most recently this has come up in the
discussions around structured data for commons, but early adopters noticed
it immediately in the interlanguage links. The only way forward (or
backward, depending on your point of view) is to explode the Wikipedias in
a similar way. So for example I like to work on 17th-century paintings and
sometimes they are interesting because of their subjects, and sometimes
they are interesting because of their provenance, but rarely both, so
Wikipedia articles generally deal with both. On Wikidata we will often have
items for both (the portrait and the portrayed; or a landscape and the
objects depicted in that landscape) and the interwikis link accordingly,
which means some interwikis disappear because one language Wikipedia
article is talking about the person while another language Wikipedia
article is talking about the painting, and so forth. I guess for Wikisource
it's similar with "Wikisource editions of biographies of people" vs. items
about actual people.

On Wed, Dec 16, 2015 at 12:12 PM, Andrea Zanni 
wrote:

> On Sun, Dec 13, 2015 at 9:35 PM, Jane Darnell  wrote:
>
> > Andrea,
> > I totally agree on the mission/vision thing, but am not sure what you
> mean
> > exactly by scale - do you mean that Wikidata shouldn't try to be so
> > granular that it has a statement to cover each factoid in any Wikipedia
> > article, or do you mean we need to talk about what constitutes notability
> > in order not to grow Wikidata exponentially to the point the servers
> crash?
> > Jane
> >
> >
> Hi Jane, I explained myself poorly (sometime English is too difficult :-)
>
> What I mean is that the scale of the error *could* be of another scale,
> another order of magnitude.
> The propagation of the error is multiplied, it's not just a single error on
> a wikipage: it's an error propagated in many wikipages, and then Google,
> etc.
> A single point of failure.
>
> Of course, the opposite is also true: it's a single point of openness,
> correction, information.
> I was just wondering if this different scale is a factor in making
> Wikipedia and Wikidata different enough to accept/reject Andreas arguments.
>
> Andrea
>
>
>
> > On Sun, Dec 13, 2015 at 7:10 PM, Andrea Zanni 
> > wrote:
> >
> > > I really feel we are drowning in a glass of water.
> > > The issue of "data quality" or "reliability" that Andreas raises is
> well
> > > known:
> > > what I don't understand if the "scale" of it is much bigger on Wikidata
> > > than Wikipedia,
> > > and if this different scale makes it much more important. The scale of
> > the
> > > issue is maybe something worth discussing, and not the issue itself? Is
> > the
> > > fact that Wikidata is centralised different from statements on
> > Wikipedia? I
> > > don't know, but to me this is a more neutral and interesting question.
> > >
> > > I often say that the Wikimedia world made quality an "heisemberghian"
> > > feature: you always have to check if it's there.
> > > The point is: it's been always like this.
> > > We always had to check for quality, even when we used Britannica or
> > > authority controls or whatever "reliable" sources we wanted. Wikipedia,
> > and
> > > now Wikidata, is made for everyone to contribute, it's open and honest
> in
> > > being open, vulnerable, prone to errors. But we are transparent, we say
> > > that in advance,  we can claim any statement to the smallest detail. Of
> > > course it's difficult, but we can do it. Wikidata, as Lydia said, can
> > > actually have conflicting statements in every item: we "just" have to
> put
> > > them there, as we did to Wikipedia.
> > >
> > > If Google uses our data and they are wrong, that's bad for them. If
> they
> > > correct the errors and do not give us the corrections, that's bad for
> us
> > > and not ethical from them. The point is: there is no license (for what
> I
> > > know) that can force them to contribute to Wikidata. That is, IMHO, the
> > > problem with "over-the-top" actors: they can harness collective
> > intelligent
> > > and "not give back." Even with CC-BY-SA, they could store (as they are
> > > probably already doing) all the data in their knowledge vault, which is
> > > secret as it is an incredible asset for them.
> > >
> > > I'd be happy to insert a new clause of "forced transparency" in
> CC-BY-SA
> > or
> > > CC0, but it's not there.
> > >
> > > So, as we are  working via GLAMs

Re: [Wikimedia-l] Quality issues

2015-12-16 Thread Andrea Zanni

On Sun, Dec 13, 2015 at 9:35 PM, Jane Darnell  wrote:

> Andrea,
> I totally agree on the mission/vision thing, but am not sure what you mean
> exactly by scale - do you mean that Wikidata shouldn't try to be so
> granular that it has a statement to cover each factoid in any Wikipedia
> article, or do you mean we need to talk about what constitutes notability
> in order not to grow Wikidata exponentially to the point the servers crash?
> Jane
>
>
Hi Jane, I explained myself poorly (sometime English is too difficult :-)

What I mean is that the scale of the error *could* be of another scale,
another order of magnitude.
The propagation of the error is multiplied, it's not just a single error on
a wikipage: it's an error propagated in many wikipages, and then Google,
etc.
A single point of failure.

Of course, the opposite is also true: it's a single point of openness,
correction, information.
I was just wondering if this different scale is a factor in making
Wikipedia and Wikidata different enough to accept/reject Andreas arguments.

Andrea



> On Sun, Dec 13, 2015 at 7:10 PM, Andrea Zanni 
> wrote:
>
> > I really feel we are drowning in a glass of water.
> > The issue of "data quality" or "reliability" that Andreas raises is well
> > known:
> > what I don't understand if the "scale" of it is much bigger on Wikidata
> > than Wikipedia,
> > and if this different scale makes it much more important. The scale of
> the
> > issue is maybe something worth discussing, and not the issue itself? Is
> the
> > fact that Wikidata is centralised different from statements on
> Wikipedia? I
> > don't know, but to me this is a more neutral and interesting question.
> >
> > I often say that the Wikimedia world made quality an "heisemberghian"
> > feature: you always have to check if it's there.
> > The point is: it's been always like this.
> > We always had to check for quality, even when we used Britannica or
> > authority controls or whatever "reliable" sources we wanted. Wikipedia,
> and
> > now Wikidata, is made for everyone to contribute, it's open and honest in
> > being open, vulnerable, prone to errors. But we are transparent, we say
> > that in advance,  we can claim any statement to the smallest detail. Of
> > course it's difficult, but we can do it. Wikidata, as Lydia said, can
> > actually have conflicting statements in every item: we "just" have to put
> > them there, as we did to Wikipedia.
> >
> > If Google uses our data and they are wrong, that's bad for them. If they
> > correct the errors and do not give us the corrections, that's bad for us
> > and not ethical from them. The point is: there is no license (for what I
> > know) that can force them to contribute to Wikidata. That is, IMHO, the
> > problem with "over-the-top" actors: they can harness collective
> intelligent
> > and "not give back." Even with CC-BY-SA, they could store (as they are
> > probably already doing) all the data in their knowledge vault, which is
> > secret as it is an incredible asset for them.
> >
> > I'd be happy to insert a new clause of "forced transparency" in CC-BY-SA
> or
> > CC0, but it's not there.
> >
> > So, as we are  working via GLAMs with Wikipedia for getting reliable
> > sources and content, we are working with them also for good statements
> and
> > data. Putting good data in Wikidata makes it better, and I don't
> understand
> > what is the problem here (I understand, again, the issue of putting too
> > much data and still having a small community).
> > For example: if we are importing different reliable databases, andthe
> > institutions behind them find it useful and helpful to have an aggregator
> > of identifiers and authority controls, what is the issue? There is value
> in
> > aggregating data, because you can spot errors and inconsistencies. It's
> not
> > easy, of course, to find a good workflow, but, again, that is *another*
> > problem.
> >
> > So, in conclusion: I find many issues in Wikidata, but not on the
> > mission/vision, just in the complexity of the project, the size of the
> > dataset, the size of the community.
> >
> > Can we talk about those?
> >
> > Aubrey
> >
> >
> >
> > On Sun, Dec 13, 2015 at 6:40 PM, Andreas Kolbe 
> wrote:
> >
> > > On Sun, Dec 13, 2015 at 5:32 PM, geni  wrote:
> > >
> > > > On 13 December 2015 at 15:57, Andreas Kolbe 
> > wrote:
> > > >
> > > > > Jane,
> > > > >
> > > > > The issue is that you can't cite one Wikipedia article as a source
> in
> > > > > another.
> > > > >
> > > >
> > > >
> > > > However you can within the same article per [[WP:LEAD]].
> > > >
> > >
> > >
> > > Well, of course, if there are reliable sources cited in the body of the
> > > article that back up the statements made in the lead. You still need to
> > > cite a reliable source though; that's Wikipedia 101.
> > > ___
> > > Wikimedia-l mailing list,

Re: [Wikimedia-l] Quality issues

2015-12-16 Thread Gerard Meijssen

Hoi,
The one thing where Wikidata shines is in connecting sources through
identifiers. It connects all Wikipedias through the interwiki links and
improving these has been an ongoing process of the last three years. Every
week more external identifiers are added and it is in the mix-n-match tool
by Magnus that many of these connections are made.

As more sources are added, the opportunity grows to compare and curate. The
law on copyright hold that you cannot use complete databases but it allows
you to compare and curate. When values match, there is no obvious issue.
When they do not, it is a matter of signalling the difference and
evaluating the opposing values.

What we should NOT do is accept any value as 100% correct. Sources are
known to be wrong but where everybody agrees, we can at least concentrate
on where there is a disagreement, where an investment in time makes the
most difference. In this way we do make a positive difference for our own
content and by signalling differences at the other end as well.

The problem with Andreas argument is that it does not provide any way
forward. It may be a problem and then what. By concentrating on what we do
best, sharing in the sum of all available knowledge we enable parties to
compare their content with all the other parties that have content. We
publish where we find a difference and it is then for us and others to do
the best we can.
Thanks,
  GerardM

On 16 December 2015 at 12:12, Andrea Zanni  wrote:

> On Sun, Dec 13, 2015 at 9:35 PM, Jane Darnell  wrote:
>
> > Andrea,
> > I totally agree on the mission/vision thing, but am not sure what you
> mean
> > exactly by scale - do you mean that Wikidata shouldn't try to be so
> > granular that it has a statement to cover each factoid in any Wikipedia
> > article, or do you mean we need to talk about what constitutes notability
> > in order not to grow Wikidata exponentially to the point the servers
> crash?
> > Jane
> >
> >
> Hi Jane, I explained myself poorly (sometime English is too difficult :-)
>
> What I mean is that the scale of the error *could* be of another scale,
> another order of magnitude.
> The propagation of the error is multiplied, it's not just a single error on
> a wikipage: it's an error propagated in many wikipages, and then Google,
> etc.
> A single point of failure.
>
> Of course, the opposite is also true: it's a single point of openness,
> correction, information.
> I was just wondering if this different scale is a factor in making
> Wikipedia and Wikidata different enough to accept/reject Andreas arguments.
>
> Andrea
>
>
>
> > On Sun, Dec 13, 2015 at 7:10 PM, Andrea Zanni 
> > wrote:
> >
> > > I really feel we are drowning in a glass of water.
> > > The issue of "data quality" or "reliability" that Andreas raises is
> well
> > > known:
> > > what I don't understand if the "scale" of it is much bigger on Wikidata
> > > than Wikipedia,
> > > and if this different scale makes it much more important. The scale of
> > the
> > > issue is maybe something worth discussing, and not the issue itself? Is
> > the
> > > fact that Wikidata is centralised different from statements on
> > Wikipedia? I
> > > don't know, but to me this is a more neutral and interesting question.
> > >
> > > I often say that the Wikimedia world made quality an "heisemberghian"
> > > feature: you always have to check if it's there.
> > > The point is: it's been always like this.
> > > We always had to check for quality, even when we used Britannica or
> > > authority controls or whatever "reliable" sources we wanted. Wikipedia,
> > and
> > > now Wikidata, is made for everyone to contribute, it's open and honest
> in
> > > being open, vulnerable, prone to errors. But we are transparent, we say
> > > that in advance,  we can claim any statement to the smallest detail. Of
> > > course it's difficult, but we can do it. Wikidata, as Lydia said, can
> > > actually have conflicting statements in every item: we "just" have to
> put
> > > them there, as we did to Wikipedia.
> > >
> > > If Google uses our data and they are wrong, that's bad for them. If
> they
> > > correct the errors and do not give us the corrections, that's bad for
> us
> > > and not ethical from them. The point is: there is no license (for what
> I
> > > know) that can force them to contribute to Wikidata. That is, IMHO, the
> > > problem with "over-the-top" actors: they can harness collective
> > intelligent
> > > and "not give back." Even with CC-BY-SA, they could store (as they are
> > > probably already doing) all the data in their knowledge vault, which is
> > > secret as it is an incredible asset for them.
> > >
> > > I'd be happy to insert a new clause of "forced transparency" in
> CC-BY-SA
> > or
> > > CC0, but it's not there.
> > >
> > > So, as we are  working via GLAMs with Wikipedia for getting reliable
> > > sources and content, we are working with them also for good

Re: [Wikimedia-l] Quality issues

2015-12-15 Thread Henning Schlottmann

On 08.12.2015 00:52, Craig Franklin wrote:
> In a database, we are limited to
> saying that Jerusalem either is or is not the capital of Israel.

We are not. Wikidata is a repositum of statements. It can contain both a
statement that Jerusalem is the caital of Israel and another statement
that it is not. In a serious project both would exist and both had other
statements linked to them, stating who thinks so.

Ciao Henning

___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-14 Thread Andreas Kolbe

On Mon, Dec 14, 2015 at 7:46 AM, Pete Forsyth  wrote:

> On Sun, Dec 13, 2015 at 4:02 PM, Andreas Kolbe  wrote:
>
> > But at the same time, Wikidata is supposed to inform the Wikipedias, as a
> > central data repository. This creates a mismatch between Wikidata's
> "early
> > days -- anything goes, let's just get content in, we'll sort it out
> later"
> > attitude and the relatively mature Wikipedias where editors insist on
> > sources for any new content added.
> >
> > This out-of-synch-ness is a real problem if you want Wikipedias to
> actually
> > use Wikidata content. Wikipedians will not accept content generation
> models
> > that take Wikipedia back to its bad old days where you could write
> anything
> > you liked without a source to back it up.
>
>
> Andreas, I think there's an important piece you're missing (or at least not
> explicitly acknlowledging) here.
>
> Very few of the Wikipedias are "relatively mature." To the extent Wikidata
> is meant to help Wikipedia, I believe it is meant to help the less mature
> Wikipedias benefit from the more robust research into sources etc. that
> takes place at the big ones -- and help the big ones notice when they have
> out-of-sync information from one another, and make informed decisions about
> what to do about it.
>
> The analysis you offer here doesn't seem granular enough to capture this,
> and seems to miss the primary value of Wikidata when it comes to Wikipedia.
>
> Thoughts?
> Pete
>


Pete,

Yes, those are good points I missed.

Andreas
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-13 Thread Jane Darnell

Thanks for that essay, Lydia! You said it well, and I especially agree with
what you wrote about trust and believing in ourselves. I had to laugh at
some of the comments, because if you substitute "Wikipedia" for "Wikidata"
those comments could have been written 3 years ago before Wikidata came on
the scene.

On Sat, Dec 12, 2015 at 10:18 PM, Lydia Pintscher <
lydia.pintsc...@wikimedia.de> wrote:

> On Thu, Dec 10, 2015 at 9:27 AM, Lydia Pintscher
>  wrote:
> > That is actually not correct. We have built Wikidata from the very
> > beginning with some core believes. One of them is that Wikidata isn't
> > supposed to have the one truth but instead is able to represent
> > various different points of view and link to sources claiming these.
> > Look for example at the country statements for Jerusalem:
> > https://www.wikidata.org/wiki/Q1218
> > Now I am the first to say that this will not be able to capture the
> > full complexity of the world around us. But that's not what it is
> > meant to do. However please be aware that we have built more than just
> > a dumb database with Wikidata and have gone to great length to make it
> > possible to capture knowledge diversity.
>
> I've taken the time and written a longer piece about data quality and
> knowledge diversity on Wikidata for the current edition of the
> Signpost:
> https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-12-09/Op-ed
>
>
> Cheers
> Lydia
>
> --
> Lydia Pintscher - http://about.me/lydia.pintscher
> Product Manager for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
> Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
>
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-13 Thread Andreas Kolbe

Jane,

The issue is that you can't cite one Wikipedia article as a source in
another. If, as some envisage, you were to fill Wikipedia's infoboxes with
Wikidata content that's unsourced, or sourced only to a Wikipedia, you'd be
doing exactly that, and violating WP:V in the process:

"Do not use articles from Wikipedia as sources. Also, do not use *websites
that mirror Wikipedia content or publications that rely on material from
Wikipedia as sources*." (WP:CIRCULAR)

That includes Wikidata. As long as Wikidata doesn't provide external
sourcing, it's unusable in Wikipedia.

Andreas

On Sun, Dec 13, 2015 at 9:15 AM, Jane Darnell  wrote:

> Thanks for that essay, Lydia! You said it well, and I especially agree with
> what you wrote about trust and believing in ourselves. I had to laugh at
> some of the comments, because if you substitute "Wikipedia" for "Wikidata"
> those comments could have been written 3 years ago before Wikidata came on
> the scene.
>
> On Sat, Dec 12, 2015 at 10:18 PM, Lydia Pintscher <
> lydia.pintsc...@wikimedia.de> wrote:
>
> > On Thu, Dec 10, 2015 at 9:27 AM, Lydia Pintscher
> >  wrote:
> > > That is actually not correct. We have built Wikidata from the very
> > > beginning with some core believes. One of them is that Wikidata isn't
> > > supposed to have the one truth but instead is able to represent
> > > various different points of view and link to sources claiming these.
> > > Look for example at the country statements for Jerusalem:
> > > https://www.wikidata.org/wiki/Q1218
> > > Now I am the first to say that this will not be able to capture the
> > > full complexity of the world around us. But that's not what it is
> > > meant to do. However please be aware that we have built more than just
> > > a dumb database with Wikidata and have gone to great length to make it
> > > possible to capture knowledge diversity.
> >
> > I've taken the time and written a longer piece about data quality and
> > knowledge diversity on Wikidata for the current edition of the
> > Signpost:
> >
> https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-12-09/Op-ed
> >
> >
> > Cheers
> > Lydia
> >
> > --
> > Lydia Pintscher - http://about.me/lydia.pintscher
> > Product Manager for Wikidata
> >
> > Wikimedia Deutschland e.V.
> > Tempelhofer Ufer 23-24
> > 10963 Berlin
> > www.wikimedia.de
> >
> > Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
> >
> > Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> > unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
> > Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
> >
> > ___
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > Wikimedia-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > 
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-13 Thread geni

On 13 December 2015 at 15:57, Andreas Kolbe  wrote:

> Jane,
>
> The issue is that you can't cite one Wikipedia article as a source in
> another.
>


However you can within the same article per [[WP:LEAD]].

-- 
geni
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-13 Thread Andreas Kolbe

On Sun, Dec 13, 2015 at 5:32 PM, geni  wrote:

> On 13 December 2015 at 15:57, Andreas Kolbe  wrote:
>
> > Jane,
> >
> > The issue is that you can't cite one Wikipedia article as a source in
> > another.
> >
>
>
> However you can within the same article per [[WP:LEAD]].
>

Well, of course, if there are reliable sources cited in the body of the
article that back up the statements made in the lead. You still need to
cite a reliable source though; that's Wikipedia 101.
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-13 Thread Gerard Meijssen

Hoi,
Wikidata is not Wikipedia. When it is imported from Wikipedia it often says
so. It does not mean that all the related data is from one Wikipedia and
consequently the composite data is information that may be relevantly
different.

Again you insist on your point of view. If you think that Wikidata is
inferior for the reasons that you give; fine. Never mind, move on.

In the mean time we will continually improve the quality of Wikidata and
when Wikipedians fail to take notice they will find slowly but surely that
the information in Wikidata is increasingly superior in the one area where
it is most obvious: the silly mistakes that come to light when it is not
only one Wikipedia that is the source of data.
Thanks,
 GerardM

On 13 December 2015 at 16:57, Andreas Kolbe  wrote:

> Jane,
>
> The issue is that you can't cite one Wikipedia article as a source in
> another. If, as some envisage, you were to fill Wikipedia's infoboxes with
> Wikidata content that's unsourced, or sourced only to a Wikipedia, you'd be
> doing exactly that, and violating WP:V in the process:
>
> "Do not use articles from Wikipedia as sources. Also, do not use *websites
> that mirror Wikipedia content or publications that rely on material from
> Wikipedia as sources*." (WP:CIRCULAR)
>
> That includes Wikidata. As long as Wikidata doesn't provide external
> sourcing, it's unusable in Wikipedia.
>
> Andreas
>
> On Sun, Dec 13, 2015 at 9:15 AM, Jane Darnell  wrote:
>
> > Thanks for that essay, Lydia! You said it well, and I especially agree
> with
> > what you wrote about trust and believing in ourselves. I had to laugh at
> > some of the comments, because if you substitute "Wikipedia" for
> "Wikidata"
> > those comments could have been written 3 years ago before Wikidata came
> on
> > the scene.
> >
> > On Sat, Dec 12, 2015 at 10:18 PM, Lydia Pintscher <
> > lydia.pintsc...@wikimedia.de> wrote:
> >
> > > On Thu, Dec 10, 2015 at 9:27 AM, Lydia Pintscher
> > >  wrote:
> > > > That is actually not correct. We have built Wikidata from the very
> > > > beginning with some core believes. One of them is that Wikidata isn't
> > > > supposed to have the one truth but instead is able to represent
> > > > various different points of view and link to sources claiming these.
> > > > Look for example at the country statements for Jerusalem:
> > > > https://www.wikidata.org/wiki/Q1218
> > > > Now I am the first to say that this will not be able to capture the
> > > > full complexity of the world around us. But that's not what it is
> > > > meant to do. However please be aware that we have built more than
> just
> > > > a dumb database with Wikidata and have gone to great length to make
> it
> > > > possible to capture knowledge diversity.
> > >
> > > I've taken the time and written a longer piece about data quality and
> > > knowledge diversity on Wikidata for the current edition of the
> > > Signpost:
> > >
> >
> https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-12-09/Op-ed
> > >
> > >
> > > Cheers
> > > Lydia
> > >
> > > --
> > > Lydia Pintscher - http://about.me/lydia.pintscher
> > > Product Manager for Wikidata
> > >
> > > Wikimedia Deutschland e.V.
> > > Tempelhofer Ufer 23-24
> > > 10963 Berlin
> > > www.wikimedia.de
> > >
> > > Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
> > >
> > > Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> > > unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
> > > Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
> > >
> > > ___
> > > Wikimedia-l mailing list, guidelines at:
> > > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > > Wikimedia-l@lists.wikimedia.org
> > > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > > 
> > ___
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > Wikimedia-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > 
> >
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-13 Thread Gerard Meijssen

Hoi,
Thank you for another approach. When Wikidata imports data from Wikipedia,
it essentially stands on the shoulders of giants. Yes, there are sources in
Wikipedia and it does not prevent occasional issues. Yes, we import a lot
of data from Wikipedia and this makes life at Wikidata easy and what we do
obvious. It all started with improving quality at Wikidata by making
interwiki links manageable and we are still often involved in fixing
wikilinks in Wikidata because the assumptions to link some articles are
"funny".

When you look at Wikipedia, a lot of the fixtures are essentially about
data. A category or a list can be replicated in many ways by querying
Wikidata.The inverse is that Wikidata can be populated from Wikipedia.
Consequently when we say that we know about men and women in so many
Wikipedias it is because of this that we can and do. When Wikipedia is
correct, Wikidata is. When Wikipedias do not agree, you will find this
expressed in Wikidata.

When people build tools, bots and they have done so for a long time it is
EXACTLY based on the assumption that Wikipedia is essentially correct and,
it is why the quality and quantity of Wikidata is already this good. When
you want to consider Wikidata and its complexity, it is important to look
at the statistics. The statistics by Magnus are the most relevant because
they help explain many of the issues of Wikidata.

One important point. No Wikipedia can claim Wikidata as it is a composite.
Wikipedia policies do not apply. When people insist that all the data in
Wikidata has to be 100% correct, forget it. Wikipedia is not 00% correct
either and that is what we build upon. It has never been this way and it is
impossible to do this any time soon.

What we can do is build upon existing qualities, compare and curate. It is
for instance fairly easy to improve on Wikipedia based upon the information
that is already there but shown to be problematic. It is easy when we
collaborate as it will improve the quality of what we offer. One problem is
that we are SO bad at collaboration. Wikipedians work on one article at a
time and when I work on awards there are easily 60 persons involved and I
trust Wikipedia to be right. The kind of issues I encounter I blog about
regularly. I am not involved in single items or they have to be of
relevance to me like Bassel, the only Wikipedian sentenced to death. So I
did add new items that exist as red links in the award he received and I
did ask Magnus to help me with a list for the award he received. I added
the website I used on the award and that is as far as I go.

When you want to talk about the issues, what is it that you want to
achieve. So far there has been little interest in Wikidata. When you want
to learn about issues, research the issues. Find methods to calculate the
error rate, find methods to compare Wikidata with the Wikipedias and with
other sources in a meaningful way. But do approach it like Magnus does. His
contributions help us make a positive difference. When you find numbers for
now that you cannot replicate with the next dump and the next, they are
essentially without much value because they do not enable us to improve on
what we have. They do not help us engage our minds to make a difference. I
ask Amir regularly to run a bot based on the statistics produced by Magnus,
we are not at the stage where we have such tasks automated...

Andrea, Wikidata is a wiki. It is young and it has already proven itself
for several applications. What can be done improves as our data improves.
We have a lack of data on many subjects because it is where Wikipedia is
lacking. How will we approach for instance the fact that we have fewer than
1000 Syrians and one of them is an emperor of the Roman empire and another
is Bassel?

Let us be bold and allow us to be a wiki. Let us work towards the quality
that is possible to achieve and do not burden us with the assumptions of
some Wikipedias. When you are serious, get involved.
Thanks,
  GerardM

On 13 December 2015 at 19:10, Andrea Zanni  wrote:

> I really feel we are drowning in a glass of water.
> The issue of "data quality" or "reliability" that Andreas raises is well
> known:
> what I don't understand if the "scale" of it is much bigger on Wikidata
> than Wikipedia,
> and if this different scale makes it much more important. The scale of the
> issue is maybe something worth discussing, and not the issue itself? Is the
> fact that Wikidata is centralised different from statements on Wikipedia? I
> don't know, but to me this is a more neutral and interesting question.
>
> I often say that the Wikimedia world made quality an "heisemberghian"
> feature: you always have to check if it's there.
> The point is: it's been always like this.
> We always had to check for quality, even when we used Britannica or
> authority controls or whatever "reliable" sources we wanted. Wikipedia, and
> now Wikidata, is made for everyone to contribute, it's open

Re: [Wikimedia-l] Quality issues

2015-12-13 Thread Jane Darnell

Andreas,
That's just not true. You can re-use and remix Wikimedia content as much as
you like. When you say you "can't cite one Wikipedia article as a source in
another", this is also not true, as we see this done in translated articles
in the edit summary. Fortunately Wikipedia articles need sources, so those
are translated along with the rest f the content and are perfectly valid to
take from one project to another. In art history, when we are talking about
paintings, we are all mostly talking about the same sources anyway,
worldwide. This is probably true for most other disciplines as well.

As far as citing goes, the ratio of cited vs. uncited statements in
Wikipedia is probably much greater than in Wikidata, except we can't
measure that. All we measure is the "reference" statement, but there are
lots of sources in various properties and my guess is that most items with
zero statements are early imports that have just not had anyone click on
them yet. When we use images in Wikipedia articles, we do not "cite"
Wikimedia Commons. Indeed, this is exactly the problem we have when we talk
to GLAMs about image donations. The link itself is enough to allow the user
with a few clicks to get at the image information on Commons, where there
is more information, including sources. When I as a Wikipedian use images
of paintings from Commons in a Wikipedia article, I am using multiple
sources for that article, but some of those sources may be from the Commons
image itself, as some of these are particularly well-sourced. When I am
updating the associated Wikidata item, I add all of the sources that I have
found, and for the more famous paintings, others add links from their own
sources, making Wikidata much richer as a source of references than any
single project. As Lydia explained however, not every individual statement
in Wikidata is sourced, though each item may be sourced to multiple
references. This is partially because we lack the tools to easily source
each statement when we update multiple statements at a time, but it is also
because we don't *need* to source obvious statements.

The point is, that publishing on any Wikmedia project, whether it's
Wikipedia, Wikimedia Commons, or Wikidata, is a manually-driven complex
process done by volunteers. It is not and never will be automatic.

Jane

On Sun, Dec 13, 2015 at 4:57 PM, Andreas Kolbe  wrote:

> Jane,
>
> The issue is that you can't cite one Wikipedia article as a source in
> another. If, as some envisage, you were to fill Wikipedia's infoboxes with
> Wikidata content that's unsourced, or sourced only to a Wikipedia, you'd be
> doing exactly that, and violating WP:V in the process:
>
> "Do not use articles from Wikipedia as sources. Also, do not use *websites
> that mirror Wikipedia content or publications that rely on material from
> Wikipedia as sources*." (WP:CIRCULAR)
>
> That includes Wikidata. As long as Wikidata doesn't provide external
> sourcing, it's unusable in Wikipedia.
>
> Andreas
>
> On Sun, Dec 13, 2015 at 9:15 AM, Jane Darnell  wrote:
>
> > Thanks for that essay, Lydia! You said it well, and I especially agree
> with
> > what you wrote about trust and believing in ourselves. I had to laugh at
> > some of the comments, because if you substitute "Wikipedia" for
> "Wikidata"
> > those comments could have been written 3 years ago before Wikidata came
> on
> > the scene.
> >
> > On Sat, Dec 12, 2015 at 10:18 PM, Lydia Pintscher <
> > lydia.pintsc...@wikimedia.de> wrote:
> >
> > > On Thu, Dec 10, 2015 at 9:27 AM, Lydia Pintscher
> > >  wrote:
> > > > That is actually not correct. We have built Wikidata from the very
> > > > beginning with some core believes. One of them is that Wikidata isn't
> > > > supposed to have the one truth but instead is able to represent
> > > > various different points of view and link to sources claiming these.
> > > > Look for example at the country statements for Jerusalem:
> > > > https://www.wikidata.org/wiki/Q1218
> > > > Now I am the first to say that this will not be able to capture the
> > > > full complexity of the world around us. But that's not what it is
> > > > meant to do. However please be aware that we have built more than
> just
> > > > a dumb database with Wikidata and have gone to great length to make
> it
> > > > possible to capture knowledge diversity.
> > >
> > > I've taken the time and written a longer piece about data quality and
> > > knowledge diversity on Wikidata for the current edition of the
> > > Signpost:
> > >
> >
> https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-12-09/Op-ed
> > >
> > >
> > > Cheers
> > > Lydia
> > >
> > > --
> > > Lydia Pintscher - http://about.me/lydia.pintscher
> > > Product Manager for Wikidata
> > >
> > > Wikimedia Deutschland e.V.
> > > Tempelhofer Ufer 23-24
> > > 10963 Berlin
> > > www.wikimedia.de
> > >
> > > Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens

Re: [Wikimedia-l] Quality issues

2015-12-13 Thread Jane Darnell

Andrea,
I totally agree on the mission/vision thing, but am not sure what you mean
exactly by scale - do you mean that Wikidata shouldn't try to be so
granular that it has a statement to cover each factoid in any Wikipedia
article, or do you mean we need to talk about what constitutes notability
in order not to grow Wikidata exponentially to the point the servers crash?
Jane

On Sun, Dec 13, 2015 at 7:10 PM, Andrea Zanni 
wrote:

> I really feel we are drowning in a glass of water.
> The issue of "data quality" or "reliability" that Andreas raises is well
> known:
> what I don't understand if the "scale" of it is much bigger on Wikidata
> than Wikipedia,
> and if this different scale makes it much more important. The scale of the
> issue is maybe something worth discussing, and not the issue itself? Is the
> fact that Wikidata is centralised different from statements on Wikipedia? I
> don't know, but to me this is a more neutral and interesting question.
>
> I often say that the Wikimedia world made quality an "heisemberghian"
> feature: you always have to check if it's there.
> The point is: it's been always like this.
> We always had to check for quality, even when we used Britannica or
> authority controls or whatever "reliable" sources we wanted. Wikipedia, and
> now Wikidata, is made for everyone to contribute, it's open and honest in
> being open, vulnerable, prone to errors. But we are transparent, we say
> that in advance,  we can claim any statement to the smallest detail. Of
> course it's difficult, but we can do it. Wikidata, as Lydia said, can
> actually have conflicting statements in every item: we "just" have to put
> them there, as we did to Wikipedia.
>
> If Google uses our data and they are wrong, that's bad for them. If they
> correct the errors and do not give us the corrections, that's bad for us
> and not ethical from them. The point is: there is no license (for what I
> know) that can force them to contribute to Wikidata. That is, IMHO, the
> problem with "over-the-top" actors: they can harness collective intelligent
> and "not give back." Even with CC-BY-SA, they could store (as they are
> probably already doing) all the data in their knowledge vault, which is
> secret as it is an incredible asset for them.
>
> I'd be happy to insert a new clause of "forced transparency" in CC-BY-SA or
> CC0, but it's not there.
>
> So, as we are  working via GLAMs with Wikipedia for getting reliable
> sources and content, we are working with them also for good statements and
> data. Putting good data in Wikidata makes it better, and I don't understand
> what is the problem here (I understand, again, the issue of putting too
> much data and still having a small community).
> For example: if we are importing different reliable databases, andthe
> institutions behind them find it useful and helpful to have an aggregator
> of identifiers and authority controls, what is the issue? There is value in
> aggregating data, because you can spot errors and inconsistencies. It's not
> easy, of course, to find a good workflow, but, again, that is *another*
> problem.
>
> So, in conclusion: I find many issues in Wikidata, but not on the
> mission/vision, just in the complexity of the project, the size of the
> dataset, the size of the community.
>
> Can we talk about those?
>
> Aubrey
>
>
>
> On Sun, Dec 13, 2015 at 6:40 PM, Andreas Kolbe  wrote:
>
> > On Sun, Dec 13, 2015 at 5:32 PM, geni  wrote:
> >
> > > On 13 December 2015 at 15:57, Andreas Kolbe 
> wrote:
> > >
> > > > Jane,
> > > >
> > > > The issue is that you can't cite one Wikipedia article as a source in
> > > > another.
> > > >
> > >
> > >
> > > However you can within the same article per [[WP:LEAD]].
> > >
> >
> >
> > Well, of course, if there are reliable sources cited in the body of the
> > article that back up the statements made in the lead. You still need to
> > cite a reliable source though; that's Wikipedia 101.
> > ___
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > Wikimedia-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > 
> >
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-13 Thread Andrea Zanni

I really feel we are drowning in a glass of water.
The issue of "data quality" or "reliability" that Andreas raises is well
known:
what I don't understand if the "scale" of it is much bigger on Wikidata
than Wikipedia,
and if this different scale makes it much more important. The scale of the
issue is maybe something worth discussing, and not the issue itself? Is the
fact that Wikidata is centralised different from statements on Wikipedia? I
don't know, but to me this is a more neutral and interesting question.

I often say that the Wikimedia world made quality an "heisemberghian"
feature: you always have to check if it's there.
The point is: it's been always like this.
We always had to check for quality, even when we used Britannica or
authority controls or whatever "reliable" sources we wanted. Wikipedia, and
now Wikidata, is made for everyone to contribute, it's open and honest in
being open, vulnerable, prone to errors. But we are transparent, we say
that in advance,  we can claim any statement to the smallest detail. Of
course it's difficult, but we can do it. Wikidata, as Lydia said, can
actually have conflicting statements in every item: we "just" have to put
them there, as we did to Wikipedia.

If Google uses our data and they are wrong, that's bad for them. If they
correct the errors and do not give us the corrections, that's bad for us
and not ethical from them. The point is: there is no license (for what I
know) that can force them to contribute to Wikidata. That is, IMHO, the
problem with "over-the-top" actors: they can harness collective intelligent
and "not give back." Even with CC-BY-SA, they could store (as they are
probably already doing) all the data in their knowledge vault, which is
secret as it is an incredible asset for them.

I'd be happy to insert a new clause of "forced transparency" in CC-BY-SA or
CC0, but it's not there.

So, as we are  working via GLAMs with Wikipedia for getting reliable
sources and content, we are working with them also for good statements and
data. Putting good data in Wikidata makes it better, and I don't understand
what is the problem here (I understand, again, the issue of putting too
much data and still having a small community).
For example: if we are importing different reliable databases, andthe
institutions behind them find it useful and helpful to have an aggregator
of identifiers and authority controls, what is the issue? There is value in
aggregating data, because you can spot errors and inconsistencies. It's not
easy, of course, to find a good workflow, but, again, that is *another*
problem.

So, in conclusion: I find many issues in Wikidata, but not on the
mission/vision, just in the complexity of the project, the size of the
dataset, the size of the community.

Can we talk about those?

Aubrey

On Sun, Dec 13, 2015 at 6:40 PM, Andreas Kolbe  wrote:

> On Sun, Dec 13, 2015 at 5:32 PM, geni  wrote:
>
> > On 13 December 2015 at 15:57, Andreas Kolbe  wrote:
> >
> > > Jane,
> > >
> > > The issue is that you can't cite one Wikipedia article as a source in
> > > another.
> > >
> >
> >
> > However you can within the same article per [[WP:LEAD]].
> >
>
>
> Well, of course, if there are reliable sources cited in the body of the
> article that back up the statements made in the lead. You still need to
> cite a reliable source though; that's Wikipedia 101.
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-13 Thread Gnangarra

this is the issue in quality


> If Google uses our data and they are wrong, that's bad for them.



Under CC) license when Google uses the information they dont need to
attribute Wikidata, if that "wrong" data came from WD --> google ---> news
source ---> WP  not only has it been washed its now become a sourced fact
in Wikipedia and there is no way to trace its orgins to WD... even if WD is
changed to another source its unlike to be corrected in the rest of the
chain, the whole WMF community have corrupted the data that is something we
should be very concerned about.


On 14 December 2015 at 02:10, Andrea Zanni  wrote:

> I really feel we are drowning in a glass of water.
> The issue of "data quality" or "reliability" that Andreas raises is well
> known:
> what I don't understand if the "scale" of it is much bigger on Wikidata
> than Wikipedia,
> and if this different scale makes it much more important. The scale of the
> issue is maybe something worth discussing, and not the issue itself? Is the
> fact that Wikidata is centralised different from statements on Wikipedia? I
> don't know, but to me this is a more neutral and interesting question.
>
> I often say that the Wikimedia world made quality an "heisemberghian"
> feature: you always have to check if it's there.
> The point is: it's been always like this.
> We always had to check for quality, even when we used Britannica or
> authority controls or whatever "reliable" sources we wanted. Wikipedia, and
> now Wikidata, is made for everyone to contribute, it's open and honest in
> being open, vulnerable, prone to errors. But we are transparent, we say
> that in advance,  we can claim any statement to the smallest detail. Of
> course it's difficult, but we can do it. Wikidata, as Lydia said, can
> actually have conflicting statements in every item: we "just" have to put
> them there, as we did to Wikipedia.
>
> If Google uses our data and they are wrong, that's bad for them. If they
> correct the errors and do not give us the corrections, that's bad for us
> and not ethical from them. The point is: there is no license (for what I
> know) that can force them to contribute to Wikidata. That is, IMHO, the
> problem with "over-the-top" actors: they can harness collective intelligent
> and "not give back." Even with CC-BY-SA, they could store (as they are
> probably already doing) all the data in their knowledge vault, which is
> secret as it is an incredible asset for them.
>
> I'd be happy to insert a new clause of "forced transparency" in CC-BY-SA or
> CC0, but it's not there.
>
> So, as we are  working via GLAMs with Wikipedia for getting reliable
> sources and content, we are working with them also for good statements and
> data. Putting good data in Wikidata makes it better, and I don't understand
> what is the problem here (I understand, again, the issue of putting too
> much data and still having a small community).
> For example: if we are importing different reliable databases, andthe
> institutions behind them find it useful and helpful to have an aggregator
> of identifiers and authority controls, what is the issue? There is value in
> aggregating data, because you can spot errors and inconsistencies. It's not
> easy, of course, to find a good workflow, but, again, that is *another*
> problem.
>
> So, in conclusion: I find many issues in Wikidata, but not on the
> mission/vision, just in the complexity of the project, the size of the
> dataset, the size of the community.
>
> Can we talk about those?
>
> Aubrey
>
>
>
> On Sun, Dec 13, 2015 at 6:40 PM, Andreas Kolbe  wrote:
>
> > On Sun, Dec 13, 2015 at 5:32 PM, geni  wrote:
> >
> > > On 13 December 2015 at 15:57, Andreas Kolbe 
> wrote:
> > >
> > > > Jane,
> > > >
> > > > The issue is that you can't cite one Wikipedia article as a source in
> > > > another.
> > > >
> > >
> > >
> > > However you can within the same article per [[WP:LEAD]].
> > >
> >
> >
> > Well, of course, if there are reliable sources cited in the body of the
> > article that back up the statements made in the lead. You still need to
> > cite a reliable source though; that's Wikipedia 101.
> > ___
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > Wikimedia-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > 
> >
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>



-- 
GN.
President Wikimedia Australia
WMAU:

Re: [Wikimedia-l] Quality issues

2015-12-13 Thread Andreas Kolbe

On Sun, Dec 13, 2015 at 6:10 PM, Andrea Zanni 
wrote:

> I really feel we are drowning in a glass of water.
> The issue of "data quality" or "reliability" that Andreas raises is well
> known:
> what I don't understand if the "scale" of it is much bigger on Wikidata
> than Wikipedia,
> and if this different scale makes it much more important. The scale of the
> issue is maybe something worth discussing, and not the issue itself? Is the
> fact that Wikidata is centralised different from statements on Wikipedia? I
> don't know, but to me this is a more neutral and interesting question.
>

Wikidata's (envisaged) centralised nature certainly makes a difference,
because the promise was that it would inform the Wikipedias.

Wikipedia started out with people just writing from their personal
knowledge. The early articles had no footnotes. Then after a while people
noticed problems like cranks filling pages with their abstruse theories
(hence the ban on original research), people adding material from their
blogs, etc. Over the course of a decade, Wikipedia developed the idea and
the culture that you have to cite a professionally published source for
everything you add to Wikipedia.

Wikidata is in its early stages. In a way it really is like Wikipedia in
2003. New content welcome! No references required!

But at the same time, Wikidata is supposed to inform the Wikipedias, as a
central data repository. This creates a mismatch between Wikidata's "early
days -- anything goes, let's just get content in, we'll sort it out later"
attitude and the relatively mature Wikipedias where editors insist on
sources for any new content added.

This out-of-synch-ness is a real problem if you want Wikipedias to actually
use Wikidata content. Wikipedians will not accept content generation models
that take Wikipedia back to its bad old days where you could write anything
you liked without a source to back it up.

Wikipedia is of course still a long way away from citing such sources for
all its content. There are vast amounts of legacy material left over from
the early days. But in the pages that are being created now (like
developing news stories, an area where the quality of Wikipedia's coverage
is often praised), pages that see a lot of traffic, pages that are
controversial, etc., it is well established that you have to cite sources
for any new assertions.

Unsourced content is unceremoniously deleted.

If Wikipedia's reputation for reliability has improved since 2003, that
change in culture from the early days is the reason.

The Age for example published an article the other day that is probably one
of the most celebratory articles ever written about Wikipedia.[1] If you're
a Wikipedian, you'll probably enjoy reading it.

Among the aspects that the author, Elizabeth Farrelly, said she liked most
about Wikipedia was "its ruthless commitment to the printed, demonstrable
source." She ended the article as follows:

---o0o---

But most interesting to me is the ban on primary research. The demand that
every input be traced to a published and authoritative source doesn't make
it true, necessarily, but does enable genuine crowd-sourcing of
scholarship. This is a revelation, and a revolution.

So yes, Wikipedia is flawed. Above all, it needs more female input. But the
obvious response, for you-and-me users who encounter something stupid or
biased or just plain wrong, is to hop in there and fix it. I'll see you
there, yes? Oh, and honey? Cite away!

---o0o---

Abandoning the principles that have elicited such praise -- traceability to
published sources, verifiable citations -- is not something Wikipedians
will entertain. To them, it would be a step back. If Wikidata wants to be
an input to Wikipedia, it will have to bear that in mind.

[1]
http://www.theage.com.au/comment/why-wikipedia-at-15-is-a-beautiful-exercise-in-scholarly-excellence-20151209-glj79f.html

> I often say that the Wikimedia world made quality an "heisemberghian"
> feature: you always have to check if it's there.
> The point is: it's been always like this.
> We always had to check for quality, even when we used Britannica or
> authority controls or whatever "reliable" sources we wanted. Wikipedia, and
> now Wikidata, is made for everyone to contribute, it's open and honest in
> being open, vulnerable, prone to errors. But we are transparent, we say
> that in advance,  we can claim any statement to the smallest detail. Of
> course it's difficult, but we can do it. Wikidata, as Lydia said, can
> actually have conflicting statements in every item: we "just" have to put
> them there, as we did to Wikipedia.
>
> If Google uses our data and they are wrong, that's bad for them. If they
> correct the errors and do not give us the corrections, that's bad for us
> and not ethical from them. The point is: there is no license (for what I
> know) that can force them to contribute to Wikidata. That is, IMHO, the
> problem with "over-the-top" actors: they

Re: [Wikimedia-l] Quality issues

2015-12-13 Thread Pete Forsyth

On Sun, Dec 13, 2015 at 4:02 PM, Andreas Kolbe  wrote:

> But at the same time, Wikidata is supposed to inform the Wikipedias, as a
> central data repository. This creates a mismatch between Wikidata's "early
> days -- anything goes, let's just get content in, we'll sort it out later"
> attitude and the relatively mature Wikipedias where editors insist on
> sources for any new content added.
>
> This out-of-synch-ness is a real problem if you want Wikipedias to actually
> use Wikidata content. Wikipedians will not accept content generation models
> that take Wikipedia back to its bad old days where you could write anything
> you liked without a source to back it up.

Andreas, I think there's an important piece you're missing (or at least not
explicitly acknlowledging) here.

Very few of the Wikipedias are "relatively mature." To the extent Wikidata
is meant to help Wikipedia, I believe it is meant to help the less mature
Wikipedias benefit from the more robust research into sources etc. that
takes place at the big ones -- and help the big ones notice when they have
out-of-sync information from one another, and make informed decisions about
what to do about it.

The analysis you offer here doesn't seem granular enough to capture this,
and seems to miss the primary value of Wikidata when it comes to Wikipedia.

Thoughts?
Pete
--
[[User:Peteforsyth]]
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-13 Thread Gerard Meijssen

Hoi,

When an error exists in Wikidata, I can change it. When an error exists in
Wikipedia I may change it. When an error exists in the Google info thingie,
I can report it and, they DO change it.

What we can do and should do is provide a two way channel to compare issues
and work on improving the data. There is a reason to be concerned but it is
not that data necessarily will always be wrong because Wikipedia or
Wikidata or whoever said so. If anything it is in our attitude, I just
found that one red link in the French Wikipedia could be a blue link. Do I
need to remedy this or do we have ways to communicate/flag this. As long as
we do not consider such workflows, you depend on the whim of people who see
issues to improve it. I do understand sufficient French but what if it is
in Farsi?
Thanks,
  GerardM

On 14 December 2015 at 01:10, Gnangarra  wrote:

> this is the issue in quality
>
> 
> > If Google uses our data and they are wrong, that's bad for them.
>
>
>
> Under CC) license when Google uses the information they dont need to
> attribute Wikidata, if that "wrong" data came from WD --> google ---> news
> source ---> WP  not only has it been washed its now become a sourced fact
> in Wikipedia and there is no way to trace its orgins to WD... even if WD is
> changed to another source its unlike to be corrected in the rest of the
> chain, the whole WMF community have corrupted the data that is something we
> should be very concerned about.
>
>
> On 14 December 2015 at 02:10, Andrea Zanni 
> wrote:
>
> > I really feel we are drowning in a glass of water.
> > The issue of "data quality" or "reliability" that Andreas raises is well
> > known:
> > what I don't understand if the "scale" of it is much bigger on Wikidata
> > than Wikipedia,
> > and if this different scale makes it much more important. The scale of
> the
> > issue is maybe something worth discussing, and not the issue itself? Is
> the
> > fact that Wikidata is centralised different from statements on
> Wikipedia? I
> > don't know, but to me this is a more neutral and interesting question.
> >
> > I often say that the Wikimedia world made quality an "heisemberghian"
> > feature: you always have to check if it's there.
> > The point is: it's been always like this.
> > We always had to check for quality, even when we used Britannica or
> > authority controls or whatever "reliable" sources we wanted. Wikipedia,
> and
> > now Wikidata, is made for everyone to contribute, it's open and honest in
> > being open, vulnerable, prone to errors. But we are transparent, we say
> > that in advance,  we can claim any statement to the smallest detail. Of
> > course it's difficult, but we can do it. Wikidata, as Lydia said, can
> > actually have conflicting statements in every item: we "just" have to put
> > them there, as we did to Wikipedia.
> >
> > If Google uses our data and they are wrong, that's bad for them. If they
> > correct the errors and do not give us the corrections, that's bad for us
> > and not ethical from them. The point is: there is no license (for what I
> > know) that can force them to contribute to Wikidata. That is, IMHO, the
> > problem with "over-the-top" actors: they can harness collective
> intelligent
> > and "not give back." Even with CC-BY-SA, they could store (as they are
> > probably already doing) all the data in their knowledge vault, which is
> > secret as it is an incredible asset for them.
> >
> > I'd be happy to insert a new clause of "forced transparency" in CC-BY-SA
> or
> > CC0, but it's not there.
> >
> > So, as we are  working via GLAMs with Wikipedia for getting reliable
> > sources and content, we are working with them also for good statements
> and
> > data. Putting good data in Wikidata makes it better, and I don't
> understand
> > what is the problem here (I understand, again, the issue of putting too
> > much data and still having a small community).
> > For example: if we are importing different reliable databases, andthe
> > institutions behind them find it useful and helpful to have an aggregator
> > of identifiers and authority controls, what is the issue? There is value
> in
> > aggregating data, because you can spot errors and inconsistencies. It's
> not
> > easy, of course, to find a good workflow, but, again, that is *another*
> > problem.
> >
> > So, in conclusion: I find many issues in Wikidata, but not on the
> > mission/vision, just in the complexity of the project, the size of the
> > dataset, the size of the community.
> >
> > Can we talk about those?
> >
> > Aubrey
> >
> >
> >
> > On Sun, Dec 13, 2015 at 6:40 PM, Andreas Kolbe 
> wrote:
> >
> > > On Sun, Dec 13, 2015 at 5:32 PM, geni  wrote:
> > >
> > > > On 13 December 2015 at 15:57, Andreas Kolbe 
> > wrote:
> > > >
> > > > > Jane,
> > > > >
> > > > > The issue is that you can't cite one Wikipedia article as a source
> in
> > > > >

Re: [Wikimedia-l] Quality issues

2015-12-12 Thread Gerard Meijssen

Andreas,

Why is it that Denny is to answer on your terms and why is it that you have
not addressed any of the points I made on quality, Moreover you deny his
argument because YOU are not willing to acknowledge his point and thereby
making him out for a liar.

You have not acknowledged that Wikidata is a wiki and you do not appreciate
its implications. You are told that your notion of quality has the least
operational value in Wikidata. You have been told repeatedly why and how
considering these other definitions of quality contribute to improved
quality and participation and it is as if this is of total irrelevance.
This all means nothing to you because you do not care, you are
intentionally not involved. You are like a pharisee in the temple.

I have heard it said several times now that your attitude is the same as
the ones mocking Wikipedia when it was young. Given that you stand for
Wikipedia Signpost, you degrade the appreciation of the English Wikipedia
considerably because you seem to be arguing the anti thesis of the wiki
concept,

Get a live.
Thanks,
  GerardM

On 12 December 2015 at 07:01, Andreas Kolbe  wrote:

> Denny,
>
>
> I quoted your statement verbatim and in full in the op-ed. Moreover, your
> statement had a context. Alexrk2 had said,[1]
>
>
>
> ---o0o---
>
> Read the above.. at least under European Union law databases are protected
> by copyright. CC0 won't be compatible with other projects like
> OpenStreetMap *or Wikipedia*. This means a CC0-WikiData won't be
> allowed to *import
> content from Wikipedia*, OpenStreetMap or any other share-alike data
> source. The worst case IMO would be if WikiData *extracts content out of
> Wikipedia and release it as CC0*. Under EU law this would be illegal. As a
> contributor in DE Wikipedia I would feel like being expropriated somehow.
> This is not acceptable! --Alexrk2 (talk) 15:32, 16 June 2012 (UTC)
>
> ---o0o---
>
>
>
> Note Alexrk2's three (3) specific references to Wikipedia.
>
> Alexrk2 referred to imports of content from Wikipedia, and how it would
> make her or him feel expropriated if WikiData extracted content out of
> Wikipedia and released it under CC0.
>
> You replied,
>
>
>
> ---o0o---
>
> Alexrk2, it is true that Wikidata under CC0 would not be allowed to import
> content from a Share-Alike data source. *Wikidata does not plan to extract
> content out of Wikipedia at all*. Wikidata will *provide *data that can be
> reused in the Wikipedias. And a CC0 source can be used by a Share-Alike
> project, be it either Wikipedia or OSM. But not the other way around. Do we
> agree on this understanding? --Denny Vrandečić (WMDE) (talk) 12:39, 4 July
> 2012 (UTC)
>
> ---o0o---
>
>
>
> Alexrk2 specifically mentioned Wikipedia. So did you in your reply,
> assuring Alexrk2 that Wikidata did not in fact plan to extract content out
> of Wikipedia at all. Does this lend itself to the interpretation that you
> were talking only about databases, and not about Wikipedia?
>
> Alexrk2 then replied to you,
>
>
>
> ---o0o---
>
> @Denny Vrandečić: I agree. But I thought, the aim (or *one* aim) of
> WikiData would be to *draw all the data out of Wikipedia (infoboxes and
> such things)*.
>
> ---o0o---
>
>
>
> You did not respond to that post, or participate further in that section.
> And these bot imports of Wikipedia infobox contents etc. have happened and
> are ongoing. They have been mentioned in many discussions. There are
> millions of statements in Wikidata that are cited to Wikipedia.
>
> Just a few days ago, Jheald said on Project Chat,[2]
>
>
>
> ---o0o---
>
> But my own view is that we should very definitely be trying, as urgently as
> possible, to *capture as much as possible of the huge amount of data in
> infoboxes, templates, categorisations, etc on Wikipedia that is not yet in
> Wikidata* -- and that (at least in most subject areas) calls to restrict to
> only data from independent external sources are utterly utterly misguided,
> and typically bear no relation to either what is desirable, what is
> available, or what is still needed in order to utilise such sources
> effectively. Jheald (talk) 23:49, 8 December 2015 (UTC)
>
> ---o0o---
>
>
>
> It's not plausible to my understanding to argue that Wikipedia's templates,
> infoboxes etc. are not "data sources" when contributors speak of capturing
> "the huge amount of data" contained in them. Much of the existing content
> of Wikidata consists of data extracted from Wikipedias.
>
> If you feel I have misquoted you anywhere on-wiki, please point me to the
> corresponding place (here or via my talk page in that project), and I will
> do whatever is necessary.
>
>
>
> [1]
>
> https://meta.wikimedia.org/wiki/Talk:Wikidata#Is_CC_the_right_license_for_data.3F
> [2]
>
> https://www.wikidata.org/w/index.php?title=Wikidata:Project_chat=281930638=281906226
>
>
>
> On Sat, Dec 12, 2015 at 12:05 AM, Denny Vrandečić 
> wrote:
>
> > On Thu, Dec 10, 2015 at 4:18 AM

Re: [Wikimedia-l] Quality issues

2015-12-12 Thread Lydia Pintscher

On Thu, Dec 10, 2015 at 9:27 AM, Lydia Pintscher
 wrote:
> That is actually not correct. We have built Wikidata from the very
> beginning with some core believes. One of them is that Wikidata isn't
> supposed to have the one truth but instead is able to represent
> various different points of view and link to sources claiming these.
> Look for example at the country statements for Jerusalem:
> https://www.wikidata.org/wiki/Q1218
> Now I am the first to say that this will not be able to capture the
> full complexity of the world around us. But that's not what it is
> meant to do. However please be aware that we have built more than just
> a dumb database with Wikidata and have gone to great length to make it
> possible to capture knowledge diversity.

I've taken the time and written a longer piece about data quality and
knowledge diversity on Wikidata for the current edition of the
Signpost: 
https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-12-09/Op-ed


Cheers
Lydia

-- 
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-11 Thread Denny Vrandečić

On Thu, Dec 10, 2015 at 4:18 AM Andreas Kolbe  wrote:

> According to Denny, Wikidata, under its CC0 licence, must not import data
> from Share-Alike sources. He reconfirmed this yesterday when I asked him
> whether he still stood by that.
>
> In practice though we have Wikidata importing massive amounts of data from
> Wikipedia, which was a Share-Alike source last time I looked. Isn't
> Wikidata then infringing Wikipedia contributors' rights?
>
> Why is it okay to import data from the CC BY-SA Wikipedia, but not from
> European CC BY-SA population statistics?
>
>
Andreas, what I said was that Wikidata must not import data from a data
source licensed under Share-Alike date source.

The important thing that differentiates what I said from what you think I
said is "import data from a data source". Wikipedia is not a data source,
but text. Extracting facts or data from a text is a very different thing
than taking data from one place and put it in another place. There was no
database that contains the content of Wikipedia and that can be queried.
Indeed, that is the whole reason why Wikidata has been started in the first
place.

In fact, extracting facts or data from one text and then writing a
Wikipedia article is what Wikipedians do all the time, and the license of
the original text we read has no effect on the license of the output text.

So, there is no such thing as an import of data from Wikipedia, because
Wikipedia is not a database.

I have repeatedly pointed you to
   http://simia.net/wiki/Free_data
and you yourself have repeatedly pointed to
   https://meta.wikimedia.org/wiki/Wikilegal/Database_Rights
so I would assume that you would have by now read these and developed an
understanding of these issues. I am not a lawyer, and my understanding of
these issues is also lacking, but I wanted at least to point out that you
are misquoting me.

Please, would you mind to correct your misquoting of me in the places where
you did so, or at least point to this email for further context?
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-11 Thread Andreas Kolbe

Denny,

I quoted your statement verbatim and in full in the op-ed. Moreover, your
statement had a context. Alexrk2 had said,[1]

---o0o---

Read the above.. at least under European Union law databases are protected
by copyright. CC0 won't be compatible with other projects like
OpenStreetMap *or Wikipedia*. This means a CC0-WikiData won't be
allowed to *import
content from Wikipedia*, OpenStreetMap or any other share-alike data
source. The worst case IMO would be if WikiData *extracts content out of
Wikipedia and release it as CC0*. Under EU law this would be illegal. As a
contributor in DE Wikipedia I would feel like being expropriated somehow.
This is not acceptable! --Alexrk2 (talk) 15:32, 16 June 2012 (UTC)

---o0o---

Note Alexrk2's three (3) specific references to Wikipedia.

Alexrk2 referred to imports of content from Wikipedia, and how it would
make her or him feel expropriated if WikiData extracted content out of
Wikipedia and released it under CC0.

You replied,

---o0o---

Alexrk2, it is true that Wikidata under CC0 would not be allowed to import
content from a Share-Alike data source. *Wikidata does not plan to extract
content out of Wikipedia at all*. Wikidata will *provide *data that can be
reused in the Wikipedias. And a CC0 source can be used by a Share-Alike
project, be it either Wikipedia or OSM. But not the other way around. Do we
agree on this understanding? --Denny Vrandečić (WMDE) (talk) 12:39, 4 July
2012 (UTC)

---o0o---

Alexrk2 specifically mentioned Wikipedia. So did you in your reply,
assuring Alexrk2 that Wikidata did not in fact plan to extract content out
of Wikipedia at all. Does this lend itself to the interpretation that you
were talking only about databases, and not about Wikipedia?

Alexrk2 then replied to you,

---o0o---

@Denny Vrandečić: I agree. But I thought, the aim (or *one* aim) of
WikiData would be to *draw all the data out of Wikipedia (infoboxes and
such things)*.

---o0o---

You did not respond to that post, or participate further in that section.
And these bot imports of Wikipedia infobox contents etc. have happened and
are ongoing. They have been mentioned in many discussions. There are
millions of statements in Wikidata that are cited to Wikipedia.

Just a few days ago, Jheald said on Project Chat,[2]

---o0o---

But my own view is that we should very definitely be trying, as urgently as
possible, to *capture as much as possible of the huge amount of data in
infoboxes, templates, categorisations, etc on Wikipedia that is not yet in
Wikidata* -- and that (at least in most subject areas) calls to restrict to
only data from independent external sources are utterly utterly misguided,
and typically bear no relation to either what is desirable, what is
available, or what is still needed in order to utilise such sources
effectively. Jheald (talk) 23:49, 8 December 2015 (UTC)

---o0o---

It's not plausible to my understanding to argue that Wikipedia's templates,
infoboxes etc. are not "data sources" when contributors speak of capturing
"the huge amount of data" contained in them. Much of the existing content
of Wikidata consists of data extracted from Wikipedias.

If you feel I have misquoted you anywhere on-wiki, please point me to the
corresponding place (here or via my talk page in that project), and I will
do whatever is necessary.

[1]
https://meta.wikimedia.org/wiki/Talk:Wikidata#Is_CC_the_right_license_for_data.3F
[2]
https://www.wikidata.org/w/index.php?title=Wikidata:Project_chat=281930638=281906226

On Sat, Dec 12, 2015 at 12:05 AM, Denny Vrandečić 
wrote:

> On Thu, Dec 10, 2015 at 4:18 AM Andreas Kolbe  wrote:
>
> > According to Denny, Wikidata, under its CC0 licence, must not import data
> > from Share-Alike sources. He reconfirmed this yesterday when I asked him
> > whether he still stood by that.
> >
> > In practice though we have Wikidata importing massive amounts of data
> from
> > Wikipedia, which was a Share-Alike source last time I looked. Isn't
> > Wikidata then infringing Wikipedia contributors' rights?
> >
> > Why is it okay to import data from the CC BY-SA Wikipedia, but not from
> > European CC BY-SA population statistics?
> >
> >
> Andreas, what I said was that Wikidata must not import data from a data
> source licensed under Share-Alike date source.
>
> The important thing that differentiates what I said from what you think I
> said is "import data from a data source". Wikipedia is not a data source,
> but text. Extracting facts or data from a text is a very different thing
> than taking data from one place and put it in another place. There was no
> database that contains the content of Wikipedia and that can be queried.
> Indeed, that is the whole reason why Wikidata has been started in the first
> place.
>
> In fact, extracting facts or data from one text and then writing a
> Wikipedia article is what Wikipedians do all the time, and the license of
> the original text we

Re: [Wikimedia-l] Quality issues

2015-12-10 Thread Gnangarra

I agree getting bogged down on one item of data isnt helpful but the data
does need to show its disputed and the data item on Israel
 should at least have Tel Aviv listed
as its mentonym


within the database because the data base
> is applying one truth where there is no one truth for everyone. This will
> always be the single biggest flaw of Wikidata no matter how data is
> presented it can never be the absolute truth

The Jerusalem/Israel example where the data doesnt indicate its disputed
means that it will propagated as an absolute truth...


Then again this is shifting away from the original concern over quality
that the ability to verify the information  isnt clear combined with the
CC0 license the already established practice on other sources. Wikidata for
falsehoods being easily manipulated its going to have a impact.

On 10 December 2015 at 16:44, Jane Darnell  wrote:

> Amen to that! This discussion about Jerusalem reminds me of the discussion
> we had about the nationality of Anne Frank. For those interested, there
> have been some heated debates about whether Mobile should use the text in
> Wikidata "label descriptions" or rather some basic presentation of the P31
> property. Most descriptions are still blank anyway. Personally I think
> texts such as "capital of Israel" or "holocaust victim" are both better
> than blank, but many disagree with me.
>
> Both of these represent associated items that have a lot of eyes on them,
> but what about our more obscure items? Lots of these may be improved by the
> people who originally created a Wikipedia page for them. As a Wikipedia
> editor who has created over 2000 Wikipedia pages, I feel somewhat dismayed
> at the idea that I need to walk through this long list and add statements
> to their Wikidata items as the responsible party who introduced them to the
> Wikiverse in the first place. But if I had a gadget that would tell me
> which of my created Wikipedia articles had 0-3 statements, I would probably
> update those.
>
> On Thu, Dec 10, 2015 at 9:27 AM, Lydia Pintscher <
> lydia.pintsc...@wikimedia.de> wrote:
>
> > On Tue, Dec 8, 2015 at 1:15 AM, Gnangarra  wrote:
> > > Criag is right this cant be fixed within the database because the data
> > base
> > > is applying one truth where there is no one truth for everyone. This
> will
> > > always be the single biggest flaw of Wikidata no matter how data is
> > > presented it can never be the absolute truth unless its measurable
> > through
> > > some mathematical scientific process that can replicated by everyone,
> > > translated into any language.
> > >
> > > Wikipedia's answer is to present all considerations in an equal manor
> and
> > > not interpret the facts
> > >
> > > Wikidata defines what is fact, what is truth, what is right thats a big
> > > task and is something the community has never tackled before... should
> we
> > > even try, has the damage already been done or should we narrow the
> range
> > of
> > > recorded data, could we flag alternatives, could we give a measure of
> > > acceptance for each fact. are there alternative means
> >
> > That is actually not correct. We have built Wikidata from the very
> > beginning with some core believes. One of them is that Wikidata isn't
> > supposed to have the one truth but instead is able to represent
> > various different points of view and link to sources claiming these.
> > Look for example at the country statements for Jerusalem:
> > https://www.wikidata.org/wiki/Q1218
> > Now I am the first to say that this will not be able to capture the
> > full complexity of the world around us. But that's not what it is
> > meant to do. However please be aware that we have built more than just
> > a dumb database with Wikidata and have gone to great length to make it
> > possible to capture knowledge diversity.
> >
> >
> > Cheers
> > Lydia
> >
> > --
> > Lydia Pintscher - http://about.me/lydia.pintscher
> > Product Manager for Wikidata
> >
> > Wikimedia Deutschland e.V.
> > Tempelhofer Ufer 23-24
> > 10963 Berlin
> > www.wikimedia.de
> >
> > Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
> >
> > Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> > unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
> > Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
> >
> > ___
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > Wikimedia-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > 
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe:

Re: [Wikimedia-l] Quality issues

2015-12-10 Thread Jane Darnell

Amen to that! This discussion about Jerusalem reminds me of the discussion
we had about the nationality of Anne Frank. For those interested, there
have been some heated debates about whether Mobile should use the text in
Wikidata "label descriptions" or rather some basic presentation of the P31
property. Most descriptions are still blank anyway. Personally I think
texts such as "capital of Israel" or "holocaust victim" are both better
than blank, but many disagree with me.

Both of these represent associated items that have a lot of eyes on them,
but what about our more obscure items? Lots of these may be improved by the
people who originally created a Wikipedia page for them. As a Wikipedia
editor who has created over 2000 Wikipedia pages, I feel somewhat dismayed
at the idea that I need to walk through this long list and add statements
to their Wikidata items as the responsible party who introduced them to the
Wikiverse in the first place. But if I had a gadget that would tell me
which of my created Wikipedia articles had 0-3 statements, I would probably
update those.

On Thu, Dec 10, 2015 at 9:27 AM, Lydia Pintscher <
lydia.pintsc...@wikimedia.de> wrote:

> On Tue, Dec 8, 2015 at 1:15 AM, Gnangarra  wrote:
> > Criag is right this cant be fixed within the database because the data
> base
> > is applying one truth where there is no one truth for everyone. This will
> > always be the single biggest flaw of Wikidata no matter how data is
> > presented it can never be the absolute truth unless its measurable
> through
> > some mathematical scientific process that can replicated by everyone,
> > translated into any language.
> >
> > Wikipedia's answer is to present all considerations in an equal manor and
> > not interpret the facts
> >
> > Wikidata defines what is fact, what is truth, what is right thats a big
> > task and is something the community has never tackled before... should we
> > even try, has the damage already been done or should we narrow the range
> of
> > recorded data, could we flag alternatives, could we give a measure of
> > acceptance for each fact. are there alternative means
>
> That is actually not correct. We have built Wikidata from the very
> beginning with some core believes. One of them is that Wikidata isn't
> supposed to have the one truth but instead is able to represent
> various different points of view and link to sources claiming these.
> Look for example at the country statements for Jerusalem:
> https://www.wikidata.org/wiki/Q1218
> Now I am the first to say that this will not be able to capture the
> full complexity of the world around us. But that's not what it is
> meant to do. However please be aware that we have built more than just
> a dumb database with Wikidata and have gone to great length to make it
> possible to capture knowledge diversity.
>
>
> Cheers
> Lydia
>
> --
> Lydia Pintscher - http://about.me/lydia.pintscher
> Product Manager for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
> Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
>
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-10 Thread Gerard Meijssen

Hoi,
The other side of being easily manipulated is that it is easy to rectify.
The Signpost is FUD in so many ways and incorrect as well. Yes, you may
have a concern about falsehoods. However, this is not going to be helped
much by insisting that everything is to be sourced. It is also not the only
way to consider quality and arguably it is the least helpful way of
improving the quality at Wikidata.

Typically what has been established on other sources is acceptable as valid
for now. When we compare and find differences, it is of relevance to find
sources and even document the differences. When it is a falsehood we should
flag them as such.  Sources can be wrong or considered to be wrong.

The case for the CC-0 license is so in line with what the WMF stands for.
Our aim is to share in the sum of all knowledge and it is the most obvious
way to do it. When Wikidata is found to document falsehoods or established
truths that are problematic, we gain a quality where people come to
Wikidata to learn what they need to learn.

So where some see a problem, there is opportunity.
Thanks,
  GerardM

On 10 December 2015 at 10:14, Gnangarra  wrote:

> I agree getting bogged down on one item of data isnt helpful but the data
> does need to show its disputed and the data item on Israel
>  should at least have Tel Aviv listed
> as its mentonym
>
>
> within the database because the data base
> > is applying one truth where there is no one truth for everyone. This will
> > always be the single biggest flaw of Wikidata no matter how data is
> > presented it can never be the absolute truth
>
> The Jerusalem/Israel example where the data doesnt indicate its disputed
> means that it will propagated as an absolute truth...
>
>
> Then again this is shifting away from the original concern over quality
> that the ability to verify the information  isnt clear combined with the
> CC0 license the already established practice on other sources. Wikidata for
> falsehoods being easily manipulated its going to have a impact.
>
> On 10 December 2015 at 16:44, Jane Darnell  wrote:
>
> > Amen to that! This discussion about Jerusalem reminds me of the
> discussion
> > we had about the nationality of Anne Frank. For those interested, there
> > have been some heated debates about whether Mobile should use the text in
> > Wikidata "label descriptions" or rather some basic presentation of the
> P31
> > property. Most descriptions are still blank anyway. Personally I think
> > texts such as "capital of Israel" or "holocaust victim" are both better
> > than blank, but many disagree with me.
> >
> > Both of these represent associated items that have a lot of eyes on them,
> > but what about our more obscure items? Lots of these may be improved by
> the
> > people who originally created a Wikipedia page for them. As a Wikipedia
> > editor who has created over 2000 Wikipedia pages, I feel somewhat
> dismayed
> > at the idea that I need to walk through this long list and add statements
> > to their Wikidata items as the responsible party who introduced them to
> the
> > Wikiverse in the first place. But if I had a gadget that would tell me
> > which of my created Wikipedia articles had 0-3 statements, I would
> probably
> > update those.
> >
> > On Thu, Dec 10, 2015 at 9:27 AM, Lydia Pintscher <
> > lydia.pintsc...@wikimedia.de> wrote:
> >
> > > On Tue, Dec 8, 2015 at 1:15 AM, Gnangarra  wrote:
> > > > Criag is right this cant be fixed within the database because the
> data
> > > base
> > > > is applying one truth where there is no one truth for everyone. This
> > will
> > > > always be the single biggest flaw of Wikidata no matter how data is
> > > > presented it can never be the absolute truth unless its measurable
> > > through
> > > > some mathematical scientific process that can replicated by everyone,
> > > > translated into any language.
> > > >
> > > > Wikipedia's answer is to present all considerations in an equal manor
> > and
> > > > not interpret the facts
> > > >
> > > > Wikidata defines what is fact, what is truth, what is right thats a
> big
> > > > task and is something the community has never tackled before...
> should
> > we
> > > > even try, has the damage already been done or should we narrow the
> > range
> > > of
> > > > recorded data, could we flag alternatives, could we give a measure of
> > > > acceptance for each fact. are there alternative means
> > >
> > > That is actually not correct. We have built Wikidata from the very
> > > beginning with some core believes. One of them is that Wikidata isn't
> > > supposed to have the one truth but instead is able to represent
> > > various different points of view and link to sources claiming these.
> > > Look for example at the country statements for Jerusalem:
> > > https://www.wikidata.org/wiki/Q1218
> > > Now I am the first to say that this will not be able to capture the
> >

Re: [Wikimedia-l] Quality issues

2015-12-10 Thread Lydia Pintscher

On Tue, Dec 8, 2015 at 1:15 AM, Gnangarra  wrote:
> Criag is right this cant be fixed within the database because the data base
> is applying one truth where there is no one truth for everyone. This will
> always be the single biggest flaw of Wikidata no matter how data is
> presented it can never be the absolute truth unless its measurable through
> some mathematical scientific process that can replicated by everyone,
> translated into any language.
>
> Wikipedia's answer is to present all considerations in an equal manor and
> not interpret the facts
>
> Wikidata defines what is fact, what is truth, what is right thats a big
> task and is something the community has never tackled before... should we
> even try, has the damage already been done or should we narrow the range of
> recorded data, could we flag alternatives, could we give a measure of
> acceptance for each fact. are there alternative means

That is actually not correct. We have built Wikidata from the very
beginning with some core believes. One of them is that Wikidata isn't
supposed to have the one truth but instead is able to represent
various different points of view and link to sources claiming these.
Look for example at the country statements for Jerusalem:
https://www.wikidata.org/wiki/Q1218
Now I am the first to say that this will not be able to capture the
full complexity of the world around us. But that's not what it is
meant to do. However please be aware that we have built more than just
a dumb database with Wikidata and have gone to great length to make it
possible to capture knowledge diversity.

Cheers
Lydia

-- 
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-10 Thread Jane Darnell

Just as this discussion shifts, so does Wikidata quality. Both, hopefully,
in a more constructive direction, which was Lydia's original point.

On Thu, Dec 10, 2015 at 10:14 AM, Gnangarra  wrote:

> I agree getting bogged down on one item of data isnt helpful but the data
> does need to show its disputed and the data item on Israel
>  should at least have Tel Aviv listed
> as its mentonym
>
>
> within the database because the data base
> > is applying one truth where there is no one truth for everyone. This will
> > always be the single biggest flaw of Wikidata no matter how data is
> > presented it can never be the absolute truth
>
> The Jerusalem/Israel example where the data doesnt indicate its disputed
> means that it will propagated as an absolute truth...
>
>
> Then again this is shifting away from the original concern over quality
> that the ability to verify the information  isnt clear combined with the
> CC0 license the already established practice on other sources. Wikidata for
> falsehoods being easily manipulated its going to have a impact.
>
> On 10 December 2015 at 16:44, Jane Darnell  wrote:
>
> > Amen to that! This discussion about Jerusalem reminds me of the
> discussion
> > we had about the nationality of Anne Frank. For those interested, there
> > have been some heated debates about whether Mobile should use the text in
> > Wikidata "label descriptions" or rather some basic presentation of the
> P31
> > property. Most descriptions are still blank anyway. Personally I think
> > texts such as "capital of Israel" or "holocaust victim" are both better
> > than blank, but many disagree with me.
> >
> > Both of these represent associated items that have a lot of eyes on them,
> > but what about our more obscure items? Lots of these may be improved by
> the
> > people who originally created a Wikipedia page for them. As a Wikipedia
> > editor who has created over 2000 Wikipedia pages, I feel somewhat
> dismayed
> > at the idea that I need to walk through this long list and add statements
> > to their Wikidata items as the responsible party who introduced them to
> the
> > Wikiverse in the first place. But if I had a gadget that would tell me
> > which of my created Wikipedia articles had 0-3 statements, I would
> probably
> > update those.
> >
> > On Thu, Dec 10, 2015 at 9:27 AM, Lydia Pintscher <
> > lydia.pintsc...@wikimedia.de> wrote:
> >
> > > On Tue, Dec 8, 2015 at 1:15 AM, Gnangarra  wrote:
> > > > Criag is right this cant be fixed within the database because the
> data
> > > base
> > > > is applying one truth where there is no one truth for everyone. This
> > will
> > > > always be the single biggest flaw of Wikidata no matter how data is
> > > > presented it can never be the absolute truth unless its measurable
> > > through
> > > > some mathematical scientific process that can replicated by everyone,
> > > > translated into any language.
> > > >
> > > > Wikipedia's answer is to present all considerations in an equal manor
> > and
> > > > not interpret the facts
> > > >
> > > > Wikidata defines what is fact, what is truth, what is right thats a
> big
> > > > task and is something the community has never tackled before...
> should
> > we
> > > > even try, has the damage already been done or should we narrow the
> > range
> > > of
> > > > recorded data, could we flag alternatives, could we give a measure of
> > > > acceptance for each fact. are there alternative means
> > >
> > > That is actually not correct. We have built Wikidata from the very
> > > beginning with some core believes. One of them is that Wikidata isn't
> > > supposed to have the one truth but instead is able to represent
> > > various different points of view and link to sources claiming these.
> > > Look for example at the country statements for Jerusalem:
> > > https://www.wikidata.org/wiki/Q1218
> > > Now I am the first to say that this will not be able to capture the
> > > full complexity of the world around us. But that's not what it is
> > > meant to do. However please be aware that we have built more than just
> > > a dumb database with Wikidata and have gone to great length to make it
> > > possible to capture knowledge diversity.
> > >
> > >
> > > Cheers
> > > Lydia
> > >
> > > --
> > > Lydia Pintscher - http://about.me/lydia.pintscher
> > > Product Manager for Wikidata
> > >
> > > Wikimedia Deutschland e.V.
> > > Tempelhofer Ufer 23-24
> > > 10963 Berlin
> > > www.wikimedia.de
> > >
> > > Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
> > >
> > > Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> > > unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
> > > Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
> > >
> > > ___
> > > Wikimedia-l mailing list, guidelines at:
> > >

Re: [Wikimedia-l] Quality issues

2015-12-10 Thread Gerard Meijssen

Hoi,
The other side of the coin of being easily manipulated is that it is easy
to rectify. The Signpost is FUD in so many ways and incorrect as well. Yes,
you may have a concern about falsehoods. However, this is not going to be
helped much by insisting that everything is to be sourced. It is also not
the only way to consider quality and arguably it is the least helpful way
of improving the quality at Wikidata.

Typically what has been established on other sources is acceptable as valid
for now. When we compare and find differences, it is of relevance to find
sources and even document the differences. When it is a falsehood we should
flag them as such.  Sources can be wrong or considered to be wrong. The
point however is that by concentrating on differences first we make the
most effective use of people who like these kinds of puzzles.

The case for the CC-0 license is so in line with what the WMF stands for.
Our aim is to share in the sum of all knowledge and it is the most obvious
way to do it. When Wikidata is found to document falsehoods or established
truths that are problematic, we gain a quality where people come to
Wikidata to learn what they need to learn.

When you say it has an impact, OK. Let it have an impact but lets consider
arguments and that is exactly what the author of this article did not do.
It is the one reason why what he wrote is FUD. So do consider quality and
recognise that we have made enormous strides forward. When this recognition
sinks in, when people understand how quality actually works, the kind of
quality that makes a difference improving Wikidata, we can easily go on
doing what we do. We may be bold and should be bold, we may make mistakes
and we do learn as we go along.
Thanks,
  GerardM

On 10 December 2015 at 10:14, Gnangarra  wrote:

> I agree getting bogged down on one item of data isnt helpful but the data
> does need to show its disputed and the data item on Israel
>  should at least have Tel Aviv listed
> as its mentonym
>
>
> within the database because the data base
> > is applying one truth where there is no one truth for everyone. This will
> > always be the single biggest flaw of Wikidata no matter how data is
> > presented it can never be the absolute truth
>
> The Jerusalem/Israel example where the data doesnt indicate its disputed
> means that it will propagated as an absolute truth...
>
>
> Then again this is shifting away from the original concern over quality
> that the ability to verify the information  isnt clear combined with the
> CC0 license the already established practice on other sources. Wikidata for
> falsehoods being easily manipulated its going to have a impact.
>
> On 10 December 2015 at 16:44, Jane Darnell  wrote:
>
> > Amen to that! This discussion about Jerusalem reminds me of the
> discussion
> > we had about the nationality of Anne Frank. For those interested, there
> > have been some heated debates about whether Mobile should use the text in
> > Wikidata "label descriptions" or rather some basic presentation of the
> P31
> > property. Most descriptions are still blank anyway. Personally I think
> > texts such as "capital of Israel" or "holocaust victim" are both better
> > than blank, but many disagree with me.
> >
> > Both of these represent associated items that have a lot of eyes on them,
> > but what about our more obscure items? Lots of these may be improved by
> the
> > people who originally created a Wikipedia page for them. As a Wikipedia
> > editor who has created over 2000 Wikipedia pages, I feel somewhat
> dismayed
> > at the idea that I need to walk through this long list and add statements
> > to their Wikidata items as the responsible party who introduced them to
> the
> > Wikiverse in the first place. But if I had a gadget that would tell me
> > which of my created Wikipedia articles had 0-3 statements, I would
> probably
> > update those.
> >
> > On Thu, Dec 10, 2015 at 9:27 AM, Lydia Pintscher <
> > lydia.pintsc...@wikimedia.de> wrote:
> >
> > > On Tue, Dec 8, 2015 at 1:15 AM, Gnangarra  wrote:
> > > > Criag is right this cant be fixed within the database because the
> data
> > > base
> > > > is applying one truth where there is no one truth for everyone. This
> > will
> > > > always be the single biggest flaw of Wikidata no matter how data is
> > > > presented it can never be the absolute truth unless its measurable
> > > through
> > > > some mathematical scientific process that can replicated by everyone,
> > > > translated into any language.
> > > >
> > > > Wikipedia's answer is to present all considerations in an equal manor
> > and
> > > > not interpret the facts
> > > >
> > > > Wikidata defines what is fact, what is truth, what is right thats a
> big
> > > > task and is something the community has never tackled before...
> should
> > we
> > > > even try, has the damage already been done or should we narrow

Re: [Wikimedia-l] Quality issues

2015-12-10 Thread Andreas Kolbe

On Thu, Dec 10, 2015 at 10:27 AM, Gerard Meijssen  wrote:

> The case for the CC-0 license is so in line with what the WMF stands for.
> Our aim is to share in the sum of all knowledge and it is the most obvious
> way to do it. When Wikidata is found to document falsehoods or established
> truths that are problematic, we gain a quality where people come to
> Wikidata to learn what they need to learn.
>

According to Denny, Wikidata, under its CC0 licence, must not import data
from Share-Alike sources. He reconfirmed this yesterday when I asked him
whether he still stood by that.

In practice though we have Wikidata importing massive amounts of data from
Wikipedia, which was a Share-Alike source last time I looked. Isn't
Wikidata then infringing Wikipedia contributors' rights?

Why is it okay to import data from the CC BY-SA Wikipedia, but not from
European CC BY-SA population statistics?

There are inchoate and uncomfortable parallels to licence laundering here,
which I would hope is not something the WMF stands for. Could someone
please explain?
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-10 Thread Gerard Meijssen

Hoi,
What other people say is there choice. The law is simple. Facts cannot be
copyrighted and consequently the preference / the opinion of Denny is
simply that.

Typically statistics organisations are more than happy to share their data.
They do so in the Netherlands and it is only for a lack of organisation on
our end that it has not happened yet.

When I copy data from Wikipedia, it is unstructured in every sense. As a
follow up I often spend time to improve upon it further.

I do not care for your opinion. So far I only have seen your FUD, you
present preferences of people like Denny as a ground for compliance, it is
not and there is not much positive in what I have seen from you so far.

What is your contribution, what is it that you hope to achieve?

You point to organisations like statistics organisation like they are the
ones not interested in collaboration. They are ever so happy to collaborate
and we are happy to acknowledge them for the source of information when
they do. By seeking collaboration, by seeking to bring data together and
achieve more, we are able to make a difference. This is not done by
publicly claiming like you do that you are not involved and do not want to
know. It is done by being involved, knowing what quality means and how we
can achieve it and walking the walk and talk the talk.
Thanks,
  GerardM

On 10 December 2015 at 13:17, Andreas Kolbe  wrote:

> On Thu, Dec 10, 2015 at 10:27 AM, Gerard Meijssen <
> gerard.meijs...@gmail.com
> > wrote:
>
> > The case for the CC-0 license is so in line with what the WMF stands for.
> > Our aim is to share in the sum of all knowledge and it is the most
> obvious
> > way to do it. When Wikidata is found to document falsehoods or
> established
> > truths that are problematic, we gain a quality where people come to
> > Wikidata to learn what they need to learn.
> >
>
>
> According to Denny, Wikidata, under its CC0 licence, must not import data
> from Share-Alike sources. He reconfirmed this yesterday when I asked him
> whether he still stood by that.
>
> In practice though we have Wikidata importing massive amounts of data from
> Wikipedia, which was a Share-Alike source last time I looked. Isn't
> Wikidata then infringing Wikipedia contributors' rights?
>
> Why is it okay to import data from the CC BY-SA Wikipedia, but not from
> European CC BY-SA population statistics?
>
> There are inchoate and uncomfortable parallels to licence laundering here,
> which I would hope is not something the WMF stands for. Could someone
> please explain?
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-07 Thread Yaroslav M. Blanter


On 2015-12-01 12:27, Andreas Kolbe wrote:

Article by Mark Graham in Slate, Nov. 30, 2015:

Why Does Google Say Jerusalem Is the Capital of Israel?
It has to do with the fact that the Web is now optimized for machines, 
not

people.



Second, because of the stripping away of context, it can be challenging 
to

represent important nuance. In the case of Jerusalem, the issue is less
that particular viewpoints about the city’s status as a capital are 
true or
false, but rather that there can be multiple truths, all of which are 
hard
to fold into a single database entry. Finally, it’s difficult for users 
to
challenge or contest representations that they deem to be unfair. 
Wikidata

is, and Freebase used to be, built on user-generated content, but those
users tend to be a highly specialized group—it’s not easy for lay users 
to
participate in those platforms. And those platforms often aren’t the 
place
in which their data is ultimately displayed, making it hard for some 
users

to find them. Furthermore, because Google’s Knowledge Base is so opaque
about where it pulls its information from, it is often unclear if those
sites are even the origins of data in the first place.

Jerusalem is just one example among many in which knowledge bases are
increasingly distancing (and in some case cutting off) debate about
contested knowledges of places. [followed by more examples]



The story with Jerusalem is very simple. I created the Wikidata item. 
The English description was "city in Israel". Then POV pushers came. 
Some of them wanted "city in Palestine", and others wanted "capital of 
Israel". Then one user, who later was elected to the board of Wikimedia 
Israel, canvassed a number of users in Hebrew Wikipedia. When there were 
too many POV pushers, I just unwatched the page, and it became "capital 
of Israel". Later on, someone managed to change it to smth neutral. 
That's it. There is nothing automatic here.


Cheers
Yaroslav

___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-07 Thread Andreas Kolbe

Hi Yaroslav,

Thanks for the background. The "POV pushing" you describe is of course what
Graham and Ford are examining in their paper.

For what it's worth, the Wikidata item for Jerusalem[1] still contains the
statement "capital of Israel" today.

As I understand it, the Knowledge Graph uses a number of sources to "guess"
whether something is factual or not. Whether Wikidata is one of them, and
what weight it has in this process, is something I suspect no one outside
Google knows.

The op-ed I mentioned writing last week is now out as part of the current
Signpost issue.[2]

Andreas

[1] https://www.wikidata.org/wiki/Q1218
[2]
https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2015-12-02/Op-ed

On Mon, Dec 7, 2015 at 8:29 PM, Yaroslav M. Blanter 
wrote:

> The story with Jerusalem is very simple. I created the Wikidata item. The
> English description was "city in Israel". Then POV pushers came. Some of
> them wanted "city in Palestine", and others wanted "capital of Israel".
> Then one user, who later was elected to the board of Wikimedia Israel,
> canvassed a number of users in Hebrew Wikipedia. When there were too many
> POV pushers, I just unwatched the page, and it became "capital of Israel".
> Later on, someone managed to change it to smth neutral. That's it. There is
> nothing automatic here.
>
> Cheers
> Yaroslav
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-07 Thread Andrea Zanni

On Mon, Dec 7, 2015 at 9:53 PM, Andreas Kolbe  wrote:

> Hi Yaroslav,
>
> Thanks for the background. The "POV pushing" you describe is of course what
> Graham and Ford are examining in their paper.
>
> For what it's worth, the Wikidata item for Jerusalem[1] still contains the
> statement "capital of Israel" today.
>

Really, I do not understand the difference between this kind of problem and
Wikipedia's edit wars or conflicts.
Wikidata represents knowledge in a structured, collaborative way: both
features define it, and it seems the op-ed just doesn't like them (either
one or both).

Aubrey
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-07 Thread Andreas Kolbe

Hi Markus,

On 1 December 2015 at 23:43, Markus Krötzsch 

wrote:

> [I continue cross-posting for this reply, but it would make sense to
> return the thread to the Wikidata list where it started, so as to avoid
> partial discussions happening in many places.]

Apologies for the late reply.

While you indicated that you had crossposted this reply to
Wikimedia-l, it didn't turn up in my inbox. I only saw it today, after
Atlasowa pointed it out on the Signpost op-ed's talk page.[1]

> On 27.11.2015 12:08, Andreas Kolbe wrote:

> >* Wikipedia content is considered a reliable source in Wikidata, and
*> >* Wikidata content is used as a reliable source by Google, where it
*> >* appears without any indication of its provenance.*

> This prompted me to reply. I wanted to write an email that merely says: >
"Really? Where did you get this from?" (Google using Wikidata content)

Multiple sources, including what appears to be your own research
group's writing:[2]

---o0o---

In December 2013, Google announced that their own collaboratively
edited knowledge base, Freebase, is to be discontinued in favour of
Wikidata, which gives Wikidata a prominent role as an in[p]ut for
Google Knowledge Graph. The research group Knowledge Systems
 is working
in close cooperation with the development team behind Wikidata, and
provides, e.g., the regular Wikidata RDF-Exports.

---o0o---

> But then I read the rest ... so here you go ...

> Your email mixes up many things and effects, some of which are important
> issues (e.g., the fact that VIAF is not a primary data source that
> should be used in citations). Many other of your remarks I find very
> hard to take serious, including but not limited to the following:

> * A rather bizarre connection between licensing models and
> accountability (as if it would make content more credible if you are
> legally required to say that you found it on Wikipedia, or even give a
> list of user names and IPs who contributed)

Both Freebase and Wikipedia have attribution licences. When Bing's
Snapshot displays information drawn from Freebase or Wikipedia, it's
indicated thus at the bottom of the infobox[3]:

---o0o---

Data from Freebase · Wikipedia

---o0o---

I take this as a token gesture to these sources' attribution licences.

Given the amount of space they have available, I would think most
people would agree that this form of attribution is sufficient. You
couldn't possibly expect them to list all contributors who have ever
contributed to the lead of the Wikipedia article, for example, as the
letter of the licence might require.

However, I think it's proper and important that those minimal
attributions are there. And given Wikidata's CC0 licence, I don't
expect re-users to continue attributing in this manner. This view is
shared by Max Klein for example, who is quoted to that effect in the
Signpost op-ed.[4]

> * Some stories that I think you really just made up for the sake of > 
> argument (Denny alone has picked the Wikidata license?

Denny led the development team. There are multiple public instances
and accounts of his having advocated this choice and convinced people
of the wisdom of it, in Wikidata talk pages and elsewhere, including a
recent post on the Wikidata mailing list.[5]

Interestingly, he originally said that this would mean there could be
no imports from Wikipedia, and that there was in fact no intention to
import data from Wikipedias (see op-ed).[6] He also said, higher up on
that page, that this was "for starters", and that that decision could
easily be changed later on by the community.[7]

> Google displays Wikidata content?

See above. If Wikidata plays "a prominent role as an in[p]ut for
Google Knowledge Graph" then I would expect there to be
correspondences between Knowledge Graph and Wikidata content.

> Bing is fuelled by Wikimedia?)

I spoke of "Wikimedia-fuelled search engines like Google and Bing" in
the context of the Google Knowledge Graph and Bing's Snapshot/Satori
equivalent.

We all know that in both cases, much of the content Google and Bing
display in these infoboxes comes from Wikimedia projects (Wikipedia,
Commons and now, apparently, Wikidata).

> * Some disjointed remarks about the history of capitalism> * The assertion 
> that content is worse just because the author who > created it used a bot for 
> editing

I spoke of "bot users mass-importing unreliable data". It's not the
bot method that makes the data unreliable: they are unreliable to
begin with (because they are unsourced, nobody verifies the source,
etc.).

As I pointed out in this week's op-ed, of the top fifteen hoaxes in
the English Wikipedia, six have active Wikidata items (or rather, had:
they were deleted this morning, after the op-ed appeared).

This is what I mean by

Re: [Wikimedia-l] Quality issues

2015-12-07 Thread Craig Franklin

Such issues are always going to crop up when you're attempting to describe
the world using Aristotelian propositions.  In a source like Wikipedia, we
can provide some nuance, explain both sides of the issue, the history of
both claims, and let the reader decide.  In a database, we are limited to
saying that Jerusalem either is or is not the capital of Israel.

To be fair, this is not an weakness that is implementation-specific to
Wikidata; it is always going to happen when you try to describe the world
in this way.  It's not something that can be fixed with adding sources, or
by bolting fancy new technical gadgets onto the side of the database.

Cheers,
Craig

On 8 December 2015 at 06:58, Andrea Zanni  wrote:

> On Mon, Dec 7, 2015 at 9:53 PM, Andreas Kolbe  wrote:
>
> > Hi Yaroslav,
> >
> > Thanks for the background. The "POV pushing" you describe is of course
> what
> > Graham and Ford are examining in their paper.
> >
> > For what it's worth, the Wikidata item for Jerusalem[1] still contains
> the
> > statement "capital of Israel" today.
> >
>
>
> Really, I do not understand the difference between this kind of problem and
> Wikipedia's edit wars or conflicts.
> Wikidata represents knowledge in a structured, collaborative way: both
> features define it, and it seems the op-ed just doesn't like them (either
> one or both).
>
> Aubrey
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-07 Thread Gnangarra

Criag is right this cant be fixed within the database because the data base
is applying one truth where there is no one truth for everyone. This will
always be the single biggest flaw of Wikidata no matter how data is
presented it can never be the absolute truth unless its measurable through
some mathematical scientific process that can replicated by everyone,
translated into any language.

Wikipedia's answer is to present all considerations in an equal manor and
not interpret the facts

Wikidata defines what is fact, what is truth, what is right thats a big
task and is something the community has never tackled before... should we
even try, has the damage already been done or should we narrow the range of
recorded data, could we flag alternatives, could we give a measure of
acceptance for each fact. are there alternative means

Quality itself has many different measures and many different ways of being
measured all of which are the truth for the question being asked...

Are we even asking the questions we need to in the way we need to?



On 8 December 2015 at 07:52, Craig Franklin 
wrote:

> Such issues are always going to crop up when you're attempting to describe
> the world using Aristotelian propositions.  In a source like Wikipedia, we
> can provide some nuance, explain both sides of the issue, the history of
> both claims, and let the reader decide.  In a database, we are limited to
> saying that Jerusalem either is or is not the capital of Israel.
>
> To be fair, this is not an weakness that is implementation-specific to
> Wikidata; it is always going to happen when you try to describe the world
> in this way.  It's not something that can be fixed with adding sources, or
> by bolting fancy new technical gadgets onto the side of the database.
>
> Cheers,
> Craig
>
> On 8 December 2015 at 06:58, Andrea Zanni 
> wrote:
>
> > On Mon, Dec 7, 2015 at 9:53 PM, Andreas Kolbe 
> wrote:
> >
> > > Hi Yaroslav,
> > >
> > > Thanks for the background. The "POV pushing" you describe is of course
> > what
> > > Graham and Ford are examining in their paper.
> > >
> > > For what it's worth, the Wikidata item for Jerusalem[1] still contains
> > the
> > > statement "capital of Israel" today.
> > >
> >
> >
> > Really, I do not understand the difference between this kind of problem
> and
> > Wikipedia's edit wars or conflicts.
> > Wikidata represents knowledge in a structured, collaborative way: both
> > features define it, and it seems the op-ed just doesn't like them (either
> > one or both).
> >
> > Aubrey
> > ___
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > Wikimedia-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > 
> >
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>



-- 
GN.
President Wikimedia Australia
WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra
Photo Gallery: http://gnangarra.redbubble.com
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-01 Thread Andreas Kolbe

On Tue, Dec 1, 2015 at 4:16 PM, Gerard Meijssen 
wrote:

> In the mean time with your "I do not want to be involved attitude" you are
>
the proverbial sailor who stays on shore.



Well, me and 99. percent of the global population. Not everyone has to
contribute to Wikidata. :)



> My arguments are plausible and I actively work towards getting them
> implemented. I do not need to convince people to do my work. The only thing
> I want to do is ask people for their support so that we get sooner to the
> stage where we will share in the sum of all available knowledge, something
> we do not really do at this stage.
>


Thanks for the spirited debate, and good luck to you, Gerard. May your
efforts be fruitful.

Andreas
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-01 Thread Gerard Meijssen

Hoi,
This thread is called "quality". There are ways to include multiple
truisms. Wikidata is the data project of the Wikimedia Foundation, it is a
wiki, so when you have issues, deal with it.

I prefer to quote what John Ruskin had to say: "Quality is never an
accident. It is always the result of intelligent effort". I am more
concerned with the fact that the Linguapax Prize does not have all of its
winners. I am more concerned that half of the items of Wikidata have fewer
than three statements.

These are issues that deal with the quality of Wikidata. As Magnus has
started to produce reports on issues between Mix'n Match and Wikidata, he
invites people to improve our quality. It is one way in which the quality
of our current data improves measurably.

When I blog about the Nansen Refugee award I report on the type of issues I
find in Wikipedia. It is easy to find fault. The point however is not that
Wikipedia is bad nor that Wikidata is good. The point is that in order to
achieve quality there is a lot of work to do.
Thanks,
  GerardM

On 1 December 2015 at 12:27, Andreas Kolbe  wrote:

> Article by Mark Graham in Slate, Nov. 30, 2015:
>
> Why Does Google Say Jerusalem Is the Capital of Israel?
> It has to do with the fact that the Web is now optimized for machines, not
> people.
>
>
> http://www.slate.com/articles/technology/future_tense/2015/11/why_does_google_say_jerusalem_is_the_capital_of_israel.html
>
> Excerpt:
>
> [...] because of the ease of separating content from containers, the
> provenance of data is often obscured. Contexts are stripped away, and
> sources vanish into Google’s black box. For instance, most of the
> information in Google’s infoboxes on cities doesn’t tell us where the data
> is sourced from.
>
> Second, because of the stripping away of context, it can be challenging to
> represent important nuance. In the case of Jerusalem, the issue is less
> that particular viewpoints about the city’s status as a capital are true or
> false, but rather that there can be multiple truths, all of which are hard
> to fold into a single database entry. Finally, it’s difficult for users to
> challenge or contest representations that they deem to be unfair. Wikidata
> is, and Freebase used to be, built on user-generated content, but those
> users tend to be a highly specialized group—it’s not easy for lay users to
> participate in those platforms. And those platforms often aren’t the place
> in which their data is ultimately displayed, making it hard for some users
> to find them. Furthermore, because Google’s Knowledge Base is so opaque
> about where it pulls its information from, it is often unclear if those
> sites are even the origins of data in the first place.
>
> Jerusalem is just one example among many in which knowledge bases are
> increasingly distancing (and in some case cutting off) debate about
> contested knowledges of places. [followed by more examples]
>
> My point is not that any of these positions are right or wrong. It is
> instead that the move to linked data and the semantic Web means that many
> decisions about how places are represented are increasingly being made by
> people and processes far from, and invisible to, people living under the
> digital shadows of those very representations. Contestations are
> centralized and turned into single data points that make it difficult for
> local citizens to have a significant voice in the co-construction of their
> own cities. [...]
>
> Linked data and the machine-readable Web have important implications for
> representation, voice, and ultimately power in cities, and we need to
> ensure that we aren't seduced into codifying, categorizing, and structuring
> in cases when ambiguity, not certainty, reigns.
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-01 Thread Andreas Kolbe

Article by Mark Graham in Slate, Nov. 30, 2015:

Why Does Google Say Jerusalem Is the Capital of Israel?
It has to do with the fact that the Web is now optimized for machines, not
people.

http://www.slate.com/articles/technology/future_tense/2015/11/why_does_google_say_jerusalem_is_the_capital_of_israel.html

Excerpt:

[...] because of the ease of separating content from containers, the
provenance of data is often obscured. Contexts are stripped away, and
sources vanish into Google’s black box. For instance, most of the
information in Google’s infoboxes on cities doesn’t tell us where the data
is sourced from.

Second, because of the stripping away of context, it can be challenging to
represent important nuance. In the case of Jerusalem, the issue is less
that particular viewpoints about the city’s status as a capital are true or
false, but rather that there can be multiple truths, all of which are hard
to fold into a single database entry. Finally, it’s difficult for users to
challenge or contest representations that they deem to be unfair. Wikidata
is, and Freebase used to be, built on user-generated content, but those
users tend to be a highly specialized group—it’s not easy for lay users to
participate in those platforms. And those platforms often aren’t the place
in which their data is ultimately displayed, making it hard for some users
to find them. Furthermore, because Google’s Knowledge Base is so opaque
about where it pulls its information from, it is often unclear if those
sites are even the origins of data in the first place.

Jerusalem is just one example among many in which knowledge bases are
increasingly distancing (and in some case cutting off) debate about
contested knowledges of places. [followed by more examples]

My point is not that any of these positions are right or wrong. It is
instead that the move to linked data and the semantic Web means that many
decisions about how places are represented are increasingly being made by
people and processes far from, and invisible to, people living under the
digital shadows of those very representations. Contestations are
centralized and turned into single data points that make it difficult for
local citizens to have a significant voice in the co-construction of their
own cities. [...]

Linked data and the machine-readable Web have important implications for
representation, voice, and ultimately power in cities, and we need to
ensure that we aren't seduced into codifying, categorizing, and structuring
in cases when ambiguity, not certainty, reigns.
___
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-12-01 Thread Andreas Kolbe

On Sun, Nov 29, 2015 at 2:55 PM, Gerard Meijssen 
wrote:

> So identify an issue and it can be dealt with.
>

The fact an issue *can* be dealt with does not mean that it *will* be dealt
with.

For example, in the post that opened this discussion a little over a week
ago, you said:

"At Wikidata we often find issues with data imported from a Wikipedia.
Lists have been produced with these issues on the Wikipedia involved and
arguably they do present issues with the quality of Wikipedia or Wikidata
for that matter. So far hardly anything resulted from such outreach."

These were your own words: "hardly anything resulted from such outreach."
Wikimedia is three years into this project. If people produce lists of
quality issues, that's great, but if nothing happens as a result, that's
not so great.

An example of this is available in this very thread. Three days ago I
mentioned the issues with the Grasulf II of Friuli entries on Reasonator
and Wikidata. I didn't expect that you or anyone else would fix them, and
they haven't been, at the time of writing.

You certainly could have fixed them -- you have made hundreds of edits on
Wikidata since replying to that post of mine -- but you haven't. Adding new
data is more satisfying than sourcing and improving an obscure entry. (If
you're wondering why I didn't fix the entry myself, see the section "And to
answer the obvious question …" in last month's Signpost op-ed.[1])

This problem is replicated across the Wikimedia universe. Wikimedia
projects are run by volunteers. They work on what interests them, or
whatever they have an investment in. Fixing old errors is not as appealing
as importing 2 million items of new data (including tens or hundreds of
thousands of erroneous ones), because fixing errors is slow work. It
retards the growth of your edit count! You spend one hour researching a
date, and all you get for that effort is one lousy edit in your
contributions history. There are plenty of tasks allowing you to rack up
500 edits in 5 minutes. People seem to prefer those.

That is why Wikipedia has the familiar backlogs in areas like copyright
infringement or AfC. Even warning templates indicating bias or other
problematic content often sit for years without being addressed.

There is a systemic mismatch between data creation and data curation. There
is a lot of energy for the former, and very little energy for the latter.
That is why initiatives like the one started by WMF board member James
Heilman and others, to have the English Wikipedia's medical articles
peer-reviewed, are so important. They are small steps in the right
direction.

> When we are afraid about a Seigenthaler type of event based on Wikidata,
> rest assured there is plenty wrong in either Wikipedia or Wikidata tha
> makes it possible for it to happen. The most important thing is to deal
> with it responsibly. Just being afraid will not help us in any way. Yes we
> need quality and quantity. As long as we make a best effort to improve our
> data, we will do well.
>

That's "eventualism". "Quality is terrible, but eventually it will be
great, because ... we're all trying, and it's a wiki!" To me that sounds
more like religious faith or magical thinking than empirical science.

Things being on a wiki does not guarantee quality; far from it.[2][3][4][5]

> As to the Wikipedian is residence, that is his opinion. At the same time
> the article on ebola has been very important. It may not be science but it
> certainly encyclopaedic. At the same time this Wikipedian in residence is
> involved, makes a positive contribution and while he may make mistakes he
> is part of the solution.
>
> I am happy that you propose that work is to be done. What have you done but
> more importantly what are you going to do? For me there is "Number of
> edits:
> 2,088,923" 
>

I will do what I can to encourage Wikimedia Foundation board members and
management to review the situation, in consultation with outside academics
like those at the Oxford Internet Institute who are concerned about present
developments, and to consider whether more stringent sourcing policies are
required for Wikidata in order to assure the quality and traceability of
data in the Wikidata corpus.

The public is the most important stakeholder in this, and should be
informed and involved. If there are quality issues, the Wikimedia
Foundation should be completely transparent about them in its public
communications, neither minimising nor exaggerating the issues. Known
problems and potential issues should be publicised as widely as possible in
order to minimise the harm to society resulting from uncritical reuse of
faulty data.

I have started to reach out to scholars and journalists, inviting them to
review this thread as well as related materials, and form their own
conclusions. I may write an op-ed about it in the Signpost, because I
believe it's an

Re: [Wikimedia-l] Quality issues

2015-12-01 Thread Gerard Meijssen

Hoi,
 I do work on quality issues. I blog about them. I work towards
implementing solutions.  I have fixed quite a few errors in Wikidata
and I do not rack up as many edits as I could because of it.

In the mean time with your "I do not want to be involved attitude" you are
the proverbial sailor who stays on shore. It is your option to get your
hands dirty or not. However, a friend of mine mentioned this attitude and
compared it to the people who said that Wikipedia would never work. That is
fine so I will  just move on away from many of your arguments..

I do not care about profit. I have over 2 million edits on Wikidata alone
and I have a few others on other projects as well. They may, it is implicit
in the license make a profit. The point is that as more data is freed, it
will free more data. With more free data we can inform more people. We can
share more of the sum of all available knowledge.

I wonder, there are many ways in which quality can be improved and all you
do is refer to others. Why should I bother with your arguments when they
are not yours and when you do not show how to make a difference? My
arguments are plausible and I actively work towards getting them
implemented. I do not need to convince people to do my work. The only thing
I want to do is ask people for their support so that we get sooner to the
stage where we will share in the sum of all available knowledge, something
we do not really do at this stage.
Thanks,
  GerardM

On 1 December 2015 at 15:30, Andreas Kolbe  wrote:

> On Sun, Nov 29, 2015 at 2:55 PM, Gerard Meijssen <
> gerard.meijs...@gmail.com>
> wrote:
>
> > So identify an issue and it can be dealt with.
> >
>
>
> The fact an issue *can* be dealt with does not mean that it *will* be dealt
> with.
>
> For example, in the post that opened this discussion a little over a week
> ago, you said:
>
> "At Wikidata we often find issues with data imported from a Wikipedia.
> Lists have been produced with these issues on the Wikipedia involved and
> arguably they do present issues with the quality of Wikipedia or Wikidata
> for that matter. So far hardly anything resulted from such outreach."
>
> These were your own words: "hardly anything resulted from such outreach."
> Wikimedia is three years into this project. If people produce lists of
> quality issues, that's great, but if nothing happens as a result, that's
> not so great.
>
> An example of this is available in this very thread. Three days ago I
> mentioned the issues with the Grasulf II of Friuli entries on Reasonator
> and Wikidata. I didn't expect that you or anyone else would fix them, and
> they haven't been, at the time of writing.
>
> You certainly could have fixed them -- you have made hundreds of edits on
> Wikidata since replying to that post of mine -- but you haven't. Adding new
> data is more satisfying than sourcing and improving an obscure entry. (If
> you're wondering why I didn't fix the entry myself, see the section "And to
> answer the obvious question …" in last month's Signpost op-ed.[1])
>
> This problem is replicated across the Wikimedia universe. Wikimedia
> projects are run by volunteers. They work on what interests them, or
> whatever they have an investment in. Fixing old errors is not as appealing
> as importing 2 million items of new data (including tens or hundreds of
> thousands of erroneous ones), because fixing errors is slow work. It
> retards the growth of your edit count! You spend one hour researching a
> date, and all you get for that effort is one lousy edit in your
> contributions history. There are plenty of tasks allowing you to rack up
> 500 edits in 5 minutes. People seem to prefer those.
>
> That is why Wikipedia has the familiar backlogs in areas like copyright
> infringement or AfC. Even warning templates indicating bias or other
> problematic content often sit for years without being addressed.
>
> There is a systemic mismatch between data creation and data curation. There
> is a lot of energy for the former, and very little energy for the latter.
> That is why initiatives like the one started by WMF board member James
> Heilman and others, to have the English Wikipedia's medical articles
> peer-reviewed, are so important. They are small steps in the right
> direction.
>
>
>
> > When we are afraid about a Seigenthaler type of event based on Wikidata,
> > rest assured there is plenty wrong in either Wikipedia or Wikidata tha
> > makes it possible for it to happen. The most important thing is to deal
> > with it responsibly. Just being afraid will not help us in any way. Yes
> we
> > need quality and quantity. As long as we make a best effort to improve
> our
> > data, we will do well.
> >
>
>
> That's "eventualism". "Quality is terrible, but eventually it will be
> great, because ... we're all trying, and it's a wiki!" To me that sounds
> more like religious faith or magical thinking than empirical science.
>
> Things being on a wiki does not guarantee

Re: [Wikimedia-l] Quality issues

2015-11-29 Thread Lilburne

Simply if I have a litre of sewage and add to it 100ml of pure water,
I still have sewage. Conversely if I have a a litre of pure water and
pour in 100ml of sewage into it then what do I have?

What if 2 out of 10 bank statements are erroneous is that OK because
8 are accurate?

What if ever 2 out of 10 gas stations delivered Gasoline from the Diesel
pump?

On 29/11/2015 10:38, Gerard Meijssen wrote:

Hoi,
More FUD. Poisonous how?
Thanks,
 GerardM

On 29 November 2015 at 11:33, Lilburne > wrote:

On 29/11/2015 09:42, Gerard Meijssen wrote:

Hoi, Wikidata is a wiki and, you seem to always forget that. 
> > The corruption of data .. how? Each statement is its own

data item

> how do you corrupt that? As I say so often, when you get a
collection > that is 80% correct you have an error rate of 20%.

Surely this isn't some exam paper where you get an 80% passing mark.
What you have is a basket of eggs ... 20% of which are poisonous.

___
Wikimedia-l mailing list, guidelines at:
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org

Unsubscribe:
https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
?subject=unsubscribe>

___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-11-29 Thread Andreas Kolbe

Gergo,

On Sun, Nov 29, 2015 at 12:36 AM, Gergo Tisza  wrote:

> By the same logic, to the extent Wikipedia takes its facts from non-free
> external source, its free license would be a copyright violation. Luckily
> for us, that's not how copyright works.

I'm aware that facts are not copyrightable. By the same logic, Wikidata
being offered under a CC BY-SA license, say, would not prevent anyone from
extracting facts -- knowledge -- from it, and it would enable Wikidata to
import a lot of data it presently cannot, because of licence
incompatibilities.

> Statements of facts can not be
> copyrighted; large-scale arrangements of facts (ie. a full database)
> probably can, but CC does not prevent others from using them without
> attribution, just distributing them (again, it's like the GPL/Affero
> difference);

Distribution is the issue here – large-scale distribution and viral
propagation of data with a well-documented potential for manipulation and
error, in a way that makes the provenance of these data a closed book to
the end user.

Do you accept that this is a potential problem, and if so, how would you
guard against it, if not through the licence?

> there are sui generis database rights in some countries but
> not in the USA where both Wikipedia and most proprietary
> reusers/compatitors are located, so relying on neighbouring rights would
> not help there but cause legal uncertainty for reusers (e.g. OSM which has
> lots of legal trouble importing coordinates due to being EU-based).
>

It seems noteworthy that Freebase specifically said, with regard to loading
structured data, "If a data source is under CC-BY, you can load it into
Freebase as long as you provide attribution."[1]

Wikidata practice seems to have taken a different path regarding licence
compatibility, given its systematic imports from Wikipedia.

Interestingly enough, it's been pointed out to me that Denny said in
2012,[2]

---o0o---

Alexrk2, it is true that Wikidata under CC0 would not be allowed to import
content from a Share-Alike data source. Wikidata does not plan to extract
content out of Wikipedia at all. Wikidata will *provide* data that can be
reused in the Wikipedias. And a CC0 source can be used by a Share-Alike
project, be it either Wikipedia or OSM. But not the other way around. Do we
agree on this understanding? --Denny Vrandečić (WMDE)
 (
talk
)
12:39, 4 July 2012 (UTC)

---o0o---

The key sentence here is "Wikidata does not plan to extract content out of
Wikipedia at all."

That doesn't seem to be how things have turned out, because today we have
people on Wikidata raising alarms about mass imports from Wikipedia:[3]

---o0o---

Reliable Bot imports from wikipedias?

In a Wikipedia discussion I came by chance across a link to the following
discussion:

   - Wikidata:Project_chat/Archive/2015/10#STOP_with_bot_import

[...] To provide an outside perspective as Wikipedian (and a potential
use[r] of WD in the future). I wholeheartedly agree with Snipre, in fact
"bots [ar]e running wild" and the uncontrolled import of data/information
from Wikipedias is one of the main reasons for some Wikipedias developing
an increasingly hostile attitude towards WD and its usage in Wikipedias.
*If* WD is ever to function as a central data storage for various Wikimedia
projects and in particular Wikipedia as well (in analogy to Commons), *then*
 quality has to take the driver's seat over quantity. A central storage
needs a much better data integrity than the projects using it, because one
mistake in its data will multiply throughout the projects relying on WD,
which may cause all sorts of problems. For crude comparison think of a
virus placed on a central server than on a single client.The consequences
are much more severe and nobody in their right mind would run the server
with even less protection/restrictions than the client.

Another thin[g] is, that if you envision users of other Wikimedia projects
such as Wikipedia or even 3rd party external projects to eventually help
with data maintenance when they start using WD, then you might find them
rather unwilling to do so, if not enough attention is paid to quality,
instead they probably just dump WD from their projects.

In general all the advantages of the central data storage depend on the
quality (reliability) of data. If that is not given to reasonable high
degree, there is no point to have central data storage at all. All the
great application become useless if they operate on false data.--Kmhkmh

 (talk ) 12:00, 19 November
2015 (UTC)

---o0o---

(I was unaware of that post by Kmhkmh when I started

Re: [Wikimedia-l] Quality issues

Hoi,
It would be a gross violation of trust to bring Wikidata under a different
license. When an external source is willing to share its data, it can do
so. With explicit agreement we can copy data in from them in this way. Even
when this is not possible for whatever reason, we can still contribute
because we can compare data and on the basis of differences in existing
data curate our data and enable them to share our findings.

I am amused by your fear for manipulation. Yes, data can be manipulated but
once we see it happen, we can take measures when it affects the data we
hold. Provenance of data is at this stage something we at Wikidata wish
for. Arguably it does not make sense to make it a priority for all of our
data because it would stifle Wikidata and it is utterly against the wiki
spirit.

The best way to guard against manipulation is to cooperate widely and take
any difference in data as serious. It is in the differences where we want
to know why the differences and why they exist. Focussing on known issues
helps us identify systemic issues and when we do we can expose such
manipulation with proof. In this way we are using a SMART methodology. No I
would never use the license as a weapon, it is how manipulation is
justified.

Importing data from Wikipedia is a sensible thing to do. Its data is
relatively well known for its quality. It has its issues but its basis is
NPOV. When people are alarmed about importing from Wikipedia, it tells us
more of what they think of the quality of Wikipedia than of the quality of
Wikidata. When people are alarmed because they cannot control it, ask
yourself what is their problem and how do their arguments enable the notion
of Wikidata as a wiki? When imported data is wrong, there are tools to
remove content quite delicately. So identify an issue and it can be dealt
with.

When you argue that Wikidata cannot be used as a central storage. Fine, do
not use it. In the mean time quality of specific sets of data is of higher
quality than any Wikipedia. This is a proven fact. The question if Wikidata
is useful as a central datarepository at this time can only be answered as
NO when it means it is about all of Wikidata. When it is about specific
subsets of data the answer is clearly yes. It is also obvious that as time
goes on more subsets of data will be of a higher quality than any Wikipedia
(when thinking in terms of sets of data - there will always items where a
Wikipedia has an edge).

FYI I am in contact with a German university that is likely to use Wikidata
internally for its research data. It needs Reasonator type of functionality
to make it useful. It wants to share its data with Wikidata and wants two
way RSS feeds in order to include new information

When we set up cooperatation with statistical offices, we CAN attribute
easily by having bots import data on their behalf using THEIR user id and
adding sources to the new data. We can also provide data from their website
in applications.. It is not the license that means anything it is what we
agree to do. When we have sourced data in this way, you are silly to change
it. False attributions are not permitted under any license.

When we are afraid about a Seigenthaler type of event based on Wikidata,
rest assured there is plenty wrong in either Wikipedia or Wikidata tha
makes it possible for it to happen. The most important thing is to deal
with it responsibly. Just being afraid will not help us in any way. Yes we
need quality and quantity. As long as we make a best effort to improve our
data, we will do well.

As to the Wikipedian is residence, that is his opinion. At the same time
the article on ebola has been very important. It may not be science but it
certainly encyclopaedic. At the same time this Wikipedian in residence is
involved, makes a positive contribution and while he may make mistakes he
is part of the solution.

I am happy that you propose that work is to be done. What have you done but
more importantly what are you going to do? For me there is "Number of edits:
2,088,923" 
Thanks,
 GerardM

On 29 November 2015 at 15:10, Andreas Kolbe  wrote:

> Gergo,
>
>
> On Sun, Nov 29, 2015 at 12:36 AM, Gergo Tisza 
> wrote:
>
> > By the same logic, to the extent Wikipedia takes its facts from non-free
> > external source, its free license would be a copyright violation. Luckily
> > for us, that's not how copyright works.
>
>
>
> I'm aware that facts are not copyrightable. By the same logic, Wikidata
> being offered under a CC BY-SA license, say, would not prevent anyone from
> extracting facts -- knowledge -- from it, and it would enable Wikidata to
> import a lot of data it presently cannot, because of licence
> incompatibilities.
>
>
>
> > Statements of facts can not be
> > copyrighted; large-scale arrangements of facts (ie. a full database)
> > probably can, but CC does not prevent others from using

Re: [Wikimedia-l] Quality issues

Hoi,
Wikidata is a wiki and, you seem to always forget that.

The corruption of data .. how? Each statement is its own data item how do
you corrupt that? As I say so often, when you get a collection that is 80%
correct you have an error rate of 20%. When you do not include that data
you have an error rate of 100%. When you have an other source that is 90%
correct that has similar data and you have an overlap of 50%, you can be
smart and at the start or later compare the data and curate.. When you only
import at the start what is the same, you probably get something like 84%
correct data imported. You can gamify the rest but however you slice it,
what you do not have and could have is 100% wrong.

Wikidata is NOT Wikipedia. It is much easier to curate data and
consequently your argument is FUD. The big thing we have not learned is
cooperation. We do not cooperate. We do not have per standard RSS feeds for
the changes to the items that belong to a specific source. We are happy to
get data but we do not reach out and give back. For me the fact that VIAF
uses Wikidata as a link is an opportunity to do better. The German DNB
cooperation are the projects that we should emulate.

When you talk about quality, you talk in an insular fashion. We have to do
it, our community. At Wikidata our community can include other
organisations with rich collections of data with high quality. We can
share, compare, curate. Even with our current low quality, we have subsets
of data that shine. Subsets of data that our of at least the same quality
as Wikipedia. However this quality is often marred with a lack of quantity,
quantity we can have when collaboration is what we do.

You are afraid of our reputation. Reputation has many aspects. Jane023
presented at the Dutch Wikimedia conference. She uses a tool that is easier
on her because no Wikipedians bother her because it is a Wikidata based
list. A similar list is now used for its quality on the Welsh Wikipedia.
The data is of a quality that Google actually uses it as she reported.

When I see the religious application of Wikipedia sentiments. I find that
we do not even care for the life of one of our own. Bassel is executed or
likely to be executed soon and some think our neutrality is so important.
FOR WHAT? So that we may not even protect our own? Is it right to protest
against TTIP (and we should) and not protest for a Wikipedian that embodies
our values?

Wikipedia think is not applicable at this stage for Wikidata. Its quality
is arguably piss poor but better in places. Many items are corrupt because
they follow the structure of Wikipedia articles. A structure Wikipedians
insist on because they wrote that article and "Wikidata is only a service
project".

I do agree that we need more quality. My approach has set theory on its
side, it embodies the wiki approach and yours is one where Pallas Athena is
to rise from the brain of Zeus in full armour. You may have noticed that my
arguments are easy to follow and conform to something that is measurable.
Yours is private, there is no possibility to verify the accuracy of your
argument. I call bullshit on your argument, not because you do not make a
fine argument but because it is an argument that prevents us from improving
Wikidata.

My hope is that we can work constructively on our quality and have a
measurable effect.
Thanks,
  GerardM

On 29 November 2015 at 02:05, Gnangarra  wrote:

> >
> > While I happily agree that Sources are good, I will not ask people to
> start
> > adding Sources at this point of time it will not improve quality
> > signifcantly. It makes more sense once we are at a stage where multiple
> > sources disagree on values for statements. Adding sources is signifcantly
> > more meaningful and useful once we start curating data.
>
>
> the problems will that by the time Wikidata starts to curate data it'll
> will have corrupted that data with its own data, and secondly past
> experience with wiki's is that fixing data after its been entered is
> actually harder and more time consuming to do, along with the fact that the
> damage to reputation will have a lasting impact  and fixing that consumes
> millions of dollars in Donner money.. As said earlier there are lesson in
> the development of Wikipedia that should be heeded in an attempt to avoid
> those same pitfalls
>
>
> On 29 November 2015 at 08:37, Gerard Meijssen 
> wrote:
>
> > Hoi,
> > It was from the Myanmar WIkipedia that a lot of data was imported to
> > Wikidata. Data that did not exist elsewhere. I do not care really what
> > "Freedom House" says. I do not know them, I do know that the data is
> > relevant and useful It was even the subject on a blogpost..
> >
> > You may ignore data that is not from a source that you like. This
> > indiscriminate POV is not a NPOV.
> >
> > As to Grasulf, you failed to get the point. It was NOT about the data
> > itself but about the presentation. I worked on this

Re: [Wikimedia-l] Quality issues

2015-11-29 Thread Lilburne


On 29/11/2015 09:42, Gerard Meijssen wrote:
Hoi, Wikidata is a wiki and, you seem to always forget that.  > > The corruption of data .. how? Each statement is its own data item 
> how do you corrupt that? As I say so often, when you get a collection 
> that is 80% correct you have an error rate of 20%.


Surely this isn't some exam paper where you get an 80% passing mark.
What you have is a basket of eggs ... 20% of which are poisonous.


___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

2015-11-29 Thread Jane Darnell

Gerard,
Thanks for highlighting my work! I already posted slides on Commons, but I
want to flesh them out with links to actual edits so people can better
understand some of these quality improvement workflows. The tools I use for
lists are written mostly by the Wikidata "god" Magnus Manske and the tools
I use on Commons are self-built kludges with the assistance of Commonist
Vera de Kok. Here is an example of a quality improvement I did this morning
for a file on Commons that was originally uploaded by an English Wikipedian
who uploaded it with the default uploader for use in an English Wikipedia
list. The improvements are coming from both the original edits of the
uploader on Wikipedia as well as the associated Wikidata list:
https://commons.wikimedia.org/w/index.php?title=File:Rembrandt_Man_with_a_Falcon_on_his_Wrist.jpg=prev=180547014

Jane

On Sun, Nov 29, 2015 at 10:42 AM, Gerard Meijssen  wrote:

> Hoi,
> Wikidata is a wiki and, you seem to always forget that.
>
> The corruption of data .. how? Each statement is its own data item how do
> you corrupt that? As I say so often, when you get a collection that is 80%
> correct you have an error rate of 20%. When you do not include that data
> you have an error rate of 100%. When you have an other source that is 90%
> correct that has similar data and you have an overlap of 50%, you can be
> smart and at the start or later compare the data and curate.. When you only
> import at the start what is the same, you probably get something like 84%
> correct data imported. You can gamify the rest but however you slice it,
> what you do not have and could have is 100% wrong.
>
> Wikidata is NOT Wikipedia. It is much easier to curate data and
> consequently your argument is FUD. The big thing we have not learned is
> cooperation. We do not cooperate. We do not have per standard RSS feeds for
> the changes to the items that belong to a specific source. We are happy to
> get data but we do not reach out and give back. For me the fact that VIAF
> uses Wikidata as a link is an opportunity to do better. The German DNB
> cooperation are the projects that we should emulate.
>
> When you talk about quality, you talk in an insular fashion. We have to do
> it, our community. At Wikidata our community can include other
> organisations with rich collections of data with high quality. We can
> share, compare, curate. Even with our current low quality, we have subsets
> of data that shine. Subsets of data that our of at least the same quality
> as Wikipedia. However this quality is often marred with a lack of quantity,
> quantity we can have when collaboration is what we do.
>
> You are afraid of our reputation. Reputation has many aspects. Jane023
> presented at the Dutch Wikimedia conference. She uses a tool that is easier
> on her because no Wikipedians bother her because it is a Wikidata based
> list. A similar list is now used for its quality on the Welsh Wikipedia.
> The data is of a quality that Google actually uses it as she reported.
>
> When I see the religious application of Wikipedia sentiments. I find that
> we do not even care for the life of one of our own. Bassel is executed or
> likely to be executed soon and some think our neutrality is so important.
> FOR WHAT? So that we may not even protect our own? Is it right to protest
> against TTIP (and we should) and not protest for a Wikipedian that embodies
> our values?
>
> Wikipedia think is not applicable at this stage for Wikidata. Its quality
> is arguably piss poor but better in places. Many items are corrupt because
> they follow the structure of Wikipedia articles. A structure Wikipedians
> insist on because they wrote that article and "Wikidata is only a service
> project".
>
> I do agree that we need more quality. My approach has set theory on its
> side, it embodies the wiki approach and yours is one where Pallas Athena is
> to rise from the brain of Zeus in full armour. You may have noticed that my
> arguments are easy to follow and conform to something that is measurable.
> Yours is private, there is no possibility to verify the accuracy of your
> argument. I call bullshit on your argument, not because you do not make a
> fine argument but because it is an argument that prevents us from improving
> Wikidata.
>
> My hope is that we can work constructively on our quality and have a
> measurable effect.
> Thanks,
>   GerardM
>
> On 29 November 2015 at 02:05, Gnangarra  wrote:
>
> > >
> > > While I happily agree that Sources are good, I will not ask people to
> > start
> > > adding Sources at this point of time it will not improve quality
> > > signifcantly. It makes more sense once we are at a stage where multiple
> > > sources disagree on values for statements. Adding sources is
> signifcantly
> > > more meaningful and useful once we start curating data.
> >
> >
> > the problems will that by the time Wikidata starts to curate data it'll
> > will have

Re: [Wikimedia-l] Quality issues

Hoi,
If anything it proves that you did not understand. Happy that you
appreciate what you finally see.
Thanks,
 GerardM

On 29 November 2015 at 03:38, Andreas Kolbe  wrote:

> On Sun, Nov 29, 2015 at 12:37 AM, Gerard Meijssen <
> gerard.meijs...@gmail.com
> > wrote:
>
> > As to Grasulf, you failed to get the point. It was NOT about the data
> > itself but about the presentation.
> >
>
>
> QED. :)
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues

Hoi,
More FUD. Poisonous how?
Thanks,
 GerardM

On 29 November 2015 at 11:33, Lilburne  wrote:

> On 29/11/2015 09:42, Gerard Meijssen wrote:
>
>> Hoi, Wikidata is a wiki and, you seem to always forget that.  > > The
>> corruption of data .. how? Each statement is its own data item
>>
> > how do you corrupt that? As I say so often, when you get a collection >
> that is 80% correct you have an error rate of 20%.
>
> Surely this isn't some exam paper where you get an 80% passing mark.
> What you have is a basket of eggs ... 20% of which are poisonous.
>
>
>
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,

Re: [Wikimedia-l] Quality issues