Re: [Wikimedia-l] Quality issues

2015-11-28 Thread Andreas Kolbe
On Sat, Nov 28, 2015 at 1:17 AM, Gergo Tisza  wrote:

> Trying to make our content less free for fear that someone might misuse it
> is a shamefully wrong frame
> of mind for and organization that's supposed to be a leader of the
> open content movement, IMO.
>


Do you think there is something "shameful" about Wikipedia using the
Creative Commons Attribution-ShareAlike 3.0 Unported License?

And if that isn't shameful, why would it be shameful if Wikidata used the
same licence?

Attribution has a dual benefit:

1. It provides visibility for Wikimedia and the open content movement.
2. The public can see where the data comes from.

What is shameful about that?
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-11-28 Thread Andreas Kolbe
On Sat, Nov 28, 2015 at 10:13 AM, Gergő Tisza  wrote:

>
> ("Shameful" was an unnecessarily confrontational choice of word; I
> apologize.)
>


Thanks.



> There is also the practical matter of facts not being copyrightable in the
> US, and non-zero CC licenses not being particularly useful for databases
> (what you want is something like the GPL Affero for databases and CC does
> not have such a license).
>


That hasn't stopped DBpedia and other open-content databases (the
Paleobiology database for example[1]) from using CC licenses requiring
attribution.

DBpedia arguably had to, because its database is derived from Wikipedia,
which has an attribution required, share-alike license: "DBpedia is derived
from Wikipedia and is distributed under the same licensing terms as
Wikipedia itself."[2]

To the extent that Wikidata draws on Wikipedia, its CC0 license would
appear to be a gross violation of Wikipedia's share-alike license
requirement.

The generation of data always has a social context. Knowing where data come
from is a good thing.

[1] https://creativecommons.org/weblog/entry/41216
[2] http://wiki.dbpedia.org/terms-imprint
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-11-28 Thread Andreas Kolbe
Gerard,


On Fri, Nov 27, 2015, Gerard Meijssen  wrote:

When you compare the quality of Wikipedias with what en.wp used to be you
> are comparing apples and oranges. The Myanmar Wikipedia is better informed
> on Myanmar than en.wp etc.
>


Is it? The entire Burmese Wikipedia contains a mere 31,646 content pages at
the time of writing, covering (or trying to cover) all countries of the
world, and all aspects of human knowledge.[1]

The English Wikipedia's WikiProject Myanmar, meanwhile, has 6,713 pages
within its purview.[2] I dare say that's more articles on Myanmar than the
Burmese Wikipedia contains. As an indication, the English Wikipedia's
article on Myanmar is more than twice as long as the one in the Burmese
Wikipedia.

Moreover, according to Freedom House[3], the internet in Myanmar is not
free:

"The government detained and charged internet users for online activities
[...] Government officials pressured social media users not to distribute
or share content that offends the military, or disturbs the functions of
government."



> When you qualify a Wikipedia as fascist, it does not follow that the data
> is suspect. Certainly when data in a source that you so easily dismiss is
> typically the same, there is not much meaning in what you say from a
> Wikidata point of view.
>


Data are always generated within a social context, and data generated by
political extremists or people living under oppressive regimes are suspect
whenever they have political implications. (Looking at the descriptions of
Burmese politics, my feeling is the Burmese Wikipedia is not under
significant government control, but largely written by ex-pats. However,
the situation is quite different in some other Wikipedias serving countries
labouring under similar regimes.)



> PS What does your librarian think when she knows



It was a he, but I'll leave him to join in himself if he chooses to.


I happen to work on Dukes of Friuli. Compare the data from Wikidata and the
> information by Reasonator based on the same item for one of them.
>
> https://tools.wmflabs.org/reasonator/?=2471519
> https://www.wikidata.org/wiki/Q2471519
>


Let's look at this example. Reasonator says of Grasulf II of Friulim, "He
died in 653". There is no source. Wikidata says he died in 653, and the
indicated source is the Italian Wikipedia.

However, when you look at the (very brief) Italian Wikipedia article[4],
you will find that the year 653 is given with a question mark. The English
Wikipedia, in contrast, states, in its similarly brief article[5],

"Nothing more is known about Grasulf and the date of his death is
uncertain."

Do you now see the problem about nuance? Reasonator and Wikidata
confidently proclaim as uncontested fact something that in fact is rather
uncertain.

The sole source cited by both the English and the Italian Wikipedia is the
Historia Langobardorum, available in Wikisource.[6] My Latin is a bit
rusty, but while the Historia mentions that Ago succeeded Grasulf upon the
latter's death, it says nothing specific about when that was. The
Historia's time indications are in general very vague, usually limited to
the phrase "Circa haec tempora", meaning "about this time". So it is in
this case.

For reference, the Google Knowledge Graph states equally confidently that
Grasulf II of Friuli died in 651AD. This may be based on the English
Wikipedia's unsourced claim (in the template at the bottom of the English
Wikipedia article) that his reign ended c. 651, or on some other source
like Freebase.

The other Wikipedias that have articles on Grasulf II provide the following
death dates

Catalan: 651
Galician: 653
Lithuanian: 653
Polish: 651
Romanian: Unknown
Russian: 653
Ukrainian: 651

As for published sources, I can offer Ersch's Allgemeine Encyclopädie
(1849), which states on page 209 that Grasulf II died in 651.[7]

The extreme vagueness of the available dates is pointed out by Thomas
Hodgkin in Vol. 7 of "Italy and Her Invaders" (1895). Hodgkin puts the end
of Grasulf's reign at 645, "as a mere random guess", and adds that "De
Rubeis, following Sigonius", puts the accession of Ago in 661.[8]

There may well be better and more recent sources beyond my reach, but
having these published dates in Wikidata, with the source references, would
actually make some sense. Unsourced data, not so much.

Answers are comfortable, but they are not knowledge when they are
unverifiable and/or wrong.


[1] https://meta.wikimedia.org/wiki/List_of_Wikipedias#10_000.2B_articles
[2]
https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Myanmar_(Burma)/Assessment

[3] https://freedomhouse.org/report/freedom-net/2015/myanmar
[4]
https://it.wikipedia.org/w/index.php?title=Grasulfo_II_del_Friuli=76641444
[5]
https://en.wikipedia.org/w/index.php?title=Grasulf_II_of_Friuli=633223880
[6] https://la.wikisource.org/wiki/Historia_Langobardorum/Liber_IV
[7]

Re: [Wikimedia-l] Quality issues

2015-11-28 Thread Ed Erhart
On the very specific point of knowledge and how it's not always possible to
boil it down to a single quantifiable value, I couldn't agree more. Thank
you, Andreas, for the detailed anecdote displaying that problem, and I'll
be happy to provide more if needed.

Does Wikidata have a way of marking data entries as estimates, or at least
dates as circa (not just unknown)?

--Ed
On Nov 28, 2015 1:24 PM, "Andreas Kolbe"  wrote:

> Gerard,
>
>
> On Fri, Nov 27, 2015, Gerard Meijssen  wrote:
>
> When you compare the quality of Wikipedias with what en.wp used to be you
> > are comparing apples and oranges. The Myanmar Wikipedia is better
> informed
> > on Myanmar than en.wp etc.
> >
>
>
> Is it? The entire Burmese Wikipedia contains a mere 31,646 content pages at
> the time of writing, covering (or trying to cover) all countries of the
> world, and all aspects of human knowledge.[1]
>
> The English Wikipedia's WikiProject Myanmar, meanwhile, has 6,713 pages
> within its purview.[2] I dare say that's more articles on Myanmar than the
> Burmese Wikipedia contains. As an indication, the English Wikipedia's
> article on Myanmar is more than twice as long as the one in the Burmese
> Wikipedia.
>
> Moreover, according to Freedom House[3], the internet in Myanmar is not
> free:
>
> "The government detained and charged internet users for online activities
> [...] Government officials pressured social media users not to distribute
> or share content that offends the military, or disturbs the functions of
> government."
>
>
>
> > When you qualify a Wikipedia as fascist, it does not follow that the data
> > is suspect. Certainly when data in a source that you so easily dismiss is
> > typically the same, there is not much meaning in what you say from a
> > Wikidata point of view.
> >
>
>
> Data are always generated within a social context, and data generated by
> political extremists or people living under oppressive regimes are suspect
> whenever they have political implications. (Looking at the descriptions of
> Burmese politics, my feeling is the Burmese Wikipedia is not under
> significant government control, but largely written by ex-pats. However,
> the situation is quite different in some other Wikipedias serving countries
> labouring under similar regimes.)
>
>
>
> > PS What does your librarian think when she knows
>
>
>
> It was a he, but I'll leave him to join in himself if he chooses to.
>
>
> I happen to work on Dukes of Friuli. Compare the data from Wikidata and the
> > information by Reasonator based on the same item for one of them.
> >
> > https://tools.wmflabs.org/reasonator/?=2471519
> > https://www.wikidata.org/wiki/Q2471519
> >
>
>
> Let's look at this example. Reasonator says of Grasulf II of Friulim, "He
> died in 653". There is no source. Wikidata says he died in 653, and the
> indicated source is the Italian Wikipedia.
>
> However, when you look at the (very brief) Italian Wikipedia article[4],
> you will find that the year 653 is given with a question mark. The English
> Wikipedia, in contrast, states, in its similarly brief article[5],
>
> "Nothing more is known about Grasulf and the date of his death is
> uncertain."
>
> Do you now see the problem about nuance? Reasonator and Wikidata
> confidently proclaim as uncontested fact something that in fact is rather
> uncertain.
>
> The sole source cited by both the English and the Italian Wikipedia is the
> Historia Langobardorum, available in Wikisource.[6] My Latin is a bit
> rusty, but while the Historia mentions that Ago succeeded Grasulf upon the
> latter's death, it says nothing specific about when that was. The
> Historia's time indications are in general very vague, usually limited to
> the phrase "Circa haec tempora", meaning "about this time". So it is in
> this case.
>
> For reference, the Google Knowledge Graph states equally confidently that
> Grasulf II of Friuli died in 651AD. This may be based on the English
> Wikipedia's unsourced claim (in the template at the bottom of the English
> Wikipedia article) that his reign ended c. 651, or on some other source
> like Freebase.
>
> The other Wikipedias that have articles on Grasulf II provide the following
> death dates
>
> Catalan: 651
> Galician: 653
> Lithuanian: 653
> Polish: 651
> Romanian: Unknown
> Russian: 653
> Ukrainian: 651
>
> As for published sources, I can offer Ersch's Allgemeine Encyclopädie
> (1849), which states on page 209 that Grasulf II died in 651.[7]
>
> The extreme vagueness of the available dates is pointed out by Thomas
> Hodgkin in Vol. 7 of "Italy and Her Invaders" (1895). Hodgkin puts the end
> of Grasulf's reign at 645, "as a mere random guess", and adds that "De
> Rubeis, following Sigonius", puts the accession of Ago in 661.[8]
>
> There may well be better and more recent sources beyond my reach, but
> having these published dates in Wikidata, with the source references, would
> actually make some sense. Unsourced 

Re: [Wikimedia-l] Quality issues

2015-11-28 Thread Rob
That male librarian here.

I think we need to encourage people to add more and conflicting data
to Wikidata, and to cite their sources when they do so.  Currently
it's not particularly easy to cite your sources on Wikidata.  So the
end result is that it encourages people to view whatever single
uncited bit of data appears there as the one true fact.

On Sat, Nov 28, 2015 at 2:17 PM, Ed Erhart  wrote:
> On the very specific point of knowledge and how it's not always possible to
> boil it down to a single quantifiable value, I couldn't agree more. Thank
> you, Andreas, for the detailed anecdote displaying that problem, and I'll
> be happy to provide more if needed.
>
> Does Wikidata have a way of marking data entries as estimates, or at least
> dates as circa (not just unknown)?
>
> --Ed
> On Nov 28, 2015 1:24 PM, "Andreas Kolbe"  wrote:
>
>> Gerard,
>>
>>
>> On Fri, Nov 27, 2015, Gerard Meijssen  wrote:
>>
>> When you compare the quality of Wikipedias with what en.wp used to be you
>> > are comparing apples and oranges. The Myanmar Wikipedia is better
>> informed
>> > on Myanmar than en.wp etc.
>> >
>>
>>
>> Is it? The entire Burmese Wikipedia contains a mere 31,646 content pages at
>> the time of writing, covering (or trying to cover) all countries of the
>> world, and all aspects of human knowledge.[1]
>>
>> The English Wikipedia's WikiProject Myanmar, meanwhile, has 6,713 pages
>> within its purview.[2] I dare say that's more articles on Myanmar than the
>> Burmese Wikipedia contains. As an indication, the English Wikipedia's
>> article on Myanmar is more than twice as long as the one in the Burmese
>> Wikipedia.
>>
>> Moreover, according to Freedom House[3], the internet in Myanmar is not
>> free:
>>
>> "The government detained and charged internet users for online activities
>> [...] Government officials pressured social media users not to distribute
>> or share content that offends the military, or disturbs the functions of
>> government."
>>
>>
>>
>> > When you qualify a Wikipedia as fascist, it does not follow that the data
>> > is suspect. Certainly when data in a source that you so easily dismiss is
>> > typically the same, there is not much meaning in what you say from a
>> > Wikidata point of view.
>> >
>>
>>
>> Data are always generated within a social context, and data generated by
>> political extremists or people living under oppressive regimes are suspect
>> whenever they have political implications. (Looking at the descriptions of
>> Burmese politics, my feeling is the Burmese Wikipedia is not under
>> significant government control, but largely written by ex-pats. However,
>> the situation is quite different in some other Wikipedias serving countries
>> labouring under similar regimes.)
>>
>>
>>
>> > PS What does your librarian think when she knows
>>
>>
>>
>> It was a he, but I'll leave him to join in himself if he chooses to.
>>
>>
>> I happen to work on Dukes of Friuli. Compare the data from Wikidata and the
>> > information by Reasonator based on the same item for one of them.
>> >
>> > https://tools.wmflabs.org/reasonator/?=2471519
>> > https://www.wikidata.org/wiki/Q2471519
>> >
>>
>>
>> Let's look at this example. Reasonator says of Grasulf II of Friulim, "He
>> died in 653". There is no source. Wikidata says he died in 653, and the
>> indicated source is the Italian Wikipedia.
>>
>> However, when you look at the (very brief) Italian Wikipedia article[4],
>> you will find that the year 653 is given with a question mark. The English
>> Wikipedia, in contrast, states, in its similarly brief article[5],
>>
>> "Nothing more is known about Grasulf and the date of his death is
>> uncertain."
>>
>> Do you now see the problem about nuance? Reasonator and Wikidata
>> confidently proclaim as uncontested fact something that in fact is rather
>> uncertain.
>>
>> The sole source cited by both the English and the Italian Wikipedia is the
>> Historia Langobardorum, available in Wikisource.[6] My Latin is a bit
>> rusty, but while the Historia mentions that Ago succeeded Grasulf upon the
>> latter's death, it says nothing specific about when that was. The
>> Historia's time indications are in general very vague, usually limited to
>> the phrase "Circa haec tempora", meaning "about this time". So it is in
>> this case.
>>
>> For reference, the Google Knowledge Graph states equally confidently that
>> Grasulf II of Friuli died in 651AD. This may be based on the English
>> Wikipedia's unsourced claim (in the template at the bottom of the English
>> Wikipedia article) that his reign ended c. 651, or on some other source
>> like Freebase.
>>
>> The other Wikipedias that have articles on Grasulf II provide the following
>> death dates
>>
>> Catalan: 651
>> Galician: 653
>> Lithuanian: 653
>> Polish: 651
>> Romanian: Unknown
>> Russian: 653
>> Ukrainian: 651
>>
>> As for published sources, I can offer Ersch's Allgemeine Encyclopädie
>> 

Re: [Wikimedia-l] Quality issues

2015-11-28 Thread Pete Forsyth
On Sat, Nov 28, 2015 at 5:23 AM, Andreas Kolbe  wrote:
>
> To the extent that Wikidata draws on Wikipedia, its CC0 license would
> appear to be a gross violation of Wikipedia's share-alike license
> requirement.
>

It's essential to also consider whether the factual information derived
from Wikipedia (or any other copyrighted source) is subject to copyright.
For instance, a biography might contain facts like "born in year" and "born
in place" and "elected to XYZ position". I don't think facts like those are
copyrightable in any jurisdiction. Perhaps there are copyrightable elements
from Wikipedia that are brought into Wikidata, but I don't know offhand
what they might be.

The generation of data always has a social context. Knowing where data come
> from is a good thing.


Knowing where data comes from is a good thing, yes; but "copyright holder"
and "intellectual source" are not identical concepts. If the purpose is to
preserve the integrity of a line of reasoning, copyright law is probably
not a very good tool for that purpose.

A related question was recently asked on the web site Quora; here's my
answer for why CC0 is generally preferable for data sets. (I may update it
with some of the points brought up here.)
https://www.quora.com/Should-open-data-be-publised-with-CC0-instead-of-CC-BY

-Pete
[[User:Peteforsyth]]
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-11-28 Thread Gnangarra
>
> While I happily agree that Sources are good, I will not ask people to start
> adding Sources at this point of time it will not improve quality
> signifcantly. It makes more sense once we are at a stage where multiple
> sources disagree on values for statements. Adding sources is signifcantly
> more meaningful and useful once we start curating data.


​the problems will that by the time Wikidata starts to curate data​ it'll
will have corrupted that data with its own data, and secondly past
experience with wiki's is that fixing data after its been entered is
actually harder and more time consuming to do, along with the fact that the
damage to reputation will have a lasting impact  and fixing that consumes
millions of dollars in Donner money.. As said earlier there are lesson in
the development of Wikipedia that should be heeded in an attempt to avoid
those same pitfalls


On 29 November 2015 at 08:37, Gerard Meijssen 
wrote:

> Hoi,
> It was from the Myanmar WIkipedia that a lot of data was imported to
> Wikidata. Data that did not exist elsewhere. I do not care really what
> "Freedom House" says. I do not know them, I do know that the data is
> relevant and useful It was even the subject on a blogpost..
>
> You may ignore data that is not from a source that you like. This
> indiscriminate POV is not a NPOV.
>
> As to Grasulf, you failed to get the point. It was NOT about the data
> itself but about the presentation. I worked on this item because a
> duplicate was created with even less data.
>
> While I happily agree that Sources are good, I will not ask people to start
> adding Sources at this point of time it will not improve quality
> signifcantly. It makes more sense once we are at a stage where multiple
> sources disagree on values for statements. Adding sources is signifcantly
> more meaningful and useful once we start curating data. Statistically most
> errors will be found where sources disagree.
>
> When people add conflicting data, it is indeed really relevant to add
> Sources. My practice for adding data is that I will only add data that
> fulfils some minimal criteria. Typically I am not interested in adding data
> that already exists. I will remove less precise for more precise data.
>
> The biggest issue with data is that we do not have enough of it and the
> second most relevant issue is that we need processes to compare sources
> with Wikidata and have a workflow to curate differences.
> Thanks,
>   GerardM
>
> On 28 November 2015 at 19:23, Andreas Kolbe  wrote:
>
> > Gerard,
> >
> >
> > On Fri, Nov 27, 2015, Gerard Meijssen  wrote:
> >
> > When you compare the quality of Wikipedias with what en.wp used to be you
> > > are comparing apples and oranges. The Myanmar Wikipedia is better
> > informed
> > > on Myanmar than en.wp etc.
> > >
> >
> >
> > Is it? The entire Burmese Wikipedia contains a mere 31,646 content pages
> at
> > the time of writing, covering (or trying to cover) all countries of the
> > world, and all aspects of human knowledge.[1]
> >
> > The English Wikipedia's WikiProject Myanmar, meanwhile, has 6,713 pages
> > within its purview.[2] I dare say that's more articles on Myanmar than
> the
> > Burmese Wikipedia contains. As an indication, the English Wikipedia's
> > article on Myanmar is more than twice as long as the one in the Burmese
> > Wikipedia.
> >
> > Moreover, according to Freedom House[3], the internet in Myanmar is not
> > free:
> >
> > "The government detained and charged internet users for online activities
> > [...] Government officials pressured social media users not to distribute
> > or share content that offends the military, or disturbs the functions of
> > government."
> >
> >
> >
> > > When you qualify a Wikipedia as fascist, it does not follow that the
> data
> > > is suspect. Certainly when data in a source that you so easily dismiss
> is
> > > typically the same, there is not much meaning in what you say from a
> > > Wikidata point of view.
> > >
> >
> >
> > Data are always generated within a social context, and data generated by
> > political extremists or people living under oppressive regimes are
> suspect
> > whenever they have political implications. (Looking at the descriptions
> of
> > Burmese politics, my feeling is the Burmese Wikipedia is not under
> > significant government control, but largely written by ex-pats. However,
> > the situation is quite different in some other Wikipedias serving
> countries
> > labouring under similar regimes.)
> >
> >
> >
> > > PS What does your librarian think when she knows
> >
> >
> >
> > It was a he, but I'll leave him to join in himself if he chooses to.
> >
> >
> > I happen to work on Dukes of Friuli. Compare the data from Wikidata and
> the
> > > information by Reasonator based on the same item for one of them.
> > >
> > > https://tools.wmflabs.org/reasonator/?=2471519
> > > https://www.wikidata.org/wiki/Q2471519
> > >
> >

Re: [Wikimedia-l] Quality issues

2015-11-28 Thread Andreas Kolbe
On Sun, Nov 29, 2015 at 12:37 AM, Gerard Meijssen  wrote:

> As to Grasulf, you failed to get the point. It was NOT about the data
> itself but about the presentation.
>


QED. :)
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-11-28 Thread Gerard Meijssen
Hoi,
It was from the Myanmar WIkipedia that a lot of data was imported to
Wikidata. Data that did not exist elsewhere. I do not care really what
"Freedom House" says. I do not know them, I do know that the data is
relevant and useful It was even the subject on a blogpost..

You may ignore data that is not from a source that you like. This
indiscriminate POV is not a NPOV.

As to Grasulf, you failed to get the point. It was NOT about the data
itself but about the presentation. I worked on this item because a
duplicate was created with even less data.

While I happily agree that Sources are good, I will not ask people to start
adding Sources at this point of time it will not improve quality
signifcantly. It makes more sense once we are at a stage where multiple
sources disagree on values for statements. Adding sources is signifcantly
more meaningful and useful once we start curating data. Statistically most
errors will be found where sources disagree.

When people add conflicting data, it is indeed really relevant to add
Sources. My practice for adding data is that I will only add data that
fulfils some minimal criteria. Typically I am not interested in adding data
that already exists. I will remove less precise for more precise data.

The biggest issue with data is that we do not have enough of it and the
second most relevant issue is that we need processes to compare sources
with Wikidata and have a workflow to curate differences.
Thanks,
  GerardM

On 28 November 2015 at 19:23, Andreas Kolbe  wrote:

> Gerard,
>
>
> On Fri, Nov 27, 2015, Gerard Meijssen  wrote:
>
> When you compare the quality of Wikipedias with what en.wp used to be you
> > are comparing apples and oranges. The Myanmar Wikipedia is better
> informed
> > on Myanmar than en.wp etc.
> >
>
>
> Is it? The entire Burmese Wikipedia contains a mere 31,646 content pages at
> the time of writing, covering (or trying to cover) all countries of the
> world, and all aspects of human knowledge.[1]
>
> The English Wikipedia's WikiProject Myanmar, meanwhile, has 6,713 pages
> within its purview.[2] I dare say that's more articles on Myanmar than the
> Burmese Wikipedia contains. As an indication, the English Wikipedia's
> article on Myanmar is more than twice as long as the one in the Burmese
> Wikipedia.
>
> Moreover, according to Freedom House[3], the internet in Myanmar is not
> free:
>
> "The government detained and charged internet users for online activities
> [...] Government officials pressured social media users not to distribute
> or share content that offends the military, or disturbs the functions of
> government."
>
>
>
> > When you qualify a Wikipedia as fascist, it does not follow that the data
> > is suspect. Certainly when data in a source that you so easily dismiss is
> > typically the same, there is not much meaning in what you say from a
> > Wikidata point of view.
> >
>
>
> Data are always generated within a social context, and data generated by
> political extremists or people living under oppressive regimes are suspect
> whenever they have political implications. (Looking at the descriptions of
> Burmese politics, my feeling is the Burmese Wikipedia is not under
> significant government control, but largely written by ex-pats. However,
> the situation is quite different in some other Wikipedias serving countries
> labouring under similar regimes.)
>
>
>
> > PS What does your librarian think when she knows
>
>
>
> It was a he, but I'll leave him to join in himself if he chooses to.
>
>
> I happen to work on Dukes of Friuli. Compare the data from Wikidata and the
> > information by Reasonator based on the same item for one of them.
> >
> > https://tools.wmflabs.org/reasonator/?=2471519
> > https://www.wikidata.org/wiki/Q2471519
> >
>
>
> Let's look at this example. Reasonator says of Grasulf II of Friulim, "He
> died in 653". There is no source. Wikidata says he died in 653, and the
> indicated source is the Italian Wikipedia.
>
> However, when you look at the (very brief) Italian Wikipedia article[4],
> you will find that the year 653 is given with a question mark. The English
> Wikipedia, in contrast, states, in its similarly brief article[5],
>
> "Nothing more is known about Grasulf and the date of his death is
> uncertain."
>
> Do you now see the problem about nuance? Reasonator and Wikidata
> confidently proclaim as uncontested fact something that in fact is rather
> uncertain.
>
> The sole source cited by both the English and the Italian Wikipedia is the
> Historia Langobardorum, available in Wikisource.[6] My Latin is a bit
> rusty, but while the Historia mentions that Ago succeeded Grasulf upon the
> latter's death, it says nothing specific about when that was. The
> Historia's time indications are in general very vague, usually limited to
> the phrase "Circa haec tempora", meaning "about this time". So it is in
> this case.
>
> For reference, the Google Knowledge Graph 

Re: [Wikimedia-l] Quality issues

2015-11-28 Thread Gergo Tisza
On Sat, Nov 28, 2015 at 5:23 AM, Andreas Kolbe  wrote:

> To the extent that Wikidata draws on Wikipedia, its CC0 license would
> appear to be a gross violation of Wikipedia's share-alike license
> requirement.
>

By the same logic, to the extent Wikipedia takes its facts from non-free
external source, its free license would be a copyright violation. Luckily
for us, that's not how copyright works. Statements of facts can not be
copyrighted; large-scale arrangements of facts (ie. a full database)
probably can, but CC does not prevent others from using them without
attribution, just distributing them (again, it's like the GPL/Affero
difference); there are sui generis database rights in some countries but
not in the USA where both Wikipedia and most proprietary
reusers/compatitors are located, so relying on neighbouring rights would
not help there but cause legal uncertainty for reusers (e.g. OSM which has
lots of legal trouble importing coordinates due to being EU-based).

The generation of data always has a social context. Knowing where data come
> from is a good thing.
>

You probably won't find any Wikipedian who disagrees; verifiability is one
of the fundaments of the project. But something being good and using
restrictive licensing to force others to do it are very different things.
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-11-27 Thread Lila Tretikov
Hoi Gerard,

What I hear in email from Andreas and Liam is not as much the propagation
of the error (which I am sure happens with some % of the cases), but the
fact that the original source is obscured and therefore it is hard to
identify and correct errors, biases, etc. Because if the source of error is
obscured, that error is that much harder to find and to correct. In fact,
we see this even on Wikipedia articles today (wrong dates of births sourced
from publications that don't do enough fact checking is something I came
across personally). It is a powerful and important principle on Wikipedia,
but with content re-use it gets lost. Public domain/CC0 in combination with
AI lands our content for slicing and dicing and re-arranging by others,
making it something entirely new, but also detached from our process of
validation and verification. I am curious to hear if people think it is a
problem. It definitely worries me.

We have been looking very closely at Wikidata and the possibilities it
offers. I am curious to understand more about your note on Resonator:

"As long as Wikidata does not
have the power of a Reasonator, the data is just that. It does not make
itself in information and consequently it is awful. When there is one thing
the Wikidata engineers do not do, it is considering the use of the data and
the workflows to improve the data and the quality."

Am I understanding you saying that until the data sees the light of day it
will not become of high quality?

Thanks,
Lila

On Fri, Nov 27, 2015 at 10:26 AM, Gerard Meijssen  wrote:

> Hoi,
> When a benefit is "Wikimedia specific" and thereby dismissed, you miss much
> of what is going on. Exactly because of this link most items are well
> defined as to what they are about. It is not perfect but it is good.
> Consequently Wikidata is able to link Wikipedia in any language to sources
> external to Wikipedia. This is a big improvement over linking external
> sources to a Wikipedia. The disambiguation of subjects is done at the
> Wikidata end.
>
> You make Wikidata to be a "default reference source". Given its current
> state, it is a bit much. Wikidata does not have the maturity to function as
> such. The best pointer to this fact is that 50% of all items has two or
> fewer statements.
>
> When you compare the quality of Wikipedias with what en.wp used to be you
> are comparing apples and oranges. The Myanmar Wikipedia is better informed
> on Myanmar than en.wp etc.
>
> When you qualify a Wikipedia as fascist, it does not follow that the data
> is suspect. Certainly when data in a source that you so easily dismiss is
> typically the same, there is not much meaning in what you say from a
> Wikidata point of view.
>
> I am thrilled that sources are so important to the Wikimedia movement and
> again, I am wondering what you hope to achieve by this pronouncement. Be
> realistic what is it that you want to achieve? Is quality important to you
> and, how do you define it and more importantly how do you want to achieve
> it. Have you seen the statistics on sources [1]? Then have a better look
> and you will find that real sources are mostly absent. Adding sources one
> statement at a time will not significantly improve quality because that is
> a numbers game and it is easier to achieve quality in a different way.
>
> When a librarian says that many sources copy each others data and that this
> is a problem, the bigger problem is missed. The bigger problem is not where
> they agree but where they disagree. Arguably they are the statements where
> quality is more likely an issue. Now ask your librarian what is likely to
> improve Wikidata more either find Sources for the statements that differ of
> find Sources where the statements agree. Wikidata is not authoritative but
> when our community starts researching such issues both Wikidata and other
> sources will improve rapidly their quality. This is not to say that in the
> end you want both Sources where sources agree and disagree.
>
> Then ask your librarian if there is a problem with missing data  We can
> import data from sources and consequently be more informative or we do not
> import more data and people have to magically combine information that
> exists in many sources to get a composite view. We could see Wikidata as a
> place where data is combined and compared with other sources, Do tell your
> librarian that the process mentioned above should be iterative and it will
> be easily understood that comparing with just one additional source will
> improve the focus on likely issues even more.
>
> PS What does your librarian think when she knows that the Dutch National
> Library is inclined to provide us with software so that books can be
> ordered at Dutch libraries from Wikidata data (and by inference from
> Wikipedias)?
>
> When some see Wikidata as a source of reference, they will increasingly be
> served a better product. At this moment it is not good at all.
>
> When 

Re: [Wikimedia-l] Quality issues

2015-11-27 Thread geni
On 27 November 2015 at 15:16, Andreas Kolbe  wrote:

>
>
> How does the presence of that information in Wikidata help if the Google
> user just gets the info in the Knowledge Graph without any indication that
> it comes from Wikidata? Because CC0 specifically waives the right to
> attribution that Wikipedia retains.[1][2] No re-user of Wikidata content is
> required to say where the data came from, and they typically don't.


The problem is that there aren't really any alturnatives to CC0 that do any
better (since wikidata isn't really copyrightable in conventional terms).
Open Data Commons Open Database License would be closest but only applies
in the EU and leads to messy arguments over what counts as a substantial
part
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-11-27 Thread Gerard Meijssen
Hoi,

Sources are important. When we do not have data at Wikidata and we add it
from anywhere, we have the basis to do some good. At this time we do not
really add source information. It is too cumbersome and as long as the
"primary sources tool", an "official" tool does not do it, why bother?

My point about sources is very much that when one source does not agree,
there is a likely quality issue. When five sources agree, there is nothing
that marks them as suspect and there is no reason why I would look for a
Source for that statement anytime soon. When you work on quality, you do
not care about sources that agree, you care about those that do not. When
multiple sources copied each others data, it means that they all provide
information. That is superior  to wave hands, not being informative and not
include missing data because of a lack of a Source

We have to ask ourselves, what is our aim. To share in the sum of all
knowledge and occasionally be wrong or keeping a lot of knowledge from the
people and pretend that it is all correct
Thanks,
  GerardM

PS we are wiki based.



On 27 November 2015 at 20:14, Lila Tretikov  wrote:

> Hoi Gerard,
>
> What I hear in email from Andreas and Liam is not as much the propagation
> of the error (which I am sure happens with some % of the cases), but the
> fact that the original source is obscured and therefore it is hard to
> identify and correct errors, biases, etc. Because if the source of error is
> obscured, that error is that much harder to find and to correct. In fact,
> we see this even on Wikipedia articles today (wrong dates of births sourced
> from publications that don't do enough fact checking is something I came
> across personally). It is a powerful and important principle on Wikipedia,
> but with content re-use it gets lost. Public domain/CC0 in combination with
> AI lands our content for slicing and dicing and re-arranging by others,
> making it something entirely new, but also detached from our process of
> validation and verification. I am curious to hear if people think it is a
> problem. It definitely worries me.
>
> We have been looking very closely at Wikidata and the possibilities it
> offers. I am curious to understand more about your note on Resonator:
>
> "As long as Wikidata does not
> have the power of a Reasonator, the data is just that. It does not make
> itself in information and consequently it is awful. When there is one thing
> the Wikidata engineers do not do, it is considering the use of the data and
> the workflows to improve the data and the quality."
>
> Am I understanding you saying that until the data sees the light of day it
> will not become of high quality?
>
> Thanks,
> Lila
>
> On Fri, Nov 27, 2015 at 10:26 AM, Gerard Meijssen <
> gerard.meijs...@gmail.com
> > wrote:
>
> > Hoi,
> > When a benefit is "Wikimedia specific" and thereby dismissed, you miss
> much
> > of what is going on. Exactly because of this link most items are well
> > defined as to what they are about. It is not perfect but it is good.
> > Consequently Wikidata is able to link Wikipedia in any language to
> sources
> > external to Wikipedia. This is a big improvement over linking external
> > sources to a Wikipedia. The disambiguation of subjects is done at the
> > Wikidata end.
> >
> > You make Wikidata to be a "default reference source". Given its current
> > state, it is a bit much. Wikidata does not have the maturity to function
> as
> > such. The best pointer to this fact is that 50% of all items has two or
> > fewer statements.
> >
> > When you compare the quality of Wikipedias with what en.wp used to be you
> > are comparing apples and oranges. The Myanmar Wikipedia is better
> informed
> > on Myanmar than en.wp etc.
> >
> > When you qualify a Wikipedia as fascist, it does not follow that the data
> > is suspect. Certainly when data in a source that you so easily dismiss is
> > typically the same, there is not much meaning in what you say from a
> > Wikidata point of view.
> >
> > I am thrilled that sources are so important to the Wikimedia movement and
> > again, I am wondering what you hope to achieve by this pronouncement. Be
> > realistic what is it that you want to achieve? Is quality important to
> you
> > and, how do you define it and more importantly how do you want to achieve
> > it. Have you seen the statistics on sources [1]? Then have a better look
> > and you will find that real sources are mostly absent. Adding sources one
> > statement at a time will not significantly improve quality because that
> is
> > a numbers game and it is easier to achieve quality in a different way.
> >
> > When a librarian says that many sources copy each others data and that
> this
> > is a problem, the bigger problem is missed. The bigger problem is not
> where
> > they agree but where they disagree. Arguably they are the statements
> where
> > quality is more likely an issue. Now ask your librarian what is likely to
> > improve 

Re: [Wikimedia-l] Quality issues

2015-11-27 Thread Gerard Meijssen
Hoi,
When a benefit is "Wikimedia specific" and thereby dismissed, you miss much
of what is going on. Exactly because of this link most items are well
defined as to what they are about. It is not perfect but it is good.
Consequently Wikidata is able to link Wikipedia in any language to sources
external to Wikipedia. This is a big improvement over linking external
sources to a Wikipedia. The disambiguation of subjects is done at the
Wikidata end.

You make Wikidata to be a "default reference source". Given its current
state, it is a bit much. Wikidata does not have the maturity to function as
such. The best pointer to this fact is that 50% of all items has two or
fewer statements.

When you compare the quality of Wikipedias with what en.wp used to be you
are comparing apples and oranges. The Myanmar Wikipedia is better informed
on Myanmar than en.wp etc.

When you qualify a Wikipedia as fascist, it does not follow that the data
is suspect. Certainly when data in a source that you so easily dismiss is
typically the same, there is not much meaning in what you say from a
Wikidata point of view.

I am thrilled that sources are so important to the Wikimedia movement and
again, I am wondering what you hope to achieve by this pronouncement. Be
realistic what is it that you want to achieve? Is quality important to you
and, how do you define it and more importantly how do you want to achieve
it. Have you seen the statistics on sources [1]? Then have a better look
and you will find that real sources are mostly absent. Adding sources one
statement at a time will not significantly improve quality because that is
a numbers game and it is easier to achieve quality in a different way.

When a librarian says that many sources copy each others data and that this
is a problem, the bigger problem is missed. The bigger problem is not where
they agree but where they disagree. Arguably they are the statements where
quality is more likely an issue. Now ask your librarian what is likely to
improve Wikidata more either find Sources for the statements that differ of
find Sources where the statements agree. Wikidata is not authoritative but
when our community starts researching such issues both Wikidata and other
sources will improve rapidly their quality. This is not to say that in the
end you want both Sources where sources agree and disagree.

Then ask your librarian if there is a problem with missing data  We can
import data from sources and consequently be more informative or we do not
import more data and people have to magically combine information that
exists in many sources to get a composite view. We could see Wikidata as a
place where data is combined and compared with other sources, Do tell your
librarian that the process mentioned above should be iterative and it will
be easily understood that comparing with just one additional source will
improve the focus on likely issues even more.

PS What does your librarian think when she knows that the Dutch National
Library is inclined to provide us with software so that books can be
ordered at Dutch libraries from Wikidata data (and by inference from
Wikipedias)?

When some see Wikidata as a source of reference, they will increasingly be
served a better product. At this moment it is not good at all.

When German Wikimedians have concerns about quality.WONDERFUL but what have
they done to improve things? Do they apply Wikipedia standards and how does
that help?

You wonder why have "bad" data in the first place... Our data IS bad and
there is not enough of it for it to be really useful. We can easily add
more data and have a more useful result We can easily compare sources and
ask people to concentrate on differences. However you can not tell me to
add Sources to the data that I add. I will tell you to do it yourself. I am
happy to improve on quality but on my terms, not yours.

You mention the propagation of errors.. How would that work. You indicate
that there are not enough people to fix all the issues. With bots like
Kian, we have probability in adding data. We have people add data where the
software is not certain.  You doubt technology but you do not know where we
are, what is already done.

In short my feeling is that you do not know what you are talking about.
There is real scholarship in the approach that I described, My take is in
applying set theory. Kian is AI. For all I care yours is FUD.

Your notion of accountability is one of a consumer, it is not the
accountability needed for a project that is immature and is not at all at a
stage where you should imply that it is good enough and that quality is
assured. There are domains in Wikidata that I will not touch because in my
opinion it is wrong in its principles. At the same time I know that it can
be fixed in time and leave it at that,

I disagree with Heather Ford and Mark Graham. As long as Wikidata does not
have the power of a Reasonator, the data is just that. It does not make
itself in information and 

Re: [Wikimedia-l] Quality issues

2015-11-27 Thread Gerard Meijssen
Hoi,

I happen to work on Dukes of Friuli. Compare the data from Wikidata and the
information by Reasonator based on the same item for one of them.

https://tools.wmflabs.org/reasonator/?=2471519
https://www.wikidata.org/wiki/Q2471519

Wikidata is not informative, you have to work hard to get the information
that Reasonator provides already for over a year. All kinds of additional
services can easily be added like the QR code and the family tree. The
Reasonator info can be easily seen in any language, just add the labels.
Thanks,
  GerardM

On 27 November 2015 at 20:14, Lila Tretikov  wrote:

> Hoi Gerard,
>
> What I hear in email from Andreas and Liam is not as much the propagation
> of the error (which I am sure happens with some % of the cases), but the
> fact that the original source is obscured and therefore it is hard to
> identify and correct errors, biases, etc. Because if the source of error is
> obscured, that error is that much harder to find and to correct. In fact,
> we see this even on Wikipedia articles today (wrong dates of births sourced
> from publications that don't do enough fact checking is something I came
> across personally). It is a powerful and important principle on Wikipedia,
> but with content re-use it gets lost. Public domain/CC0 in combination with
> AI lands our content for slicing and dicing and re-arranging by others,
> making it something entirely new, but also detached from our process of
> validation and verification. I am curious to hear if people think it is a
> problem. It definitely worries me.
>
> We have been looking very closely at Wikidata and the possibilities it
> offers. I am curious to understand more about your note on Resonator:
>
> "As long as Wikidata does not
> have the power of a Reasonator, the data is just that. It does not make
> itself in information and consequently it is awful. When there is one thing
> the Wikidata engineers do not do, it is considering the use of the data and
> the workflows to improve the data and the quality."
>
> Am I understanding you saying that until the data sees the light of day it
> will not become of high quality?
>
> Thanks,
> Lila
>
> On Fri, Nov 27, 2015 at 10:26 AM, Gerard Meijssen <
> gerard.meijs...@gmail.com
> > wrote:
>
> > Hoi,
> > When a benefit is "Wikimedia specific" and thereby dismissed, you miss
> much
> > of what is going on. Exactly because of this link most items are well
> > defined as to what they are about. It is not perfect but it is good.
> > Consequently Wikidata is able to link Wikipedia in any language to
> sources
> > external to Wikipedia. This is a big improvement over linking external
> > sources to a Wikipedia. The disambiguation of subjects is done at the
> > Wikidata end.
> >
> > You make Wikidata to be a "default reference source". Given its current
> > state, it is a bit much. Wikidata does not have the maturity to function
> as
> > such. The best pointer to this fact is that 50% of all items has two or
> > fewer statements.
> >
> > When you compare the quality of Wikipedias with what en.wp used to be you
> > are comparing apples and oranges. The Myanmar Wikipedia is better
> informed
> > on Myanmar than en.wp etc.
> >
> > When you qualify a Wikipedia as fascist, it does not follow that the data
> > is suspect. Certainly when data in a source that you so easily dismiss is
> > typically the same, there is not much meaning in what you say from a
> > Wikidata point of view.
> >
> > I am thrilled that sources are so important to the Wikimedia movement and
> > again, I am wondering what you hope to achieve by this pronouncement. Be
> > realistic what is it that you want to achieve? Is quality important to
> you
> > and, how do you define it and more importantly how do you want to achieve
> > it. Have you seen the statistics on sources [1]? Then have a better look
> > and you will find that real sources are mostly absent. Adding sources one
> > statement at a time will not significantly improve quality because that
> is
> > a numbers game and it is easier to achieve quality in a different way.
> >
> > When a librarian says that many sources copy each others data and that
> this
> > is a problem, the bigger problem is missed. The bigger problem is not
> where
> > they agree but where they disagree. Arguably they are the statements
> where
> > quality is more likely an issue. Now ask your librarian what is likely to
> > improve Wikidata more either find Sources for the statements that differ
> of
> > find Sources where the statements agree. Wikidata is not authoritative
> but
> > when our community starts researching such issues both Wikidata and other
> > sources will improve rapidly their quality. This is not to say that in
> the
> > end you want both Sources where sources agree and disagree.
> >
> > Then ask your librarian if there is a problem with missing data  We can
> > import data from sources and consequently be more informative or we do
> not
> > import 

Re: [Wikimedia-l] Quality issues

2015-11-27 Thread geni
On 27 November 2015 at 15:27, Andreas Kolbe  wrote:

> On Fri, Nov 27, 2015 at 1:47 PM, Gnangarra  wrote:
>
>
> Would it not make more sense to import (and verify!) the reliable source
> cited in the relevant Wikipedia version, along with the statement?
>
>
You hit issues with non machine readable data, paywalls and deadtree walls.
And even then it varies by field (for example in chemistry if you can get
around those problems you wouldn't bother with wikidata and instead go
straight for the Beilstein clone).

-- 
geni
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-11-27 Thread Andreas Kolbe
Gerard,

On Tue, Nov 24, 2015 at 7:15 AM, Gerard Meijssen 
wrote:

> Hoi,
> To start of, results from the past are no indications of results in the
> future. It is the disclaimer insurance companies have to state in all their
> adverts in the Netherlands. When you continue and make it a "theological"
> issue, you lose me because I am not of this faith, far from it. Wikidata is
> its own project and it is utterly dissimilar from Wikipedia.To start of
> Wikidata has been a certified success from the start. The improvement it
> brought by bringing all interwiki links together is enormous.That alone
> should be a pointer that Wikipedia think is not realistic.
>


These benefits are internal to Wikimedia and a completely separate issue
from third-party re-use of Wikidata content as a default reference source,
which is the issue of concern here.


To continue, people have been importing data into Wikidata from the start.
> They are the statements you know and, it was possible  to import them from
> Wikipedia because of these interwiki links. So when you call for sources,
> it is fairly save to assume that those imports are supported by the quality
> of the statements of the Wikipedias



The quality of three-quarters of the 280+ Wikipedia language versions is
about at the level the English Wikipedia had reached in 2002.

Even some of the larger Wikipedias have significant problems. The Kazakh
Wikipedia for example is controlled by functionaries of an oppressive
regime[1], and the Croatian one is reportedly[2] controlled by fascists
rewriting history (unless things have improved markedly in the Croatian
Wikipedia since that report, which would be news to me). The Azerbaijani
Wikipedia seems to have problems as well.

The Wikimedia movement has always had an important principle: that all
content should be traceable to a "reliable source". Throughout the first
decade of this movement and beyond, Wikimedia content has never been
considered a reliable source. For example, you can't use a Wikipedia
article as a reference in another Wikipedia article.

Another important principle has been the disclaimer: pointing out to people
that the data is anonymously crowdsourced, and that there is no guarantee
of reliability or fitness for use.

Both of these principles are now being jettisoned.

Wikipedia content is considered a reliable source in Wikidata, and Wikidata
content is used as a reliable source by Google, where it appears without
any indication of its provenance. This is a reflection of the fact that
Wikidata, unlike Wikipedia, comes with a CC0 licence. That decision was, I
understand, made by Denny, who is both a Google employee and a WMF board
member.

The benefit to Google is very clear: this free, unattributed content adds
value to Google's search engine result pages, and improves Google's revenue
(currently running at about $10 million an hour, much of it from ads).

But what is the benefit to the end user? The end user gets information of
undisclosed provenance, which is presented to them as authoritative, even
though it may be compromised. In what sense is that an improvement for
society?

To me, the ongoing information revolution is like the 19th century
industrial revolution done over. It created whole new categories of abuse,
which it took a century to (partly) eliminate. But first, capitalists had a
field day, and the people who were screwed were the common folk. Could we
not try to learn from history?



> and if anything, that is also where
> they typically fail because many assumptions at Wikipedia are plain wrong
> at Wikidata. For instance a listed building is not the organisation the
> building is known for. At Wikidata they each need their own item and
> associated statements.
>
> Wikidata is already a success for other reasons. VIAF no longer links to
> Wikipedia but to Wikidata. The biggest benefit of this move is for people
> who are not interested in English.  Because of this change VIAF links
> through Wikidata to all Wikipedias not only en.wp. Consequently people may
> find through VIAF Wikipedia articles in their own language through their
> library systems.
>


At the recent Wikiconference USA, a Wikimedia veteran and professional
librarian expressed the view to me that

* circular referencing between VIAF and Wikidata will create a humongous
muddle that nobody will be able to sort out again afterwards, because –
unlike wiki mishaps in other topic areas – here it's the most authoritative
sources that are being corrupted by circular referencing;

* third parties are using Wikimedia content as a *reference standard *when
that was never the intention (see above).

I've seen German Wikimedians express concerns that quality assurance
standards have dropped alarmingly since the project began, with bot users
mass-importing unreliable data.



> So do not forget about Wikipedia and the lessons learned. These lessons are
> important to Wikipedia. However, they do not 

Re: [Wikimedia-l] Quality issues

2015-11-27 Thread Liam Wyatt
On 27 November 2015 at 12:08, Andreas Kolbe  wrote:

> The Wikimedia movement has always had an important principle: that all
> content should be traceable to a "reliable source". Throughout the first
> decade of this movement and beyond, Wikimedia content has never been
> considered a reliable source. For example, you can't use a Wikipedia
> article as a reference in another Wikipedia article.
>
> Another important principle has been the disclaimer: pointing out to people
> that the data is anonymously crowdsourced, and that there is no guarantee
> of reliability or fitness for use.
>
> Both of these principles are now being jettisoned.
>
> Wikipedia content is considered a reliable source in Wikidata...
>
 

I agree that "reliable source" referencing and "crowdsourced content" are
indeed principles of our movement. However, I disagree that Wikidata is
"jettisoning" them. In fact, quite the contrary!

The purpose of the statement "imported from --> English Wikipedia" in the
"reference" field of a Wikidata item's statement is PRECISELY to indicate
to the user that this information has not been INDEPENDENTLY verified to a
reliable source and that Wikipedia is NOT considered a reliable source.
Furthermore, it provides a PROVENANCE of that information to help stop
people from circular referencing. That is - clearly stating that the
specific fact in Wikidata has come from Wikipedia helps to avoid the
structured-data equivalent of "citogenisis": https://xkcd.com/978/ If/When
a person can provide a reliable reference for that same fact, they are
encouraged to add an actual reference. Note, the wikidata statement used
for facts coming in from Wikipedia use the property "imported from". This
is deliberately different from the property "reference URL" which is what
you would use when adding an actual reference to a third-party reliable
online source.

Furthermore, the fact that many statements in Wikidata are not given a
reference (yet) is not necessarily a "problem". For example - this
https://www.wikidata.org/wiki/Q21481859 is a Wikidata item for a scientific
publication with 2891 co-authors!! This is an extreme example, but it
demonstrates my point... None of those 2891 statements has a specific
reference listed for it, because all of them are self-evidently referenced
to the scientific publication itself. The same is true of the other
properties applied to this item (volume, publication date, title, page
number...). All of these could be "referenced" to the very first property
in the Wikidata item - the DOI of the scientific article:
http://www.sciencedirect.com/science/article/pii/S0370269312008581 This
item is not "less reliable" because it doesn't have the same footnote
repeated almost three thousand times, but if you merely look at statistics
of "unreferenced wikidata statements" it would APPEAR that it is very
poorly cited.
So, I think we need a more nuanced view of what "proper referencing" means
in the context of Wikidata.

-Liam

wittylama.com
Peace, love & metadata
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-11-27 Thread Gnangarra
Disclaimer first - I'm not exactly conversant in the intricacies of
WikiData, if I was to take the information on 14th Dalai Lama

https://en.wikipedia.org/wiki/14th_Dalai_Lama

it links to Wikidata at

https://www.wikidata.org/wiki/Q17293

the en article has 2 references that list his date of birth, the WikiData
item has two references for the same piece of information
WikiData source;

   1. just says imported from Russian language Wikipedia, which links to
   Wikidata page on the Russian Wikipedia not to the source url nor does it
   link to permanent url so as a source its meaningless, while may just be the
   result of who did the data import linking to Russian language Wikipedia is
   kind of obscure for a source, I can understand a  tibetan, mandarin, or
   cantonese language source as they would be associated with the region
   2. Integrated Authority File  links to
   https://www.wikidata.org/wiki/Q36578 on WikiData it doesnt provide a url
   or any other information which enables someone to verify what is said

Despite two reference the data itself appears to be immediately untraceable
to a reliable source.

The circular reference of Wikidata to a Wikipedia of any language is ok but
the link should be traceable to a specific article version which would then
make it possible to verify the data even if the current data on Wikipedia
is changed after its imported, that in itself shouldnt be difficult to
engineer.  If that was the case then to me a Wikipedia reference for all
data is a reasonable minimum standard to start at, finding a way to
replicate the same data 2891 times in Liams scenario shouldnt be much of a
challenge if WP can replicate templates in 100,000 articles, as a standard
we have GLAM making donations of images in quantities of 10,000's  I htnk
someone has already solved this in a meaningful way


On 27 November 2015 at 20:51, Liam Wyatt  wrote:

> On 27 November 2015 at 12:08, Andreas Kolbe  wrote:
>
> > The Wikimedia movement has always had an important principle: that all
> > content should be traceable to a "reliable source". Throughout the first
> > decade of this movement and beyond, Wikimedia content has never been
> > considered a reliable source. For example, you can't use a Wikipedia
> > article as a reference in another Wikipedia article.
> >
> > Another important principle has been the disclaimer: pointing out to
> people
> > that the data is anonymously crowdsourced, and that there is no guarantee
> > of reliability or fitness for use.
> >
> > Both of these principles are now being jettisoned.
> >
> > Wikipedia content is considered a reliable source in Wikidata...
> >
>  
>
> I agree that "reliable source" referencing and "crowdsourced content" are
> indeed principles of our movement. However, I disagree that Wikidata is
> "jettisoning" them. In fact, quite the contrary!
>
> The purpose of the statement "imported from --> English Wikipedia" in the
> "reference" field of a Wikidata item's statement is PRECISELY to indicate
> to the user that this information has not been INDEPENDENTLY verified to a
> reliable source and that Wikipedia is NOT considered a reliable source.
> Furthermore, it provides a PROVENANCE of that information to help stop
> people from circular referencing. That is - clearly stating that the
> specific fact in Wikidata has come from Wikipedia helps to avoid the
> structured-data equivalent of "citogenisis": https://xkcd.com/978/ If/When
> a person can provide a reliable reference for that same fact, they are
> encouraged to add an actual reference. Note, the wikidata statement used
> for facts coming in from Wikipedia use the property "imported from". This
> is deliberately different from the property "reference URL" which is what
> you would use when adding an actual reference to a third-party reliable
> online source.
>
> Furthermore, the fact that many statements in Wikidata are not given a
> reference (yet) is not necessarily a "problem". For example - this
> https://www.wikidata.org/wiki/Q21481859 is a Wikidata item for a
> scientific
> publication with 2891 co-authors!! This is an extreme example, but it
> demonstrates my point... None of those 2891 statements has a specific
> reference listed for it, because all of them are self-evidently referenced
> to the scientific publication itself. The same is true of the other
> properties applied to this item (volume, publication date, title, page
> number...). All of these could be "referenced" to the very first property
> in the Wikidata item - the DOI of the scientific article:
> http://www.sciencedirect.com/science/article/pii/S0370269312008581 This
> item is not "less reliable" because it doesn't have the same footnote
> repeated almost three thousand times, but if you merely look at statistics
> of "unreferenced wikidata statements" it would APPEAR that it is very
> poorly cited.
> So, I think we need a more nuanced view of what "proper referencing" 

Re: [Wikimedia-l] Quality issues

2015-11-27 Thread Jane Darnell
Yes I agree. I think most of the discussion here has to do with people
conflating the concept of text as in Wikipedia sentences and the concept of
data as in Wikidata statements. When a user adds an image from Commons on
Wikipedia, the source of the image is generally not added to Wikipedia, and
I have never heard anyone complain about that except for image donors who
wished that their images *were* attributed when used on Wikipedia. The same
is true when Wikipedians add Wikidata statements from an item on Wikipedia.
A date statement in Wikidata for a painting may be indirectly referenced in
the item in another statement (the collection statement, or a "described at
url" statement). This is also true of the way the date field in the Commons
artwork template is used.

It is just as undesirable to clutter Wikipedia with a reference for such a
date from Wikidata as it is to reference the source of the file image when
including images, and so there will generally not be a reference for the
pulled date in the Wikidata infobox, because the user can always look up
the item for more information. Most paintings included on Wikipedia, with
or without infoboxes, do not reference the date field specifically - either
to the Commons image or to the article. When they do, this is often in
cases where the date has been disputed. Our goal is not to reference
everything, but to reference the things that need referencing.

On Fri, Nov 27, 2015 at 1:51 PM, Liam Wyatt  wrote:

> On 27 November 2015 at 12:08, Andreas Kolbe  wrote:
>
> > The Wikimedia movement has always had an important principle: that all
> > content should be traceable to a "reliable source". Throughout the first
> > decade of this movement and beyond, Wikimedia content has never been
> > considered a reliable source. For example, you can't use a Wikipedia
> > article as a reference in another Wikipedia article.
> >
> > Another important principle has been the disclaimer: pointing out to
> people
> > that the data is anonymously crowdsourced, and that there is no guarantee
> > of reliability or fitness for use.
> >
> > Both of these principles are now being jettisoned.
> >
> > Wikipedia content is considered a reliable source in Wikidata...
> >
>  
>
> I agree that "reliable source" referencing and "crowdsourced content" are
> indeed principles of our movement. However, I disagree that Wikidata is
> "jettisoning" them. In fact, quite the contrary!
>
> The purpose of the statement "imported from --> English Wikipedia" in the
> "reference" field of a Wikidata item's statement is PRECISELY to indicate
> to the user that this information has not been INDEPENDENTLY verified to a
> reliable source and that Wikipedia is NOT considered a reliable source.
> Furthermore, it provides a PROVENANCE of that information to help stop
> people from circular referencing. That is - clearly stating that the
> specific fact in Wikidata has come from Wikipedia helps to avoid the
> structured-data equivalent of "citogenisis": https://xkcd.com/978/ If/When
> a person can provide a reliable reference for that same fact, they are
> encouraged to add an actual reference. Note, the wikidata statement used
> for facts coming in from Wikipedia use the property "imported from". This
> is deliberately different from the property "reference URL" which is what
> you would use when adding an actual reference to a third-party reliable
> online source.
>
> Furthermore, the fact that many statements in Wikidata are not given a
> reference (yet) is not necessarily a "problem". For example - this
> https://www.wikidata.org/wiki/Q21481859 is a Wikidata item for a
> scientific
> publication with 2891 co-authors!! This is an extreme example, but it
> demonstrates my point... None of those 2891 statements has a specific
> reference listed for it, because all of them are self-evidently referenced
> to the scientific publication itself. The same is true of the other
> properties applied to this item (volume, publication date, title, page
> number...). All of these could be "referenced" to the very first property
> in the Wikidata item - the DOI of the scientific article:
> http://www.sciencedirect.com/science/article/pii/S0370269312008581 This
> item is not "less reliable" because it doesn't have the same footnote
> repeated almost three thousand times, but if you merely look at statistics
> of "unreferenced wikidata statements" it would APPEAR that it is very
> poorly cited.
> So, I think we need a more nuanced view of what "proper referencing" means
> in the context of Wikidata.
>
> -Liam
>
> wittylama.com
> Peace, love & metadata
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 

Re: [Wikimedia-l] Quality issues

2015-11-27 Thread Gergo Tisza
On Fri, Nov 27, 2015 at 11:14 AM, Lila Tretikov  wrote:

> What I hear in email from Andreas and Liam is not as much the propagation
> of the error (which I am sure happens with some % of the cases), but the
> fact that the original source is obscured and therefore it is hard to
> identify and correct errors, biases, etc. Because if the source of error is
> obscured, that error is that much harder to find and to correct. In fact,
> we see this even on Wikipedia articles today (wrong dates of births sourced
> from publications that don't do enough fact checking is something I came
> across personally). It is a powerful and important principle on Wikipedia,
> but with content re-use it gets lost. Public domain/CC0 in combination with
> AI lands our content for slicing and dicing and re-arranging by others,
> making it something entirely new, but also detached from our process of
> validation and verification. I am curious to hear if people think it is a
> problem. It definitely worries me.
>

​This conversation seems to have morphed into trying to solve some problems
that we are speculating Google might have (no one here actually *knows* how
the Knowledge Graph works, of course; maybe it's sensitive to manipulation
of Wikidata claims, maybe not). That seems like an entirely fruitless line
of discourse to me; if the problem exists, it is Google's problem to solve
(since they are the ones in a position to tell if it's a real problem or
not; not to mention they have two or three magnitudes more resources to
throw at it than the Wikimedia movement would). Trying to make our content
less free for fear that someone might misuse it is a shamefully wrong frame
of mind for and organization that's supposed to be a leader of the open
content movement, IMO.
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-11-27 Thread Wil Sinclair
Gergo, do you mind if people continue discussing this? I'm finding it
very interesting and fruitful. I hadn't thought through these issues
before, and there are likely to be others on this list who haven't
either.

Best!
,Wil

On Fri, Nov 27, 2015 at 5:17 PM, Gergo Tisza  wrote:
> On Fri, Nov 27, 2015 at 11:14 AM, Lila Tretikov  wrote:
>
>> What I hear in email from Andreas and Liam is not as much the propagation
>> of the error (which I am sure happens with some % of the cases), but the
>> fact that the original source is obscured and therefore it is hard to
>> identify and correct errors, biases, etc. Because if the source of error is
>> obscured, that error is that much harder to find and to correct. In fact,
>> we see this even on Wikipedia articles today (wrong dates of births sourced
>> from publications that don't do enough fact checking is something I came
>> across personally). It is a powerful and important principle on Wikipedia,
>> but with content re-use it gets lost. Public domain/CC0 in combination with
>> AI lands our content for slicing and dicing and re-arranging by others,
>> making it something entirely new, but also detached from our process of
>> validation and verification. I am curious to hear if people think it is a
>> problem. It definitely worries me.
>>
>
> This conversation seems to have morphed into trying to solve some problems
> that we are speculating Google might have (no one here actually *knows* how
> the Knowledge Graph works, of course; maybe it's sensitive to manipulation
> of Wikidata claims, maybe not). That seems like an entirely fruitless line
> of discourse to me; if the problem exists, it is Google's problem to solve
> (since they are the ones in a position to tell if it's a real problem or
> not; not to mention they have two or three magnitudes more resources to
> throw at it than the Wikimedia movement would). Trying to make our content
> less free for fear that someone might misuse it is a shamefully wrong frame
> of mind for and organization that's supposed to be a leader of the open
> content movement, IMO.
> ___
> Wikimedia-l mailing list, guidelines at: 
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
> 

___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-11-27 Thread Gerard Meijssen
Hoi,
There is no problem considering these points. You go in a direction that
has little to do with what we are and where we stand. Wikidata is a wiki.
That implies that it does not have to be perfect. It implies that
approaches are taken that arguably wacky and we will see in time how it
pans out. For instance, "Frankfurt" "instance of" "big city", big city is a
city that is over a certain size. The size is debatable and consequently it
is really poor as a concept. from a Wikidata point of view. It can be
inferred and therefore even redundant. Does it matter? Not really because
in time "we" will see the light.

Our data is incomplete. Arguably importing data enables us to share more of
the sum of all knowledge to our users. A given percentage of all data is
incorrect. However having no data is arguably 100% incorrect and 100% not
in line with our goal of serving the sum of all knowledge. Quality is
important so processes and workflows are exceedingly important to have. We
lack in that department so far. But comparing external data sources like
VIAF or DNB in an iterative way is obvious when you want to identify those
items and statements that are suspect. The data in Wikidata makes it easy
because we have spend considerable effort linking external sources first to
Wikipedia and now to Wikidata. It is easy to mark items with issues using
qualifiers on the external source ID and have a basis for such workflows
and quality markers.

When you make a point of external sources trusting Wikidata, these external
sources may be consumers or they can be partners. When they are partners,
we can provide RSS feeds informing of issues that have been found and they
can do their curation on their data. When they are consumers we can still
provide such an RSS but we do not know what they do with it, it is their
problem more than it is ours.

As I say so often, Wikidata is immature. It is silly to blindly trust
Wikidata. It is largely based on Wikipedia and it has constructs of its own
that we do not need/want in Wikidata. Big cities is one example. We have
items because of interwiki links that are a mix of all kinds eg a listed
building and an organisation. This is conceptually wrong at Wikidata and it
needs to be split. This is where many Wikipedians become uncomfortable but
hey, Wikidata does not tell them to rewrite their article.

So yes you can continue with this point but it has little impact on
Wikidata and when you think it should do consider what impact it has on
Wikidata as a wiki. It is NOT an academic resource or a reference source
perse. It is a wiki, it is allowed to be wrong particularly when it has
proper workflows to improve quality.

If anything THIS is where we can do with a lot more talk and preferably
action. This is where Wikidata is obviously lacking and when we do have
proper workflows in place, we do NOT need the dump that is the "primary
sources" as this is the antithesis of a wiki and it prevents us from
sharing available knowledge.
Thanks,
  GerardM

On 28 November 2015 at 07:05, Wil Sinclair  wrote:

> Gergo, do you mind if people continue discussing this? I'm finding it
> very interesting and fruitful. I hadn't thought through these issues
> before, and there are likely to be others on this list who haven't
> either.
>
> Best!
> ,Wil
>
> On Fri, Nov 27, 2015 at 5:17 PM, Gergo Tisza  wrote:
> > On Fri, Nov 27, 2015 at 11:14 AM, Lila Tretikov 
> wrote:
> >
> >> What I hear in email from Andreas and Liam is not as much the
> propagation
> >> of the error (which I am sure happens with some % of the cases), but the
> >> fact that the original source is obscured and therefore it is hard to
> >> identify and correct errors, biases, etc. Because if the source of
> error is
> >> obscured, that error is that much harder to find and to correct. In
> fact,
> >> we see this even on Wikipedia articles today (wrong dates of births
> sourced
> >> from publications that don't do enough fact checking is something I came
> >> across personally). It is a powerful and important principle on
> Wikipedia,
> >> but with content re-use it gets lost. Public domain/CC0 in combination
> with
> >> AI lands our content for slicing and dicing and re-arranging by others,
> >> making it something entirely new, but also detached from our process of
> >> validation and verification. I am curious to hear if people think it is
> a
> >> problem. It definitely worries me.
> >>
> >
> > This conversation seems to have morphed into trying to solve some
> problems
> > that we are speculating Google might have (no one here actually *knows*
> how
> > the Knowledge Graph works, of course; maybe it's sensitive to
> manipulation
> > of Wikidata claims, maybe not). That seems like an entirely fruitless
> line
> > of discourse to me; if the problem exists, it is Google's problem to
> solve
> > (since they are the ones in a position to tell if it's a real problem or
> > not; not to 

Re: [Wikimedia-l] Quality issues

2015-11-27 Thread Andreas Kolbe
Liam,

I am interested in anything demonstrating that the things I am concerned
about are not a problem.

Further Comments interspersed below.

On Fri, Nov 27, 2015 at 12:51 PM, Liam Wyatt  wrote:

> On 27 November 2015 at 12:08, Andreas Kolbe  wrote:
>
> > The Wikimedia movement has always had an important principle: that all
> > content should be traceable to a "reliable source". Throughout the first
> > decade of this movement and beyond, Wikimedia content has never been
> > considered a reliable source. For example, you can't use a Wikipedia
> > article as a reference in another Wikipedia article.
> >
> > Another important principle has been the disclaimer: pointing out to
> people
> > that the data is anonymously crowdsourced, and that there is no guarantee
> > of reliability or fitness for use.
> >
> > Both of these principles are now being jettisoned.
> >
> > Wikipedia content is considered a reliable source in Wikidata...
> >
>  
>
> I agree that "reliable source" referencing and "crowdsourced content" are
> indeed principles of our movement. However, I disagree that Wikidata is
> "jettisoning" them. In fact, quite the contrary!
>
> The purpose of the statement "imported from --> English Wikipedia" in the
> "reference" field of a Wikidata item's statement is PRECISELY to indicate
> to the user that this information has not been INDEPENDENTLY verified to a
> reliable source and that Wikipedia is NOT considered a reliable source.
> Furthermore, it provides a PROVENANCE of that information to help stop
> people from circular referencing. That is - clearly stating that the
> specific fact in Wikidata has come from Wikipedia helps to avoid the
> structured-data equivalent of "citogenisis": https://xkcd.com/978/ If/When
> a person can provide a reliable reference for that same fact, they are
> encouraged to add an actual reference. Note, the wikidata statement used
> for facts coming in from Wikipedia use the property "imported from". This
> is deliberately different from the property "reference URL" which is what
> you would use when adding an actual reference to a third-party reliable
> online source.
>


How does the presence of that information in Wikidata help if the Google
user just gets the info in the Knowledge Graph without any indication that
it comes from Wikidata? Because CC0 specifically waives the right to
attribution that Wikipedia retains.[1][2] No re-user of Wikidata content is
required to say where the data came from, and they typically don't.

So, absent this information, don't you think it likely that users will
simply propagate information they find in Google and on other reusers'
sites? Rather than preventing citogenesis, I think it's citogenesis on
steroids, given that Google has far more users than any Wikimedia project.

This CC0, no-attribution arrangement may financially benefit Google,
because they can dispense with a source link that might lead users away
from their own site and their ads, but how does it benefit the public, or
indeed benefit Wikimedia? Are we all just working to make Google richer, or
are we working for the public?

Moreover, according to data on Wikimedia Labs[3], about half of all
statements in Wikidata have *no reference whatsoever*. That's *in addition*
to the third that are only referenced to a Wikipedia.

Yet all of this material is meant to form an input to the Google Knowledge
Graph, following Google's abandonment of Freebase in favour of
Wikidata.[4][5]



> Furthermore, the fact that many statements in Wikidata are not given a
> reference (yet) is not necessarily a "problem". For example - this
> https://www.wikidata.org/wiki/Q21481859 is a Wikidata item for a
> scientific
> publication with 2891 co-authors!! This is an extreme example, but it
> demonstrates my point... None of those 2891 statements has a specific
> reference listed for it, because all of them are self-evidently referenced
> to the scientific publication itself. The same is true of the other
> properties applied to this item (volume, publication date, title, page
> number...). All of these could be "referenced" to the very first property
> in the Wikidata item - the DOI of the scientific article:
> http://www.sciencedirect.com/science/article/pii/S0370269312008581 This
> item is not "less reliable" because it doesn't have the same footnote
> repeated almost three thousand times, but if you merely look at statistics
> of "unreferenced wikidata statements" it would APPEAR that it is very
> poorly cited.
> So, I think we need a more nuanced view of what "proper referencing" means
> in the context of Wikidata.
>


I take your point, even though I am unsure what value this Wikidata listing
adds for the public, given that it merely reproduces details from the
publisher's page. Might we be reinventing the wheel? And if there is value
added for the public in some way that escapes me, surely it would not be
difficult to have the bot add the 

Re: [Wikimedia-l] Quality issues

2015-11-27 Thread Andreas Kolbe
On Fri, Nov 27, 2015 at 1:47 PM, Gnangarra  wrote:

> Disclaimer first - I'm not exactly conversant in the intricacies of
> WikiData, if I was to take the information on 14th Dalai Lama
>
> https://en.wikipedia.org/wiki/14th_Dalai_Lama
>
> it links to Wikidata at
>
> https://www.wikidata.org/wiki/Q17293
>
> the en article has 2 references that list his date of birth, the WikiData
> item has two references for the same piece of information
> WikiData source;
>
>1. just says imported from Russian language Wikipedia, which links to
>Wikidata page on the Russian Wikipedia not to the source url nor does it
>link to permanent url so as a source its meaningless, while may just be
> the
>result of who did the data import linking to Russian language Wikipedia
> is
>kind of obscure for a source, I can understand a  tibetan, mandarin, or
>cantonese language source as they would be associated with the region
>2. Integrated Authority File  links to
>https://www.wikidata.org/wiki/Q36578 on WikiData it doesnt provide a
> url
>or any other information which enables someone to verify what is said
>
> Despite two reference the data itself appears to be immediately untraceable
> to a reliable source.
>
> The circular reference of Wikidata to a Wikipedia of any language is ok but
> the link should be traceable to a specific article version which would then
> make it possible to verify the data even if the current data on Wikipedia
> is changed after its imported, that in itself shouldnt be difficult to
> engineer.  If that was the case then to me a Wikipedia reference for all
> data is a reasonable minimum standard to start at



Would it not make more sense to import (and verify!) the reliable source
cited in the relevant Wikipedia version, along with the statement?
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-11-25 Thread Gerard Meijssen
Hoi,
To belabour the point, we do make errors, we will fail in expectations.
What we need is not complaining that the world is not perfect, we need to
have an approach that will improve our data and is inclusive. We need to be
more of a wiki.
Thanks,
 GerardM

On 25 November 2015 at 04:57, Gnangarra  wrote:

> this isnt about how or whats of Google its about ensuring that what we do
> is trustworthy
>
> On 25 November 2015 at 08:12, Andreas Kolbe  wrote:
>
> > On Tue, Nov 24, 2015 at 11:26 PM, Leila Zia  wrote:
> >
> > >
> > > It's worth mentioning:
> > >
> > > Dominant search engines do not rely on one source of information to
> > surface
> > > results, they get information from many sources, weigh the responses
> they
> > > get based on the trust on the sources and many other factors, and
> > aggregate
> > > to find the best answer to be shown to the user.
> > >
> >
> >
> > Have you never seen Google display gross Wikipedia vandalism?[1][2] Cases
> > like that make it very clear that the Wikimedia content in question
> entered
> > Google directly, without human oversight or cross-checking against other
> > sources. What you describe sounds good, but it didn't happen.
> >
> > If even transient vandalism passes through (the Finnish vandalism was
> > reportedly deleted in Wikipedia within minutes), then so can more subtle
> > and long-lived errors and falsehoods.
> >
> > Similarly, Bing Satori's timeline is simply made up of verbatim Wikipedia
> > sentences containing a numerical year.
> >
> > We know far too little about how search engines import Wikipedia and
> > Wikidata content, and what proportion of content is checked and how.
> >
> >
> >
> > > I just used "chicken pox" as a search query in Google, I see an
> > information
> > > box on the right-hand-side of the page about the disease, and when I
> > click
> > > on Sources I get this page
> > > <
> > >
> >
> https://support.google.com/websearch/answer/2364942?p=medical_conditions=1
> > > >
> > > ("See where we found the medical information") which shows all the
> > sources
> > > Google has used to retrieve information about chicken pox from, nothing
> > in
> > > that list starts with wiki. Of course, this is not the case for all
> > search
> > > queries, for some of them, Google still uses Wikipedia snippets.
> >
> >
> >
> > For medical queries, Google (rightly) prefers other sources, so those
> > queries are not presently affected.
> >
> > [1]
> >
> >
> https://www.seroundtable.com/google-world-series-cardinals-blunder-17587.html
> > [2]
> >
> >
> https://commons.wikimedia.org/wiki/File:Wikipedia_vandalism_in_Google_infobox_from_flagged_revisions_stabilized_article_25.2.2015.png
> > ___
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > Wikimedia-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > 
> >
>
>
>
> --
> GN.
> President Wikimedia Australia
> WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra
> Photo Gallery: http://gnangarra.redbubble.com
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-11-24 Thread Andreas Kolbe
On Tue, Nov 24, 2015 at 11:26 PM, Leila Zia  wrote:

>
> It's worth mentioning:
>
> Dominant search engines do not rely on one source of information to surface
> results, they get information from many sources, weigh the responses they
> get based on the trust on the sources and many other factors, and aggregate
> to find the best answer to be shown to the user.
>


Have you never seen Google display gross Wikipedia vandalism?[1][2] Cases
like that make it very clear that the Wikimedia content in question entered
Google directly, without human oversight or cross-checking against other
sources. What you describe sounds good, but it didn't happen.

If even transient vandalism passes through (the Finnish vandalism was
reportedly deleted in Wikipedia within minutes), then so can more subtle
and long-lived errors and falsehoods.

Similarly, Bing Satori's timeline is simply made up of verbatim Wikipedia
sentences containing a numerical year.

We know far too little about how search engines import Wikipedia and
Wikidata content, and what proportion of content is checked and how.



> I just used "chicken pox" as a search query in Google, I see an information
> box on the right-hand-side of the page about the disease, and when I click
> on Sources I get this page
> <
> https://support.google.com/websearch/answer/2364942?p=medical_conditions=1
> >
> ("See where we found the medical information") which shows all the sources
> Google has used to retrieve information about chicken pox from, nothing in
> that list starts with wiki. Of course, this is not the case for all search
> queries, for some of them, Google still uses Wikipedia snippets.



For medical queries, Google (rightly) prefers other sources, so those
queries are not presently affected.

[1]
https://www.seroundtable.com/google-world-series-cardinals-blunder-17587.html
[2]
https://commons.wikimedia.org/wiki/File:Wikipedia_vandalism_in_Google_infobox_from_flagged_revisions_stabilized_article_25.2.2015.png
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-11-24 Thread Leila Zia
On Mon, Nov 23, 2015 at 8:28 PM, Andreas Kolbe  wrote:

> On Mon, Nov 23, 2015 at 11:37 PM, Gnangarra  wrote:
>
> > 5.People need to able to trust all data in WikiData, otherwise they just
> > wont use it because as Wikidata expands the same PR firms, interest
> groups
> > which have seen so many of WP issues will gravitate to the easier to
> > manipulate WikiData
> >
>
>
> I think the potential problem here is far worse: people *will use* the
> data, because their lack of trustworthiness, as amply described in the
> Wikidata disclaimer[1], is no longer visible when they're displayed as
> "fact" by dominant search engines.
>

It's worth mentioning:

Dominant search engines do not rely on one source of information to surface
results, they get information from many sources, weigh the responses they
get based on the trust on the sources and many other factors, and aggregate
to find the best answer to be shown to the user.
I just used "chicken pox" as a search query in Google, I see an information
box on the right-hand-side of the page about the disease, and when I click
on Sources I get this page

("See where we found the medical information") which shows all the sources
Google has used to retrieve information about chicken pox from, nothing in
that list starts with wiki. Of course, this is not the case for all search
queries, for some of them, Google still uses Wikipedia snippets.

Leila
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-11-24 Thread Gnangarra
this isnt about how or whats of Google its about ensuring that what we do
is trustworthy

On 25 November 2015 at 08:12, Andreas Kolbe  wrote:

> On Tue, Nov 24, 2015 at 11:26 PM, Leila Zia  wrote:
>
> >
> > It's worth mentioning:
> >
> > Dominant search engines do not rely on one source of information to
> surface
> > results, they get information from many sources, weigh the responses they
> > get based on the trust on the sources and many other factors, and
> aggregate
> > to find the best answer to be shown to the user.
> >
>
>
> Have you never seen Google display gross Wikipedia vandalism?[1][2] Cases
> like that make it very clear that the Wikimedia content in question entered
> Google directly, without human oversight or cross-checking against other
> sources. What you describe sounds good, but it didn't happen.
>
> If even transient vandalism passes through (the Finnish vandalism was
> reportedly deleted in Wikipedia within minutes), then so can more subtle
> and long-lived errors and falsehoods.
>
> Similarly, Bing Satori's timeline is simply made up of verbatim Wikipedia
> sentences containing a numerical year.
>
> We know far too little about how search engines import Wikipedia and
> Wikidata content, and what proportion of content is checked and how.
>
>
>
> > I just used "chicken pox" as a search query in Google, I see an
> information
> > box on the right-hand-side of the page about the disease, and when I
> click
> > on Sources I get this page
> > <
> >
> https://support.google.com/websearch/answer/2364942?p=medical_conditions=1
> > >
> > ("See where we found the medical information") which shows all the
> sources
> > Google has used to retrieve information about chicken pox from, nothing
> in
> > that list starts with wiki. Of course, this is not the case for all
> search
> > queries, for some of them, Google still uses Wikipedia snippets.
>
>
>
> For medical queries, Google (rightly) prefers other sources, so those
> queries are not presently affected.
>
> [1]
>
> https://www.seroundtable.com/google-world-series-cardinals-blunder-17587.html
> [2]
>
> https://commons.wikimedia.org/wiki/File:Wikipedia_vandalism_in_Google_infobox_from_flagged_revisions_stabilized_article_25.2.2015.png
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>



-- 
GN.
President Wikimedia Australia
WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra
Photo Gallery: http://gnangarra.redbubble.com
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-11-23 Thread Leila Zia
Hi Andreas,

On Mon, Nov 23, 2015 at 1:15 PM, Andreas Kolbe  wrote:

> Moreover, I was somewhat surprised to learn the other day that, apparently,
> over 80 percent of Wikidata statements are either unreferenced or only
> referenced to a Wikipedia:
>
>
> https://de.wikipedia.org/w/index.php?title=Datei:Citing_as_a_public_service.pdf=17
>
> That seems like a recipe for disaster, given that Wikidata feeds the Google
> Knowledge Graph and Bing Satori to some extent.
>
> Thoughts?
>

Here are my thoughts:

1) No, it's not a recipe for disaster. :-) I expand below.

2) People sit at the different parts of the spectrum when it comes to the
issues around Wikidata references. What almost all these people have in
common is that they know having references is a very valuable thing for
Wikidata (or any other knowledge base for that matter).

3) As a researcher, as long as the data is in Wikidata, with or without a
reference, I'm already some steps ahead. If there is no reference, I have a
starting point to look for a reference for that specific value, and in that
process, I may find conflicting data with new references. For a project in
a growing stage, these are opportunities, not blockers.

4) I hear a lot of sensitivity about referencing Wikidata claim values to
Wikipedia. I hear people's concerns (having loops in referencing mechanisms
is not good) but I do not consider the existence of Wikipedia references an
issue and I certainly prefer a Wikipedia reference over no reference,
especially if the date the information was extracted at is also tracked
somewhere in Wikidata. Giving information to the researcher that the data
has come from Wikipedia will give him/her a head-start about where to
continue the search.

5) I see a need to give the users of open data a chance to use data with
more knowledge and control. For example, if you are an app developer, you
should be able to figure out relatively easily what data in Wikidata you
can fully trust, and what data you may want to skip using in your app. At
the moment, some part of the community considers a value with a non-
Wikipedia reference approved/monitored by a human as trustworthy (this is
no written rule, I'm summarizing my current understanding based on
discussions with some of the Wikidata community members, including myself
:-). But, among other things, the reference in Wikidata may not be a
trustworthy reference. We should surface how much trust one should have in
the values in Wikidata to the end-user.

What is amazing is: There are many great things one can do based on the
data that is being gathered in Wikidata. We should all work together to
improve that data, but we should also acknowledge that our attention is
split across many projects (this is definitely the case for me), and as a
result, we will be seeing steady and smooth improvements in Wikidata, and
not sudden and very fast improvements. We need to stay curious, excited,
committed, and patient. :-)

Leila

Disclaimer: These are my personal views about references in Wikidata, and
not necessarily the views of my team or the Wikimedia Foundation. :-)
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-11-23 Thread Andreas Kolbe
On Mon, Nov 23, 2015 at 11:37 PM, Gnangarra  wrote:

> 5.People need to able to trust all data in WikiData, otherwise they just
> wont use it because as Wikidata expands the same PR firms, interest groups
> which have seen so many of WP issues will gravitate to the easier to
> manipulate WikiData
>


I think the potential problem here is far worse: people *will use* the
data, because their lack of trustworthiness, as amply described in the
Wikidata disclaimer[1], is no longer visible when they're displayed as
"fact" by dominant search engines.

Google is already committed to Wikidata. Wikidata is in part a Google
project. This means information placed in Wikidata may in time have the
potential to reach an audience of billions – a far greater audience than
Wikipedia has.

People already blindly copy falsehoods from Wikipedia today, because
important caveats (like checking the sourcing to assess the reliability of
a Wikipedia article) are widely ignored. As a result, circular references
and citogenesis have become a significant problem for Wikipedia.

People are far more likely still to copy blindly from Google. It's circular
referencing on steroids.

The way things are headed, manipulations in Wikidata that enter the Google
Knowledge Graph, Bing Satori, etc. could end up having far greater leverage
than any Wikipedia manipulation has ever had. In the worst-case scenario –
depending on how much search engines will come to rely on Wikidata – an
edit war won by anonymous players in an obscure corner of Wikidata might
literally redefine truth for the English-speaking internet.

Is this really a good thing? Are checks and balances in place to prevent
this from happening?



> Lets build something based on the lessons learnt on Wikipedia over the last
> 15 years rather than duplicate those missteps
>


That seems like good advice to me. The online world's information
infrastructure shouldn't be built on sand.


[1] https://www.wikidata.org/wiki/Wikidata:General_disclaimer – highlights:
"Wikidata cannot guarantee the validity of the information found here.
[...] No formal peer review[:] Wikidata does not have an executive editor
or editorial board that vets content before it is published. Our active
community of editors uses tools such as the Special:Recentchanges and
Special:Newpages feeds to monitor new and changing content. However,
Wikidata is not uniformly peer reviewed; while readers may correct errors
or engage in casual peer review, they have no legal duty to do so and thus
all information read here is without any implied warranty of fitness for
any purpose or use whatsoever. None of the contributors, sponsors,
administrators or anyone else connected with Wikidata in any way whatsoever
can be responsible for the appearance of any inaccurate or libelous
information or for your use of the information contained in or linked from
these web pages [...] neither is anyone at Wikidata responsible should
someone change, edit, modify or remove any information that you may post on
Wikidata or any of its associated projects."
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-11-23 Thread Gnangarra
some resposnes to Leila comments

1. Its not a disaster but it is a serious concern, we know from past
experiences that it goes to the heart of the projects long term
credibility, Countless hours and funds have gone into redressing Wikipedias
reputation and still after  8 years of doing this we get bagged, we are
still answering these questions. why send Wikidata done that  track when we
all understand the importance of referencing or in more theological
perspective "if we cant learn from history, why do we spend so many
resources recording history"

2. referencing is a very valuable thing for all data, that should be a
starting point for the spectrum and Wikidata, rather than a goal or end
point. Wikipeidas still have unreferenced material 15 years after it started

3. I'd disagree if the data isnt referenced then its of no value,
Wikipedias are a better place to look

4.Wikipedia reference isnt ideal but it is better than nothing, providing
that reference is to a permanent link rather than just a article at least
then if the information is changed there is some ability to recover the
original source.  In general a circular reference is a bad out come

5.People need to able to trust all data in WikiData, otherwise they just
wont use it because as Wikidata expands the same PR firms, interest groups
which have seen so many of WP issues will gravitate to the easier to
manipulate WikiData


Lets build something based on the lessons learnt on Wikipedia over the last
15 years rather than duplicate those missteps



On 24 November 2015 at 06:18, Leila Zia  wrote:

> Hi Andreas,
>
> On Mon, Nov 23, 2015 at 1:15 PM, Andreas Kolbe  wrote:
>
> > Moreover, I was somewhat surprised to learn the other day that,
> apparently,
> > over 80 percent of Wikidata statements are either unreferenced or only
> > referenced to a Wikipedia:
> >
> >
> >
> https://de.wikipedia.org/w/index.php?title=Datei:Citing_as_a_public_service.pdf=17
> >
> > That seems like a recipe for disaster, given that Wikidata feeds the
> Google
> > Knowledge Graph and Bing Satori to some extent.
> >
> > Thoughts?
> >
>
> Here are my thoughts:
>
> 1) No, it's not a recipe for disaster. :-) I expand below.
>
> 2) People sit at the different parts of the spectrum when it comes to the
> issues around Wikidata references. What almost all these people have in
> common is that they know having references is a very valuable thing for
> Wikidata (or any other knowledge base for that matter).
>
> 3) As a researcher, as long as the data is in Wikidata, with or without a
> reference, I'm already some steps ahead. If there is no reference, I have a
> starting point to look for a reference for that specific value, and in that
> process, I may find conflicting data with new references. For a project in
> a growing stage, these are opportunities, not blockers.
>
> 4) I hear a lot of sensitivity about referencing Wikidata claim values to
> Wikipedia. I hear people's concerns (having loops in referencing mechanisms
> is not good) but I do not consider the existence of Wikipedia references an
> issue and I certainly prefer a Wikipedia reference over no reference,
> especially if the date the information was extracted at is also tracked
> somewhere in Wikidata. Giving information to the researcher that the data
> has come from Wikipedia will give him/her a head-start about where to
> continue the search.
>
> 5) I see a need to give the users of open data a chance to use data with
> more knowledge and control. For example, if you are an app developer, you
> should be able to figure out relatively easily what data in Wikidata you
> can fully trust, and what data you may want to skip using in your app. At
> the moment, some part of the community considers a value with a non-
> Wikipedia reference approved/monitored by a human as trustworthy (this is
> no written rule, I'm summarizing my current understanding based on
> discussions with some of the Wikidata community members, including myself
> :-). But, among other things, the reference in Wikidata may not be a
> trustworthy reference. We should surface how much trust one should have in
> the values in Wikidata to the end-user.
>
> What is amazing is: There are many great things one can do based on the
> data that is being gathered in Wikidata. We should all work together to
> improve that data, but we should also acknowledge that our attention is
> split across many projects (this is definitely the case for me), and as a
> result, we will be seeing steady and smooth improvements in Wikidata, and
> not sudden and very fast improvements. We need to stay curious, excited,
> committed, and patient. :-)
>
> Leila
>
> Disclaimer: These are my personal views about references in Wikidata, and
> not necessarily the views of my team or the Wikimedia Foundation. :-)
> ___
> Wikimedia-l mailing list, guidelines at:
> 

Re: [Wikimedia-l] Quality issues

2015-11-23 Thread Gerard Meijssen
Hoi,
To start of, results from the past are no indications of results in the
future. It is the disclaimer insurance companies have to state in all their
adverts in the Netherlands. When you continue and make it a "theological"
issue, you lose me because I am not of this faith, far from it. Wikidata is
its own project and it is utterly dissimilar from Wikipedia.To start of
Wikidata has been a certified success from the start. The improvement it
brought by bringing all interwiki links together is enormous.That alone
should be a pointer that Wikipedia think is not realistic.

To continue, people have been importing data into Wikidata from the start.
They are the statements you know and, it was possible  to import them from
Wikipedia because of these interwiki links. So when you call for sources,
it is fairly save to assume that those imports are supported by the quality
of the statements of the Wikipedias and if anything, that is also where
they typically fail because many assumptions at Wikipedia are plain wrong
at Wikidata. For instance a listed building is not the organisation the
building is known for. At Wikidata they each need their own item and
associated statements.

Wikidata is already a success for other reasons. VIAF no longer links to
Wikipedia but to Wikidata. The biggest benefit of this move is for people
who are not interested in English.  Because of this change VIAF links
through Wikidata to all Wikipedias not only en.wp. Consequently people may
find through VIAF Wikipedia articles in their own language through their
library systems.

So do not forget about Wikipedia and the lessons learned. These lessons are
important to Wikipedia. However, they do not necessarily apply to Wikidata
particularly when you approach Wikidata as an opportunity to do things in a
different way. Set theory, a branch of mathematics, is exactly what we
need. When we have data at Wikidata of a given quality.. eg 90% and we have
data at another source with a given quality eg 90%, we can compare the two
and find a subset where the two sources do not match. When we curate the
differences, it is highly likely that we improve quality at Wikidata or at
the other source. With a proper workflow and an iterative approach to
multiple sources, we will spend time adding sources and improving quality.
This is more productive than religiously adding sources for every
statement. It also brings us better information in less time. I hope this
will help people understand that Wikidata is not Wikipedia and, that is a
good thing.
Thanks,
   GerardM

On 24 November 2015 at 00:37, Gnangarra  wrote:

> some resposnes to Leila comments
>
> 1. Its not a disaster but it is a serious concern, we know from past
> experiences that it goes to the heart of the projects long term
> credibility, Countless hours and funds have gone into redressing Wikipedias
> reputation and still after  8 years of doing this we get bagged, we are
> still answering these questions. why send Wikidata done that  track when we
> all understand the importance of referencing or in more theological
> perspective "if we cant learn from history, why do we spend so many
> resources recording history"
>
> 2. referencing is a very valuable thing for all data, that should be a
> starting point for the spectrum and Wikidata, rather than a goal or end
> point. Wikipeidas still have unreferenced material 15 years after it
> started
>
> 3. I'd disagree if the data isnt referenced then its of no value,
> Wikipedias are a better place to look
>
> 4.Wikipedia reference isnt ideal but it is better than nothing, providing
> that reference is to a permanent link rather than just a article at least
> then if the information is changed there is some ability to recover the
> original source.  In general a circular reference is a bad out come
>
> 5.People need to able to trust all data in WikiData, otherwise they just
> wont use it because as Wikidata expands the same PR firms, interest groups
> which have seen so many of WP issues will gravitate to the easier to
> manipulate WikiData
>
>
> Lets build something based on the lessons learnt on Wikipedia over the last
> 15 years rather than duplicate those missteps
>
>
>
> On 24 November 2015 at 06:18, Leila Zia  wrote:
>
> > Hi Andreas,
> >
> > On Mon, Nov 23, 2015 at 1:15 PM, Andreas Kolbe 
> wrote:
> >
> > > Moreover, I was somewhat surprised to learn the other day that,
> > apparently,
> > > over 80 percent of Wikidata statements are either unreferenced or only
> > > referenced to a Wikipedia:
> > >
> > >
> > >
> >
> https://de.wikipedia.org/w/index.php?title=Datei:Citing_as_a_public_service.pdf=17
> > >
> > > That seems like a recipe for disaster, given that Wikidata feeds the
> > Google
> > > Knowledge Graph and Bing Satori to some extent.
> > >
> > > Thoughts?
> > >
> >
> > Here are my thoughts:
> >
> > 1) No, it's not a recipe for disaster. :-) I expand below.
> >
> > 2) 

Re: [Wikimedia-l] Quality issues

2015-11-21 Thread Peter Southwood
The problem may simply be that the information is not coming to the attention 
of the people who care, as they don't know that it exists or where to find it. 
The normal place to put information relating to improvement of an article is on 
the article talk page, and that is where Wikipedians will expect to find it.
Cheers,
Peter

-Original Message-
From: Wikimedia-l [mailto:wikimedia-l-boun...@lists.wikimedia.org] On Behalf Of 
Gerard Meijssen
Sent: Saturday, 21 November 2015 9:57 AM
To: Wikimedia Mailing List
Subject: Re: [Wikimedia-l] Quality issues

Hoi,
That is indeed a problem. So far it has been lists, often well formatted lists 
that do not have a workflow, are not updated regularly. I have added these 
issues as a wishlist item to work on. [1]

You have to appreciate that when a list of problematic issues is listed with 
over 100 items, it is no longer easy or obvious that you want to add and follow 
100 talk pages.This is one of the big differences between Wikipedia think and 
Wikidata think. I care about a lot of data, data that is linked. Analogous to 
the "Kevin Bacon steps of separation" I want all items easily and obviously 
connected.  That is another quality goal for Wikidata .

Given the state of Wikipedia, most articles have an article, easy and obvious 
tasks like fact checking and adding sources is exactly what we are looking for 
for maintaining our community. Add relevance to the cocktail, we know that 
these facts are likely to have issues, and you appreciate why this may help us 
with our quality and with our community issues.
Thanks,
 GerardM


[1]
https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey#Visibility_for_quality_issues

On 21 November 2015 at 07:11, Peter Southwood <peter.southw...@telkomsa.net>
wrote:

> How are you notifying the Wikipedias/Wikipedians? Do you leave a 
> message on the talk page of the relevant article?
> Cheers,
> Peter
>
> -Original Message-
> From: Wikimedia-l [mailto:wikimedia-l-boun...@lists.wikimedia.org] On 
> Behalf Of Gerard Meijssen
> Sent: Saturday, 21 November 2015 12:23 AM
> To: Wikimedia Mailing List
> Subject: Re: [Wikimedia-l] Quality issues
>
> Hoi,
> So far such lists have been produced for bigger Wikipedias but 
> essentially it is potentially an issue for any and all Wikis that have 
> data that may exist on Wikidata or linked through Wikidata on external 
> sources.
> Thanks,
>   GerardM
>
> On 20 November 2015 at 12:33, Peter Southwood < 
> peter.southw...@telkomsa.net>
> wrote:
>
> > Gerard,
> > Who were you expecting would respond from the Wikipedias?
> > Cheers,
> > Peter
> >
> > -Original Message-
> > From: Wikimedia-l [mailto:wikimedia-l-boun...@lists.wikimedia.org] 
> > On Behalf Of Gerard Meijssen
> > Sent: Friday, 20 November 2015 9:18 AM
> > To: Wikimedia Mailing List; Research into Wikimedia content and 
> > communities; WikiData-l
> > Subject: [Wikimedia-l] Quality issues
> >
> > Hoi,
> > At Wikidata we often find issues with data imported from a Wikipedia.
> > Lists have been produced with these issues on the Wikipedia involved 
> > and arguably they do present issues with the quality of Wikipedia or 
> > Wikidata for that matter. So far hardly anything resulted from such
> outreach.
> >
> > When Wikipedia is a black box, not communicating about with the 
> > outside world, at some stage the situation becomes toxic. At this 
> > moment there are already those at Wikidata that argue not to bother 
> > about Wikipedia quality because in their view, Wikipedians do not 
> > care
> about its own quality.
> >
> > Arguably known issues with quality are the easiest to solve.
> >
> > There are many ways to approach this subject. It is indeed a quality 
> > issue both for Wikidata and Wikipedia. It can be seen as a research 
> > issue; how to deal with quality and how do such mechanisms function 
> > if
> at all.
> >
> > I blogged about it..
> > Thanks,
> >  GerardM
> >
> >
> > http://ultimategerardm.blogspot.nl/2015/11/what-kind-of-box-is-wikip
> > ed ia.html ___
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > Wikimedia-l@lists.wikimedia.org
> > Unsubscribe: 
> > https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>
> >
> > -
> > No virus found in this message.
> > Checked by AVG - www.avg.com
> > Version: 2016.0.7227 / Virus Database: 4460/11032 - Release Date:
> > 11/20/15
> >
> 

Re: [Wikimedia-l] Quality issues

2015-11-21 Thread Gerard Meijssen
Hoi,
I respect the policy of Wikipedia. However, when multiple Wikipedias differ
and when there is no sourcing does this policy hold? When Wikidata has no
attributable sources but multiple statements is it not conceivable that
things are easy and obvious.. that they are wrong?

When you talk about the FA status of articles, you are considering
something totally alien to what is at stake. Typically we do not have
credible sources at Wikidata and typically there is an issue with the data.

When Wikidata is as mature as en.wp we will have on average 10 statements
for every item. Currently half of our items have at most two statements. We
do find issues in any source by comparing them. It does make sense to make
this effort. It is an obvious way of improving quality in all of our
projects and even beyond that.
Thanks,
 GerardM

On 21 November 2015 at 10:26, Gnangarra  wrote:

> >
> > Many data sources have data from the same origin. It does not follow that
> > without original sources they are all right. Quite the reverse. It does
> > however take humans to be bold, to determine where a booboo has been
> made.
> > Yes, we do decide what is right or wrong,
>
>
> ​No we dont decide what is right or wrong, en:wp has very specific core
> policies about this
>
>- ​Original research - we dont draw conclusions from available data
>- NPOV​ -
>*which means presenting information without editorial bias*,
>​ the moment we make that decision about whats right  we exceed the
>boundaries of our core pillars ​dont know, uncertain or conflicting
>information means exactly that we dont get to choose what we think is
> right
>
>
> ​The data article writers work with isnt black and white and its definitely
> not set in stone Wikipedia content is a constant evolving collation of
> knowledge, we should be careful when ever we put in place a process that
> makes information definitive because people become reluctant to add to that
> and they are even less likely to challenge something that has been cast in
> stone already regardless of the inaccuracy of that casting .  We see it
> within Wikipedia when articles are elevated to FA status with the number of
> editors who fiercely defend that current/correct version against any
> changes regardless of the merit in the information being added with
> comments like "discuss it on talk page first" "revert good faith edit"
>
>
> the more disjointed knowledge becomes the harder it is to keep it current,
> accurate the more isolated that knowledge. Then power over making changes
> takes precedence over productivity, accuracy and openness
>
> On 21 November 2015 at 16:12, Gerard Meijssen 
> wrote:
>
> > Hoi,
> > You conflate two issues. First when facts differ, it should be possible
> to
> > explain why they differ. Only when there is no explanation particularly
> > when there are no sources, there is an issue. In come real sources. When
> > someone died on 7-5-1759 and another source has a different date, it may
> be
> > the difference between a Julian and a Gregorian date. When a source makes
> > this plain, one fact has been proven to be incorrect. When the date was
> > 1759, it is obvious that the other date is more precise.. The point is
> very
> > much that Wikipedia values sources and so does Wikidata. USE THEM and
> find
> > that data sources may be wrong when they are. In this way we improve
> > quality.
> >
> > Many data sources have data from the same origin. It does not follow that
> > without original sources they are all right. Quite the reverse. It does
> > however take humans to be bold, to determine where a booboo has been
> made.
> > Yes, we do decide what is right or wrong, we do this when we research an
> > issue and that is exactly what this is about. It all starts with
> > determining a source.
> >
> > In the mean time, Wikidata is negligent in stating sources. The worst
> > example is in the "primary sources" tool. It is bad because it is brought
> > to us as the best work flow for adding uncertain data to Wikidata. So the
> > world is not perfect but hey it is a wiki :)
> > Thanks,
> >   GerardM
> >
> > On 21 November 2015 at 00:32, Gnangarra  wrote:
> >
> > > >
> > > > ...
> > > > *When 100% is compared with another source and 85% is the same,**you
> > only
> > > > have to check 15% and decide what is righ**t*
> > >
> > >
> > > ​this very statement highlights one issue that ​
> > >
> > > ​will always be a problem between Wikidata and Wikipedias. Wikipedia,
> at
> > > least in my 10 years of experience on en:wp is that when you have
> > multiple
> > > sources that differ you highlight the existence of those ​sources and
> the
> > > conflict of information  we dont decide what is right or wrong.
> > >
> > > On 21 November 2015 at 06:35, Gerard Meijssen <
> gerard.meijs...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hoi,
> > > >  quality is different things  I do 

Re: [Wikimedia-l] Quality issues

2015-11-21 Thread Gnangarra
>
> Many data sources have data from the same origin. It does not follow that
> without original sources they are all right. Quite the reverse. It does
> however take humans to be bold, to determine where a booboo has been made.
> Yes, we do decide what is right or wrong,


​No we dont decide what is right or wrong, en:wp has very specific core
policies about this

   - ​Original research - we dont draw conclusions from available data
   - NPOV​ -
   *which means presenting information without editorial bias*,
   ​ the moment we make that decision about whats right  we exceed the
   boundaries of our core pillars ​dont know, uncertain or conflicting
   information means exactly that we dont get to choose what we think is right


​The data article writers work with isnt black and white and its definitely
not set in stone Wikipedia content is a constant evolving collation of
knowledge, we should be careful when ever we put in place a process that
makes information definitive because people become reluctant to add to that
and they are even less likely to challenge something that has been cast in
stone already regardless of the inaccuracy of that casting .  We see it
within Wikipedia when articles are elevated to FA status with the number of
editors who fiercely defend that current/correct version against any
changes regardless of the merit in the information being added with
comments like "discuss it on talk page first" "revert good faith edit"


the more disjointed knowledge becomes the harder it is to keep it current,
accurate the more isolated that knowledge. Then power over making changes
takes precedence over productivity, accuracy and openness

On 21 November 2015 at 16:12, Gerard Meijssen 
wrote:

> Hoi,
> You conflate two issues. First when facts differ, it should be possible to
> explain why they differ. Only when there is no explanation particularly
> when there are no sources, there is an issue. In come real sources. When
> someone died on 7-5-1759 and another source has a different date, it may be
> the difference between a Julian and a Gregorian date. When a source makes
> this plain, one fact has been proven to be incorrect. When the date was
> 1759, it is obvious that the other date is more precise.. The point is very
> much that Wikipedia values sources and so does Wikidata. USE THEM and find
> that data sources may be wrong when they are. In this way we improve
> quality.
>
> Many data sources have data from the same origin. It does not follow that
> without original sources they are all right. Quite the reverse. It does
> however take humans to be bold, to determine where a booboo has been made.
> Yes, we do decide what is right or wrong, we do this when we research an
> issue and that is exactly what this is about. It all starts with
> determining a source.
>
> In the mean time, Wikidata is negligent in stating sources. The worst
> example is in the "primary sources" tool. It is bad because it is brought
> to us as the best work flow for adding uncertain data to Wikidata. So the
> world is not perfect but hey it is a wiki :)
> Thanks,
>   GerardM
>
> On 21 November 2015 at 00:32, Gnangarra  wrote:
>
> > >
> > > ...
> > > *When 100% is compared with another source and 85% is the same,**you
> only
> > > have to check 15% and decide what is righ**t*
> >
> >
> > ​this very statement highlights one issue that ​
> >
> > ​will always be a problem between Wikidata and Wikipedias. Wikipedia, at
> > least in my 10 years of experience on en:wp is that when you have
> multiple
> > sources that differ you highlight the existence of those ​sources and the
> > conflict of information  we dont decide what is right or wrong.
> >
> > On 21 November 2015 at 06:35, Gerard Meijssen  >
> > wrote:
> >
> > > Hoi,
> > >  quality is different things  I do care about quality but
> I
> > do
> > > not necessarily agree with you how to best achieve it. Arguably bots
> are
> > > better and getting data into Wikidata than people. This means that the
> > > error rate of bots is typically better than what people do. It is all
> in
> > > the percentages.
> > >
> > > I have always said that the best way to improve quality is by comparing
> > > sources. When Wikidata has no data, it is arguably better to import
> data
> > > from any source. When the quality is 90% correct, there is already 100%
> > > more data. When 100% is compared with another source and 85% is the
> same,
> > > you only have to check 15% and decide what is right. When you compare
> > with
> > > two distinct sources, the percentage that differs changes again.. :) In
> > > this way it makes sense to check errors
> > >
> > > It does not help when you state that either party has people that care
> or
> > > do not care about quality. By providing a high likelihood that
> something
> > is
> > > problematic, you will learn who actually makes a difference. It however
> > > 

Re: [Wikimedia-l] Quality issues

2015-11-21 Thread
On 20 November 2015 at 22:47, Milos Rancic  wrote:
> Offtopic: Gerard, during the last half an hour or so, I am just
> getting emails from you inside of this thread (including wiki-research
> list). I thought my phone has a bug. It's useful to write a larger
> email with addressing all the issues. Besides other things, with this
> frequency, you'll spend your monthly email quota for this list the day
> after tomorrow.

+1

I keep an open mind for supporting Wikidata in association with my
Commons uploads. This thread going over a series of old gripes against
other projects, with a lack of new proposals, has diminished my
interest. For me, this effectively burns out the word "Wikidata" for a
month.

Fae

___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-11-21 Thread Gerard Meijssen
Hoi,
You conflate two issues. First when facts differ, it should be possible to
explain why they differ. Only when there is no explanation particularly
when there are no sources, there is an issue. In come real sources. When
someone died on 7-5-1759 and another source has a different date, it may be
the difference between a Julian and a Gregorian date. When a source makes
this plain, one fact has been proven to be incorrect. When the date was
1759, it is obvious that the other date is more precise.. The point is very
much that Wikipedia values sources and so does Wikidata. USE THEM and find
that data sources may be wrong when they are. In this way we improve
quality.

Many data sources have data from the same origin. It does not follow that
without original sources they are all right. Quite the reverse. It does
however take humans to be bold, to determine where a booboo has been made.
Yes, we do decide what is right or wrong, we do this when we research an
issue and that is exactly what this is about. It all starts with
determining a source.

In the mean time, Wikidata is negligent in stating sources. The worst
example is in the "primary sources" tool. It is bad because it is brought
to us as the best work flow for adding uncertain data to Wikidata. So the
world is not perfect but hey it is a wiki :)
Thanks,
  GerardM

On 21 November 2015 at 00:32, Gnangarra  wrote:

> >
> > ...
> > *When 100% is compared with another source and 85% is the same,**you only
> > have to check 15% and decide what is righ**t*
>
>
> ​this very statement highlights one issue that ​
>
> ​will always be a problem between Wikidata and Wikipedias. Wikipedia, at
> least in my 10 years of experience on en:wp is that when you have multiple
> sources that differ you highlight the existence of those ​sources and the
> conflict of information  we dont decide what is right or wrong.
>
> On 21 November 2015 at 06:35, Gerard Meijssen 
> wrote:
>
> > Hoi,
> >  quality is different things  I do care about quality but I
> do
> > not necessarily agree with you how to best achieve it. Arguably bots are
> > better and getting data into Wikidata than people. This means that the
> > error rate of bots is typically better than what people do. It is all in
> > the percentages.
> >
> > I have always said that the best way to improve quality is by comparing
> > sources. When Wikidata has no data, it is arguably better to import data
> > from any source. When the quality is 90% correct, there is already 100%
> > more data. When 100% is compared with another source and 85% is the same,
> > you only have to check 15% and decide what is right. When you compare
> with
> > two distinct sources, the percentage that differs changes again.. :) In
> > this way it makes sense to check errors
> >
> > It does not help when you state that either party has people that care or
> > do not care about quality. By providing a high likelihood that something
> is
> > problematic, you will learn who actually makes a difference. It however
> > started with having data to compare in the first place
> > Thanks,
> >   GerardM
> >
> > On 20 November 2015 at 14:50, Petr Kadlec  wrote:
> >
> > > On Fri, Nov 20, 2015 at 8:18 AM, Gerard Meijssen <
> > > gerard.meijs...@gmail.com>
> > > wrote:
> > >
> > > > When Wikipedia is a black box, not communicating about with the
> outside
> > > > world, at some stage the situation becomes toxic. At this moment
> there
> > > are
> > > > already those at Wikidata that argue not to bother about Wikipedia
> > > quality
> > > > because in their view, Wikipedians do not care about its own quality.
> > > >
> > >
> > > Right. When some users blindly dump random data to Wikidata, not
> > > communicating about with the outside world, at some stage the situation
> > > becomes toxic. At this moment there are already those at Wikipedia that
> > > argue not to bother about Wikidata quality because in their view,
> > > Wikidatans do not care about its own quality.
> > >
> > > For instance, take a look at
> > > https://www.wikidata.org/wiki/User_talk:GerardM
> > > https://www.wikidata.org/wiki/User_talk:GerardM/Archive_1
> > >
> > > Erm
> > > -- [[cs:User:Mormegil | Petr Kadlec]]
> > > ___
> > > Wikimedia-l mailing list, guidelines at:
> > > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > > Wikimedia-l@lists.wikimedia.org
> > > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > > 
> > >
> > ___
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > Wikimedia-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > 
> >
>
>
>

Re: [Wikimedia-l] Quality issues

2015-11-21 Thread Jane Darnell
Sorry to read that Fae, but in your specific case I do think your time is
spent more productively on Commons, because the value of your contributions
there is huge. Having created Wikidata items for many of your Commons
uploads, I think it may be worthwhile at some point to try and get someone
to run a Fae-Wikidata-conversion bot to try and get as much data as
possible from your uploads moved over, but until then, please go ahead with
whatever it is you like to do best. In my last mail I was thinking about
Wikipedians, but of course the same is true for all of the sister projects.

On Sat, Nov 21, 2015 at 1:08 PM, Fæ  wrote:

> On 20 November 2015 at 22:47, Milos Rancic  wrote:
> > Offtopic: Gerard, during the last half an hour or so, I am just
> > getting emails from you inside of this thread (including wiki-research
> > list). I thought my phone has a bug. It's useful to write a larger
> > email with addressing all the issues. Besides other things, with this
> > frequency, you'll spend your monthly email quota for this list the day
> > after tomorrow.
>
> +1
>
> I keep an open mind for supporting Wikidata in association with my
> Commons uploads. This thread going over a series of old gripes against
> other projects, with a lack of new proposals, has diminished my
> interest. For me, this effectively burns out the word "Wikidata" for a
> month.
>
> Fae
>
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-11-21 Thread Gnangarra
agree getting information in is in and of itself a good starting point but
ignoring the lessons learnt in other project in doing so is only creating
more work for those that follow. Having less clear policy about sources and
allowing unsourced information is only going to put Wikidata behind
Wikipedia in quality, in doing so its not going to endear WikiData
information to Wikipedians which in turn Wikipedians as they get data just
arent going to go that extra step to share no matter how easy the step is
to take

On 21 November 2015 at 19:13, Gerard Meijssen 
wrote:

> Hoi,
> I respect the policy of Wikipedia. However, when multiple Wikipedias differ
> and when there is no sourcing does this policy hold? When Wikidata has no
> attributable sources but multiple statements is it not conceivable that
> things are easy and obvious.. that they are wrong?
>
> When you talk about the FA status of articles, you are considering
> something totally alien to what is at stake. Typically we do not have
> credible sources at Wikidata and typically there is an issue with the data.
>
> When Wikidata is as mature as en.wp we will have on average 10 statements
> for every item. Currently half of our items have at most two statements. We
> do find issues in any source by comparing them. It does make sense to make
> this effort. It is an obvious way of improving quality in all of our
> projects and even beyond that.
> Thanks,
>  GerardM
>
> On 21 November 2015 at 10:26, Gnangarra  wrote:
>
> > >
> > > Many data sources have data from the same origin. It does not follow
> that
> > > without original sources they are all right. Quite the reverse. It does
> > > however take humans to be bold, to determine where a booboo has been
> > made.
> > > Yes, we do decide what is right or wrong,
> >
> >
> > ​No we dont decide what is right or wrong, en:wp has very specific core
> > policies about this
> >
> >- ​Original research - we dont draw conclusions from available data
> >- NPOV​ -
> >*which means presenting information without editorial bias*,
> >​ the moment we make that decision about whats right  we exceed the
> >boundaries of our core pillars ​dont know, uncertain or
> conflicting
> >information means exactly that we dont get to choose what we think is
> > right
> >
> >
> > ​The data article writers work with isnt black and white and its
> definitely
> > not set in stone Wikipedia content is a constant evolving collation of
> > knowledge, we should be careful when ever we put in place a process that
> > makes information definitive because people become reluctant to add to
> that
> > and they are even less likely to challenge something that has been cast
> in
> > stone already regardless of the inaccuracy of that casting .  We see it
> > within Wikipedia when articles are elevated to FA status with the number
> of
> > editors who fiercely defend that current/correct version against any
> > changes regardless of the merit in the information being added with
> > comments like "discuss it on talk page first" "revert good faith edit"
> >
> >
> > the more disjointed knowledge becomes the harder it is to keep it
> current,
> > accurate the more isolated that knowledge. Then power over making changes
> > takes precedence over productivity, accuracy and openness
> >
> > On 21 November 2015 at 16:12, Gerard Meijssen  >
> > wrote:
> >
> > > Hoi,
> > > You conflate two issues. First when facts differ, it should be possible
> > to
> > > explain why they differ. Only when there is no explanation particularly
> > > when there are no sources, there is an issue. In come real sources.
> When
> > > someone died on 7-5-1759 and another source has a different date, it
> may
> > be
> > > the difference between a Julian and a Gregorian date. When a source
> makes
> > > this plain, one fact has been proven to be incorrect. When the date was
> > > 1759, it is obvious that the other date is more precise.. The point is
> > very
> > > much that Wikipedia values sources and so does Wikidata. USE THEM and
> > find
> > > that data sources may be wrong when they are. In this way we improve
> > > quality.
> > >
> > > Many data sources have data from the same origin. It does not follow
> that
> > > without original sources they are all right. Quite the reverse. It does
> > > however take humans to be bold, to determine where a booboo has been
> > made.
> > > Yes, we do decide what is right or wrong, we do this when we research
> an
> > > issue and that is exactly what this is about. It all starts with
> > > determining a source.
> > >
> > > In the mean time, Wikidata is negligent in stating sources. The worst
> > > example is in the "primary sources" tool. It is bad because it is
> brought
> > > to us as the best work flow for adding uncertain data to Wikidata. So
> the
> > > world is not perfect but hey it is a wiki :)
> > > Thanks,
> > >   GerardM

Re: [Wikimedia-l] Quality issues

2015-11-20 Thread Petr Kadlec
On Fri, Nov 20, 2015 at 8:18 AM, Gerard Meijssen 
wrote:

> When Wikipedia is a black box, not communicating about with the outside
> world, at some stage the situation becomes toxic. At this moment there are
> already those at Wikidata that argue not to bother about Wikipedia quality
> because in their view, Wikipedians do not care about its own quality.
>

Right. When some users blindly dump random data to Wikidata, not
communicating about with the outside world, at some stage the situation
becomes toxic. At this moment there are already those at Wikipedia that
argue not to bother about Wikidata quality because in their view,
Wikidatans do not care about its own quality.

For instance, take a look at
https://www.wikidata.org/wiki/User_talk:GerardM
https://www.wikidata.org/wiki/User_talk:GerardM/Archive_1

Erm
-- [[cs:User:Mormegil | Petr Kadlec]]
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-11-20 Thread Richard Symonds
Folks, regardless of which views we hold, we're all on the same side - can
we try and be a little less acerbic please - it is Friday after all!

Richard Symonds
Wikimedia UK
0207 065 0992

Wikimedia UK is a Company Limited by Guarantee registered in England and
Wales, Registered No. 6741827. Registered Charity No.1144513. Registered
Office 4th Floor, Development House, 56-64 Leonard Street, London EC2A 4LT.
United Kingdom. Wikimedia UK is the UK chapter of a global Wikimedia
movement. The Wikimedia projects are run by the Wikimedia Foundation (who
operate Wikipedia, amongst other projects).

*Wikimedia UK is an independent non-profit charity with no legal control
over Wikipedia nor responsibility for its contents.*

On 20 November 2015 at 13:50, Petr Kadlec  wrote:

> On Fri, Nov 20, 2015 at 8:18 AM, Gerard Meijssen <
> gerard.meijs...@gmail.com>
> wrote:
>
> > When Wikipedia is a black box, not communicating about with the outside
> > world, at some stage the situation becomes toxic. At this moment there
> are
> > already those at Wikidata that argue not to bother about Wikipedia
> quality
> > because in their view, Wikipedians do not care about its own quality.
> >
>
> Right. When some users blindly dump random data to Wikidata, not
> communicating about with the outside world, at some stage the situation
> becomes toxic. At this moment there are already those at Wikipedia that
> argue not to bother about Wikidata quality because in their view,
> Wikidatans do not care about its own quality.
>
> For instance, take a look at
> https://www.wikidata.org/wiki/User_talk:GerardM
> https://www.wikidata.org/wiki/User_talk:GerardM/Archive_1
>
> Erm
> -- [[cs:User:Mormegil | Petr Kadlec]]
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-11-20 Thread Peter Southwood
Gerard, 
Who were you expecting would respond from the Wikipedias?
Cheers,
Peter

-Original Message-
From: Wikimedia-l [mailto:wikimedia-l-boun...@lists.wikimedia.org] On Behalf Of 
Gerard Meijssen
Sent: Friday, 20 November 2015 9:18 AM
To: Wikimedia Mailing List; Research into Wikimedia content and communities; 
WikiData-l
Subject: [Wikimedia-l] Quality issues

Hoi,
At Wikidata we often find issues with data imported from a Wikipedia. Lists 
have been produced with these issues on the Wikipedia involved and arguably 
they do present issues with the quality of Wikipedia or Wikidata for that 
matter. So far hardly anything resulted from such outreach.

When Wikipedia is a black box, not communicating about with the outside world, 
at some stage the situation becomes toxic. At this moment there are already 
those at Wikidata that argue not to bother about Wikipedia quality because in 
their view, Wikipedians do not care about its own quality.

Arguably known issues with quality are the easiest to solve.

There are many ways to approach this subject. It is indeed a quality issue both 
for Wikidata and Wikipedia. It can be seen as a research issue; how to deal 
with quality and how do such mechanisms function if at all.

I blogged about it..
Thanks,
 GerardM

http://ultimategerardm.blogspot.nl/2015/11/what-kind-of-box-is-wikipedia.html
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2016.0.7227 / Virus Database: 4460/11032 - Release Date: 11/20/15


___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-11-20 Thread Gerard Meijssen
Hoi,
The difference between the use of quality images from Commons and
establishing what is correct is quite distinct. With Commons it is an
esthetic difference, with these lists it is about the credibility of the
data involved.
Thanks,
GerardM

On 20 November 2015 at 09:53, Jane Darnell  wrote:

> Gerard,
> I think this was always the case. Most Wikidatans are as at home on
> Wikipedia as they are on Commons. The issue you describe also happened to
> Commons - both communities feel the other is less focussed on quality. Many
> Commonists spend hours on high quality images and these are rarely picked
> up by Wikipedia unless a Commonist notices and does so in their own
> language. There is no requirement for Wikipedians to get to know any other
> project and this is normal wiki behavior. We don't want anyone to feel
> pressured to do anything they feel uncomfortable doing. It's already
> difficult to get Wikipedians to do small tasks like add catagories to their
> articles. The list of things necessary to create an acceptable article on
> Wikipedia just seems to get longer and longer, while the associated work
> for illustrations of that article or for data of that article is not even
> mentioned in current AfC policies on Wikipedia. I have thought about this,
> but I still think we need to break down the list of things necessary to
> make new short articles on Wikipedia, not extend the list. So in summary, I
> think that what you describe is normal predictable behavior for a
> "Wikipedia support" project such as Commons and Wikidata. This will change
> as more and more external users find out that Commons and Wikidata are
> valuable resources in and of themselves. This is already the case for many
> GLAMs which have found collaborations with Commons to be valuable
> experiences. I have high hopes this will become the case for Wikidata as
> well.
> Jane
>
> On Fri, Nov 20, 2015 at 8:18 AM, Gerard Meijssen <
> gerard.meijs...@gmail.com>
> wrote:
>
> > Hoi,
> > At Wikidata we often find issues with data imported from a Wikipedia.
> Lists
> > have been produced with these issues on the Wikipedia involved and
> arguably
> > they do present issues with the quality of Wikipedia or Wikidata for that
> > matter. So far hardly anything resulted from such outreach.
> >
> > When Wikipedia is a black box, not communicating about with the outside
> > world, at some stage the situation becomes toxic. At this moment there
> are
> > already those at Wikidata that argue not to bother about Wikipedia
> quality
> > because in their view, Wikipedians do not care about its own quality.
> >
> > Arguably known issues with quality are the easiest to solve.
> >
> > There are many ways to approach this subject. It is indeed a quality
> issue
> > both for Wikidata and Wikipedia. It can be seen as a research issue; how
> to
> > deal with quality and how do such mechanisms function if at all.
> >
> > I blogged about it..
> > Thanks,
> >  GerardM
> >
> >
> >
> http://ultimategerardm.blogspot.nl/2015/11/what-kind-of-box-is-wikipedia.html
> > ___
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > Wikimedia-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > 
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-11-20 Thread Jane Darnell
Gerard,
I think this was always the case. Most Wikidatans are as at home on
Wikipedia as they are on Commons. The issue you describe also happened to
Commons - both communities feel the other is less focussed on quality. Many
Commonists spend hours on high quality images and these are rarely picked
up by Wikipedia unless a Commonist notices and does so in their own
language. There is no requirement for Wikipedians to get to know any other
project and this is normal wiki behavior. We don't want anyone to feel
pressured to do anything they feel uncomfortable doing. It's already
difficult to get Wikipedians to do small tasks like add catagories to their
articles. The list of things necessary to create an acceptable article on
Wikipedia just seems to get longer and longer, while the associated work
for illustrations of that article or for data of that article is not even
mentioned in current AfC policies on Wikipedia. I have thought about this,
but I still think we need to break down the list of things necessary to
make new short articles on Wikipedia, not extend the list. So in summary, I
think that what you describe is normal predictable behavior for a
"Wikipedia support" project such as Commons and Wikidata. This will change
as more and more external users find out that Commons and Wikidata are
valuable resources in and of themselves. This is already the case for many
GLAMs which have found collaborations with Commons to be valuable
experiences. I have high hopes this will become the case for Wikidata as
well.
Jane

On Fri, Nov 20, 2015 at 8:18 AM, Gerard Meijssen 
wrote:

> Hoi,
> At Wikidata we often find issues with data imported from a Wikipedia. Lists
> have been produced with these issues on the Wikipedia involved and arguably
> they do present issues with the quality of Wikipedia or Wikidata for that
> matter. So far hardly anything resulted from such outreach.
>
> When Wikipedia is a black box, not communicating about with the outside
> world, at some stage the situation becomes toxic. At this moment there are
> already those at Wikidata that argue not to bother about Wikipedia quality
> because in their view, Wikipedians do not care about its own quality.
>
> Arguably known issues with quality are the easiest to solve.
>
> There are many ways to approach this subject. It is indeed a quality issue
> both for Wikidata and Wikipedia. It can be seen as a research issue; how to
> deal with quality and how do such mechanisms function if at all.
>
> I blogged about it..
> Thanks,
>  GerardM
>
>
> http://ultimategerardm.blogspot.nl/2015/11/what-kind-of-box-is-wikipedia.html
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-11-20 Thread Gerard Meijssen
Hoi,
 quality is different things  I do care about quality but I do
not necessarily agree with you how to best achieve it. Arguably bots are
better and getting data into Wikidata than people. This means that the
error rate of bots is typically better than what people do. It is all in
the percentages.

I have always said that the best way to improve quality is by comparing
sources. When Wikidata has no data, it is arguably better to import data
from any source. When the quality is 90% correct, there is already 100%
more data. When 100% is compared with another source and 85% is the same,
you only have to check 15% and decide what is right. When you compare with
two distinct sources, the percentage that differs changes again.. :) In
this way it makes sense to check errors

It does not help when you state that either party has people that care or
do not care about quality. By providing a high likelihood that something is
problematic, you will learn who actually makes a difference. It however
started with having data to compare in the first place
Thanks,
  GerardM

On 20 November 2015 at 14:50, Petr Kadlec  wrote:

> On Fri, Nov 20, 2015 at 8:18 AM, Gerard Meijssen <
> gerard.meijs...@gmail.com>
> wrote:
>
> > When Wikipedia is a black box, not communicating about with the outside
> > world, at some stage the situation becomes toxic. At this moment there
> are
> > already those at Wikidata that argue not to bother about Wikipedia
> quality
> > because in their view, Wikipedians do not care about its own quality.
> >
>
> Right. When some users blindly dump random data to Wikidata, not
> communicating about with the outside world, at some stage the situation
> becomes toxic. At this moment there are already those at Wikipedia that
> argue not to bother about Wikidata quality because in their view,
> Wikidatans do not care about its own quality.
>
> For instance, take a look at
> https://www.wikidata.org/wiki/User_talk:GerardM
> https://www.wikidata.org/wiki/User_talk:GerardM/Archive_1
>
> Erm
> -- [[cs:User:Mormegil | Petr Kadlec]]
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-11-20 Thread Gerard Meijssen
Hoi,
So far such lists have been produced for bigger Wikipedias but essentially
it is potentially an issue for any and all Wikis that have data that may
exist on Wikidata or linked through Wikidata on external sources.
Thanks,
  GerardM

On 20 November 2015 at 12:33, Peter Southwood 
wrote:

> Gerard,
> Who were you expecting would respond from the Wikipedias?
> Cheers,
> Peter
>
> -Original Message-
> From: Wikimedia-l [mailto:wikimedia-l-boun...@lists.wikimedia.org] On
> Behalf Of Gerard Meijssen
> Sent: Friday, 20 November 2015 9:18 AM
> To: Wikimedia Mailing List; Research into Wikimedia content and
> communities; WikiData-l
> Subject: [Wikimedia-l] Quality issues
>
> Hoi,
> At Wikidata we often find issues with data imported from a Wikipedia.
> Lists have been produced with these issues on the Wikipedia involved and
> arguably they do present issues with the quality of Wikipedia or Wikidata
> for that matter. So far hardly anything resulted from such outreach.
>
> When Wikipedia is a black box, not communicating about with the outside
> world, at some stage the situation becomes toxic. At this moment there are
> already those at Wikidata that argue not to bother about Wikipedia quality
> because in their view, Wikipedians do not care about its own quality.
>
> Arguably known issues with quality are the easiest to solve.
>
> There are many ways to approach this subject. It is indeed a quality issue
> both for Wikidata and Wikipedia. It can be seen as a research issue; how to
> deal with quality and how do such mechanisms function if at all.
>
> I blogged about it..
> Thanks,
>  GerardM
>
>
> http://ultimategerardm.blogspot.nl/2015/11/what-kind-of-box-is-wikipedia.html
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>
> -
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 2016.0.7227 / Virus Database: 4460/11032 - Release Date: 11/20/15
>
>
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-11-20 Thread Gnangarra
>
> ...
> *When 100% is compared with another source and 85% is the same,**you only
> have to check 15% and decide what is righ**t*


​this very statement highlights one issue that ​

​will always be a problem between Wikidata and Wikipedias. Wikipedia, at
least in my 10 years of experience on en:wp is that when you have multiple
sources that differ you highlight the existence of those ​sources and the
conflict of information  we dont decide what is right or wrong.

On 21 November 2015 at 06:35, Gerard Meijssen 
wrote:

> Hoi,
>  quality is different things  I do care about quality but I do
> not necessarily agree with you how to best achieve it. Arguably bots are
> better and getting data into Wikidata than people. This means that the
> error rate of bots is typically better than what people do. It is all in
> the percentages.
>
> I have always said that the best way to improve quality is by comparing
> sources. When Wikidata has no data, it is arguably better to import data
> from any source. When the quality is 90% correct, there is already 100%
> more data. When 100% is compared with another source and 85% is the same,
> you only have to check 15% and decide what is right. When you compare with
> two distinct sources, the percentage that differs changes again.. :) In
> this way it makes sense to check errors
>
> It does not help when you state that either party has people that care or
> do not care about quality. By providing a high likelihood that something is
> problematic, you will learn who actually makes a difference. It however
> started with having data to compare in the first place
> Thanks,
>   GerardM
>
> On 20 November 2015 at 14:50, Petr Kadlec  wrote:
>
> > On Fri, Nov 20, 2015 at 8:18 AM, Gerard Meijssen <
> > gerard.meijs...@gmail.com>
> > wrote:
> >
> > > When Wikipedia is a black box, not communicating about with the outside
> > > world, at some stage the situation becomes toxic. At this moment there
> > are
> > > already those at Wikidata that argue not to bother about Wikipedia
> > quality
> > > because in their view, Wikipedians do not care about its own quality.
> > >
> >
> > Right. When some users blindly dump random data to Wikidata, not
> > communicating about with the outside world, at some stage the situation
> > becomes toxic. At this moment there are already those at Wikipedia that
> > argue not to bother about Wikidata quality because in their view,
> > Wikidatans do not care about its own quality.
> >
> > For instance, take a look at
> > https://www.wikidata.org/wiki/User_talk:GerardM
> > https://www.wikidata.org/wiki/User_talk:GerardM/Archive_1
> >
> > Erm
> > -- [[cs:User:Mormegil | Petr Kadlec]]
> > ___
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > Wikimedia-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > 
> >
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> 
>



-- 
GN.
President Wikimedia Australia
WMAU: http://www.wikimedia.org.au/wiki/User:Gnangarra
Photo Gallery: http://gnangarra.redbubble.com
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-11-20 Thread Milos Rancic
Offtopic: Gerard, during the last half an hour or so, I am just
getting emails from you inside of this thread (including wiki-research
list). I thought my phone has a bug. It's useful to write a larger
email with addressing all the issues. Besides other things, with this
frequency, you'll spend your monthly email quota for this list the day
after tomorrow.

On Fri, Nov 20, 2015 at 11:35 PM, Gerard Meijssen
 wrote:
> Hoi,
>  quality is different things  I do care about quality but I do
> not necessarily agree with you how to best achieve it. Arguably bots are
> better and getting data into Wikidata than people. This means that the
> error rate of bots is typically better than what people do. It is all in
> the percentages.
>
> I have always said that the best way to improve quality is by comparing
> sources. When Wikidata has no data, it is arguably better to import data
> from any source. When the quality is 90% correct, there is already 100%
> more data. When 100% is compared with another source and 85% is the same,
> you only have to check 15% and decide what is right. When you compare with
> two distinct sources, the percentage that differs changes again.. :) In
> this way it makes sense to check errors
>
> It does not help when you state that either party has people that care or
> do not care about quality. By providing a high likelihood that something is
> problematic, you will learn who actually makes a difference. It however
> started with having data to compare in the first place
> Thanks,
>   GerardM
>
> On 20 November 2015 at 14:50, Petr Kadlec  wrote:
>
>> On Fri, Nov 20, 2015 at 8:18 AM, Gerard Meijssen <
>> gerard.meijs...@gmail.com>
>> wrote:
>>
>> > When Wikipedia is a black box, not communicating about with the outside
>> > world, at some stage the situation becomes toxic. At this moment there
>> are
>> > already those at Wikidata that argue not to bother about Wikipedia
>> quality
>> > because in their view, Wikipedians do not care about its own quality.
>> >
>>
>> Right. When some users blindly dump random data to Wikidata, not
>> communicating about with the outside world, at some stage the situation
>> becomes toxic. At this moment there are already those at Wikipedia that
>> argue not to bother about Wikidata quality because in their view,
>> Wikidatans do not care about its own quality.
>>
>> For instance, take a look at
>> https://www.wikidata.org/wiki/User_talk:GerardM
>> https://www.wikidata.org/wiki/User_talk:GerardM/Archive_1
>>
>> Erm
>> -- [[cs:User:Mormegil | Petr Kadlec]]
>> ___
>> Wikimedia-l mailing list, guidelines at:
>> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
>> Wikimedia-l@lists.wikimedia.org
>> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
>> 
>>
> ___
> Wikimedia-l mailing list, guidelines at: 
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
> 

___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Quality issues

2015-11-20 Thread Peter Southwood
How are you notifying the Wikipedias/Wikipedians? Do you leave a message on the 
talk page of the relevant article?
Cheers,
Peter

-Original Message-
From: Wikimedia-l [mailto:wikimedia-l-boun...@lists.wikimedia.org] On Behalf Of 
Gerard Meijssen
Sent: Saturday, 21 November 2015 12:23 AM
To: Wikimedia Mailing List
Subject: Re: [Wikimedia-l] Quality issues

Hoi,
So far such lists have been produced for bigger Wikipedias but essentially it 
is potentially an issue for any and all Wikis that have data that may exist on 
Wikidata or linked through Wikidata on external sources.
Thanks,
  GerardM

On 20 November 2015 at 12:33, Peter Southwood <peter.southw...@telkomsa.net>
wrote:

> Gerard,
> Who were you expecting would respond from the Wikipedias?
> Cheers,
> Peter
>
> -Original Message-
> From: Wikimedia-l [mailto:wikimedia-l-boun...@lists.wikimedia.org] On 
> Behalf Of Gerard Meijssen
> Sent: Friday, 20 November 2015 9:18 AM
> To: Wikimedia Mailing List; Research into Wikimedia content and 
> communities; WikiData-l
> Subject: [Wikimedia-l] Quality issues
>
> Hoi,
> At Wikidata we often find issues with data imported from a Wikipedia.
> Lists have been produced with these issues on the Wikipedia involved 
> and arguably they do present issues with the quality of Wikipedia or 
> Wikidata for that matter. So far hardly anything resulted from such outreach.
>
> When Wikipedia is a black box, not communicating about with the 
> outside world, at some stage the situation becomes toxic. At this 
> moment there are already those at Wikidata that argue not to bother 
> about Wikipedia quality because in their view, Wikipedians do not care about 
> its own quality.
>
> Arguably known issues with quality are the easiest to solve.
>
> There are many ways to approach this subject. It is indeed a quality 
> issue both for Wikidata and Wikipedia. It can be seen as a research 
> issue; how to deal with quality and how do such mechanisms function if at all.
>
> I blogged about it..
> Thanks,
>  GerardM
>
>
> http://ultimategerardm.blogspot.nl/2015/11/what-kind-of-box-is-wikiped
> ia.html ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>
>
> -
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 2016.0.7227 / Virus Database: 4460/11032 - Release Date: 
> 11/20/15
>
>
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>
>
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2016.0.7227 / Virus Database: 4460/11036 - Release Date: 11/20/15


___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

Re: [Wikimedia-l] Quality issues

2015-11-20 Thread Gerard Meijssen
Hoi,
That is indeed a problem. So far it has been lists, often well formatted
lists that do not have a workflow, are not updated regularly. I have added
these issues as a wishlist item to work on. [1]

You have to appreciate that when a list of problematic issues is listed
with over 100 items, it is no longer easy or obvious that you want to add
and follow 100 talk pages.This is one of the big differences between
Wikipedia think and Wikidata think. I care about a lot of data, data that
is linked. Analogous to the "Kevin Bacon steps of separation" I want all
items easily and obviously connected.  That is another quality goal
for Wikidata .

Given the state of Wikipedia, most articles have an article, easy and
obvious tasks like fact checking and adding sources is exactly what we are
looking for for maintaining our community. Add relevance to the cocktail,
we know that these facts are likely to have issues, and you appreciate why
this may help us with our quality and with our community issues.
Thanks,
 GerardM


[1]
https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey#Visibility_for_quality_issues

On 21 November 2015 at 07:11, Peter Southwood <peter.southw...@telkomsa.net>
wrote:

> How are you notifying the Wikipedias/Wikipedians? Do you leave a message
> on the talk page of the relevant article?
> Cheers,
> Peter
>
> -Original Message-
> From: Wikimedia-l [mailto:wikimedia-l-boun...@lists.wikimedia.org] On
> Behalf Of Gerard Meijssen
> Sent: Saturday, 21 November 2015 12:23 AM
> To: Wikimedia Mailing List
> Subject: Re: [Wikimedia-l] Quality issues
>
> Hoi,
> So far such lists have been produced for bigger Wikipedias but essentially
> it is potentially an issue for any and all Wikis that have data that may
> exist on Wikidata or linked through Wikidata on external sources.
> Thanks,
>   GerardM
>
> On 20 November 2015 at 12:33, Peter Southwood <
> peter.southw...@telkomsa.net>
> wrote:
>
> > Gerard,
> > Who were you expecting would respond from the Wikipedias?
> > Cheers,
> > Peter
> >
> > -Original Message-
> > From: Wikimedia-l [mailto:wikimedia-l-boun...@lists.wikimedia.org] On
> > Behalf Of Gerard Meijssen
> > Sent: Friday, 20 November 2015 9:18 AM
> > To: Wikimedia Mailing List; Research into Wikimedia content and
> > communities; WikiData-l
> > Subject: [Wikimedia-l] Quality issues
> >
> > Hoi,
> > At Wikidata we often find issues with data imported from a Wikipedia.
> > Lists have been produced with these issues on the Wikipedia involved
> > and arguably they do present issues with the quality of Wikipedia or
> > Wikidata for that matter. So far hardly anything resulted from such
> outreach.
> >
> > When Wikipedia is a black box, not communicating about with the
> > outside world, at some stage the situation becomes toxic. At this
> > moment there are already those at Wikidata that argue not to bother
> > about Wikipedia quality because in their view, Wikipedians do not care
> about its own quality.
> >
> > Arguably known issues with quality are the easiest to solve.
> >
> > There are many ways to approach this subject. It is indeed a quality
> > issue both for Wikidata and Wikipedia. It can be seen as a research
> > issue; how to deal with quality and how do such mechanisms function if
> at all.
> >
> > I blogged about it..
> > Thanks,
> >  GerardM
> >
> >
> > http://ultimategerardm.blogspot.nl/2015/11/what-kind-of-box-is-wikiped
> > ia.html ___
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > Wikimedia-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>
> >
> > -
> > No virus found in this message.
> > Checked by AVG - www.avg.com
> > Version: 2016.0.7227 / Virus Database: 4460/11032 - Release Date:
> > 11/20/15
> >
> >
> > ___
> > Wikimedia-l mailing list, guidelines at:
> > https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> > Wikimedia-l@lists.wikimedia.org
> > Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l,
> > <mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>
> >
> ___
> Wikimedia-l mailing list, guidelines at:
> https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
> Wikimedia-l@lists.wikimedia.org
> Unsubscribe: https://lists.w

<    1   2