> I really feel we are drowning in a glass of water.
> The issue of "data quality" or "reliability" that Andreas raises is well
> known:
> what I don't understand if the "scale" of it is much bigger on Wikidata
> than Wikipedia,
> and if this different scale makes it much more important. The scale of the
> issue is maybe something worth discussing, and not the issue itself? Is the
> fact that Wikidata is centralised different from statements on Wikipedia? I
> don't know, but to me this is a more neutral and interesting question.

Wikidata's (envisaged) centralised nature certainly makes a difference,
because the promise was that it would inform the Wikipedias.

Wikipedia started out with people just writing from their personal
knowledge. The early articles had no footnotes. Then after a while people
noticed problems like cranks filling pages with their abstruse theories
(hence the ban on original research), people adding material from their
blogs, etc. Over the course of a decade, Wikipedia developed the idea and
the culture that you have to cite a professionally published source for
everything you add to Wikipedia.

Wikidata is in its early stages. In a way it really is like Wikipedia in
2003. New content welcome! No references required!

But at the same time, Wikidata is supposed to inform the Wikipedias, as a
central data repository. This creates a mismatch between Wikidata's "early
days -- anything goes, let's just get content in, we'll sort it out later"
attitude and the relatively mature Wikipedias where editors insist on
sources for any new content added.

This out-of-synch-ness is a real problem if you want Wikipedias to actually
use Wikidata content. Wikipedians will not accept content generation models
that take Wikipedia back to its bad old days where you could write anything
you liked without a source to back it up.

Wikipedia is of course still a long way away from citing such sources for
all its content. There are vast amounts of legacy material left over from
the early days. But in the pages that are being created now (like
developing news stories, an area where the quality of Wikipedia's coverage
is often praised), pages that see a lot of traffic, pages that are
controversial, etc., it is well established that you have to cite sources
for any new assertions.

Unsourced content is unceremoniously deleted.

If Wikipedia's reputation for reliability has improved since 2003, that
change in culture from the early days is the reason.

The Age for example published an article the other day that is probably one
of the most celebratory articles ever written about Wikipedia.[1] If you're
a Wikipedian, you'll probably enjoy reading it.

Among the aspects that the author, Elizabeth Farrelly, said she liked most
about Wikipedia was "its ruthless commitment to the printed, demonstrable
source." She ended the article as follows:


But most interesting to me is the ban on primary research. The demand that
every input be traced to a published and authoritative source doesn't make
it true, necessarily, but does enable genuine crowd-sourcing of
scholarship. This is a revelation, and a revolution.

So yes, Wikipedia is flawed. Above all, it needs more female input. But the
obvious response, for you-and-me users who encounter something stupid or
biased or just plain wrong, is to hop in there and fix it. I'll see you
there, yes? Oh, and honey? Cite away!


Abandoning the principles that have elicited such praise -- traceability to
published sources, verifiable citations -- is not something Wikipedians
will entertain. To them, it would be a step back. If Wikidata wants to be
an input to Wikipedia, it will have to bear that in mind.


> I often say that the Wikimedia world made quality an "heisemberghian"
> feature: you always have to check if it's there.
> The point is: it's been always like this.
> We always had to check for quality, even when we used Britannica or
> authority controls or whatever "reliable" sources we wanted. Wikipedia, and
> now Wikidata, is made for everyone to contribute, it's open and honest in
> being open, vulnerable, prone to errors. But we are transparent, we say
> that in advance,  we can claim any statement to the smallest detail. Of
> course it's difficult, but we can do it. Wikidata, as Lydia said, can
> actually have conflicting statements in every item: we "just" have to put
> them there, as we did to Wikipedia.
> If Google uses our data and they are wrong, that's bad for them. If they
> correct the errors and do not give us the corrections, that's bad for us
> and not ethical from them. The point is: there is no license (for what I
> know) that can force them to contribute to Wikidata. That is, IMHO, the
> problem with "over-the-top" actors: they can harness collective intelligent
> and "not give back." Even with CC-BY-SA, they could store (as they are
> probably already doing) all the data in their knowledge vault, which is
> secret as it is an incredible asset for them.
> I'd be happy to insert a new clause of "forced transparency" in CC-BY-SA or
> CC0, but it's not there.
> So, as we are  working via GLAMs with Wikipedia for getting reliable
> sources and content, we are working with them also for good statements and
> data. Putting good data in Wikidata makes it better, and I don't understand
> what is the problem here (I understand, again, the issue of putting too
> much data and still having a small community).
> For example: if we are importing different reliable databases, andthe
> institutions behind them find it useful and helpful to have an aggregator
> of identifiers and authority controls, what is the issue? There is value in
> aggregating data, because you can spot errors and inconsistencies. It's not
> easy, of course, to find a good workflow, but, again, that is *another*
> problem.
> So, in conclusion: I find many issues in Wikidata, but not on the
> mission/vision, just in the complexity of the project, the size of the
> dataset, the size of the community.
> Can we talk about those?
> Aubrey
