Given that Wikidata has identifiers to many external sources the challenge
of reconciliation is often less of a challenge for crowds and less of a
challenge than it needs to be. A few examples; the OCLC maintains two
distinct identifiers; VIAF and ISNI. They are both actively maintained.
When we include VIAF numbers in Wikidata, there will be instances where the
identifiers become redirects. The same is true for ISNI. When we have the
latest VIAF numbers, the ISNI numbers are highly likely to be correct.
(better than 95% - the minimum requirements for imports at ISNI)..
When we share our identifiers regularly, we will learn about redirects and
gain the direct links. We shared our identiers and VIAF identifiers with
the Open Library. They now include them and in return we received a file
that helped us depuplicate our Open Library identifiers and replace the
redirects. What is infuriating is that there are Open Library identifiers
hidden in the Freebase data. They cannot be exported, we can not send them
to OL for processing and import them in Wikidata. We do a subpar job as a
Another project where we will gain information from multiple sources is
the Biodiversity Heritage Library. We may gain links through their
collaboration with the Internet Archive and the OCLC. This will reduce the
chances for the introduction of duplicates at our end because of shared
identifiers. I will also reduce the amount or people we have to process
before they are included in Wikidata. It will allow for both OCLC, BHL and
IA to learn of identifiers as we have them allowing for subsequent
improvement is quality in the future for all of us.
So in my opinion we should agressively share identifiers, collaborate and
seek the redirects and replace them and become more and more a focal point
for links between resources.
On 8 August 2017 at 11:13, Marco Fossati <foss...@spaziodati.eu> wrote:
> Hi Antonin,
> On 8/7/17 20:36, Antonin Delpeuch (lists) wrote:
>> Does anybody know an alternative to CrowdFlower that can be used for
>> free with volunteer workers?
> There you go: https://crowdcrafting.org/
> Hope this helps you keep up with your great work on openrefine.
> I believe entity reconciliation is one of the most challenging tasks that
> keep third-party data providers away from imports to Wikidata.
> Wikidata mailing list
Wikidata mailing list