Sure working with linked data is great. But sometimes data is not linked
at all and has no identifiers...
That's where the work Antonin is doing with OpenRefine helps with
reconciling when even there are no identifiers other than a name. Many
datasets only have Strings as Things. In fact, I'd say its quite useful to
not only *add additional statements about existing Things* we already have,
but *also adding more Things* in the world that have yet to be included in
a database like Wikidata where no identifiers have been created yet for
And I'm just as upset as you are about the goldmine of data still locked up
in Freebase. But don't worry, baby steps, and eventually that data will
make its way into Wikidata. Getting the Primary Sources tool up to par is
a big step towards that, but certainly not the end of the line.
On Tue, Aug 8, 2017 at 6:57 AM Gerard Meijssen <gerard.meijs...@gmail.com>
> Given that Wikidata has identifiers to many external sources the challenge
> of reconciliation is often less of a challenge for crowds and less of a
> challenge than it needs to be. A few examples; the OCLC maintains two
> distinct identifiers; VIAF and ISNI. They are both actively maintained.
> When we include VIAF numbers in Wikidata, there will be instances where the
> identifiers become redirects. The same is true for ISNI. When we have the
> latest VIAF numbers, the ISNI numbers are highly likely to be correct.
> (better than 95% - the minimum requirements for imports at ISNI)..
> When we share our identifiers regularly, we will learn about redirects and
> gain the direct links. We shared our identiers and VIAF identifiers with
> the Open Library. They now include them and in return we received a file
> that helped us depuplicate our Open Library identifiers and replace the
> redirects. What is infuriating is that there are Open Library identifiers
> hidden in the Freebase data. They cannot be exported, we can not send them
> to OL for processing and import them in Wikidata. We do a subpar job as a
> Another project where we will gain information from multiple sources is
> the Biodiversity Heritage Library. We may gain links through their
> collaboration with the Internet Archive and the OCLC. This will reduce the
> chances for the introduction of duplicates at our end because of shared
> identifiers. I will also reduce the amount or people we have to process
> before they are included in Wikidata. It will allow for both OCLC, BHL and
> IA to learn of identifiers as we have them allowing for subsequent
> improvement is quality in the future for all of us.
> So in my opinion we should agressively share identifiers, collaborate and
> seek the redirects and replace them and become more and more a focal point
> for links between resources.
> On 8 August 2017 at 11:13, Marco Fossati <foss...@spaziodati.eu> wrote:
>> Hi Antonin,
>> On 8/7/17 20:36, Antonin Delpeuch (lists) wrote:
>>> Does anybody know an alternative to CrowdFlower that can be used for
>>> free with volunteer workers?
>> There you go: https://crowdcrafting.org/
>> Hope this helps you keep up with your great work on openrefine.
>> I believe entity reconciliation is one of the most challenging tasks that
>> keep third-party data providers away from imports to Wikidata.
>> Wikidata mailing list
> Wikidata mailing list
Wikidata mailing list