Dear all,

I would like to extract all links to "externalids" contained in a Wikidata
dump (the truthy ntriples version wikidata_latest-truthy.nt.bz2).

I thought I could trust the prefix
http://www.wikidata.org/prop/direct-normalized, so I used the following
ripgrep command:

rg --search-zip -F "http://www.wikidata.org/prop/direct-normalized";
wikidata_latest-truthy.nt.bz2 | pv > wikidata-extids.txt

But I get as a result a little less than 29.5 million lines. Pubmed and
DOI, which alone account for about 33 million statements, are not included.

So, I wonder if I understood correctly the meaning of "
http://www.wikidata.org/prop/direct-normalized";. Can someone give me an
explanation or a pointer?

Cheers,

Ettore Rizza
_______________________________________________
Wikidata mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to