Dear all, I would like to extract all links to "externalids" contained in a Wikidata dump (the truthy ntriples version wikidata_latest-truthy.nt.bz2).
I thought I could trust the prefix http://www.wikidata.org/prop/direct-normalized, so I used the following ripgrep command: rg --search-zip -F "http://www.wikidata.org/prop/direct-normalized" wikidata_latest-truthy.nt.bz2 | pv > wikidata-extids.txt But I get as a result a little less than 29.5 million lines. Pubmed and DOI, which alone account for about 33 million statements, are not included. So, I wonder if I understood correctly the meaning of " http://www.wikidata.org/prop/direct-normalized". Can someone give me an explanation or a pointer? Cheers, Ettore Rizza
_______________________________________________ Wikidata mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata
