Re: [Wikidata] [Xmldatadumps-l] Availability of Wikidata JSON dumps after Feb, 2019

Gerhard Gonter Wed, 25 Nov 2020 08:30:25 -0800

On Wed, Nov 25, 2020 at 4:41 PM Thad Guidry <[email protected]> wrote:
>
> Gerhard,
>
> I'm curious what you mean by "processing" and "comb through".
> Can you describe how your processing and what system or database the output 
> gets loaded into?


I'm doing embarrassingly little with the data yet and there is no
regular database involved.  The processor mainly looks for properties
I defined beforehand and transcribes relevant information into TSV
files. That's the "comb through" part of my mail.  The processor reads
each line of the decompressed dump stream which represents one
Wikidata item, looks for those properties and also writes each item
individually compressed into a output files which are later indexed to
access items directly if  I later want to look at one of them.  That's
about all.

> Perhaps you have your scripts publicly available on something like GitHub?

Yes, it is available from Github:
https://github.com/gonter/wikidata-dump-processor

> It would be nice to know a bit more on what you also are doing.  Thanks in 
> advance!

Mainly I'm looking for items with GND identifieiers and related
identifiers such as VIAF, ORCID, etc.  However, this data is currently
not used anywhere, but maybe I'll do that later.

regards, Gerhard Gonter

_______________________________________________
Wikidata mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] [Xmldatadumps-l] Availability of Wikidata JSON dumps after Feb, 2019

Reply via email to