Hi everyone,

Benno (in CC) has recently announced this tool:
https://tools.wmflabs.org/wdumps/

I haven't checked it out yet, but it sounds related to Aidan's inquiry.
Hope this helps.

Cheers,

Marco

On 12/18/19 8:01 AM, Edgard Marx wrote:
+1

On Tue, Dec 17, 2019, 19:14 Aidan Hogan <[email protected] <mailto:[email protected]>> wrote:

    Hey all,

    As someone who likes to use Wikidata in their research, and likes to
    give students projects relating to Wikidata, I am finding it more and
    more difficult to (recommend to) work with recent versions of Wikidata
    due to the increasing dump sizes, where even the truthy version now
    costs considerable time and machine resources to process and handle. In
    some cases we just grin and bear the costs, while in other cases we
    apply an ad hoc sampling to be able to play around with the data and
    try
    things quickly.

    More generally, I think the growing data volumes might inadvertently
    scare people off taking the dumps and using them in their research.

    One idea we had recently to reduce the data size for a student project
    while keeping the most notable parts of Wikidata was to only keep
    claims
    that involve an item linked to Wikipedia; in other words, if the
    statement involves a Q item (in the "subject" or "object") not
    linked to
    Wikipedia, the statement is removed.

    I wonder would it be possible for Wikidata to provide such a dump to
    download (e.g., in RDF) for people who prefer to work with a more
    concise sub-graph that still maintains the most "notable" parts? While
    of course one could compute this from the full-dump locally, making
    such
    a version available as a dump directly would save clients some
    resources, potentially encourage more research using/on Wikidata, and
    having such a version "rubber-stamped" by Wikidata would also help to
    justify the use of such a dataset for research purposes.

    ... just an idea I thought I would float out there. Perhaps there is
    another (better) way to define a concise dump.

    Best,
    Aidan

    _______________________________________________
    Wikidata mailing list
    [email protected] <mailto:[email protected]>
    https://lists.wikimedia.org/mailman/listinfo/wikidata


_______________________________________________
Wikidata mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata


_______________________________________________
Wikidata mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to