+1

On Tue, Dec 17, 2019, 19:14 Aidan Hogan <aid...@gmail.com> wrote:

> Hey all,
>
> As someone who likes to use Wikidata in their research, and likes to
> give students projects relating to Wikidata, I am finding it more and
> more difficult to (recommend to) work with recent versions of Wikidata
> due to the increasing dump sizes, where even the truthy version now
> costs considerable time and machine resources to process and handle. In
> some cases we just grin and bear the costs, while in other cases we
> apply an ad hoc sampling to be able to play around with the data and try
> things quickly.
>
> More generally, I think the growing data volumes might inadvertently
> scare people off taking the dumps and using them in their research.
>
> One idea we had recently to reduce the data size for a student project
> while keeping the most notable parts of Wikidata was to only keep claims
> that involve an item linked to Wikipedia; in other words, if the
> statement involves a Q item (in the "subject" or "object") not linked to
> Wikipedia, the statement is removed.
>
> I wonder would it be possible for Wikidata to provide such a dump to
> download (e.g., in RDF) for people who prefer to work with a more
> concise sub-graph that still maintains the most "notable" parts? While
> of course one could compute this from the full-dump locally, making such
> a version available as a dump directly would save clients some
> resources, potentially encourage more research using/on Wikidata, and
> having such a version "rubber-stamped" by Wikidata would also help to
> justify the use of such a dataset for research purposes.
>
> ... just an idea I thought I would float out there. Perhaps there is
> another (better) way to define a concise dump.
>
> Best,
> Aidan
>
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to