+1 On Tue, Dec 17, 2019, 19:14 Aidan Hogan <aid...@gmail.com> wrote:
> Hey all, > > As someone who likes to use Wikidata in their research, and likes to > give students projects relating to Wikidata, I am finding it more and > more difficult to (recommend to) work with recent versions of Wikidata > due to the increasing dump sizes, where even the truthy version now > costs considerable time and machine resources to process and handle. In > some cases we just grin and bear the costs, while in other cases we > apply an ad hoc sampling to be able to play around with the data and try > things quickly. > > More generally, I think the growing data volumes might inadvertently > scare people off taking the dumps and using them in their research. > > One idea we had recently to reduce the data size for a student project > while keeping the most notable parts of Wikidata was to only keep claims > that involve an item linked to Wikipedia; in other words, if the > statement involves a Q item (in the "subject" or "object") not linked to > Wikipedia, the statement is removed. > > I wonder would it be possible for Wikidata to provide such a dump to > download (e.g., in RDF) for people who prefer to work with a more > concise sub-graph that still maintains the most "notable" parts? While > of course one could compute this from the full-dump locally, making such > a version available as a dump directly would save clients some > resources, potentially encourage more research using/on Wikidata, and > having such a version "rubber-stamped" by Wikidata would also help to > justify the use of such a dataset for research purposes. > > ... just an idea I thought I would float out there. Perhaps there is > another (better) way to define a concise dump. > > Best, > Aidan > > _______________________________________________ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata > >
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata