Hello all!

Le mar. 17 déc. 2019 à 18:15, Aidan Hogan <[email protected]> a écrit :
>
> Hey all,
>
> As someone who likes to use Wikidata in their research, and likes to
> give students projects relating to Wikidata, I am finding it more and
> more difficult to (recommend to) work with recent versions of Wikidata
> due to the increasing dump sizes, where even the truthy version now
> costs considerable time and machine resources to process and handle.

Maybe that is a software problem? What tools do you use to process the dump?

> More generally, I think the growing data volumes might inadvertently
> scare people off taking the dumps and using them in their research.
>
> One idea we had recently to reduce the data size for a student project
> while keeping the most notable parts of Wikidata was to only keep claims
> that involve an item linked to Wikipedia; in other words, if the
> statement involves a Q item (in the "subject" or "object") not linked to
> Wikipedia, the statement is removed.

One similar scheme will be to only keep concepts that are part of wikipedia
vital article [0] and their neighboors (to be defined).

[0] https://en.wikipedia.org/wiki/Wikipedia:Vital_articles/Level/5

Related to wikipedia vital articles, for which I only know the english version,
the problem with that is that wikipedia vital articles are not
available in structured
format.  I made a few month back a proposal to add that information to wikidata,
I had no feedback.  There is https://www.wikidata.org/wiki/Q43375360.
Not sure where to go from there.

> I wonder would it be possible for Wikidata to provide such a dump to
> download (e.g., in RDF) for people who prefer to work with a more
> concise sub-graph that still maintains the most "notable" parts?

The best thing would be to allow people to create their own vital wikidata
concepts, similar to how there is custom wikipedia vital lists and taking
inspiration from to the tool that was released recently.

> While
> of course one could compute this from the full-dump locally, making such
> a version available as a dump directly would save clients some
> resources, potentially encourage more research using/on Wikidata, and
> having such a version "rubber-stamped" by Wikidata would also help to
> justify the use of such a dataset for research purposes.

I agree.

> ... just an idea I thought I would float out there. Perhaps there is
> another (better) way to define a concise dump.
>
> Best,
> Aidan
>
> _______________________________________________
> Wikidata mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikidata



-- 
Amirouche ~ https://hyper.dev

_______________________________________________
Wikidata mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to