On Sat, 21 Dec 2019 at 17:25, Lydia Pintscher <[email protected]> wrote:
> On Thu, Dec 19, 2019 at 11:16 PM Aidan Hogan <[email protected]> wrote: > > - @Lydia, good point! I was thinking that filtering by wikilinks will > > just drop some more obscure nodes (like Q51366847 for example), but had > > not considered that there are some more general "concepts" that do not > > have a corresponding Wikipedia article. All the same, in a lot of the > > research we use Wikidata for, we are not particularly interested in one > > thing or another, but more interested in facilitating what other people > > are interested in. Examples would be query performance, finding paths, > > versioning, finding references, etc. But point taken! Maybe there is a > > way to identify "general entities" that do not have wikilinks, but do > > have a high degree or centrality, for example? Would a degree-based or > > centrality-based filter be possible in something like WDumper (perhaps > > it goes beyond the original purpose; certainly it does not seem trivial > > in terms of resources used)? Would it be a good idea? > > I think it's definitely worth exploring but I fear it needs someone to > actually sit down and collect the different dumps use-cases and talk > to people to figure out which part of the data they need. Based on > that we could identify common patterns. Yeah, there are a bunch of quite varied motivations for subsets. I have found the topic of Wikidata subsetting and data dumps coming up again and again. Most recently in a lifescience/bioinformations setting which is how we ended up collecting raw materials in the doc already shared here, https://docs.google.com/document/d/1MmrpEQ9O7xA6frNk6gceu_IbQrUiEYGI9vcQjDvTL9c but also in other domains. If people here care to drop use cases, thoughts and notes (*however scrappy*) into that doc I will make a pass over it to try to pull together a more readable summary of the various motivations for subsetting. The work Adam wrote up at https://addshore.com/2019/10/your-own-wikidata-query-service-with-no-limits-part-1/ is also very relevant... (I think this is something > that needs to be done but unfortunately can't dedicate time to it in > the foreseeable future. https://phabricator.wikimedia.org/T46581 is a > good place for people who want to help think it through. That is also a fine place to record things! I don’t mean to fork the discussion. Maybe we could have a call for interested parties in the new year? Dan > > > Cheers > Lydia > > -- > Lydia Pintscher - http://about.me/lydia.pintscher > Product Manager for Wikidata > > Wikimedia Deutschland e.V. > Tempelhofer Ufer 23-24 > 10963 Berlin > www.wikimedia.de > > Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. > > Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg > unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das > Finanzamt für Körperschaften I Berlin, Steuernummer 27/029/42207. > > _______________________________________________ > Wikidata mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikidata >
_______________________________________________ Wikidata mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata
