[+Ruben Verborgh] Salut Maxime,
I wonder if this is something Ruben's Linked Data Fragments (http://linkeddatafragments.org/) could solve in a fast enough manner?! I let Ruben chime in (if he wants). Cheers, Tom On Thu, Apr 28, 2016 at 12:26 PM, Maxime Lathuilière <[email protected]> wrote: > Hello! > > Context: > For the needs of inventaire.io, I'm working on a type-filtered autocomplete, > that is, a field with suggestions but with suggestions matching a given > claim, typically an "author" input where I would like to suggest only > entities that match the claim P31:Q5 (instance of -> human). > > The dream would be to have "filter" option in the wbsearchentities module, > to be able to do things like > https://www.wikidata.org/w/api.php?action=wbsearchentities&limit=10&format=json&search=victor&filter=P31:Q5 > > As far as I know, this isn't possible yet. One could search without filter, > then fetch the related entities with their claims data, then filter on those > claims, but this is rather slow for such an autocomplete feature that needs > to be snappy. So the alternative approach I have been working on to is to > get a subset of a Wikidata dump and put it in an ElasticSearch instance. > > Question: > What is the best way to get all the entities matching a given claim? > My answer so far was downloading a dump, then filtering the entities by > claim, but are there better/less resource-intensive ways? > The only other alternative I see would be a SPARQL query without specifying > a LIMIT (which in the case of P31:Q5 is probably in the millions(?)) to get > all the desired ids, then using wbgetentities to get the data 50 by 50 to > work around the API limitations, but those limitations are there for > something right? > As those who manage the servers that would be stressed by one or the other > way, what seems the less painful to recommend? ^^ > > Thanks in advance for any clue! > > New tools: > - To make a filtered dump, I wrote a small command-line tool: > wikidata-filter > It can filter a dump but also any set of Wikidata entities in a > newline-delimited json file, hope it can be helpful to other people! > - The whole search engine setup can be found here: > wikidata-subset-search-engine > > Clues and comments welcome! > > Greetings, > > Maxime > > -- > Maxime Lathuilière > maxlath.eu - twitter > inventaire.io - roadmap - code - twitter - facebook > wiki(pedia|data): Zorglub27 > for personal emails use [email protected] instead -- Dr. Thomas Steiner, Employee (http://blog.tomayac.com, https://twitter.com/tomayac) Google Germany GmbH, ABC-Str. 19, 20354 Hamburg, Germany Managing Directors: Matthew Scott Sucherman, Paul Terence Manicle Registration office and registration number: Hamburg, HRB 86891 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.29 (GNU/Linux) iFy0uwAntT0bE3xtRa5AfeCheCkthAtTh3reSabiGbl0ck0fjumBl3DCharaCTersAttH3b0ttom hTtPs://xKcd.cOm/1181/ -----END PGP SIGNATURE----- _______________________________________________ Wikidata mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata
