Re: [Wikidata] question on claim-filtered search/dump and works on a Wikidata subset search engine

Thomas Steiner Thu, 28 Apr 2016 05:55:02 -0700

[+Ruben Verborgh]

Salut Maxime,


I wonder if this is something Ruben's Linked Data Fragments
(http://linkeddatafragments.org/) could solve in a fast enough
manner?! I let Ruben chime in (if he wants).

Cheers,
Tom

On Thu, Apr 28, 2016 at 12:26 PM, Maxime Lathuilière <[email protected]> wrote:
> Hello!
>
> Context:
> For the needs of inventaire.io, I'm working on a type-filtered autocomplete,
> that is, a field with suggestions but with suggestions matching a given
> claim, typically an "author" input where I would like to suggest only
> entities that match the claim P31:Q5 (instance of -> human).
>
> The dream would be to have "filter" option in the wbsearchentities module,
> to be able to do things like
> https://www.wikidata.org/w/api.php?action=wbsearchentities&limit=10&format=json&search=victor&filter=P31:Q5
>
> As far as I know, this isn't possible yet. One could search without filter,
> then fetch the related entities with their claims data, then filter on those
> claims, but this is rather slow for such an autocomplete feature that needs
> to be snappy. So the alternative approach I have been working on to is to
> get a subset of a Wikidata dump and put it in an ElasticSearch instance.
>
> Question:
> What is the best way to get all the entities matching a given claim?
> My answer so far was downloading a dump, then filtering the entities by
> claim,  but are there better/less resource-intensive ways?
> The only other alternative I see would be a SPARQL query without specifying
> a LIMIT (which in the case of P31:Q5 is probably in the millions(?)) to get
> all the desired ids, then using wbgetentities to get the data 50 by 50 to
> work around the API limitations, but those limitations are there for
> something right?
> As those who manage the servers that would be stressed by one or the other
> way, what seems the less painful to recommend? ^^
>
> Thanks in advance for any clue!
>
> New tools:
> - To make a filtered dump, I wrote a small command-line tool:
> wikidata-filter
>   It can filter a dump but also any set of Wikidata entities in a
> newline-delimited json file, hope it can be helpful to other people!
> - The whole search engine setup can be found here:
> wikidata-subset-search-engine
>
> Clues and comments welcome!
>
> Greetings,
>
> Maxime
>
> --
> Maxime Lathuilière
> maxlath.eu - twitter
> inventaire.io - roadmap - code - twitter - facebook
> wiki(pedia|data): Zorglub27
> for personal emails use [email protected] instead

-- 
Dr. Thomas Steiner, Employee (http://blog.tomayac.com,
https://twitter.com/tomayac)

Google Germany GmbH, ABC-Str. 19, 20354 Hamburg, Germany
Managing Directors: Matthew Scott Sucherman, Paul Terence Manicle
Registration office and registration number: Hamburg, HRB 86891

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.29 (GNU/Linux)

iFy0uwAntT0bE3xtRa5AfeCheCkthAtTh3reSabiGbl0ck0fjumBl3DCharaCTersAttH3b0ttom
hTtPs://xKcd.cOm/1181/
-----END PGP SIGNATURE-----

_______________________________________________
Wikidata mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] question on claim-filtered search/dump and works on a Wikidata subset search engine

Reply via email to