Re: [Wikitech-l] Arbitrary Wikidata querying

Daniel Kinzler Wed, 14 Dec 2016 15:41:33 -0800

Very well put, Stas, thank you!

Am 13.12.2016 um 07:23 schrieb Stas Malyshev:
> Hi!
> 
>> If I wanted to make a page on the English Wikipedia using wikitext called
>> "List of United States presidents" that dynamically embeds information
>> from <https://www.wikidata.org/wiki/Q23> and
>> <https://www.wikidata.org/wiki/Q11806> and other similar items, is this
>> currently possible? I consider this to be arbitrary Wikidata querying, but
>> if that's not the correct term, please let me know what to call it.
> 
> So this is kind of can of worms which we I guess eventually have to
> open, but very carefully. So I want to state my _current_ opinion on the
> matters - please note, it can change at any time due to changing
> circumstances, persuasion, experience, revelation, etc.
> 
> 1. Technically, anything that can access a web-service and speak JSON,
> can talk to SPARQL server. So, in theory, making some way to do this,
> *in theory*, would not be very hard. But - please keep reading.
> 
> 2. I am very apprehensive about having direct link between any wiki
> pages and SPARQL server without heavy caching and rate limiting in
> between. We don't have super-strong setup there and I'm afraid making
> such link would just knock our setup over, especially if people start
> putting queries into frequently-used templates.
> 
> 3. We have number of bot setups (Listeria etc.) which can auto-update
> lists from SPARQL periodically. This works reasonably well (excepting
> occasional timeout on tricky queries, etc.) and does not require
> requesting the info too frequently.
> 
> 4. If we want more direct page-to-SPARQL-to-page interface, we need to
> think about storing/caching data, and not for 5 minutes like it's cached
> now but for much longer time, probably in storage other than varnish.
> Ideally, that storage would be more of a persistent store than a cache -
> i.e. it would always (or nearly always) be available but periodically
> updated. Kind of like bots mentioned above but more generic. I don't
> have any more design for it beyond that but that's I think the direction
> we should be looking into.
> 
>> A more advanced form of this Wikidata querying would be dynamically
>> generating a list of presidents of the United States by finding every
>> Wikidata item where position held includes "President of the United
>> States". Is this currently possible on-wiki or via wikitext?
> 
> No, and there are tricky parts there. Consider
> https://www.wikidata.org/wiki/Q735712. Yes, Lex Luthor held the office
> of the President of the USA. In a fictional universe, of course. But the
> naive query - every
> Wikidata item where position held includes "President of the United
> States" - would return Lex Luthor as the president as legitimate as
> Abraham Lincoln. In fact, there are 79 US presidents judging by
> "position held" alone. So clearly, there need to be some limits. And
> those limits would be on case-by-case basis.
> 
>> If either of these querying capabilities are possible, how do I do them?
>> I don't understand how to query Wikidata in a useful way and I find this
>> frustrating. Since 2012, we've been putting a lot of data into Wikidata,
>> but I want to programmatically extract some of this data and use it in my
>> Wikipedia editing. How do I do this?
> 
> Right now the best way is use one of the list-maintaining bots I think.
> Unless you're talking about pulling a small set of values, in which case
> Lua/templates are probably the best venue.
> 
>> If these querying capabilities are not currently possible, when might they
>> be? I understand that cache invalidation is difficult and that this will
>> need a sensible editing user interface, but I don't care about all of
>> that, I just want to be able to query data out of this large data store.
> 
> We're working on it (mostly thinking right now, but correct design is
> 80% of the work, so...). Visualizations already have query capabilities
> (mainly because they have strong caching model embedded and because
> there are not too many of them and you need to create them so we can
> watch the load carefully). Other pages can gain them - probably via some
> kind of Lua functionality - as soon as we figure out what's the right
> way to do it, hopefully somewhere within the next year (no promise, but
> hopefully).
>



-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Arbitrary Wikidata querying

Reply via email to