Re: [Wikitech-l] Arbitrary Wikidata querying

2016-12-16 Thread Magnus Manske
I strongly support "native" Wikipedia lists using Wikidata queries, and by
that I mean proper SPARQL, not Lua hacks.

Listeria is used "in production", e.g. on Welsh (about 17.000 lists in
articles, see https://tools.wmflabs.org/listeria/botstatus.php), but it was
always intended as a proof of concept. It is also designed around WDQ and
only later retrofitted for SPARQL, which explains some of its peculiarities.

It can handle ~23K lists per day, easily, without any caching. I believe
(naively, perhaps) an extension would be feasible that renders "row
templates" based on SPARQL queries. No Lua needs to be involved in this, or
current Wikidata "fact transclusion". In a first iteration, it might not
even have an automatic update mechanism:
* Render some  construct based on SPARQL
* Tag pages with such tags in the database, or even through categories
* Have an external service/bot purge these pages on a regular basis; that
would update the list without the need of editing the page
* These automated updates could be staged by the time the SPARQL query
required on the last update - <2sec once/day, >10sec once/week etc.
* Have an "update now!" button (as I have on Listeria lists) that just
links to "action=purge", for the impatient (instant gratification)

The Wikipedia setup wasn't always as heavily cached as it is today; it grew
with usage. I believe we could do this for Wikidata-based lists as well, as
the WMF would control the update cycles.

On Fri, Dec 16, 2016 at 7:30 AM Stas Malyshev 
wrote:

> Hi!
>
> > Actually, specifically for list of presidents you don't need bot.
>
> Yeah, you are right, I was thinking about going through query route, but
> if your list is contained in one property (like Q30/P6) then using Lua
> is just fine. It's not always the case (e.g. "list of all movies where
> Brad Pitt played"). But where it works it's definitely a good way to go.
>
> > 3. It is limited to simple lists (you can't have list of Republican
> > presidents - because it requires additional filters and you don't want to
> > create new property for it)
>
> Exactly. You probably could still do something in Lua, but that's
> pushing it already.
>
> > 4. Internationalization - What if yi Wikipedia wants to create list of
> > governors of some small country where there are no yi labels for the
> > presidents? The list would be partially in yi partially in en - is this
> > desired behavior? or they can show only presidents who have label in yi -
> > but this would give partial data - is this the desired behavior?
> [Probably
> > the correct solution is to do show the fallback labels in en, but add
> some
> > tracking category for pages requires label translation or [translate me]
> > links)
>
> That sounds like a good idea :)
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Arbitrary Wikidata querying

2016-12-15 Thread Stas Malyshev
Hi!

> Actually, specifically for list of presidents you don't need bot.

Yeah, you are right, I was thinking about going through query route, but
if your list is contained in one property (like Q30/P6) then using Lua
is just fine. It's not always the case (e.g. "list of all movies where
Brad Pitt played"). But where it works it's definitely a good way to go.

> 3. It is limited to simple lists (you can't have list of Republican
> presidents - because it requires additional filters and you don't want to
> create new property for it)

Exactly. You probably could still do something in Lua, but that's
pushing it already.

> 4. Internationalization - What if yi Wikipedia wants to create list of
> governors of some small country where there are no yi labels for the
> presidents? The list would be partially in yi partially in en - is this
> desired behavior? or they can show only presidents who have label in yi -
> but this would give partial data - is this the desired behavior?  [Probably
> the correct solution is to do show the fallback labels in en, but add some
> tracking category for pages requires label translation or [translate me]
> links)

That sounds like a good idea :)
-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Arbitrary Wikidata querying

2016-12-15 Thread Eran Rosenthal
TL;DR: The ONLY practical solution today it to use Lua.
This sucks,but it works and scale well [in WP sense] - hewiki uses it
heavily in infoboxs - to show list of actors in movies, or musical band
members etc.

Long version:
Actually, specifically for list of presidents you don't need bot.
Here is how to do it in Lua  more or less (in pseudo code):
local countryEntity = mw.wikibase.getEntity('Q30') --note: you can get
the country from property/current entity to be generic
local presidents=  countryEntity:getBestStatements('P6') --note: you
can get this as a parameter
local output = ''
for i, property in ipairs(propertyVals) do
local propValue = property.mainsnak and property.mainsnak.datavalue
-- parse it to the desired output...
end

(A real world usage example:
https://he.wikipedia.org/wiki/%D7%99%D7%97%D7%99%D7%93%D7%94:PropertyLink?uselang=en
in function: getProperty)

Why this is good:
1. It is the only practical way to query wikidata from Wikipedia. [bots
aren't practical - 1. They are less accessible to common users. 2. Some use
cases requires to run the query and update every 4/5 years when list of
governors is updated]
2. It is generic enough to work in different countries and different lists
3. Users can easily use it with with syntax such as
{{#invoke:LuaModule|listOf|Q30|P6}} or as templates, and are unaware to the
implementation

Why it sucks:
1. Because it is ugly Lua code
2. This just moves the problem to Wikidata [have to maintain Q30.P6 using
bots/humans instead of queries]
3. It is limited to simple lists (you can't have list of Republican
presidents - because it requires additional filters and you don't want to
create new property for it)
4. Internationalization - What if yi Wikipedia wants to create list of
governors of some small country where there are no yi labels for the
presidents? The list would be partially in yi partially in en - is this
desired behavior? or they can show only presidents who have label in yi -
but this would give partial data - is this the desired behavior?  [Probably
the correct solution is to do show the fallback labels in en, but add some
tracking category for pages requires label translation or [translate me]
links)












On Fri, Dec 16, 2016 at 7:35 AM, Stas Malyshev 
wrote:

> Hi!
>
> > Sure, but I'm not really worried about potential false positives. I'm
> > worried that we're building a giant write-only data store.
>
> Fortunately, we are not doing that.
>
> >> Unless you're talking about pulling a small set of values, in which case
> >> Lua/templates are probably the best venue.
> >
> > I'm not sure what small means here. We have about 46 U.S. Presidents, is
> > that small enough? Which Lua functions and templates could I use?
>
> No, list of presidents is not small enough. Lua right now can fetch
> specific data from specific item. Which is OK if you know the item and
> what you're getting (e.g. infoboxes, etc.) but not good for lists of
> items, especially with complicated conditions. That use case currently
> needs external tools - like bots.
>
> > Wikidata began in October 2012. I thought it might take till 2014 or even
> > 2015 to get querying capability into a usable state, but we're now
> looking
>
> Please do not confuse your particular use case with querying not be
> usable at all. It is definitely usable and being used by many people for
> many things. Generating lists directly from wiki template is not
> supported yet, and we're working on it. I'm sorry that your use case is
> not supported and you're feeling disappointed. But we do have query
> capability and it can be used and is being used for many other things.
>
> Of course, contributions - in any form, query development, code
> development, design, frontend, backend, data contributions, etc. - are
> always welcome.
>
> > to even contribute to it when it feels like putting data into a giant
> > system that you can't really get back out. I love Magnus and I have a ton
>
> Again, this is not correct - you can read data back out and there are
> several ways you can use query functionality for it right now. The way
> you want to do it is not supported - yet - but there are many other
> ways. Which we are constantly improving. But we can't do everything at
> once. Please be patient, please contribute with what you can, and we'll
> get there.
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Arbitrary Wikidata querying

2016-12-15 Thread Stas Malyshev
Hi!

> Sure, but I'm not really worried about potential false positives. I'm
> worried that we're building a giant write-only data store.

Fortunately, we are not doing that.

>> Unless you're talking about pulling a small set of values, in which case
>> Lua/templates are probably the best venue.
> 
> I'm not sure what small means here. We have about 46 U.S. Presidents, is
> that small enough? Which Lua functions and templates could I use?

No, list of presidents is not small enough. Lua right now can fetch
specific data from specific item. Which is OK if you know the item and
what you're getting (e.g. infoboxes, etc.) but not good for lists of
items, especially with complicated conditions. That use case currently
needs external tools - like bots.

> Wikidata began in October 2012. I thought it might take till 2014 or even
> 2015 to get querying capability into a usable state, but we're now looking

Please do not confuse your particular use case with querying not be
usable at all. It is definitely usable and being used by many people for
many things. Generating lists directly from wiki template is not
supported yet, and we're working on it. I'm sorry that your use case is
not supported and you're feeling disappointed. But we do have query
capability and it can be used and is being used for many other things.

Of course, contributions - in any form, query development, code
development, design, frontend, backend, data contributions, etc. - are
always welcome.

> to even contribute to it when it feels like putting data into a giant
> system that you can't really get back out. I love Magnus and I have a ton

Again, this is not correct - you can read data back out and there are
several ways you can use query functionality for it right now. The way
you want to do it is not supported - yet - but there are many other
ways. Which we are constantly improving. But we can't do everything at
once. Please be patient, please contribute with what you can, and we'll
get there.
-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Arbitrary Wikidata querying

2016-12-15 Thread MZMcBride
Thank you for this e-mail. It was informative.

Stas Malyshev wrote:
>No, and there are tricky parts there. Consider
>https://www.wikidata.org/wiki/Q735712. Yes, Lex Luthor held the office
>of the President of the USA. In a fictional universe, of course. But the
>naive query - every Wikidata item where position held includes "President
>of the United States" - would return Lex Luthor as the president as
>legitimate as Abraham Lincoln. In fact, there are 79 US presidents
>judging by "position held" alone. So clearly, there need to be some
>limits. And those limits would be on case-by-case basis.

Sure, but I'm not really worried about potential false positives. I'm
worried that we're building a giant write-only data store.

>Right now the best way is use one of the list-maintaining bots I think.

This sucks. :-(

>Unless you're talking about pulling a small set of values, in which case
>Lua/templates are probably the best venue.

I'm not sure what small means here. We have about 46 U.S. Presidents, is
that small enough? Which Lua functions and templates could I use?

>We're working on it (mostly thinking right now, but correct design is
>80% of the work, so...). Visualizations already have query capabilities
>(mainly because they have strong caching model embedded and because
>there are not too many of them and you need to create them so we can
>watch the load carefully). Other pages can gain them - probably via some
>kind of Lua functionality - as soon as we figure out what's the right
>way to do it, hopefully somewhere within the next year (no promise, but
>hopefully).

Wikidata began in October 2012. I thought it might take till 2014 or even
2015 to get querying capability into a usable state, but we're now looking
at potentially 2018? This really sucks. I think Wikidata may eventually
have a seismic shift on wiki editing, but currently I don't see any reason
to even contribute to it when it feels like putting data into a giant
system that you can't really get back out. I love Magnus and I have a ton
of respect for him, but I don't want anything to do with anything called
Listeria. It continues to seem like querying is an afterthought for
Wikidata and this continues to boggle my mind.

MZMcBride



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Arbitrary Wikidata querying

2016-12-14 Thread Daniel Kinzler
Very well put, Stas, thank you!

Am 13.12.2016 um 07:23 schrieb Stas Malyshev:
> Hi!
> 
>> If I wanted to make a page on the English Wikipedia using wikitext called
>> "List of United States presidents" that dynamically embeds information
>> from  and
>>  and other similar items, is this
>> currently possible? I consider this to be arbitrary Wikidata querying, but
>> if that's not the correct term, please let me know what to call it.
> 
> So this is kind of can of worms which we I guess eventually have to
> open, but very carefully. So I want to state my _current_ opinion on the
> matters - please note, it can change at any time due to changing
> circumstances, persuasion, experience, revelation, etc.
> 
> 1. Technically, anything that can access a web-service and speak JSON,
> can talk to SPARQL server. So, in theory, making some way to do this,
> *in theory*, would not be very hard. But - please keep reading.
> 
> 2. I am very apprehensive about having direct link between any wiki
> pages and SPARQL server without heavy caching and rate limiting in
> between. We don't have super-strong setup there and I'm afraid making
> such link would just knock our setup over, especially if people start
> putting queries into frequently-used templates.
> 
> 3. We have number of bot setups (Listeria etc.) which can auto-update
> lists from SPARQL periodically. This works reasonably well (excepting
> occasional timeout on tricky queries, etc.) and does not require
> requesting the info too frequently.
> 
> 4. If we want more direct page-to-SPARQL-to-page interface, we need to
> think about storing/caching data, and not for 5 minutes like it's cached
> now but for much longer time, probably in storage other than varnish.
> Ideally, that storage would be more of a persistent store than a cache -
> i.e. it would always (or nearly always) be available but periodically
> updated. Kind of like bots mentioned above but more generic. I don't
> have any more design for it beyond that but that's I think the direction
> we should be looking into.
> 
>> A more advanced form of this Wikidata querying would be dynamically
>> generating a list of presidents of the United States by finding every
>> Wikidata item where position held includes "President of the United
>> States". Is this currently possible on-wiki or via wikitext?
> 
> No, and there are tricky parts there. Consider
> https://www.wikidata.org/wiki/Q735712. Yes, Lex Luthor held the office
> of the President of the USA. In a fictional universe, of course. But the
> naive query - every
> Wikidata item where position held includes "President of the United
> States" - would return Lex Luthor as the president as legitimate as
> Abraham Lincoln. In fact, there are 79 US presidents judging by
> "position held" alone. So clearly, there need to be some limits. And
> those limits would be on case-by-case basis.
> 
>> If either of these querying capabilities are possible, how do I do them?
>> I don't understand how to query Wikidata in a useful way and I find this
>> frustrating. Since 2012, we've been putting a lot of data into Wikidata,
>> but I want to programmatically extract some of this data and use it in my
>> Wikipedia editing. How do I do this?
> 
> Right now the best way is use one of the list-maintaining bots I think.
> Unless you're talking about pulling a small set of values, in which case
> Lua/templates are probably the best venue.
> 
>> If these querying capabilities are not currently possible, when might they
>> be? I understand that cache invalidation is difficult and that this will
>> need a sensible editing user interface, but I don't care about all of
>> that, I just want to be able to query data out of this large data store.
> 
> We're working on it (mostly thinking right now, but correct design is
> 80% of the work, so...). Visualizations already have query capabilities
> (mainly because they have strong caching model embedded and because
> there are not too many of them and you need to create them so we can
> watch the load carefully). Other pages can gain them - probably via some
> kind of Lua functionality - as soon as we figure out what's the right
> way to do it, hopefully somewhere within the next year (no promise, but
> hopefully).
> 


-- 
Daniel Kinzler
Senior Software Developer

Wikimedia Deutschland
Gesellschaft zur Förderung Freien Wissens e.V.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Arbitrary Wikidata querying

2016-12-12 Thread Stas Malyshev
Hi!

> If I wanted to make a page on the English Wikipedia using wikitext called
> "List of United States presidents" that dynamically embeds information
> from  and
>  and other similar items, is this
> currently possible? I consider this to be arbitrary Wikidata querying, but
> if that's not the correct term, please let me know what to call it.

So this is kind of can of worms which we I guess eventually have to
open, but very carefully. So I want to state my _current_ opinion on the
matters - please note, it can change at any time due to changing
circumstances, persuasion, experience, revelation, etc.

1. Technically, anything that can access a web-service and speak JSON,
can talk to SPARQL server. So, in theory, making some way to do this,
*in theory*, would not be very hard. But - please keep reading.

2. I am very apprehensive about having direct link between any wiki
pages and SPARQL server without heavy caching and rate limiting in
between. We don't have super-strong setup there and I'm afraid making
such link would just knock our setup over, especially if people start
putting queries into frequently-used templates.

3. We have number of bot setups (Listeria etc.) which can auto-update
lists from SPARQL periodically. This works reasonably well (excepting
occasional timeout on tricky queries, etc.) and does not require
requesting the info too frequently.

4. If we want more direct page-to-SPARQL-to-page interface, we need to
think about storing/caching data, and not for 5 minutes like it's cached
now but for much longer time, probably in storage other than varnish.
Ideally, that storage would be more of a persistent store than a cache -
i.e. it would always (or nearly always) be available but periodically
updated. Kind of like bots mentioned above but more generic. I don't
have any more design for it beyond that but that's I think the direction
we should be looking into.

> A more advanced form of this Wikidata querying would be dynamically
> generating a list of presidents of the United States by finding every
> Wikidata item where position held includes "President of the United
> States". Is this currently possible on-wiki or via wikitext?

No, and there are tricky parts there. Consider
https://www.wikidata.org/wiki/Q735712. Yes, Lex Luthor held the office
of the President of the USA. In a fictional universe, of course. But the
naive query - every
Wikidata item where position held includes "President of the United
States" - would return Lex Luthor as the president as legitimate as
Abraham Lincoln. In fact, there are 79 US presidents judging by
"position held" alone. So clearly, there need to be some limits. And
those limits would be on case-by-case basis.

> If either of these querying capabilities are possible, how do I do them?
> I don't understand how to query Wikidata in a useful way and I find this
> frustrating. Since 2012, we've been putting a lot of data into Wikidata,
> but I want to programmatically extract some of this data and use it in my
> Wikipedia editing. How do I do this?

Right now the best way is use one of the list-maintaining bots I think.
Unless you're talking about pulling a small set of values, in which case
Lua/templates are probably the best venue.

> If these querying capabilities are not currently possible, when might they
> be? I understand that cache invalidation is difficult and that this will
> need a sensible editing user interface, but I don't care about all of
> that, I just want to be able to query data out of this large data store.

We're working on it (mostly thinking right now, but correct design is
80% of the work, so...). Visualizations already have query capabilities
(mainly because they have strong caching model embedded and because
there are not too many of them and you need to create them so we can
watch the load carefully). Other pages can gain them - probably via some
kind of Lua functionality - as soon as we figure out what's the right
way to do it, hopefully somewhere within the next year (no promise, but
hopefully).

-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Arbitrary Wikidata querying

2016-12-11 Thread Eran Rosenthal
Currently it is only possible with Lua.
The documentation is in:
https://www.mediawiki.org/wiki/Extension:Wikibase_Client/Lua

it is quite ugly to write such module (not cool SPARQL...) but it works,
and you can expose it with nice interface to be used in wikipages.






On Sun, Dec 11, 2016 at 6:17 AM, Gergo Tisza  wrote:

> On Sat, Dec 10, 2016 at 5:30 PM, MZMcBride  wrote:
>
> > A more advanced form of this Wikidata querying would be dynamically
> > generating a list of presidents of the United States by finding every
> > Wikidata item where position held includes "President of the United
> > States". Is this currently possible on-wiki or via wikitext?
> >
>
> Not directly, but there are bots which can emulate it, such as Listeria by
> Magnus:
> http://magnusmanske.de/wordpress/?p=301
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Arbitrary Wikidata querying

2016-12-10 Thread Gergo Tisza
On Sat, Dec 10, 2016 at 5:30 PM, MZMcBride  wrote:

> A more advanced form of this Wikidata querying would be dynamically
> generating a list of presidents of the United States by finding every
> Wikidata item where position held includes "President of the United
> States". Is this currently possible on-wiki or via wikitext?
>

Not directly, but there are bots which can emulate it, such as Listeria by
Magnus:
http://magnusmanske.de/wordpress/?p=301
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Arbitrary Wikidata querying

2016-12-10 Thread Yuri Astrakhan
AFAIK, you can query data from Wikidata, but you cannot put it into a page,
unless its a graph. Graphs can do it -
https://www.mediawiki.org/wiki/Extension:Graph/Demo/Sparql

As of last Thursday, you can also create a table on Commons Data namespace,
and make a simple Lua script on your favorite wiki to pull that data in and
render it. Since Wikidata is accessible from Lua, you could pull useful
information about each president. I am not sure about the efficiency
aspects here.

WeatherDemo  -- pulls
data from commons Data:Weather/New_York_City.tab
 and
formats it using enwiki module
.  I'm still working on
some fun demos for a big presentation.


On Sat, Dec 10, 2016 at 8:30 PM MZMcBride  wrote:

> Hi.
>
> If I wanted to make a page on the English Wikipedia using wikitext called
> "List of United States presidents" that dynamically embeds information
> from  and
>  and other similar items, is this
> currently possible? I consider this to be arbitrary Wikidata querying, but
> if that's not the correct term, please let me know what to call it.
>
> A more advanced form of this Wikidata querying would be dynamically
> generating a list of presidents of the United States by finding every
> Wikidata item where position held includes "President of the United
> States". Is this currently possible on-wiki or via wikitext?
>
> If either of these querying capabilities are possible, how do I do them?
> I don't understand how to query Wikidata in a useful way and I find this
> frustrating. Since 2012, we've been putting a lot of data into Wikidata,
> but I want to programmatically extract some of this data and use it in my
> Wikipedia editing. How do I do this?
>
> If these querying capabilities are not currently possible, when might they
> be? I understand that cache invalidation is difficult and that this will
> need a sensible editing user interface, but I don't care about all of
> that, I just want to be able to query data out of this large data store.
>
> MZMcBride
>
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Arbitrary Wikidata querying

2016-12-10 Thread MZMcBride
Hi.

If I wanted to make a page on the English Wikipedia using wikitext called
"List of United States presidents" that dynamically embeds information
from  and
 and other similar items, is this
currently possible? I consider this to be arbitrary Wikidata querying, but
if that's not the correct term, please let me know what to call it.

A more advanced form of this Wikidata querying would be dynamically
generating a list of presidents of the United States by finding every
Wikidata item where position held includes "President of the United
States". Is this currently possible on-wiki or via wikitext?

If either of these querying capabilities are possible, how do I do them?
I don't understand how to query Wikidata in a useful way and I find this
frustrating. Since 2012, we've been putting a lot of data into Wikidata,
but I want to programmatically extract some of this data and use it in my
Wikipedia editing. How do I do this?

If these querying capabilities are not currently possible, when might they
be? I understand that cache invalidation is difficult and that this will
need a sensible editing user interface, but I don't care about all of
that, I just want to be able to query data out of this large data store.

MZMcBride



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l