[Wikidata-bugs] [Maniphest] [Commented On] T179879: Provide a 5-minute timeout in WDQS for trusted users using OAuth

2018-06-27 Thread Smalyshev
Smalyshev added a comment.
@Ijon I appreciate the issue, but having unauthenticated endpoint with extended timeout kinda defeats the purpose of timeout, and adding authentication is not trivial, so it has to wait until I get to it. Unless somebody wants to contribute a patch :)TASK DETAILhttps://phabricator.wikimedia.org/T179879EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: MichaelSchoenitzer, Edgars2007, chasemp, Lydia_Pintscher, Magnus, MichaelSchoenitzer_WMDE, MisterSynergy, doctaxon, Jonas, Ash_Crow, Daniel_Mietchen, Lucas_Werkmeister_WMDE, Jane023, Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179879: Provide a 5-minute timeout in WDQS for trusted users using OAuth

2018-06-25 Thread Ijon
Ijon added a comment.
Thanks for the insight on the difference between those two versions. That is indeed usable, for this particular count.

But to the more general request? We do need some way to work around the timeout, sometimes, without abusing our personally knowing @Smalyshev.TASK DETAILhttps://phabricator.wikimedia.org/T179879EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: IjonCc: MichaelSchoenitzer, Edgars2007, chasemp, Lydia_Pintscher, Magnus, MichaelSchoenitzer_WMDE, MisterSynergy, doctaxon, Jonas, Ash_Crow, Daniel_Mietchen, Lucas_Werkmeister_WMDE, Jane023, Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179879: Provide a 5-minute timeout in WDQS for trusted users using OAuth

2018-06-04 Thread Smalyshev
Smalyshev added a comment.
OK, so this query is fast:

SELECT (COUNT(*) AS ?count)
WHERE {
?article wdt:P31 wd:Q13442814 .
}

and this is slow:

SELECT (COUNT(?aricle) AS ?count)
WHERE {
?article wdt:P31 wd:Q13442814 .
}

I think for @Ijon's case the former works as well. Also, this one:

SELECT (count(*) as ?cnt) WHERE {
  SERVICE gas:service {
 gas:program gas:gasClass "com.bigdata.rdf.graph.analytics.BFS" .
 gas:program gas:in wd:Q13442814 . # one or more times, specifies the initial frontier.
 gas:program gas:out ?type . # exactly once - will be bound to the visited vertices.
 gas:program gas:linkType wdt:P279 .
 gas:program gas:traversalDirection "Reverse" .
  }
  ?article wdt:P31 ?type .
}

works for me, though it takes very close to a minute so may work or fail depending on circumstance. Also this one seems to be of the same speed:

SELECT (COUNT(*) AS ?count)
WHERE {
?article wdt:P31/wdt:P279* wd:Q13442814 .
  hint:Prior hint:gearing "reverse" .
}TASK DETAILhttps://phabricator.wikimedia.org/T179879EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: MichaelSchoenitzer, Edgars2007, chasemp, Lydia_Pintscher, Magnus, MichaelSchoenitzer_WMDE, MisterSynergy, doctaxon, Jonas, Ash_Crow, Daniel_Mietchen, Lucas_Werkmeister_WMDE, Jane023, Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179879: Provide a 5-minute timeout in WDQS for trusted users using OAuth

2018-06-04 Thread Smalyshev
Smalyshev added a comment.
Another example of a query timing out without lots of output is this simple count of scholarly articles (millions)

I think this one should be easy to do by first fetching all types with something like http://tinyurl.com/y9pnxg82 and then running counts for each of the types. Though there might be intersections... 
Though I see that direct count is slow too. Not sure why, it should be much faster. I'll look into it.TASK DETAILhttps://phabricator.wikimedia.org/T179879EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: MichaelSchoenitzer, Edgars2007, chasemp, Lydia_Pintscher, Magnus, MichaelSchoenitzer_WMDE, MisterSynergy, doctaxon, Jonas, Ash_Crow, Daniel_Mietchen, Lucas_Werkmeister_WMDE, Jane023, Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179879: Provide a 5-minute timeout in WDQS for trusted users using OAuth

2018-06-04 Thread Ijon
Ijon added a comment.
Yes. Another example of a query timing out without lots of output is this simple count of scholarly articles (millions)

SELECT (COUNT(?article) AS ?count)
WHERE {
?article wdt:P31/wdt:P279* wd:Q13442814
}

I think such counts are very reasonable queries to occasionally want to run, so yes, we do still need a solution -- with all appropriate measures (only trusted users, quotas, whatever) -- for this issue.TASK DETAILhttps://phabricator.wikimedia.org/T179879EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: IjonCc: MichaelSchoenitzer, Edgars2007, chasemp, Lydia_Pintscher, Magnus, MichaelSchoenitzer_WMDE, MisterSynergy, doctaxon, Jonas, Ash_Crow, Daniel_Mietchen, Lucas_Werkmeister_WMDE, Jane023, Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179879: Provide a 5-minute timeout in WDQS for trusted users using OAuth

2018-03-12 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.

In T179879#4041380, @MichaelSchoenitzer wrote:
Maybe you see any other tricks to improve this type of query?


Adding a query hint helps:

SELECT ?item ?sitelinks WHERE {
  ?item wdt:P31 wd:Q5;
wikibase:sitelinks ?sitelinks. hint:Prior hint:rangeSafe true. # tell BlazeGraph that we’re not mixing ints/floats/whatever here
  FILTER(?sitelinks >= 40)
  MINUS { ?item ^schema:about/schema:isPartOf  }
}
ORDER BY DESC(?sitelinks)
LIMIT 10

However, you still need the FILTER, unfortunately.TASK DETAILhttps://phabricator.wikimedia.org/T179879EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Lucas_Werkmeister_WMDECc: MichaelSchoenitzer, Edgars2007, chasemp, Lydia_Pintscher, Magnus, MichaelSchoenitzer_WMDE, MisterSynergy, doctaxon, Jonas, Ash_Crow, Daniel_Mietchen, Lucas_Werkmeister_WMDE, Jane023, Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179879: Provide a 5-minute timeout in WDQS for trusted users using OAuth

2018-03-11 Thread Smalyshev
Smalyshev added a comment.
But what I was referring to is that while above examples produce huge amount of data in the output and can therefore never be significantly faster by definition, the example I gave does not give big amounts of data as output.

Yes, as you correctly noted, what matters is not the output but the quantity of data it has to process. The large output usually means large amount of data, but the other way is not true - even queries outputting one item or nothing at all can require large working sets.TASK DETAILhttps://phabricator.wikimedia.org/T179879EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: MichaelSchoenitzer, Edgars2007, chasemp, Lydia_Pintscher, Magnus, MichaelSchoenitzer_WMDE, MisterSynergy, doctaxon, Jonas, Ash_Crow, Daniel_Mietchen, Lucas_Werkmeister_WMDE, Jane023, Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179879: Provide a 5-minute timeout in WDQS for trusted users using OAuth

2018-03-11 Thread MichaelSchoenitzer
MichaelSchoenitzer added a comment.
That's the reason why queries time out. There's no magic to it - queries that time out are those that require the engine to process a huge amount of data.

Well that's obvious. But what I was referring to is that while above examples produce huge amount of data in the output and can therefore never be significantly faster by definition, the example I gave does not give big amounts of data as output.
It is slow because the SPARQL-Query first generates a list of all humans – which is huge – then sorts the list and only then applies the limit. Also the filter is only applied after generating the list – thus not improving the query time much. There is no way to avoid the generation of the huge interim list in the first place now - but there could be!
So in contrary to the queries my Magnus and others this type of query could be faster.
A (not very nice but working) way to do so would be to add a binned or qualitative information to the RDF-Database containing an estimate of the number of site-links, so that on could add something like:

?item wikibase:popularity wikibase:verypopular.

to restrain the list to items with more than some-constant number of site links. Not very pretty, but something like that could be implemented to improve this very common and still very inefficient pattern described above.TASK DETAILhttps://phabricator.wikimedia.org/T179879EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: MichaelSchoenitzerCc: MichaelSchoenitzer, Edgars2007, chasemp, Lydia_Pintscher, Magnus, MichaelSchoenitzer_WMDE, MisterSynergy, doctaxon, Jonas, Ash_Crow, Daniel_Mietchen, Lucas_Werkmeister_WMDE, Jane023, Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179879: Provide a 5-minute timeout in WDQS for trusted users using OAuth

2018-03-11 Thread Smalyshev
Smalyshev added a comment.
So far all the examples for queries that are timing out are queries that produce massive amount of data or are very complex

That's the reason why queries time out. There's no magic to it - queries that time out are those that require the engine to process a huge amount of data. It can be either because interpretation of the query by the engine is wrong (such as Optimizer bugs) or the query is written in a bad way, or the query is genuinely requiring the engine to do a lot of work. In the first two cases, you can try to rewrite the query in a way that would direct engine to do less work to achieve the same result. In the latter case, there's nothing to be done - if you want to use a shared service to consume more resources, that means everybody else gets proportionally less resources. We indeed may need to look into how to serve the resource-heavy requests too, but please realize that apparent textual simplicity of the query does not mean it doesn't process millions of entities. That's pretty much always the reason query is slow (bugs excluded of course).TASK DETAILhttps://phabricator.wikimedia.org/T179879EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: MichaelSchoenitzer, Edgars2007, chasemp, Lydia_Pintscher, Magnus, MichaelSchoenitzer_WMDE, MisterSynergy, doctaxon, Jonas, Ash_Crow, Daniel_Mietchen, Lucas_Werkmeister_WMDE, Jane023, Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179879: Provide a 5-minute timeout in WDQS for trusted users using OAuth

2018-03-11 Thread MichaelSchoenitzer
MichaelSchoenitzer added a comment.
So far all the examples for queries that are timing out are queries that produce massive amount of data or are very complex. But a simple pattern of a timing-out query I see very often is the following:

SELECT ?item ?sitelinks WHERE {
  ?item wdt:P31 wd:Q…;
wikibase:sitelinks ?sitelinks .
  MINUS { ?item ^schema:about/schema:isPartOf  }
} ORDER BY DESC(?sitelinks) LIMIT 10

Getting the items of one area with the most sitelinks and optionally a specific sitelink missing. This is a widely used pattern which is used often to find out what the most "important" topics of an area are, often used to find white spots in Wikipedias coverage or to generate working list. These queries timeout as soon as the amount of articles in the area is to big. Which is quite often the case and getting more common the more wikidata grows. It doesn't matter what the limit is.

The query can be tuned a bit, if you can guess the amount of sitelinks needed to get into the top-10. But that is hard and only a small improvement and therefore often not sufficient. For example this query runs sometimes but most of the time it times-out:

SELECT ?item ?sitelinks WHERE {
  ?item wdt:P31 wd:Q5;
wikibase:sitelinks ?sitelinks .
  filter (?sitelinks > 40).
  MINUS { ?item ^schema:about/schema:isPartOf  }
} ORDER BY DESC(?sitelinks) LIMIT 10

Due to growth of Wikidata it will always time out in near future. Maybe you see any other tricks to improve this type of query?TASK DETAILhttps://phabricator.wikimedia.org/T179879EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: MichaelSchoenitzerCc: MichaelSchoenitzer, Edgars2007, chasemp, Lydia_Pintscher, Magnus, MichaelSchoenitzer_WMDE, MisterSynergy, doctaxon, Jonas, Ash_Crow, Daniel_Mietchen, Lucas_Werkmeister_WMDE, Jane023, Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179879: Provide a 5-minute timeout in WDQS for trusted users using OAuth

2018-02-27 Thread Smalyshev
Smalyshev added a comment.
Separate service - yes, though ideally I'd just use the same URL

If it will be separate service, it will be separate URL. Having two distinct services under the same URL would not really work well with how our LVS is set up, unless we do some complex tricks with redirects, which I don't really want to get into.

Separate service - yes, though ideally I'd just use the same URL

LIMIT worked for me, OFFSET of course wouldn't work better then the original once numbers are bigger, since OFFSET essentially needs to run through all the data set to get to the offset. If you need whole dataset, LIMIT/OFFSET would make the issue only worse, quadratically. However, if for the use case dealing with subset of data is enough, then LIMIT might help. Of course, depends on use case (e.g. I am still not sure what 2M results are being used for, specifically). But LIMIT/OFFSET is certainly not going to solve this case.

For my own purposes, we could even have a Wikidata page with "named queries"

Making proposal about something like this is long on my TODO list, see also https://commons.wikimedia.org/wiki/User:TabulistBot - but I haven't got it to working condition due to lack of time. I could pick it up.

Transparent offlining - queries with a keyword get run with increased timeout (say, 10min), but only one of them at a time

I don't think this is feasible - there's no real way to ensure "only one" part since servers are completely independent, and even "one per server" is not completely trivial, though can probably be done with some query parameter magic. More worrying is that this requires support for very long requests from the whole pipeline from frontend down to Blazegraph, and I'm not sure how well this will actually work - HTTP is not really meant to do hour-long requests, and it would be a real shame to run one only to discover there's no way to deliver the data back since the connection has died in the meantime.

Additionally, this doesn't really solve the issue as one query, provided it's big enough, can take down the whole server (Java handles OOM really badly, which is mostly mitigated by the fact that most queries will time out before they can clog enough memory, but if we remove timeout the risk grows significantly). If we had it in some kind of gated setup, it'd be fine to take this risk, but exposing it to the Internet where people are not exactly known for not trying to break things just for lulz seems too big a risk for me.

So, in general, this seems to need some thought yet... I'd like to hear more details of what exactly is done with those 2M results, maybe I'll have some other ideas.TASK DETAILhttps://phabricator.wikimedia.org/T179879EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Edgars2007, chasemp, Lydia_Pintscher, Magnus, MichaelSchoenitzer_WMDE, MisterSynergy, doctaxon, Jonas, Ash_Crow, Daniel_Mietchen, Lucas_Werkmeister_WMDE, Jane023, Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179879: Provide a 5-minute timeout in WDQS for trusted users using OAuth

2018-02-27 Thread doctaxon
doctaxon added a comment.

In T179879#4006226, @Lucas_Werkmeister_WMDE wrote:
Oh, and the outer query is trying to get all the labels of the place of birth, place of death, country of citizenship, etc., in all languages. That’s also a terrible idea.


Not in all languages, but in German or the first alternative language. I want to get at least one alternative.TASK DETAILhttps://phabricator.wikimedia.org/T179879EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: doctaxonCc: chasemp, Lydia_Pintscher, Magnus, MichaelSchoenitzer_WMDE, MisterSynergy, doctaxon, Jonas, Ash_Crow, Daniel_Mietchen, Lucas_Werkmeister_WMDE, Jane023, Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179879: Provide a 5-minute timeout in WDQS for trusted users using OAuth

2018-02-27 Thread doctaxon
doctaxon added a comment.
?item wdt:P27 wd:Q183 was my mistake, these are the German women. I need the Swedish, sorry: ?item wdt:P27 wd:Q34TASK DETAILhttps://phabricator.wikimedia.org/T179879EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: doctaxonCc: chasemp, Lydia_Pintscher, Magnus, MichaelSchoenitzer_WMDE, MisterSynergy, doctaxon, Jonas, Ash_Crow, Daniel_Mietchen, Lucas_Werkmeister_WMDE, Jane023, Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179879: Provide a 5-minute timeout in WDQS for trusted users using OAuth

2018-02-27 Thread doctaxon
doctaxon added a comment.

In T179879#4006181, @Lucas_Werkmeister_WMDE wrote:
It only returns some two thousand results, so the surrounding query shouldn’t be a problem either…


No, the "minus"-variation gives a timout, too.TASK DETAILhttps://phabricator.wikimedia.org/T179879EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: doctaxonCc: chasemp, Lydia_Pintscher, Magnus, MichaelSchoenitzer_WMDE, MisterSynergy, doctaxon, Jonas, Ash_Crow, Daniel_Mietchen, Lucas_Werkmeister_WMDE, Jane023, Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179879: Provide a 5-minute timeout in WDQS for trusted users using OAuth

2018-02-27 Thread Lucas_Werkmeister_WMDE
Lucas_Werkmeister_WMDE added a comment.
@doctaxon use MINUS instead of FILTER NOT EXISTS, and stop killing the optimizer, and then %subquery runs in less than 5 seconds.

SELECT ?item ?sitelinks WHERE {
  ?item wdt:P27 wd:Q183;
wdt:P21 wd:Q6581072;
wdt:P31 wd:Q5;
wikibase:sitelinks ?sitelinks.
  MINUS {
?article schema:about ?item;
 schema:isPartOf .
  }
}

It only returns some two thousand results, so the surrounding query shouldn’t be a problem either…TASK DETAILhttps://phabricator.wikimedia.org/T179879EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Lucas_Werkmeister_WMDECc: chasemp, Lydia_Pintscher, Magnus, MichaelSchoenitzer_WMDE, MisterSynergy, doctaxon, Jonas, Ash_Crow, Daniel_Mietchen, Lucas_Werkmeister_WMDE, Jane023, Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179879: Provide a 5-minute timeout in WDQS for trusted users using OAuth

2018-02-27 Thread doctaxon
doctaxon added a comment.
@Jonas 
This is such a query that runs into a timeout. It should give all Swedish women that have no sitelink to dewiki with counting the sitelinks and listing some properties to each item, and is needed for a dewiki community project. I cannot limit it because of the needed double ORDER BY, so the query has to run with unlimited result. If there is a longer timeout, the query will give a successful result. It will be great, if you can optimize it, if possible. Can you calculate, how much time the query will take to complete?

select
   ?sitelinks
   ?itemLabel
   ?itemDescription
   (group_concat(distinct ?P106l; separator=', ') as ?P106s) 
   (sample(?P18) as ?P18s)
   (group_concat(distinct ?P569l; separator=', ') as ?P569s)
   (group_concat(distinct ?P569prec; separator=', ') as ?P569precs)
   (group_concat(distinct ?P569cal; separator=', ') as ?P569cals)
   (group_concat(distinct ?P19l; separator=', ') as ?P19s) 
   (group_concat(distinct ?P570l; separator=', ') as ?P570s)
   (group_concat(distinct ?P570prec; separator=', ') as ?P570precs)
   (group_concat(distinct ?P570cal; separator=', ') as ?P570cals)
   (group_concat(distinct ?P20l; separator=', ') as ?P20s) 
   (group_concat(distinct ?P27l; separator=', ') as ?P27s) 
   ?item
with {
   select ?item ?sitelinks
   where {
  hint:Query hint:optimizer "None".
  ?item wdt:P27 wd:Q183; wdt:P21 wd:Q6581072; wdt:P31 wd:Q5; wikibase:sitelinks ?sitelinks .
  filter not exists {[] schema:about ?item; schema:isPartOf }
   }
} as %subquery
where {
   hint:Query hint:optimizer "None".
   include %subquery.
   optional {?item wdt:P106 ?P106. optional {?P106 wdt:P2521 ?P106l. filter(lang(?P106l) = 'de')}}
   optional {?item wdt:P18 ?P18.}
   optional {?item wdt:P27/rdfs:label ?P27l.}
   optional {?item wdt:P569 ?P569.}
   optional {?item p:P569/psv:P569 [wikibase:timePrecision ?P569prec; wikibase:timeCalendarModel ?P569cal]}
   optional {?item wdt:P570 ?P570.}
   optional {?item p:P570/psv:P570 [wikibase:timePrecision ?P570prec; wikibase:timeCalendarModel ?P570cal]}
   optional {?item wdt:P19/rdfs:label ?P19l.}
   optional {?item wdt:P20/rdfs:label ?P20l.}
   bind(substr(str(?P569), 1, 10) AS ?P569l).
   bind(substr(str(?P569), 1, 10) AS ?P569l).
   bind(xsd:integer(substr(str(?item), 33)) as ?num).
   service wikibase:label {bd:serviceParam wikibase:language 'de,[AUTO_LANGUAGE]'.}
}
group by ?sitelinks ?itemLabel ?itemDescription ?item
order by desc(?sitelinks) asc(?num)TASK DETAILhttps://phabricator.wikimedia.org/T179879EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: doctaxonCc: chasemp, Lydia_Pintscher, Magnus, MichaelSchoenitzer_WMDE, MisterSynergy, doctaxon, Jonas, Ash_Crow, Daniel_Mietchen, Lucas_Werkmeister_WMDE, Jane023, Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179879: Provide a 5-minute timeout in WDQS for trusted users using OAuth

2018-02-27 Thread Magnus
Magnus added a comment.
Thanks @Smalyshev Yes these queries are outliers, but there is a good reason to run them (in this case, find images for missing items), and just by Wikidata growth, the number of outliers (queries that timeout) will increase, to the point where they are no longer outliers. Technical improvements (improved software, better hardware) can buffer some of this, but not all. As you show, some queries can be re-formulated (though the reason why one works but not the other remains a mystery to most users), but again, not forever.

As for my preferences for implementation:


OAuth-regulated access point - rather not. It complicates script based queries. Can work around that, but...
Separate service - yes, though ideally I'd just use the same URL and add a keyword to the SPARQL (we used to have /* SLOW_OK */ on the original toolserver SQL queries), which would then use a separate server in the background, and be transparent for the end user
Offlining - if I could submit my query, get a token back, and then can query the token once every 5 min or so if it's done, and get the result back then, that would be fine with me
What I do with the 1.8M items is I run full-text searches on them on Commons and Flickr, in this case. I am perfectly happy to cache the result myself for days or weeks and work on that list, but I need to get the list in the first place. LIMIT and OFFSET do not work for these when the full query doesn't, I tried
For my own purposes, we could even have a Wikidata page with "named queries" (heading with the name, paragraph with the query, or something) that gets run on a regular basis; offlining, with a twist. This could produce static JSON files (maybe compressed) somewhere, under the query name. Additionally, the query service could offer returning the data via a fake query ("# NAMED QUERY Bob" etc), but that's optional, though it would have the benefit of using the same interface
Transparent offlining - queries with a keyword get run with increased timeout (say, 10min), but only one of them at a time; the connection stays open though, and eventually returns, even if it takes hours. Feels dirty, though :-(


TASK DETAILhttps://phabricator.wikimedia.org/T179879EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: MagnusCc: chasemp, Lydia_Pintscher, Magnus, MichaelSchoenitzer_WMDE, MisterSynergy, doctaxon, Jonas, Ash_Crow, Daniel_Mietchen, Lucas_Werkmeister_WMDE, Jane023, Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179879: Provide a 5-minute timeout in WDQS for trusted users using OAuth

2018-02-27 Thread Jonas
Jonas added a comment.
I  can see the timeout brings a lot of frustration.
And I think we can agree that bumping the limit every few months cannot be the solution.

As a first step I would propose to create a process where one can submit a query that times out, so we can better understand the problem and the needs of the users.
This will also help to optimize the queries and the query optimizer which will help to avoid a lot of timeouts.

After we have gained more insights we could discuss what needs of the users should be satisfied with the services and which needs should/can be satisfied with other services.TASK DETAILhttps://phabricator.wikimedia.org/T179879EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: JonasCc: chasemp, Lydia_Pintscher, Magnus, MichaelSchoenitzer_WMDE, MisterSynergy, doctaxon, Jonas, Ash_Crow, Daniel_Mietchen, Lucas_Werkmeister_WMDE, Jane023, Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179879: Provide a 5-minute timeout in WDQS for trusted users using OAuth

2018-02-26 Thread doctaxon
doctaxon added a comment.
Maybe 120s limit? I think, we have to test it, but how?TASK DETAILhttps://phabricator.wikimedia.org/T179879EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: doctaxonCc: chasemp, Lydia_Pintscher, Magnus, MichaelSchoenitzer_WMDE, MisterSynergy, doctaxon, Jonas, matej_suchanek, Ash_Crow, Daniel_Mietchen, Lucas_Werkmeister_WMDE, Jane023, Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179879: Provide a 5-minute timeout in WDQS for trusted users using OAuth

2018-02-26 Thread doctaxon
doctaxon added a comment.
@Smalyshev @Magnus

Thank you, Magnus, this was my opinion, but my English is not so good like yours, so ...

Let's discuss about Magnus' suggestion, it sounds really good.TASK DETAILhttps://phabricator.wikimedia.org/T179879EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: doctaxonCc: chasemp, Lydia_Pintscher, Magnus, MichaelSchoenitzer_WMDE, MisterSynergy, doctaxon, Jonas, matej_suchanek, Ash_Crow, Daniel_Mietchen, Lucas_Werkmeister_WMDE, Jane023, Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179879: Provide a 5-minute timeout in WDQS for trusted users using OAuth

2018-02-26 Thread Magnus
Magnus added a comment.
Semantics and drama aside, I think the point is that some useful queries are already creaking, or failing entirely.

Species without image times out, and I'm not even requesting taxon name (and we all know to stay away from the SERVICE labels for long queries by now).

Women without image runs sometimes, but fails on other occasions, so it will probably stop working soon, as it does for men.

I would not call either of those "extreme outliers"; I use both in scripts (taxa limited to 700K, sadly), for very practical reasons. As far as I can tell, they do not fail because of complexity, but because of data size. 30 seconds is an arbitrary limit that was set a while ago. It keeps the servers running, but with data size growing, more and more queries will begin to fail. These failing queries will only make up a small percentage of queries run, but they are very useful. So we would like a mechanism to run those few useful, simple, un-optimizable queries. A separate server that runs fewer but longer queries would do that.TASK DETAILhttps://phabricator.wikimedia.org/T179879EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: MagnusCc: chasemp, Lydia_Pintscher, Magnus, MichaelSchoenitzer_WMDE, MisterSynergy, doctaxon, Jonas, matej_suchanek, Ash_Crow, Daniel_Mietchen, Lucas_Werkmeister_WMDE, Jane023, Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179879: Provide a 5-minute timeout in WDQS for trusted users using OAuth

2018-02-26 Thread Smalyshev
Smalyshev added a comment.
but you didn't explain the reason why not.

Because it is a shared service, and running queries with no limit on execution time will deny other users opportunity to use this service.

if we stand still, then SPARQL will be useless in near future

I do not think running 3-5M queries a day qualifies as "useless". Let's not exaggerate here - not allowing to run an extreme outlier query does not make the service "useless". We could discuss solutions for extreme outliers, but we should understand what we're talking about here and approach it realistically.TASK DETAILhttps://phabricator.wikimedia.org/T179879EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: chasemp, Lydia_Pintscher, Magnus, MichaelSchoenitzer_WMDE, MisterSynergy, doctaxon, Jonas, matej_suchanek, Ash_Crow, Daniel_Mietchen, Lucas_Werkmeister_WMDE, Jane023, Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179879: Provide a 5-minute timeout in WDQS for trusted users using OAuth

2018-02-24 Thread bd808
bd808 added a comment.

In T179879#3998830, @doctaxon wrote:
@bd808  @chasemp  What is your meaning about this?


@Smalyshev and his team are in charge of the WDQS software and servers.TASK DETAILhttps://phabricator.wikimedia.org/T179879EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: bd808Cc: chasemp, bd808, Lydia_Pintscher, Magnus, MichaelSchoenitzer_WMDE, MisterSynergy, doctaxon, Jonas, matej_suchanek, Ash_Crow, Daniel_Mietchen, Lucas_Werkmeister_WMDE, Jane023, Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179879: Provide a 5-minute timeout in WDQS for trusted users using OAuth

2018-02-23 Thread Magnus
Magnus added a comment.
How about a separate server just for "heavy" queries? For everyone? If that goes down occasionally, hey, so what?

To be extra fancy, maybe trigger it with a SPARQL comment, "# timeout:10" or something; every query with such a comment goes to the separate server?TASK DETAILhttps://phabricator.wikimedia.org/T179879EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: MagnusCc: Lydia_Pintscher, Magnus, MichaelSchoenitzer_WMDE, MisterSynergy, doctaxon, Jonas, matej_suchanek, Ash_Crow, Daniel_Mietchen, Lucas_Werkmeister_WMDE, Jane023, Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179879: Provide a 5-minute timeout in WDQS for trusted users using OAuth

2018-02-23 Thread Smalyshev
Smalyshev added a comment.
Maybe we should get a SPARQL server that allows a longer timeout period.

The problem with it, along with obvious resource constraints (hardware costs money, supporting service instances costs money & needs people), if we have longer timeout this means we will have more heavy queries there. Which means the server would be more loaded. Which means, the next heavy query would take even more time, as it would be competing with server resources with the heavy queries already running on this server. This has a potential to evolve into a complete gridlock.

There's of course a point where it's still OK because the server has enough resources. But I don't think queries that return many millions of results should be our target, at least not for the generic endpoint, and maybe not at all. We may have a case for such queries, but I'd like to hear it and discuss it before we approach the solution for this.TASK DETAILhttps://phabricator.wikimedia.org/T179879EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Lydia_Pintscher, Magnus, MichaelSchoenitzer_WMDE, MisterSynergy, doctaxon, Jonas, matej_suchanek, Ash_Crow, Daniel_Mietchen, Lucas_Werkmeister_WMDE, Jane023, Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179879: Provide a 5-minute timeout in WDQS for trusted users using OAuth

2018-02-23 Thread Aklapper
Aklapper added a comment.
@doctaxon: I usually do not assign tasks to developers as I am not a manager of some unlimited pool of developers. :) Many tasks remain open until someone volunteers to investigate and provide a patch. The associated project tags provide information and links about individuals and teams interested in these areas.TASK DETAILhttps://phabricator.wikimedia.org/T179879EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AklapperCc: Lydia_Pintscher, Magnus, MichaelSchoenitzer_WMDE, MisterSynergy, doctaxon, Jonas, matej_suchanek, Ash_Crow, Daniel_Mietchen, Lucas_Werkmeister_WMDE, Jane023, Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179879: Provide a 5-minute timeout in WDQS for trusted users using OAuth

2018-02-23 Thread doctaxon
doctaxon added a comment.
Okay, thank you for merging, but this task is very old, too.

@Aklapper Can you assign this task to anyone of WMF, who provides the SPARQL servers. I guess, that the timeout duration can be increased server-side.

Thank you
Martin ...TASK DETAILhttps://phabricator.wikimedia.org/T179879EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: doctaxonCc: Lydia_Pintscher, Magnus, MichaelSchoenitzer_WMDE, MisterSynergy, doctaxon, Jonas, matej_suchanek, Ash_Crow, Daniel_Mietchen, Lucas_Werkmeister_WMDE, Jane023, Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Avner, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179879: Provide a 5-minute timeout in WDQS for trusted users using OAuth

2017-11-07 Thread Jane023
Jane023 added a comment.
I have come to understand (mostly through trial and error, and then asking around) that to reduce the time on the query you need to start with the thing that has the least number of items in the group you want to query. As these "sub groups" get bigger and bigger, I need to become more and more creative over time to split these sub-groups in order to make the same query on a yearly basis. It would be nice if these large "sub-groups" could have their own reporting instances in query-land somehow, so e.g. all women-items, all people-born-in-city items, all items-with-sitelinks-in-specific-project (language WP, language WS, Commons), etc.TASK DETAILhttps://phabricator.wikimedia.org/T179879EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Jane023Cc: Jane023, Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, Avner, debt, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179879: Provide a 5-minute timeout in WDQS for trusted users using OAuth

2017-11-06 Thread Base
Base added a comment.
While I support the request completely, the query linked can be optimised by using just one label language and dumping slow label service, I was able to get results for Wikipedia as big as German: http://tinyurl.com/ycbr7zq2TASK DETAILhttps://phabricator.wikimedia.org/T179879EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: BaseCc: Base, Gehel, Smalyshev, Ijon, Aklapper, Lahi, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, Avner, debt, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs