[Wikidata-bugs] [Maniphest] [Commented On] T126730: [RFC] Caching for results of wikidatasparql queries for Graphs

2016-02-15 Thread Yurik
Yurik added a comment.

Clarification:  there is a graph design sandbox 
, which re-renders the 
graph on every change. In production, the data is pulled by graphoid service 
(rarely, as it is behind varnish cache), and by client browsers (when users 
click the graph to make them interactive).


TASK DETAIL
  https://phabricator.wikimedia.org/T126730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Yurik
Cc: GWicke, Bene, Ricordisamoa, daniel, Lydia_Pintscher, Smalyshev, Jonas, 
Christopher, Yurik, hoo, Aklapper, aude, debt, Gehel, Izno, Luke081515, jkroll, 
Wikidata-bugs, Jdouglas, Deskana, Manybubbles, Mbch331, Jay8g, Ltrlg



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T126730: [RFC] Caching for results of wikidatasparql queries for Graphs

2016-02-15 Thread Smalyshev
Smalyshev added a comment.

@GWicke the context is that Yuri is building an interface that allows Graphs to 
query SPARQL endpoint. Since running the query each time graph is displayed is 
too expensive, we want some intermediate caching store that would store the 
results, possibly for the time defined in the query.

As far as I can see, we do not need change propagation there - in fact, I don't 
think it's possible as to figure out which change belongs to the result of the 
query is harder than running the query anew, in general case. We just need 
intermediary storage with expiration. So I wondered if RESTbase would be a good 
platform for it.


TASK DETAIL
  https://phabricator.wikimedia.org/T126730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: GWicke, Bene, Ricordisamoa, daniel, Lydia_Pintscher, Smalyshev, Jonas, 
Christopher, Yurik, hoo, Aklapper, aude, debt, Gehel, Izno, Luke081515, jkroll, 
Wikidata-bugs, Jdouglas, Deskana, Manybubbles, Mbch331, Jay8g, Ltrlg



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T126730: [RFC] Caching for results of wikidatasparql queries for Graphs

2016-02-15 Thread GWicke
GWicke added a comment.

@smalyshev, I would need more context to usefully comment on this.

In particular, I wonder if there are a small number of queries that get a lot 
of hits, and if those queries can be cached for long enough to result in 
worthwhile hit rates.

When discussing a use case like graphs, there are also a lot more caching 
layers and change propagation systems to consider.


TASK DETAIL
  https://phabricator.wikimedia.org/T126730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GWicke
Cc: GWicke, Bene, Ricordisamoa, daniel, Lydia_Pintscher, Smalyshev, Jonas, 
Christopher, Yurik, hoo, Aklapper, aude, debt, Gehel, Izno, Luke081515, jkroll, 
Wikidata-bugs, Jdouglas, Deskana, Manybubbles, Mbch331, Jay8g, Ltrlg



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T126730: [RFC] Caching for results of wikidatasparql queries for Graphs

2016-02-15 Thread Christopher
Christopher added a comment.

@smalyshev I completely agree with the concept of an intermediate service 
between the nanosparqlserver and the client.  I think that this service should 
"broker" requests (based on an options configuration object), and eval whether 
a query is re-executed against the BG db or the results could be returned from 
the "cache", i.e. an "offline" "response only" db.

I have been looking at Huginn https://github.com/cantino/huginn recently.  This 
is an application that delegates tasks to agents.   This (or similar app) may 
be suitable for MW extension usage just by using agents or webhooks instead of 
inline queries.


TASK DETAIL
  https://phabricator.wikimedia.org/T126730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Christopher
Cc: Bene, Ricordisamoa, daniel, Lydia_Pintscher, Smalyshev, Jonas, Christopher, 
Yurik, hoo, Aklapper, aude, debt, Gehel, Izno, Luke081515, jkroll, 
Wikidata-bugs, Jdouglas, Deskana, Manybubbles, Mbch331, Jay8g, Ltrlg



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T126730: [RFC] Caching for results of wikidatasparql queries for Graphs

2016-02-13 Thread Lydia_Pintscher
Lydia_Pintscher added a comment.

Yes as Stas says we will need to make it possible (and obvious) for people to 
get up-to-date results for maintenance tasks, work lists for editathons and 
current events.


TASK DETAIL
  https://phabricator.wikimedia.org/T126730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Lydia_Pintscher
Cc: Ricordisamoa, daniel, Lydia_Pintscher, Smalyshev, Jonas, Christopher, 
Yurik, hoo, Aklapper, aude, debt, Gehel, Izno, Luke081515, jkroll, 
Wikidata-bugs, Jdouglas, Deskana, Manybubbles, Mbch331, Jay8g, Ltrlg



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T126730: [RFC] Caching for results of wikidatasparql queries for Graphs

2016-02-13 Thread Yurik
Yurik added a comment.

@smalyshev, I think the question is what to make "default":

- should query results be cached by default, and not cached when a certain 
parameter is given, or the other way around?
- should that parameter be part of the query, or should it be a header?
- do we want to treat identical queries sent with and without "force" as being 
the same, so that if I force a query, it updates the cache for the non-forced 
ones? If so, will we need special varnish handling of this?
- should non-cached queries still be cached for a much shorter period, like 5 
seconds?


TASK DETAIL
  https://phabricator.wikimedia.org/T126730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Yurik
Cc: Ricordisamoa, daniel, Lydia_Pintscher, Smalyshev, Jonas, Christopher, 
Yurik, hoo, Aklapper, aude, debt, Gehel, Izno, Luke081515, jkroll, 
Wikidata-bugs, Jdouglas, Deskana, Manybubbles, Mbch331, Jay8g, Ltrlg



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T126730: [RFC] Caching for results of wikidatasparql queries for Graphs

2016-02-13 Thread Smalyshev
Smalyshev added a comment.

My opinion:

> should query results be cached by default, and not cached when a certain 
> parameter is given, or the other way around?

Not cached by default. But that may depend of what "default" is :) And how it 
behaves in production - if we get too much load, I may change my opinion.

> should that parameter be part of the query, or should it be a header?

Can be both, but query option is a must since query options are supported by 
various caches much better than headers, and same with clients.

> do we want to treat identical queries sent with and without "force" as being 
> the same, so that if I force a query, it updates the cache for the non-forced 
> ones?

That would be nice, but depends on which cache we use and how it works. I 
strongly suspect the sets of cacheable and non-cacheable queries will be 
largely disjoint anyway, so in practice it may not matter that much. As I see 
it, there are two kind of queries - those that you don't care about fresh 
to-the-second results - e.g. birthdays of US presidents probably didn't change 
since yesterday - and those that you do care. E.g. list of entities with broken 
"country" property may have changed since I last run the bot that fixes it, and 
I want the actual data.

Also, doing one extra query run is not a big deal, doing 1000s query runs is 
what we worry about.

> should non-cached queries still be cached for a much shorter period, like 5 
> seconds?

Well, there should be a possibility to run a completely non-cached query. It is 
a must. However, should it be an option or no-option situation, that may be 
open for discussion. Currently I am on a "no-option" position, but a lot 
depends on how exactly we will use it or for what.


TASK DETAIL
  https://phabricator.wikimedia.org/T126730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: Bene, Ricordisamoa, daniel, Lydia_Pintscher, Smalyshev, Jonas, Christopher, 
Yurik, hoo, Aklapper, aude, debt, Gehel, Izno, Luke081515, jkroll, 
Wikidata-bugs, Jdouglas, Deskana, Manybubbles, Mbch331, Jay8g, Ltrlg



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T126730: [RFC] Caching for results of wikidatasparql queries for Graphs

2016-02-12 Thread aude
aude added a comment.

we might want to track entity usage of whatever entities are used and that 
could be incorporated into cache invalidation (essentially it's arbitrary 
access)


TASK DETAIL
  https://phabricator.wikimedia.org/T126730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: aude
Cc: Ricordisamoa, daniel, Lydia_Pintscher, Smalyshev, Jonas, Christopher, 
Yurik, hoo, Aklapper, aude, debt, Gehel, Izno, Luke081515, jkroll, 
Wikidata-bugs, Jdouglas, Deskana, Manybubbles, Mbch331, Jay8g, Ltrlg



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T126730: [RFC] Caching for results of wikidatasparql queries for Graphs

2016-02-12 Thread hoo
hoo added a comment.

In https://phabricator.wikimedia.org/T126730#2022524, @Jonas wrote:

> I think it is quite obvious that we need at least some caching before the 
> Query entities are finally deployed, because Graph extension is not the only 
> possible source of DoS requests.


I didn't quite get what you're trying to say… Query entities would inherently 
give us a way to cache results and they could easily use their own (private) 
blazegraph instance which would protect (to a certain degree) them from public 
service outages (which is something you could of course do for all kinds of 
internal querying).
Protecting us from a DoS from Graph queries or query entities queries 
specifically shouldn't be too hard: We can control how often we allow users to 
purge the data and all query changes need an edit at some point (which is 
obviously visible).


TASK DETAIL
  https://phabricator.wikimedia.org/T126730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hoo
Cc: daniel, Lydia_Pintscher, Smalyshev, Jonas, Christopher, Yurik, hoo, 
Aklapper, aude, debt, Gehel, Izno, Luke081515, jkroll, Wikidata-bugs, Jdouglas, 
Deskana, Manybubbles, Mbch331, Jay8g, Ltrlg



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T126730: [RFC] Caching for results of wikidatasparql queries for Graphs

2016-02-12 Thread Ricordisamoa
Ricordisamoa added a subscriber: Ricordisamoa.
Ricordisamoa added a comment.

If a good caching strategy can be made to work without Query entities, why not 
reuse it for Scribunto access? 


TASK DETAIL
  https://phabricator.wikimedia.org/T126730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ricordisamoa
Cc: Ricordisamoa, daniel, Lydia_Pintscher, Smalyshev, Jonas, Christopher, 
Yurik, hoo, Aklapper, aude, debt, Gehel, Izno, Luke081515, jkroll, 
Wikidata-bugs, Jdouglas, Deskana, Manybubbles, Mbch331, Jay8g, Ltrlg



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T126730: [RFC] Caching for results of wikidatasparql queries for Graphs

2016-02-12 Thread Jonas
Jonas added a comment.

I think it is quite obvious that we need at least some caching before the Query 
entities are finally deployed, because Graph extension is not the only possible 
source of DoS requests.

There are a lot of use cases were we are fine with cached (old) results. We 
could make a lot of people very happy with this.


TASK DETAIL
  https://phabricator.wikimedia.org/T126730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Jonas
Cc: daniel, Lydia_Pintscher, Smalyshev, Jonas, Christopher, Yurik, hoo, 
Aklapper, aude, debt, Gehel, Izno, Luke081515, jkroll, Wikidata-bugs, Jdouglas, 
Deskana, Manybubbles, Mbch331, Jay8g, Ltrlg



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T126730: [RFC] Caching for results of wikidatasparql queries for Graphs

2016-02-12 Thread hoo
hoo added a comment.

In https://phabricator.wikimedia.org/T126730#2022680, @Ricordisamoa wrote:

> If a good caching strategy can be made to work without Query entities, why 
> not reuse it for Scribunto access? 


Inline queries have a potentially high maintenance cost and don't have a 
history, therefore I would prefer not to do that without Query entities, even 
if we can address the performance issues.


TASK DETAIL
  https://phabricator.wikimedia.org/T126730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: hoo
Cc: Ricordisamoa, daniel, Lydia_Pintscher, Smalyshev, Jonas, Christopher, 
Yurik, hoo, Aklapper, aude, debt, Gehel, Izno, Luke081515, jkroll, 
Wikidata-bugs, Jdouglas, Deskana, Manybubbles, Mbch331, Jay8g, Ltrlg



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T126730: [RFC] Caching for results of wikidatasparql queries for Graphs

2016-02-12 Thread Smalyshev
Smalyshev added a comment.

Hmm... I'm not sure I want to do this in generic case. Not all queries should 
be cached - in fact, there are plenty of queries that change and should **not** 
be cached, and that's the whole point - such as data quality queries that may 
drive bots, etc. On the other hand, there are queries that **can** be cached. 
So I wonder what if we make some path parameter of header that would control 
which caching header nginx returns? So that the client could control whether 
they want cached or uncached result (with the default being the current state 
of affairs).

On a more general note, I think the better solution for this would be to have 
some kind of intermediate data store (either on wiki or maybe in restBase?) 
that would fetch query data and cache it with various times and the graphs 
would use that store.


TASK DETAIL
  https://phabricator.wikimedia.org/T126730

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: Ricordisamoa, daniel, Lydia_Pintscher, Smalyshev, Jonas, Christopher, 
Yurik, hoo, Aklapper, aude, debt, Gehel, Izno, Luke081515, jkroll, 
Wikidata-bugs, Jdouglas, Deskana, Manybubbles, Mbch331, Jay8g, Ltrlg



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs