[Wikidata-bugs] [Maniphest] [Commented On] T112151: Support POST for SPARQL query endpoint

BBlack Thu, 19 Nov 2015 12:09:49 -0800

BBlack added a comment.

POST isn't just theoretically imperfect.  Proliferation of POST for what should 
be cacheable, readonly, idempotent queries is a serious long-term problem for 
us.  Primarily it's that we can't cache them in Varnish like we should be able 
to (which absorbs over 90% of all public traffic before things get down to the 
application layers), but as mentioned before it's going to affect how multi-DC 
request routing works as well, the current design of which is built on the idea 
that HTTP methods are used correctly (that we can sticky users to the primary 
writeable DC on POST for updates, and otherwise distribute their load fairly 
when they're doing only GETs and haven't done POST in a while).  Of course we 
already have cases like this though, even in the MediaWiki API.   But still, 
trying to stick to standards here isn't just a matter of theory trumping user 
requests.  It's a matter of saving ourselves future pain when we debug 
performance and availability issues across our entire
infrastructure as whole, including this one of many application services.


AFAICS there is no general solution to this problem.  You could call it a 
deficiency in the HTTP protocol itself.  On the one hand, we have strong 
reasons to prefer that methods are used correctly (i.e. GET used for readonly 
idempotent operations, POST used for non-idempotent operations that modify 
server-side data).  On the other hand, we face a serious restriction in the 
amount of data that can be passed in a GET query which can only be 
realistically worked around with long POST body data.

There's no right answer, but I think answers which make use of GET are the 
better answers here (because they preserve the meaning of the methods).  Most 
of the possible workarounds with GET basically boil down to:

1. Find a way to keep the query strings under 2K (which should be reasonable 
for a whole lot of data models!).  For this you can encode queries better (e.g. 
use short key names instead of long textual ones, etc), you can compress the 
query strings prior to encoding, etc.  Often it's simply a matter of bad data 
models, or very inefficient encoding of said data model into the URI.  For 
cases where the basic entropy of the possible query strings is always going to 
exceed 2K even after efficient coding and compression (which is probably always 
going to be the case for an open-ended generic query language like SPARQL?), 
you can look at the other two options:

2. Making it a two-phase operation: a POST of a large query to "save" the query 
itself for that user/session under some label/index, and then one or more GET 
operations after that which make use of the saved query and actually execute it 
for results.  Persisting these (as opposed to just nailing a POST in front of 
every GET) would be ideal, as is letting users share/reuse common queries where 
it makes sense, etc.

3. Using headers.  Header limits are generally higher than URI limits, and 
usually the limit is tunable server-side, so you can for instance put the bulk 
of the data in a request header like `X-SPARQL: ....` (and then still do things 
from (1) to keep the size in check).

The idea of compressing away comments and indents isn't "bad practice" - it 
would in fact be very good practice in this case.  HTTP URIs are not text 
editors or source-code-storage, they're essentially just the wire protocol for 
transmission of a compressible idea at this layer.  If there's a problem with 
client tooling, we can submit patches to those client projects and/or nudge 
them in the direction of these arguments.


TASK DETAIL
  https://phabricator.wikimedia.org/T112151

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev, BBlack
Cc: chasemp, JanZerebecki, BBlack, Andrew, Deskana, Joe, gerritbot, nichtich, 
Jneubert, Karima, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, 
Manybubbles, Mbch331



_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

[Wikidata-bugs] [Maniphest] [Commented On] T112151: Support POST for SPARQL query endpoint

Reply via email to