BBlack added a comment. POST isn't just theoretically imperfect. Proliferation of POST for what should be cacheable, readonly, idempotent queries is a serious long-term problem for us. Primarily it's that we can't cache them in Varnish like we should be able to (which absorbs over 90% of all public traffic before things get down to the application layers), but as mentioned before it's going to affect how multi-DC request routing works as well, the current design of which is built on the idea that HTTP methods are used correctly (that we can sticky users to the primary writeable DC on POST for updates, and otherwise distribute their load fairly when they're doing only GETs and haven't done POST in a while). Of course we already have cases like this though, even in the MediaWiki API. But still, trying to stick to standards here isn't just a matter of theory trumping user requests. It's a matter of saving ourselves future pain when we debug performance and availability issues across our entire infrastructure as whole, including this one of many application services.
AFAICS there is no general solution to this problem. You could call it a deficiency in the HTTP protocol itself. On the one hand, we have strong reasons to prefer that methods are used correctly (i.e. GET used for readonly idempotent operations, POST used for non-idempotent operations that modify server-side data). On the other hand, we face a serious restriction in the amount of data that can be passed in a GET query which can only be realistically worked around with long POST body data. There's no right answer, but I think answers which make use of GET are the better answers here (because they preserve the meaning of the methods). Most of the possible workarounds with GET basically boil down to: 1. Find a way to keep the query strings under 2K (which should be reasonable for a whole lot of data models!). For this you can encode queries better (e.g. use short key names instead of long textual ones, etc), you can compress the query strings prior to encoding, etc. Often it's simply a matter of bad data models, or very inefficient encoding of said data model into the URI. For cases where the basic entropy of the possible query strings is always going to exceed 2K even after efficient coding and compression (which is probably always going to be the case for an open-ended generic query language like SPARQL?), you can look at the other two options: 2. Making it a two-phase operation: a POST of a large query to "save" the query itself for that user/session under some label/index, and then one or more GET operations after that which make use of the saved query and actually execute it for results. Persisting these (as opposed to just nailing a POST in front of every GET) would be ideal, as is letting users share/reuse common queries where it makes sense, etc. 3. Using headers. Header limits are generally higher than URI limits, and usually the limit is tunable server-side, so you can for instance put the bulk of the data in a request header like `X-SPARQL: ....` (and then still do things from (1) to keep the size in check). The idea of compressing away comments and indents isn't "bad practice" - it would in fact be very good practice in this case. HTTP URIs are not text editors or source-code-storage, they're essentially just the wire protocol for transmission of a compressible idea at this layer. If there's a problem with client tooling, we can submit patches to those client projects and/or nudge them in the direction of these arguments. TASK DETAIL https://phabricator.wikimedia.org/T112151 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Smalyshev, BBlack Cc: chasemp, JanZerebecki, BBlack, Andrew, Deskana, Joe, gerritbot, nichtich, Jneubert, Karima, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, Mbch331 _______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs