| bd808 added a comment. |
In T185637#3924302, @Hjfocs wrote:Example workflow:
- the client fires a POST to https://pst.wmflabs.org/pst/curate;
- the back-end service processes the client request;
- the service fires an internal HTTP request (i.e., via http://10.68.22.221:9999) to the storage engine, in order to update some data;
- the storage engine responds to the service;
- the service sends the final response to the client.
Looking at the domain proxy setup, this translates to:
- the client fires a POST to https://pst.wmflabs.org/pst/curate;
- client connects to domain proxy nginx instance via public IP (208.80.155.156)
- Request includes Host: pst.wmflabs.org
- nginx on domain proxy server (at this moment, novaproxy-01.project-proxy.eqiad.wmflabs) reads HTTP payload to determine backend
- nginx on domain proxy server constructs a proxied HTTP POST request to http://10.68.22.221:9999
- This ip and port are selected because "pst.wmflabs.org" has been registered via Horizon to use this backing server and port.
- 10.68.22.221 is routed to the pst.wikidata-primary-sources-tool.eqiad.wmflabs VM
- the back-end service processes the client request;
- Blazegraph is running on port 9999 on pst.wikidata-primary-sources-tool.eqiad.wmflabs
- Blazegraph receives the POST
- the service fires an internal HTTP request (i.e., via http://10.68.22.221:9999) to the storage engine, in order to update some data;
- My knowledge of the stack stops here, but it sounds like Blazegraph talks to itself via HTTP?
- the storage engine responds to the service;
- Again, outside my knowledge of the deep stack activties
- the service sends the final response to the client.
- Blazegraph on pst.wikidata-primary-sources-tool.eqiad.wmflabs responds to the request from nginx on novaproxy-01.project-proxy.eqiad.wmflabs
- Nginx on novaproxy-01.project-proxy.eqiad.wmflabs responds to the client with the HTTP payload it received
The problem: I'm unpredictably getting org.apache.http.NoHttpResponseException: 10.68.22.221:9999 failed to respond. Step 4 fails.
This seems to be raised by HTTP connections that get closed between step 1 and 3.
Sending a Connection: Keep-Alive; header in step 3 doesn't fix the problem, while upstream + keepalive works.
HTTP Keep-Alive would be a client connection pooling optimization for Blazegraph talking to Blazegraph. There is no nginx proxy intervening in this conversation unless Blazegraph is actually talking to itself via the proxied hostname (pst.wmflabs.org) rather than direct communication using the 10.68.22.221:9999 ip and port. If you are using the public hostname instead of the direct ip or internal hostname, this seems potentially problematic just from the point of view of additional and unnecessary latency and routing complication. The only way that I see an nginx instance would be involved is if you are using the public hostname of the proxy instead of directly talking to the internal service via an ip or internal hostname.
Is there a part of this stack that I am misunderstanding, or is your initial inbound request actually to some other end point?
Cc: bd808, Aklapper, Hjfocs, 1978Gage2001, Lahi, aborrero, Gq86, GoranSMilovanovic, Kiailandi, Chicocvenancio, QZanden, Tbscho, dachary, LawExplorer, JJMC89, Luke081515, Wikidata-bugs, aude, Gryllida, Ricordisamoa, Sjoerddebruin, Tpt, scfc, Mbch331, Krenair, chasemp
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
