Adar Dembo has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14573 )
Change subject: webserver: add support for Knox URL rewriting ...................................................................... webserver: add support for Knox URL rewriting This patch modifies the web UI to support proxying by Apache Knox. When Kudu is deployed with Knox, the Kudu server is expected to be firewalled off and all web UI access is mediated by the Knox gateway. Precisely how this works is best illustrated via example. Suppose we want to access a web UI at foo.bar.com:8051 via Knox running on localhost. Instead of accessing: http://foo.bar.com:8051/varz We must access: https://localhost:8443/gateway/test/kuduui/varz?scheme=http&host=foo.bar.com&port=8051 Let's break this down: - localhost:8443 is the location of the Knox gateway. - The gateway/test/kuduui subpath is part of the Knox topology definition, which tells Knox that we're interested in accessing a Kudu web UI in the 'test' topology. - Ultimately we're interested in the /varz page in the Kudu web UI. - Because there's a web UI in each Kudu process, the query parameters tell Knox which Kudu server we're interested in, and how we want to acccess it. When Knox receives this HTTP request, it rewrites it to use the simpler form and sends it to Kudu. That's only half of the work though; Knox must also rewrite the HTTP response because all of the links in the HTML were created by a web UI unaware of its firewalled state. By the way of example, if we kept a partial URL like /varz intact, the client would try to access https://localhost:8443/varz when following the link. So Knox needs to rewrite /varz into the "long" form described above. URLs pointing to other Kudu servers (e.g. http://baz.bar.com:8051/) must also be rewritten because the client can't access those servers directly. So how do we do all of this? The first part is a KUDUUI service definition in Knox[1]. The definition uses pattern matching to identify which web UI URLs need to be rewritten and how. Unfortunately, the matching isn't robust enough to match "/.*" (including "/"). So we need to help it out. When we detect a request proxied by Knox, we prepend a special identifier to all non-external links in the response. The KUDUUI service definition searches for this identifier and rewrites all URLs that include it.[2] Almost everything in this patch either directly or indirectly facilitates that work. Other interesting things going on: - cpp-mustache was upgraded. Todd added some patches to recursively resolve a variable through all parent contexts. This is necessary if we're to find {{base_url}} at the top-level JSON context regardless of where it's used. The library now depends on a C++11-compliant compiler. - If we're responding to Knox, we need to avoid URL-encoding any query parameter values, because for some reason Knox does this on its own when it rewrites HTTP responses. - Standing up a Knox gateway is difficult, and given that Knox integration is quite ancillary to core Kudu, I didn't think that implementing a "MiniKnox" made sense. Instead, I wrote a new test that crawls all web UIs in a mini cluster and tests all links in all pages. I used the Gumbo HTML parser (added to thirdparty in previous patches) to simplify this work. What doesn't work? - The /config page references a font glyph that can't be proxied, because the link to the font is embedded in bootstrap.min.css and we don't rewrite links in CSS files. The effect is a small box instead of a lock icon, and an ugly error in the Knox logs. - Similarly, the /metrics.html and /tracing.html pages (and JS) can't be proxied because they're not templates and can't easily be made into templates. Clients who wish to use them will need to set up an SSH tunnel in order to do so. 1. See https://issues.apache.org/jira/browse/KNOX-2072 for details. 2. Another approach is to add host/port info to all non-external links. That's what Impala did in IMPALA-8897, and it's nice in that it minimizes URL rewriting in HTTP responses. But it's also fraught in that Impala doesn't always know its own hostname. Change-Id: Iee92cb094b81609356acf858b7c549b6c281a7e5 Reviewed-on: http://gerrit.cloudera.org:8080/14573 Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo <[email protected]> --- M src/kudu/integration-tests/CMakeLists.txt M src/kudu/integration-tests/registration-test.cc A src/kudu/integration-tests/webserver-crawl-itest.cc M src/kudu/master/master_path_handlers.cc M src/kudu/server/webserver.cc M src/kudu/server/webserver.h M src/kudu/tserver/tserver_path_handlers.cc M src/kudu/util/CMakeLists.txt M src/kudu/util/thread.cc A src/kudu/util/web_callback_registry.cc M src/kudu/util/web_callback_registry.h M thirdparty/build-definitions.sh M thirdparty/vars.sh M www/dashboards.mustache M www/home.mustache M www/log-anchors.mustache M www/scans.mustache M www/table.mustache M www/tables.mustache M www/tablet-rowsetlayout-svg.mustache M www/tablet-servers.mustache M www/tablet.mustache M www/tablets.mustache M www/threadz.mustache 24 files changed, 463 insertions(+), 72 deletions(-) Approvals: Kudu Jenkins: Verified Adar Dembo: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/14573 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: Iee92cb094b81609356acf858b7c549b6c281a7e5 Gerrit-Change-Number: 14573 Gerrit-PatchSet: 9 Gerrit-Owner: Adar Dembo <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Thomas Tauber-Marshall <[email protected]> Gerrit-Reviewer: Tidy Bot (241)
