Hello Alexey Serbin, Andrew Wong,
I'd like you to do a code review. Please visit
http://gerrit.cloudera.org:8080/14573
to review the following change.
Change subject: webserver: add support for Knox URL rewriting
......................................................................
webserver: add support for Knox URL rewriting
This patch modifies the web UI to support proxying by Apache Knox. When Kudu
is deployed with Knox, the Kudu server is expected to be firewalled off and
all web UI access is mediated by the Knox gateway. Precisely how this works
is best illustrated via example. Suppose we want to access a web UI at
foo.bar.com:8051 via Knox running on localhost. Instead of accessing:
http://foo.bar.com:8051/varz
We must access:
https://localhost:8443/gateway/test/kuduui/varz?scheme=http&host=foo.bar.com&port=8051
Let's break this down:
- localhost:8443 is the location of the Knox gateway.
- The gateway/test/kuduui subpath is part of the Knox topology definition,
which tells Knox that we're interested in accessing a Kudu web UI in the
'test' topology.
- Ultimately we're interested in the /varz page in the Kudu web UI.
- Because there's a web UI in each Kudu process, the query parameters tell
Knox which Kudu server we're interested in, and how we want to acccess it.
When Knox receives this HTTP request, it rewrites it to use the simpler form
and sends it to Kudu. That's only half of the work though; Knox must also
rewrite the HTTP response because all of the links in the HTML were created
by a web UI unaware of its firewalled state. By the way of example, if we
kept a partial URL like /varz intact, the client would try to access
https://localhost:8443/varz when following the link. So Knox needs to
rewrite /varz into the "long" form described above. URLs pointing to other Kudu
servers (e.g. http://baz.bar.com:8051/) must also be rewritten because the
client can't access those servers directly.
So how do we facilitate all of this? The first part is a KUDUUI service
definition in Knox. The definition uses pattern matching to identify which
web UI URLs need to be rewritten and how. Unfortunately, the matching isn't
robust enough to match "/.*" (including "/"). So we need to help it out[1].
When we detect a request proxied by Knox, we prepend a special identifier to
all non-external links in the response. The KUDUUI service definition
searches for this identifier and rewrites all URLs that include it.
Almost everything in this patch either directly or indirectly facilitates
that work. Other interesting things going on:
- cpp-mustache was upgraded. Todd added some patches to recursively resolve
a variable through all parent contexts. This is necessary if we're to find
{{base_url}} at the top-level JSON context regardless of where it's used.
The library now depends on a C++11-compliant compiler.
- If we're responding to Knox, we need to avoid URL-encoding any query
parameter values, because for some reason Knox does this on its own when
it rewrites HTTP responses.
- Standing up a Knox gateway is difficult, and given that Knox integration
is quite ancillary to core Kudu, I didn't think that implementing a
"MiniKnox" made sense. Instead, I wrote a new test that crawls all web UIs
in a mini cluster and tests all links in all pages. I used the Gumbo HTML
parser (added to thirdparty in previous patches) to simplify this work.
What doesn't work?
- The /config page references a font glyph that can't be proxied, because
the link to the font is embedded in bootstrap.min.css and we don't rewrite
links in CSS files. The effect is a small box instead of a lock icon, and
an ugly error in the Knox logs.
- Similarly, /tracing.html page (and its javascript) can't be proxied
because it's not a template and can't easily be made into one. Clients who
wish to use tracing will need to set up an SSH tunnel in order to do so.
1. Another approach is to add host/port info to all non-external links.
That's what Impala did in IMPALA-8897, and it's nice in that it minimizes
URL rewriting in HTTP responses. But it's also fraught in that Impala
doesn't always know its own hostname.
Change-Id: Iee92cb094b81609356acf858b7c549b6c281a7e5
---
M src/kudu/integration-tests/CMakeLists.txt
A src/kudu/integration-tests/webserver-crawl-itest.cc
M src/kudu/master/master_path_handlers.cc
M src/kudu/server/webserver.cc
M src/kudu/server/webserver.h
M src/kudu/util/CMakeLists.txt
M src/kudu/util/thread.cc
A src/kudu/util/web_callback_registry.cc
M src/kudu/util/web_callback_registry.h
M thirdparty/build-definitions.sh
M thirdparty/vars.sh
M www/dashboards.mustache
M www/home.mustache
M www/log-anchors.mustache
M www/scans.mustache
M www/table.mustache
M www/tables.mustache
M www/tablet-rowsetlayout-svg.mustache
M www/tablet-servers.mustache
M www/tablet.mustache
M www/tablets.mustache
M www/threadz.mustache
22 files changed, 436 insertions(+), 63 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/73/14573/1
--
To view, visit http://gerrit.cloudera.org:8080/14573
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iee92cb094b81609356acf858b7c549b6c281a7e5
Gerrit-Change-Number: 14573
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>