Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14573

to look at the new patch set (#3).

Change subject: webserver: add support for Knox URL rewriting
......................................................................

webserver: add support for Knox URL rewriting

This patch modifies the web UI to support proxying by Apache Knox. When Kudu
is deployed with Knox, the Kudu server is expected to be firewalled off and
all web UI access is mediated by the Knox gateway. Precisely how this works
is best illustrated via example. Suppose we want to access a web UI at
foo.bar.com:8051 via Knox running on localhost. Instead of accessing:

  http://foo.bar.com:8051/varz

We must access:

  
https://localhost:8443/gateway/test/kuduui/varz?scheme=http&host=foo.bar.com&port=8051

Let's break this down:
- localhost:8443 is the location of the Knox gateway.
- The gateway/test/kuduui subpath is part of the Knox topology definition,
  which tells Knox that we're interested in accessing a Kudu web UI in the
  'test' topology.
- Ultimately we're interested in the /varz page in the Kudu web UI.
- Because there's a web UI in each Kudu process, the query parameters tell
  Knox which Kudu server we're interested in, and how we want to acccess it.

When Knox receives this HTTP request, it rewrites it to use the simpler form
and sends it to Kudu. That's only half of the work though; Knox must also
rewrite the HTTP response because all of the links in the HTML were created
by a web UI unaware of its firewalled state. By the way of example, if we
kept a partial URL like /varz intact, the client would try to access
https://localhost:8443/varz when following the link. So Knox needs to
rewrite /varz into the "long" form described above. URLs pointing to other Kudu
servers (e.g. http://baz.bar.com:8051/) must also be rewritten because the
client can't access those servers directly.

So how do we facilitate all of this? The first part is a KUDUUI service
definition in Knox. The definition uses pattern matching to identify which
web UI URLs need to be rewritten and how. Unfortunately, the matching isn't
robust enough to match "/.*" (including "/"). So we need to help it out[1].
When we detect a request proxied by Knox, we prepend a special identifier to
all non-external links in the response. The KUDUUI service definition
searches for this identifier and rewrites all URLs that include it.

Almost everything in this patch either directly or indirectly facilitates
that work. Other interesting things going on:
- cpp-mustache was upgraded. Todd added some patches to recursively resolve
  a variable through all parent contexts. This is necessary if we're to find
  {{base_url}} at the top-level JSON context regardless of where it's used.
  The library now depends on a C++11-compliant compiler.
- If we're responding to Knox, we need to avoid URL-encoding any query
  parameter values, because for some reason Knox does this on its own when
  it rewrites HTTP responses.
- Standing up a Knox gateway is difficult, and given that Knox integration
  is quite ancillary to core Kudu, I didn't think that implementing a
  "MiniKnox" made sense. Instead, I wrote a new test that crawls all web UIs
  in a mini cluster and tests all links in all pages. I used the Gumbo HTML
  parser (added to thirdparty in previous patches) to simplify this work.

What doesn't work?
- The /config page references a font glyph that can't be proxied, because
  the link to the font is embedded in bootstrap.min.css and we don't rewrite
  links in CSS files. The effect is a small box instead of a lock icon, and
  an ugly error in the Knox logs.
- Similarly, the /metrics.html and /tracing.html pages (and JS) can't be
  proxied because they're not templates and can't easily be made into
  templates. Clients who wish to use them will need to set up an SSH tunnel
  in order to do so.

1. Another approach is to add host/port info to all non-external links.
   That's what Impala did in IMPALA-8897, and it's nice in that it minimizes
   URL rewriting in HTTP responses. But it's also fraught in that Impala
   doesn't always know its own hostname.

Change-Id: Iee92cb094b81609356acf858b7c549b6c281a7e5
---
M src/kudu/integration-tests/CMakeLists.txt
M src/kudu/integration-tests/registration-test.cc
A src/kudu/integration-tests/webserver-crawl-itest.cc
M src/kudu/master/master_path_handlers.cc
M src/kudu/server/webserver.cc
M src/kudu/server/webserver.h
M src/kudu/util/CMakeLists.txt
M src/kudu/util/thread.cc
A src/kudu/util/web_callback_registry.cc
M src/kudu/util/web_callback_registry.h
M thirdparty/build-definitions.sh
M thirdparty/vars.sh
M www/dashboards.mustache
M www/home.mustache
M www/log-anchors.mustache
M www/scans.mustache
M www/table.mustache
M www/tables.mustache
M www/tablet-rowsetlayout-svg.mustache
M www/tablet-servers.mustache
M www/tablet.mustache
M www/tablets.mustache
M www/threadz.mustache
23 files changed, 437 insertions(+), 64 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/73/14573/3
--
To view, visit http://gerrit.cloudera.org:8080/14573
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Iee92cb094b81609356acf858b7c549b6c281a7e5
Gerrit-Change-Number: 14573
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Andrew Wong <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

Reply via email to