Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13299 )

Change subject: HS2 + HTTP(S) + BASIC/LDAP based thrift server endpoint
......................................................................


Patch Set 2:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/13299/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13299/2//COMMIT_MSG@21
PS2, Line 21:  Before this patch is committed, the intention is
            : to submit the changes to those files that are shown in this 
review as
            : a patch on Impala's native-toolchain Thrift.
It seems to me like it would be easier to just copy-paste this, since the 
Thrift Protocol/Transport interfaces are typically pretty stable across Thrift 
versions. Or, contribute the changes upstream to Thrift itself. Carrying a 
largeish patch in the context of native-toolchain seems kinda like worst of 
both worlds to me.


http://gerrit.cloudera.org:8080/#/c/13299/2//COMMIT_MSG@28
PS2, Line 28: sends a huge payload that can potentially crash the server.
yea, assuming this is meant to be exposed to the internet, we might want to 
consider fuzz testing the pre-authenticated endpoint in an ASAN build.


http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/rpc/authentication.cc
File be/src/rpc/authentication.cc:

http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/rpc/authentication.cc@502
PS2, Line 502:     LOG(ERROR) << "Failed to determine buffer length to decode 
base64 auth string.";
for all of the log messages in this function, can we include the remote socket 
address?


http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/rpc/authentication.cc@505
PS2, Line 505:   char out[out_max];
stack allocating here seems quite dangerous without constraining the length (eg 
to 1kb or something)


http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/rpc/authentication.cc@507
PS2, Line 507:   if (!Base64Decode(base64, in_len, out_max, out, &out_len)) {
Looking at Base64Decode, it doesn't seem to null-terminate the output, but down 
a few lines you're using 'out' as if it's a null-terminated C string.

I'd feel safer about all this code if we used C++ strings. gutil has this 
function:

bool Base64Unescape(const char* src, int slen, string* dest);

which will take care of all the buffer sizing stuff for you


http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/rpc/authentication.cc@521
PS2, Line 521:   size_t pass_len = out_len - user_len - 1;
again I'd feel safer about the above code if we used C++ strings, like:

pair<string,string> u_p = strings::Split(up_string, 
strings::delimiter::Limit(":", 1));

(I can't necessarily spot a bug in your C string manipulation, but I also know 
that the above won't have a bug)


http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/rpc/authentication.cc@957
PS2, Line 957: SetConnectionUsername
this function is a bit weirdly named, since in the HTTP case, it isn't setting 
the connection username a tall, but rather setting up a hook that will later 
set a connection username. Maybe a moot point based on my comment elsewhere 
that this needs to be per-request state rather than per-connection state for 
HTTP


http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/rpc/authentication.cc@977
PS2, Line 977:       const string& err = Substitute("Bad transport type: $0", 
underlying_transport_type);
can we just LOG(FATAL) here since it would be a coding bug?


http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/rpc/authentication.cc@998
PS2, Line 998:       return Status::Expected(err);
same


http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/service/impala-server.cc@2424
PS2, Line 2424:     if (hs2_http_port > 0) {
Maybe we should use '-1' to mean disable? port 0 usually means "use an 
ephemeral port'


http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/transport/THttpServer.h
File be/src/transport/THttpServer.h:

http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/transport/THttpServer.h@71
PS2, Line 71: std::vector<uint8_t>
that's a bit of an odd choice of type instead of std::string


http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/transport/THttpServer.h@80
PS2, Line 80:   THttpServerTransportFactory(bool requireBasicAuth = false)
explicit


http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/transport/THttpServer.cpp
File be/src/transport/THttpServer.cpp:

http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/transport/THttpServer.cpp@68
PS2, Line 68: strncmp
probably needs to be made case-insensitive (odd that x-forwarded-for is not 
using THRIFT_strncasecmp).


http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/transport/THttpServer.cpp@69
PS2, Line 69: 7
probably need to also check that sz >= 7, otherwise we might read past the end 
of value

(one of the advantages of pulling THttpServer into Impala code itself instead 
of trying to patch Thrift woudl be we could use nice convenience functions like 
HasPrefixString() or TryStripPrefix from gutil here, which is much less 
error-prone


http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/transport/THttpServer.cpp@76
PS2, Line 76:       base64AuthString_.resize(strlen(base64));
            :       strcpy(reinterpret_cast<char*>(base64AuthString_.data()), 
base64);
is base64AuthString_ were a string you could just use a simple assignment here


http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/transport/THttpServer.cpp@79
PS2, Line 79:       if (strcmp(base64, 
reinterpret_cast<char*>(base64AuthString_.data())) != 0) {
I think it's worth considering this a bit carefully. It seems that the design 
here is still assuming a single remote user per HTTP connection, and this code 
will reject the user with "unauthorized" if the credentials change. This may 
cause problems if Impala's put behind an HTTP load balancer, since it's quite 
common for the load balancer to hold persistent connections to the backend 
(Impala) and multiplex multiple end user requests across those connections.

It seems you're already resetting the 'authorized_' state after each set of 
headers, so we won't leak the authencation status across users, but rejecting 
here if the header changes seems like it will end up sending back a "401 
Unauthorized" to a user who actually has valid credentials if their request got 
multiplexed onto someone else's connection.

Of course, fixing this means we have to rethink the use of "connection context' 
for storing authentication information and instead use something closer to a 
'request context'. That might require a bit more surgery but I think we may 
have to go that way.

An alternative which might be quicker to implement "for now" would be to turn 
off HTTP keepalive support by sending a 'Connection: close' with each request. 
That will hurt performance a bit since every request will require a new TCP and 
TLS handshake, restarting window scaling, etc, but if we need something 
implemented super quick might be an option worth trying.


http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/transport/THttpServer.cpp@193
PS2, Line 193:     << "WWW-Authenticate: Basic" << CRLF;
in theory shouldn't there be two CRLF after the last header?

also, RFC7235 says that you need to specify a realm= parameter, eg:
      WWW-Authenticate: Basic realm="WallyWorld"



--
To view, visit http://gerrit.cloudera.org:8080/13299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic5569ac62ef3af2868b5d0581f5029dac736b2ff
Gerrit-Change-Number: 13299
Gerrit-PatchSet: 2
Gerrit-Owner: Thomas Marshall <[email protected]>
Gerrit-Reviewer: Bharath Vissapragada <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Ho <[email protected]>
Gerrit-Reviewer: Mike Yoder <[email protected]>
Gerrit-Reviewer: Sudhanshu Arora <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-Comment-Date: Fri, 17 May 2019 04:32:44 +0000
Gerrit-HasComments: Yes

Reply via email to