Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/13299 )
Change subject: HS2 + HTTP(S) + BASIC/LDAP based thrift server endpoint ...................................................................... Patch Set 2: (17 comments) http://gerrit.cloudera.org:8080/#/c/13299/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13299/2//COMMIT_MSG@21 PS2, Line 21: Before this patch is committed, the intention is : to submit the changes to those files that are shown in this review as : a patch on Impala's native-toolchain Thrift. It seems to me like it would be easier to just copy-paste this, since the Thrift Protocol/Transport interfaces are typically pretty stable across Thrift versions. Or, contribute the changes upstream to Thrift itself. Carrying a largeish patch in the context of native-toolchain seems kinda like worst of both worlds to me. http://gerrit.cloudera.org:8080/#/c/13299/2//COMMIT_MSG@28 PS2, Line 28: sends a huge payload that can potentially crash the server. yea, assuming this is meant to be exposed to the internet, we might want to consider fuzz testing the pre-authenticated endpoint in an ASAN build. http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/rpc/authentication.cc File be/src/rpc/authentication.cc: http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/rpc/authentication.cc@502 PS2, Line 502: LOG(ERROR) << "Failed to determine buffer length to decode base64 auth string."; for all of the log messages in this function, can we include the remote socket address? http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/rpc/authentication.cc@505 PS2, Line 505: char out[out_max]; stack allocating here seems quite dangerous without constraining the length (eg to 1kb or something) http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/rpc/authentication.cc@507 PS2, Line 507: if (!Base64Decode(base64, in_len, out_max, out, &out_len)) { Looking at Base64Decode, it doesn't seem to null-terminate the output, but down a few lines you're using 'out' as if it's a null-terminated C string. I'd feel safer about all this code if we used C++ strings. gutil has this function: bool Base64Unescape(const char* src, int slen, string* dest); which will take care of all the buffer sizing stuff for you http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/rpc/authentication.cc@521 PS2, Line 521: size_t pass_len = out_len - user_len - 1; again I'd feel safer about the above code if we used C++ strings, like: pair<string,string> u_p = strings::Split(up_string, strings::delimiter::Limit(":", 1)); (I can't necessarily spot a bug in your C string manipulation, but I also know that the above won't have a bug) http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/rpc/authentication.cc@957 PS2, Line 957: SetConnectionUsername this function is a bit weirdly named, since in the HTTP case, it isn't setting the connection username a tall, but rather setting up a hook that will later set a connection username. Maybe a moot point based on my comment elsewhere that this needs to be per-request state rather than per-connection state for HTTP http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/rpc/authentication.cc@977 PS2, Line 977: const string& err = Substitute("Bad transport type: $0", underlying_transport_type); can we just LOG(FATAL) here since it would be a coding bug? http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/rpc/authentication.cc@998 PS2, Line 998: return Status::Expected(err); same http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/service/impala-server.cc@2424 PS2, Line 2424: if (hs2_http_port > 0) { Maybe we should use '-1' to mean disable? port 0 usually means "use an ephemeral port' http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/transport/THttpServer.h File be/src/transport/THttpServer.h: http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/transport/THttpServer.h@71 PS2, Line 71: std::vector<uint8_t> that's a bit of an odd choice of type instead of std::string http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/transport/THttpServer.h@80 PS2, Line 80: THttpServerTransportFactory(bool requireBasicAuth = false) explicit http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/transport/THttpServer.cpp File be/src/transport/THttpServer.cpp: http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/transport/THttpServer.cpp@68 PS2, Line 68: strncmp probably needs to be made case-insensitive (odd that x-forwarded-for is not using THRIFT_strncasecmp). http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/transport/THttpServer.cpp@69 PS2, Line 69: 7 probably need to also check that sz >= 7, otherwise we might read past the end of value (one of the advantages of pulling THttpServer into Impala code itself instead of trying to patch Thrift woudl be we could use nice convenience functions like HasPrefixString() or TryStripPrefix from gutil here, which is much less error-prone http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/transport/THttpServer.cpp@76 PS2, Line 76: base64AuthString_.resize(strlen(base64)); : strcpy(reinterpret_cast<char*>(base64AuthString_.data()), base64); is base64AuthString_ were a string you could just use a simple assignment here http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/transport/THttpServer.cpp@79 PS2, Line 79: if (strcmp(base64, reinterpret_cast<char*>(base64AuthString_.data())) != 0) { I think it's worth considering this a bit carefully. It seems that the design here is still assuming a single remote user per HTTP connection, and this code will reject the user with "unauthorized" if the credentials change. This may cause problems if Impala's put behind an HTTP load balancer, since it's quite common for the load balancer to hold persistent connections to the backend (Impala) and multiplex multiple end user requests across those connections. It seems you're already resetting the 'authorized_' state after each set of headers, so we won't leak the authencation status across users, but rejecting here if the header changes seems like it will end up sending back a "401 Unauthorized" to a user who actually has valid credentials if their request got multiplexed onto someone else's connection. Of course, fixing this means we have to rethink the use of "connection context' for storing authentication information and instead use something closer to a 'request context'. That might require a bit more surgery but I think we may have to go that way. An alternative which might be quicker to implement "for now" would be to turn off HTTP keepalive support by sending a 'Connection: close' with each request. That will hurt performance a bit since every request will require a new TCP and TLS handshake, restarting window scaling, etc, but if we need something implemented super quick might be an option worth trying. http://gerrit.cloudera.org:8080/#/c/13299/2/be/src/transport/THttpServer.cpp@193 PS2, Line 193: << "WWW-Authenticate: Basic" << CRLF; in theory shouldn't there be two CRLF after the last header? also, RFC7235 says that you need to specify a realm= parameter, eg: WWW-Authenticate: Basic realm="WallyWorld" -- To view, visit http://gerrit.cloudera.org:8080/13299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic5569ac62ef3af2868b5d0581f5029dac736b2ff Gerrit-Change-Number: 13299 Gerrit-PatchSet: 2 Gerrit-Owner: Thomas Marshall <[email protected]> Gerrit-Reviewer: Bharath Vissapragada <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Mike Yoder <[email protected]> Gerrit-Reviewer: Sudhanshu Arora <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Comment-Date: Fri, 17 May 2019 04:32:44 +0000 Gerrit-HasComments: Yes
