Mihaly Szjatinya has posted comments on this change. ( http://gerrit.cloudera.org:8080/23237 )
Change subject: IMPALA-14285: Add SAML2 authentication support for Coordinator Web UI ...................................................................... Patch Set 15: (23 comments) http://gerrit.cloudera.org:8080/#/c/23237/13//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/23237/13//COMMIT_MSG@31 PS13, Line 31: > Please add details on how this change was tested. Ack http://gerrit.cloudera.org:8080/#/c/23237/13/be/src/rpc/authentication-util.cc File be/src/rpc/authentication-util.cc: http://gerrit.cloudera.org:8080/#/c/23237/13/be/src/rpc/authentication-util.cc@263 PS13, Line 263: vector<string> split = Split(url, delimiter::Limit("/", 3)); > Please add ctests for this function. Added. http://gerrit.cloudera.org:8080/#/c/23237/13/be/src/rpc/authentication-util.cc@277 PS13, Line 277: bool ParseParams(std::map<string, string*>& params_to_check, const string& original, > Please add ctests for this function. Added. http://gerrit.cloudera.org:8080/#/c/23237/13/be/src/rpc/authentication-util.cc@280 PS13, Line 280: vector<string> key_value = Split(pair, delimiter::Limit("=", 1)); > What does it mean if key_value.size() is not 2? Is that an error situation It means the parameter doesn't have the standard key=value format. The code silently skips these parameters. P.S. The function was only moved by me from authentication.cc to share it with webserver.cc http://gerrit.cloudera.org:8080/#/c/23237/13/be/src/rpc/authentication-util.cc@326 PS13, Line 326: Thanks Csaba. I can only add that without load balancer multiple coordinators works fine for both http-hs2 and the webserver. These two statements seems to contradict each other: > If there is a failover, then it is to client's task to retry the SAML2 auth > workflow > The tests in test_saml2_sso.py should verify a new SAML assertion is > requested if the user presents an invalid one. http://gerrit.cloudera.org:8080/#/c/23237/13/be/src/util/backend-gflag-util.cc File be/src/util/backend-gflag-util.cc: http://gerrit.cloudera.org:8080/#/c/23237/13/be/src/util/backend-gflag-util.cc@174 PS13, Line 174: for client conn > This help text would be more descriptive if it said something like "for cli Ack http://gerrit.cloudera.org:8080/#/c/23237/13/be/src/util/backend-gflag-util.cc@180 PS13, Line 180: debug web > "WebServer" is a somewhat generic term. Would it make more sense to say "d ack, maybe 'debug web UI' http://gerrit.cloudera.org:8080/#/c/23237/13/be/src/util/backend-gflag-util.cc@190 PS13, Line 190: > hs2-http connections ack http://gerrit.cloudera.org:8080/#/c/23237/13/be/src/util/backend-gflag-util.cc@194 PS13, Line 194: > Same question as line 180. ack http://gerrit.cloudera.org:8080/#/c/23237/13/be/src/util/webserver.h File be/src/util/webserver.h: http://gerrit.cloudera.org:8080/#/c/23237/13/be/src/util/webserver.h@202 PS13, Line 202: : // Returns a '401 - Unauthorized' / 'AuthenticationRequired' to the client. : void returnUnauthorized( : struct sq_connection* connection); : : // Helper method to handle SAML authentication failure with metrics and return code : sq_callback_result_t returnSamlAuthFailure(struct sq_connection* connection); : : // Returns a response based on a TWrappedHttpResponse that can come from Frontend code. : void returnWrappedResponse( : struct sq_connection* connection, const impala::TWrappedHttpResponse& response); : > Please add comments to all of these functions explaining what they do. Ack http://gerrit.cloudera.org:8080/#/c/23237/13/be/src/util/webserver.h@329 PS13, Line 329: a before s > Nit: returning Ack http://gerrit.cloudera.org:8080/#/c/23237/13/be/src/util/webserver.cc File be/src/util/webserver.cc: http://gerrit.cloudera.org:8080/#/c/23237/13/be/src/util/webserver.cc@373 PS13, Line 373: || use_saml_ @Csaba, thanks for details! > We should still test one or two of the more likely combinations (such as > SAML+LDAP or SAML+Keberos). As this would also involve the HS2, maybe we can have a different scope for this? http://gerrit.cloudera.org:8080/#/c/23237/13/be/src/util/webserver.cc@851 PS13, Line 851: bool use_saml = use_saml_ && !sq_get_header(connection, "X-Impala-EETest"); > This line needs to check FLAGS_saml2_ee_test_mode, otherwise SAML auth can Actually the comment above is wrong here. I confused 'FLAGS_saml2_ee_test_mode' Impala flag with 'X-Impala-EETest' http header. These are unrelated. We're checking the header here, which is sent from the bootstrap scripts based on a mandatory SAML flag. Fixed the comment. http://gerrit.cloudera.org:8080/#/c/23237/13/fe/src/main/java/org/apache/impala/authentication/saml/ImpalaSamlClientBase.java File fe/src/main/java/org/apache/impala/authentication/saml/ImpalaSamlClientBase.java: http://gerrit.cloudera.org:8080/#/c/23237/13/fe/src/main/java/org/apache/impala/authentication/saml/ImpalaSamlClientBase.java@87 PS13, Line 87: // 1. Validate that InResponseTo matches a request ID we actually sent > Nit: Place all the abstract methods together in the code. This method is not even needed. http://gerrit.cloudera.org:8080/#/c/23237/13/fe/src/main/java/org/apache/impala/authentication/saml/ImpalaSamlClientBase.java@93 PS13, Line 93: > Nit: add the "final" keyword to this method so subclasses cannot override i Ack http://gerrit.cloudera.org:8080/#/c/23237/13/fe/src/main/java/org/apache/impala/authentication/saml/ImpalaSamlClientBase.java@108 PS13, Line 108: WithLocationAction locationAction = (WithLocationAction) redirect.get(); > Nit: change to "protected" visibility and add "final" keyword Ack http://gerrit.cloudera.org:8080/#/c/23237/13/fe/src/main/java/org/apache/impala/authentication/saml/ImpalaSamlClientBase.java@137 PS13, Line 137: return nameId; > Nit: add the "final" keyword to this method so subclasses cannot override i Ack http://gerrit.cloudera.org:8080/#/c/23237/13/fe/src/main/java/org/apache/impala/authentication/saml/ImpalaSamlClientHS2.java File fe/src/main/java/org/apache/impala/authentication/saml/ImpalaSamlClientHS2.java: http://gerrit.cloudera.org:8080/#/c/23237/13/fe/src/main/java/org/apache/impala/authentication/saml/ImpalaSamlClientHS2.java@46 PS13, Line 46: ImpalaSamlClientHS2.class.getSimpleName(), : HiveSamlRelayStateStoreHS2.get(), > Name and StateGenerator are set by both subclasses of ImpalaSamlClientBase. Good point, fixed http://gerrit.cloudera.org:8080/#/c/23237/13/fe/src/main/java/org/apache/impala/authentication/saml/ImpalaSamlClientHS2.java@53 PS13, Line 53: throws InternalException { > If this value will never change, then pass it to the ImpalaSamlClientBase c Fixed http://gerrit.cloudera.org:8080/#/c/23237/13/fe/src/main/java/org/apache/impala/service/JniFrontend.java File fe/src/main/java/org/apache/impala/service/JniFrontend.java: http://gerrit.cloudera.org:8080/#/c/23237/13/fe/src/main/java/org/apache/impala/service/JniFrontend.java@825 PS13, Line 825: Preconditions.checkNotNull(client, "SAML client not configured. " > Nit: client will be Null if SAML idp metadata was not configured. Pass a Ack http://gerrit.cloudera.org:8080/#/c/23237/13/fe/src/main/java/org/apache/impala/service/JniFrontend.java@844 PS13, Line 844: > Nit: client will be Null if SAML idp metadata was not configured. Pass a Ack http://gerrit.cloudera.org:8080/#/c/23237/13/tests/custom_cluster/test_saml2_sso.py File tests/custom_cluster/test_saml2_sso.py: http://gerrit.cloudera.org:8080/#/c/23237/13/tests/custom_cluster/test_saml2_sso.py@145 PS13, Line 145: assertation_id=str(uuid.uuid4()), > Please add additional test cases using a SAML response: Done. The first three are easy. For replay attacks two security mechanisms are involved: 1. MessageReplaySecurityHandler (already in place) - automatically detects the same {msg_id} on credentialsExtractor.extract(). No changes needed for this one. 2. ImpalaSAMLMessageStore (added by me) - detects the same {request_id} as 'InResponseTo'. Once in place, also checked by credentialsExtractor.extract(). It uses the same type of cache as HiveSamlRelayStateStoreBase. NOTE: While Webserver returns 401 upon any of the above attacks, the HS2 for some reason returns 200 and a html form, only without token. I've crafted the tests to work with this, but perhaps we could just return the same 401 here: https://gerrit.cloudera.org/#/c/23237/13/fe/src/main/java/org/apache/impala/authentication/saml/ImpalaSamlClientHS2.java@108 (This code is refactored but the logic is preserved) NOTE2: I just realised that for the Webserver we're not invalidating the RelayState cache in the end of the sequence. For HS2 it is done in the end of Step 3/3. This makes the replay attack possible as Step 1 -> Step 2 -> Step 2 again. But for Webserver if we invalidate at Step 2/2, this will not even allow us to reach credentialsExtractor.extract(), thus making replay attacks impossible. In that case we can remove the replay attack tests for Webserver. What do you think? cc @Chaba http://gerrit.cloudera.org:8080/#/c/23237/13/tests/custom_cluster/test_saml2_sso.py@324 PS13, Line 324: self._test_common_protocol_checks() > Need assertions for the "impala.webserver.total-saml-auth-success" and "imp Good point. -- To view, visit http://gerrit.cloudera.org:8080/23237 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12540300529f9c240abf7196141ecb0ae6e37995 Gerrit-Change-Number: 23237 Gerrit-PatchSet: 15 Gerrit-Owner: Mihaly Szjatinya <[email protected]> Gerrit-Reviewer: Abhishek Rawat <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Jason Fehr <[email protected]> Gerrit-Reviewer: Mihaly Szjatinya <[email protected]> Gerrit-Reviewer: Nandor Kollar <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Comment-Date: Wed, 04 Feb 2026 17:26:19 +0000 Gerrit-HasComments: Yes
