Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16659 )
Change subject: KUDU-3210 Add lock before verifying signature ...................................................................... Patch Set 8: (7 comments) http://gerrit.cloudera.org:8080/#/c/16659/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16659/6//COMMIT_MSG@21 PS6, Line 21: OpenSSL locking callbacks are : properly registered too > CRYPTO_lock() is just calling the callback function we registered: Ah, I see. This now makes sense to me. Thank you for the clarification -- I haven't looked into the openssl code before w.r.t. CRYPTO_lock() and CRYPTO_set_locking_callback(). http://gerrit.cloudera.org:8080/#/c/16659/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16659/8//COMMIT_MSG@9 PS8, Line 9: OpenSSL FIPS Object : Module, or at least in SafeLogic CryptoComply for Servers To be more specific and for posterity, could you add details on the version of the OpenSSL library and the version of the SafeLogic module as well? Thanks! http://gerrit.cloudera.org:8080/#/c/16659/8//COMMIT_MSG@36 PS8, Line 36: commits commit http://gerrit.cloudera.org:8080/#/c/16659/8/src/kudu/security/crypto.cc File src/kudu/security/crypto.cc: http://gerrit.cloudera.org:8080/#/c/16659/8/src/kudu/security/crypto.cc@145 PS8, Line 145: #if OPENSSL_VERSION_NUMBER < 0x10100000L : std::unique_lock<std::mutex> l(mutex); : #endif Is it possible to make this and other global-style locking introduced in this changelist to be FIPS-specific only? If I'm not mistaken, we haven't seen the race manifesting itself in other but FIPS-enabled OpenSSL environments yet, and it's been few years of using this code. OpenSSL of version 1.0.2 is used in RHEL/CentOS 7 distribution, and I guess that might be a majority of what Kudu clusters are run now in the fields. Also, RHEL/CentOS 6 might be affected as well. Since I don't know much about the performance impact of these changes (BTW, did you try to run any performance tests in this regard), I would be more comfortable if we added these extra synchronization sections only only for FIPS-enabled mode (and maybe, even for particular FIPS module). For example, there might be a class that is a wrapper around std::unique_lock<std::mutex> l(mutex) in case of FIPS-enabled environment, and an empty wrapper class otherwise. Since the FIPS_mode() setting can be read and then preserved in a libsecurity-wide static variable, it should be cheap to check for the value of that variable later on. http://gerrit.cloudera.org:8080/#/c/16659/3/src/kudu/security/tls_context.cc File src/kudu/security/tls_context.cc: http://gerrit.cloudera.org:8080/#/c/16659/3/src/kudu/security/tls_context.cc@470 PS3, Line 470: #if OPENSSL_VERSION_NUMBER < 0x10100000L : unique_lock<std::mutex> lock_global(mutex); : #endif : unique_lock<RWMutex> lock(lock_); > First I had #else here too, but TSAN didn't like that: http://jenkins.kudu. Ah, I see -- there are other methods in this class that work with members protected by lock_. http://gerrit.cloudera.org:8080/#/c/16659/3/src/kudu/security/tls_handshake.cc File src/kudu/security/tls_handshake.cc: http://gerrit.cloudera.org:8080/#/c/16659/3/src/kudu/security/tls_handshake.cc@101 PS3, Line 101: DCHECK(n == recv.size() || (n == -1 && recv.empty())); : DCHECK_EQ(BIO_ctrl_pending(rbio), recv.size()); : > The tls_handshake-test.cc doesn't even use sockets, it seems it's doing the Indeed, you are right. I'm taking this back. After looking at https://github.com/apache/kudu/blob/ea1695885067dc5d39ad1f794a91a9d9e0540b1a/src/kudu/security/tls_context.cc#L547-L549 it became clear to me that TlsHandshake is working with memory-based BIO only and isn't supposed to do otherwise as for the manual page and the code in SSL_do_handshake(): "The behaviour of SSL_do_handshake() depends on the underlying BIO". I have also had some fun time looking into ssl/s3_srvr.c and ssl/s3_clnt.c. http://gerrit.cloudera.org:8080/#/c/16659/3/src/kudu/security/tls_handshake.cc@104 PS3, Line 104: f OPENSSL_VERSION_NUMBER < 0x10100000L BTW, is it possible to limit the scope of this global lock to cover only the race-affected section? I guess that's SSL_do_handshake() only, right? I'd be surprised that SSL_get_error() and BIO-based IO below are affected by races as well, but since I don't have enough information on how the race in this piece of the code manifests itself I can only assume that it somewhere within SSL_do_handshake(). -- To view, visit http://gerrit.cloudera.org:8080/16659 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifafc7dcf91db910123276b657515e410bb7f6fcd Gerrit-Change-Number: 16659 Gerrit-PatchSet: 8 Gerrit-Owner: Attila Bukor <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Attila Bukor <[email protected]> Gerrit-Reviewer: Grant Henke <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Wenzhe Zhou <[email protected]> Gerrit-Comment-Date: Thu, 29 Oct 2020 22:05:29 +0000 Gerrit-HasComments: Yes
