On Tue, 6 Sep 2022 21:02:50 GMT, Valerie Peng <valer...@openjdk.org> wrote:

>> There is a race condition in JDK's SessionManager, which can lead to random 
>> exceptions.
>> 
>> **Exception:**
>> 
>> javax.net.ssl.SSLException: Internal error: close session with active objects
>>      at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:133)
>>      at 
>> java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:371)
>>      at 
>> java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:314)
>>      at 
>> java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:309)
>>      at 
>> java.base/sun.security.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1707)
>>      at 
>> java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1080)
>>      at 
>> java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:971)
>>      at SSLSocketServer.serverLoop(SSLSocketServer.java:133)
>>      at SSLSocketServer$1.run(SSLSocketServer.java:75)
>>      at java.base/java.lang.Thread.run(Thread.java:833)
>> Caused by: java.security.ProviderException: Internal error: close session 
>> with active objects
>>      at 
>> jdk.crypto.cryptoki/sun.security.pkcs11.Session.close(Session.java:127)
>>      at 
>> jdk.crypto.cryptoki/sun.security.pkcs11.Session.close(Session.java:114)
>>      at 
>> jdk.crypto.cryptoki/sun.security.pkcs11.SessionManager.closeSession(SessionManager.java:237)
>>      at 
>> jdk.crypto.cryptoki/sun.security.pkcs11.SessionManager$Pool.release(SessionManager.java:270)
>>      at 
>> jdk.crypto.cryptoki/sun.security.pkcs11.SessionManager.demoteObjSession(SessionManager.java:210)
>>      at 
>> jdk.crypto.cryptoki/sun.security.pkcs11.Session.removeObject(Session.java:101)
>>      at 
>> jdk.crypto.cryptoki/sun.security.pkcs11.SessionKeyRef.updateNativeKey(P11Key.java:1396)
>>      at 
>> jdk.crypto.cryptoki/sun.security.pkcs11.SessionKeyRef.removeNativeKey(P11Key.java:1377)
>>      at 
>> jdk.crypto.cryptoki/sun.security.pkcs11.NativeKeyHolder.releaseKeyID(P11Key.java:1329)
>>      at 
>> jdk.crypto.cryptoki/sun.security.pkcs11.P11Key.releaseKeyID(P11Key.java:156)
>>      at 
>> jdk.crypto.cryptoki/sun.security.pkcs11.P11AEADCipher.reset(P11AEADCipher.java:529)
>>      at 
>> jdk.crypto.cryptoki/sun.security.pkcs11.P11AEADCipher.ensureInitialized(P11AEADCipher.java:436)
>>      at 
>> jdk.crypto.cryptoki/sun.security.pkcs11.P11AEADCipher.implDoFinal(P11AEADCipher.java:732)
>>      at 
>> jdk.crypto.cryptoki/sun.security.pkcs11.P11AEADCipher.engineDoFinal(P11AEADCipher.java:624)
>>      at java.base/javax.crypto.Cipher.doFinal(Cipher.java:2500)
>>      at 
>> java.base/sun.security.ssl.SSLCipher$T12GcmReadCipherGenerator$GcmReadCipher.decrypt(SSLCipher.java:1659)
>>      at 
>> java.base/sun.security.ssl.SSLSocketInputRecord.decodeInputRecord(SSLSocketInputRecord.java:260)
>>      at 
>> java.base/sun.security.ssl.SSLSocketInputRecord.decode(SSLSocketInputRecord.java:181)
>>      at java.base/sun.security.ssl.SSLTransport.decode(SSLTransport.java:111)
>>      at 
>> java.base/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1508)
>>      at 
>> java.base/sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1479)
>>      at 
>> java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1064)
>>      ... 4 more
>> 
>> 
>> **Reproducibility:**
>> I started getting this exception quite reliably on JDK17 on my machine with 
>> one particular test setup using ssl-tests testsuite. Unfortunately setup 
>> itself needed some RH specific patches and also ability to reproduce depends 
>> on other factors such number keys in keystore, machine where testing was 
>> performed... I tried a bit to create some reproducer, but I couldn't find a 
>> way to easily reproduce this issue :(
>> 
>> **Problem:**
>> SunPKCS11 provider does session pooling. This is done in SessionManager [1] 
>> (one per SunPKCS11 provider). Released sessions are kept by SessionManager 
>> for a while, for reuse (in limited number). This however is a bit 
>> complicated as some sessions can own objects (e.g. keys). So there are 
>> actually 2 pools. One for sessions with objects ("objSessions") and one for 
>> sessions without objects ("opSessions"). This is because sessions without 
>> objects, which are not being used, can be safely closed (SessionManager only 
>> keeps around limited amount of these), while sessions with objects cannot be 
>> safely closed (until all objects are removed from them). Session manager has 
>> methods for getting Session for given purpose (object creation or just doing 
>> other operations), prioritizing appropriate pool. Each session has counter 
>> (called "createdObjects") to track how many objects it owns. When session is 
>> being returned to pool this counter is checked and session is placed to 
>> appropriate pool. Also wh
 en counter for some Session in "objSessions" pool reaches zero it is moved 
("demoted") to "opSessions" pool.
>> 
>> And here comes complicated part. As far as I understand it, 
>> Session.addObject() [2] (which increases "createdObjects" counter) is always 
>> being called by thread "holding" session which owns the created object. 
>> (That is: thread gets a session, uses it to create an object and calls 
>> Session.addObject() on that session to increase the counter, before 
>> returning the session to pool. See e.g.: [3]) However this is not true for 
>> Session.removeObject() [4]. (That is: thread gets session, which is not 
>> necessary the same one owning object being removed, performs object removal, 
>> but then calls Session.removeObject() on session which owned that object. 
>> See e.g.: [5]) That is Session.removeObject() can be called on Session which 
>> is in "objSessions" pool or which is being used be other thread. (object 
>> removal can happen as result of releasing key, either explicitly or as 
>> result of GC etc..). 
>> 
>> And finally, there is a problem in code handling object removal from a 
>> session. Session.removeObject() [4] first checks if "createdObjects" counter 
>> reached zero. If so, it calls SessionManager.demoteObjSession(this) [6], 
>> which attempts to remove Session from objSessions pool, if session is 
>> successfully removed from there, meaning no other thread "holds" this 
>> session, session is put to opSessions pool, if not (meaning other thread 
>> "holds" it), method just returns, since that other thread puts this session 
>> to appropriate pool, when it is done with it by calling 
>> SessionManager.releaseSession(session).
>> 
>> There is race condition here. Consider following scenario:
>> 
>> // Thread T1 runs:
>> Session.removeObject() // [4]
>> createdObjects.decrementAndGet() // returns zero
>> 
>> // Thread T2 steps in (operating on the same session instance):
>> Session.addObject() // increases "createdObjects" counter [2]
>> SessionManager.releaseSession(session) // releases session to objSessions 
>> pool
>> 
>> // Thread T1 continues:
>> SessionManager.demoteObjSession(this) // [6]
>> objSessions.remove(session) // returns true
>> opSessions.release(session)  // puts session (with objects!) to opSessions 
>> pool
>> // if opSessions is already full, close of session with objects is attempted 
>> throwing Exception..
>> 
>> 
>> **Fix:**
>> SessionManager.demoteObjSession [6] method was changed, so that check for 
>> objects is done once again if session was successfully removed from 
>> "objSessions" pool (now that it is out of pool and other threads should not 
>> be adding objects to it). Based on this check session is either released to 
>> "opSessions" pool or returned to "objSessions" pool. This can be achieved by 
>> calling releaseSession(session) instead of opSessions.release(session).
>> 
>> **Testing:**
>> jdk_security tests passed for me locally with this change.
>> I have also tested this change on top of custom JDK17 build which allows 
>> scenario, where I can reproduce this issue. Problem got fixed.
>> 
>> [1] 
>> https://github.com/openjdk/jdk/blob/9444a081cc9873caa7b5c6a78df0d1aecda6e4f1/src/jdk.crypto.cryptoki/share/classes/sun/security/pkcs11/SessionManager.java
>> [2] 
>> https://github.com/openjdk/jdk/blob/9444a081cc9873caa7b5c6a78df0d1aecda6e4f1/src/jdk.crypto.cryptoki/share/classes/sun/security/pkcs11/Session.java#L93
>> [3] 
>> https://github.com/openjdk/jdk/blob/9444a081cc9873caa7b5c6a78df0d1aecda6e4f1/src/jdk.crypto.cryptoki/share/classes/sun/security/pkcs11/P11Key.java#L1339
>> [4] 
>> https://github.com/openjdk/jdk/blob/9444a081cc9873caa7b5c6a78df0d1aecda6e4f1/src/jdk.crypto.cryptoki/share/classes/sun/security/pkcs11/Session.java#L98
>> [5] 
>> https://github.com/openjdk/jdk/blob/9444a081cc9873caa7b5c6a78df0d1aecda6e4f1/src/jdk.crypto.cryptoki/share/classes/sun/security/pkcs11/P11Key.java#L1360
>> [6] 
>> https://github.com/openjdk/jdk/blob/9444a081cc9873caa7b5c6a78df0d1aecda6e4f1/src/jdk.crypto.cryptoki/share/classes/sun/security/pkcs11/SessionManager.java#L195
>
> src/jdk.crypto.cryptoki/share/classes/sun/security/pkcs11/SessionManager.java 
> line 210:
> 
>> 208:             return;
>> 209:         }
>> 210:         releaseSession(session);
> 
> With the described race condition, have you tried fixing it by adding a 
> if-condition check before doing line 204-210, i.e. if (!session.hasObjects()) 
> { .... }?

I am afraid putting check before line 204 would not solve the issue (just 
lowered it's likelihood). Problem is, that operation consisting of check for 
objects on a session and then removing it from objSessions pool is not atomic. 
Session still could be obtained from objSessions pool by other thread after 
session.hasObjects() was called, object added to it and released back to 
objSessions pool before objSessions.remove(session) is called. I think this 
check for objects can only be trusted after session was successfully removed 
from objSessions (that is, session was in objSessions pool (no tread "holds" 
it) and was removed).

Actually whole call of demoteObjSession method is already behind one check for 
zero objects (but that check cannot be trusted), and needs to be redone after 
objSessions.remove(session),  because of problem described higher . See:
https://github.com/openjdk/jdk/blob/9444a081cc9873caa7b5c6a78df0d1aecda6e4f1/src/jdk.crypto.cryptoki/share/classes/sun/security/pkcs11/Session.java#L100

-------------

PR: https://git.openjdk.org/jdk/pull/10125

Reply via email to