[ https://issues.apache.org/jira/browse/NIFI-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Peter Turcsanyi resolved NIFI-7527. ----------------------------------- Fix Version/s: 1.12.0 Resolution: Fixed > AbstractKuduProcessor deadlocks after TGT refresh > -------------------------------------------------- > > Key: NIFI-7527 > URL: https://issues.apache.org/jira/browse/NIFI-7527 > Project: Apache NiFi > Issue Type: Bug > Reporter: Tamas Palfy > Assignee: Tamas Palfy > Priority: Major > Fix For: 1.12.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > The fix for https://issues.apache.org/jira/browse/NIFI-7453 (PutKudu kerberos > issue after TGT expires) introduced a new bug: after TGT refresh the > processor ends up in a deadlock. > The reason is that the onTrigger initiates a read lock: > {code:java} > @Override > public void onTrigger(final ProcessContext context, final ProcessSession > session) throws ProcessException { > kuduClientReadLock.lock(); > try { > onTrigger(context, session, kuduClientR); > } finally { > kuduClientReadLock.unlock(); > } > } > {code} > and while the read lock is in effect, later (in the same stack) - if TGT > refresh occurs - a write lock is attempted: > {code:java} > ... > public synchronized boolean checkTGTAndRelogin() throws > LoginException { > boolean didRelogin = super.checkTGTAndRelogin(); > if (didRelogin) { > createKuduClient(context); > } > return didRelogin; > } > ... > protected void createKuduClient(ProcessContext context) { > kuduClientWriteLock.lock(); > try { > if (this.kuduClientR.get() != null) { > try { > this.kuduClientR.get().close(); > } catch (KuduException e) { > getLogger().error("Couldn't close Kudu client."); > } > } > if (kerberosUser != null) { > final KerberosAction<KuduClient> kerberosAction = new > KerberosAction<>(kerberosUser, () -> buildClient(context), getLogger()); > this.kuduClientR.set(kerberosAction.execute()); > } else { > this.kuduClientR.set(buildClient(context)); > } > } finally { > kuduClientWriteLock.unlock(); > } > } > {code} > This attempt at the write lock will get stuck, waiting for the previous read > lock to get released. > (Other threads may have acquired the same read lock but they can release it > eventually - unless they too try to acquire the write lock themselves.) > For the fix it seemed to be best to re-evalute the locking logic. > Previously basically the whole onTrigger logic was encapsulated in a read > lock, including the checking - and recreating as needed - the Kudu client > (as explained before). > It's best to just keep the actual privileged action in the read lock so the > the refreshing of the TGT and re-creation of the Kudu client can safely be > done in a write lock before that. -- This message was sent by Atlassian Jira (v8.3.4#803005)