Tim Armstrong has uploaded a new change for review. http://gerrit.cloudera.org:8080/6224
Change subject: IMPALA-4946: fix hang in BufferPool ...................................................................... IMPALA-4946: fix hang in BufferPool Once the write is removed from the "in flight" list, both the Client and Page may be destroyed by a different thread. The fix is to signal condition variables before inside the critical section that removes the write from the in flight list. Also fix a potential pitfall with DiskIoMgr::CancelContext() where concurrent calls to the method, which can be called asynchronously with other methods, could result in a hang in DiskIoMgr::CancelContext(). I do not believe any Impala code calls it concurrently from multiple threads, so the bug was only latent. Testing: I was able to reproduce reliably by inserting a 1s sleep before the NotifyAll() calls. After the fix, the hang didn't reproduce with sleeps inside or outside the critical section. I could not come up with a unit test that had a higher reproduction rate than the current tests - the window for the race is very small. I considered adding a debug stress option to insert these delays, but with all the code moved into the critical section it wouldn't be useful. Change-Id: I13fc95b5a664544dee789c4107fccf81d2077347 --- M be/src/runtime/bufferpool/buffer-pool.cc M be/src/runtime/disk-io-mgr-internal.h 2 files changed, 5 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/24/6224/1 -- To view, visit http://gerrit.cloudera.org:8080/6224 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I13fc95b5a664544dee789c4107fccf81d2077347 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong <[email protected]>
