[Impala-ASF-CR] IMPALA-3936: BufferedBlockMgr fixes for Pin() while write in flight.
Internal Jenkins has posted comments on this change. Change subject: IMPALA-3936: BufferedBlockMgr fixes for Pin() while write in flight. .. Patch Set 4: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/3832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4be4fad8e6f2303db19ea1e2bd0f13523781ae8e Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3936: BufferedBlockMgr fixes for Pin() while write in flight.
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-3936: BufferedBlockMgr fixes for Pin() while write in flight. .. IMPALA-3936: BufferedBlockMgr fixes for Pin() while write in flight. Fix multiple bugs that could occur when a block was unpinned then pinned again while the write was in flight. There were two problems: 1. A block's buffer could be transferred while a write is in flight, leaving the block in an invalid state. The fix is to wait for the in-flight write to complete before transferring the buffer. 2. On certain code paths in WriteComplete(), condition variables weren't signalled, leading to threads waiting for write completion not being woken up. The fix is to clarify when condition variables will be signalled and ensure that the appropriate condition variables are always signalled when the write completes. Testing: Added a targeted unit test that exercises these code paths using a debug option that controls timing of writes. Reran the stress test configuration that reproducibly triggered the bug: TPC-H query 18 on a release build with a single impalad. It succeeded. Change-Id: I4be4fad8e6f2303db19ea1e2bd0f13523781ae8e Reviewed-on: http://gerrit.cloudera.org:8080/3832 Reviewed-by: Tim Armstrong Tested-by: Internal Jenkins --- M be/src/runtime/buffered-block-mgr-test.cc M be/src/runtime/buffered-block-mgr.cc M be/src/runtime/buffered-block-mgr.h 3 files changed, 121 insertions(+), 27 deletions(-) Approvals: Internal Jenkins: Verified Tim Armstrong: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/3832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I4be4fad8e6f2303db19ea1e2bd0f13523781ae8e Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-3936: BufferedBlockMgr fixes for Pin() while write in flight.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3936: BufferedBlockMgr fixes for Pin() while write in flight. .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/3832/2//COMMIT_MSG Commit Message: Line 28: It succeeded. > I'm fine with the unit test coverage, but just thinking that if this worklo Agreed, I think it's timing-dependent though so I couldn't see a way to make it reproducible - I think the main thing would be to run the stress test on a variety of data sizes and hardware configs. I plan to port many of the BufferedBlockMgr unit tests to the new buffer pool so we should have coverage of this interesting sequence of operations. -- To view, visit http://gerrit.cloudera.org:8080/3832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4be4fad8e6f2303db19ea1e2bd0f13523781ae8e Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3936: BufferedBlockMgr fixes for Pin() while write in flight.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3936: BufferedBlockMgr fixes for Pin() while write in flight. .. Patch Set 4: Code-Review+2 rebase, carry +2 -- To view, visit http://gerrit.cloudera.org:8080/3832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4be4fad8e6f2303db19ea1e2bd0f13523781ae8e Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3936: BufferedBlockMgr fixes for Pin() while write in flight.
Dan Hecht has posted comments on this change. Change subject: IMPALA-3936: BufferedBlockMgr fixes for Pin() while write in flight. .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/3832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4be4fad8e6f2303db19ea1e2bd0f13523781ae8e Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3936: BufferedBlockMgr fixes for Pin() while write in flight.
Dan Hecht has posted comments on this change. Change subject: IMPALA-3936: BufferedBlockMgr fixes for Pin() while write in flight. .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/3832/2//COMMIT_MSG Commit Message: Line 28: It succeeded. > Hard to say - I only saw it in a specific configuration I was running local I'm fine with the unit test coverage, but just thinking that if this workload provides an interesting stress on spilling, then it'd be nice to carry forward to the buffer pool. But I suppose we don't have any reason to believe it would stress the new code in the same way. -- To view, visit http://gerrit.cloudera.org:8080/3832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4be4fad8e6f2303db19ea1e2bd0f13523781ae8e Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3936: BufferedBlockMgr fixes for Pin() while write in flight.
Tim Armstrong has posted comments on this change. Change subject: IMPALA-3936: BufferedBlockMgr fixes for Pin() while write in flight. .. Patch Set 2: (8 comments) http://gerrit.cloudera.org:8080/#/c/3832/2//COMMIT_MSG Commit Message: Line 14: the in-flight write to complete before transferring the buffer. > TransferBlock() will either unpin or delete. In the delete case, WriteComp The invariant is !in_write_ || buffer_desc_ != NULL The problem is in the delete case if src->in_write_ is true, then it will grab the buffer_desc_ from src, leaving it in the invalid state where in_write_ is true and buffer_desc_ == NULL. Your comment makes me think that there may be a similar problem in the unpin case that is also solved by this fix. I don't think we ever unpin a block twice so we wouldn't have hit it though. Line 17: woken up. The fix is to clarify when condition variables will be > was this manifesting in a hang? Yep the unit test hung in many cases and crashed in others. I didn't see that in Impala. PS2, Line 19: right > write Done Line 20: notify_all() in case there are multiple threads waiting. > is using notify_all() necessary, or just done for consistency? It isn't really necessary in some cases now that I've reasoned through it. I reverted those changes and updated the header comments to better explain the invariants. Line 28: It succeeded. > is this a configuration that will get coverage at some point during release Hard to say - I only saw it in a specific configuration I was running locally on a specific scale factor with a release build and the PHJ refactor patch. We haven't seen it in other stress tests. Maybe with the PHJ refactor patch we'll see it on other stress tests. I'm pretty happy with the coverage that the unit test provides. http://gerrit.cloudera.org:8080/#/c/3832/2/be/src/runtime/buffered-block-mgr.cc File be/src/runtime/buffered-block-mgr.cc: Line 929: block->write_complete_cv_.notify_all(); > seems okay to use notify_all(), but in practice why would there be multiple I reverted this change and updated the header comment because, as you pointed out, it isn't possible. Line 976: buffer_available_cv_.notify_all(); I'm more convinced now that this change was unnecessary. I was thinking that it would be a problem if a waiter woke up then decided not to grab the buffer, but that shouldn't be possible. I reverted this change and updated the header comment. http://gerrit.cloudera.org:8080/#/c/3832/2/be/src/runtime/buffered-block-mgr.h File be/src/runtime/buffered-block-mgr.h: Line 449: /// needs to not have been taken when this function is called. > looks like this comment needs updating if we'll always wait for in-flight w Done -- To view, visit http://gerrit.cloudera.org:8080/3832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4be4fad8e6f2303db19ea1e2bd0f13523781ae8e Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Tim Armstrong Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3936: BufferedBlockMgr fixes for Pin() while write in flight.
Tim Armstrong has uploaded a new patch set (#3). Change subject: IMPALA-3936: BufferedBlockMgr fixes for Pin() while write in flight. .. IMPALA-3936: BufferedBlockMgr fixes for Pin() while write in flight. Fix multiple bugs that could occur when a block was unpinned then pinned again while the write was in flight. There were two problems: 1. A block's buffer could be transferred while a write is in flight, leaving the block in an invalid state. The fix is to wait for the in-flight write to complete before transferring the buffer. 2. On certain code paths in WriteComplete(), condition variables weren't signalled, leading to threads waiting for write completion not being woken up. The fix is to clarify when condition variables will be signalled and ensure that the appropriate condition variables are always signalled when the write completes. Testing: Added a targeted unit test that exercises these code paths using a debug option that controls timing of writes. Reran the stress test configuration that reproducibly triggered the bug: TPC-H query 18 on a release build with a single impalad. It succeeded. Change-Id: I4be4fad8e6f2303db19ea1e2bd0f13523781ae8e --- M be/src/runtime/buffered-block-mgr-test.cc M be/src/runtime/buffered-block-mgr.cc M be/src/runtime/buffered-block-mgr.h 3 files changed, 121 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/3832/3 -- To view, visit http://gerrit.cloudera.org:8080/3832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4be4fad8e6f2303db19ea1e2bd0f13523781ae8e Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-3936: BufferedBlockMgr fixes for Pin() while write in flight.
Dan Hecht has posted comments on this change. Change subject: IMPALA-3936: BufferedBlockMgr fixes for Pin() while write in flight. .. Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/3832/2//COMMIT_MSG Commit Message: Line 14: the in-flight write to complete before transferring the buffer. TransferBlock() will either unpin or delete. In the delete case, WriteComplete() should finish the delete, so we should be okay, right? In the unpin case, it looks like we will hit the DCHECK in WriteUnpinnedBlock() that !block->in_write_. Is that what's happening? Or is there some other corruption? Line 17: woken up. The fix is to clarify when condition variables will be was this manifesting in a hang? PS2, Line 19: right write Line 20: notify_all() in case there are multiple threads waiting. is using notify_all() necessary, or just done for consistency? Line 28: It succeeded. is this a configuration that will get coverage at some point during releases (i.e. do we have good regression coverage)? http://gerrit.cloudera.org:8080/#/c/3832/2/be/src/runtime/buffered-block-mgr.cc File be/src/runtime/buffered-block-mgr.cc: Line 929: block->write_complete_cv_.notify_all(); seems okay to use notify_all(), but in practice why would there be multiple threads waiting on this cv? It's only used by TransferBlock() and two threads shouldn't be trying to do that, right? http://gerrit.cloudera.org:8080/#/c/3832/2/be/src/runtime/buffered-block-mgr.h File be/src/runtime/buffered-block-mgr.h: Line 449: /// needs to not have been taken when this function is called. looks like this comment needs updating if we'll always wait for in-flight writes. -- To view, visit http://gerrit.cloudera.org:8080/3832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4be4fad8e6f2303db19ea1e2bd0f13523781ae8e Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Dan Hecht Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3936: BufferedBlockMgr fixes for Pin() while write in flight.
Tim Armstrong has uploaded a new patch set (#2). Change subject: IMPALA-3936: BufferedBlockMgr fixes for Pin() while write in flight. .. IMPALA-3936: BufferedBlockMgr fixes for Pin() while write in flight. Fix multiple bugs that could occur when a block was unpinned then pinned again while the write was in flight. There were two problems: 1. A block's buffer could be transferred while a write is in flight, leaving the block in an invalid state. The fix is to wait for the in-flight write to complete before transferring the buffer. 2. On certain code paths in WriteComplete(), condition variables weren't signalled, leading to threads waiting for write completion not being woken up. The fix is to clarify when condition variables will be signalled and ensure that the appropriate condition variables are always signalled when the right completes. Also consistently use notify_all() in case there are multiple threads waiting. Testing: Added a targeted unit test that exercises these code paths using a debug option that controls timing of writes. Reran the stress test configuration that reproducibly triggered the bug: TPC-H query 18 on a release build with a single impalad. It succeeded. Change-Id: I4be4fad8e6f2303db19ea1e2bd0f13523781ae8e --- M be/src/runtime/buffered-block-mgr-test.cc M be/src/runtime/buffered-block-mgr.cc M be/src/runtime/buffered-block-mgr.h 3 files changed, 114 insertions(+), 25 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/32/3832/2 -- To view, visit http://gerrit.cloudera.org:8080/3832 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I4be4fad8e6f2303db19ea1e2bd0f13523781ae8e Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Tim Armstrong
