Hello Tidy Bot, Zoltan Chovan, Alexey Serbin, Ashwani Raina, Kudu Jenkins,
Anonymous Coward (763),
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/21127
to look at the new patch set (#8).
Change subject: [ARM] Concurrent binary tree memory barriers fixed.
......................................................................
[ARM] Concurrent binary tree memory barriers fixed.
TestCBTree.TestRacyConcurrentInsert sometimes fails on ARM.
The concurrent inserts produce a tree with some incorrectly ordered
nodes. Apart
from incorrect order, there are no other errors in the tree. All
inserted
elements are present, but CBTree::GetCopy cannot find some due to the
incorrect
ordering.
This is not a unit test error, but a real error in the CBTree
implementation.
Modifying the test to only do the inserts first, and only start checking
when
the tree is finalized does not prevent the bug. So calling only the
tree->Insert
function from multiple threads is sufficient to reproduce the issue
(only on
ARM).
Root cause:
Some memory order restrictions are not strict enough. It does not cause
a
problem on x86 (There were no real changes for 11 years), but it causes
problems on ARM. x86 guarantees a globally consistent order for stores
(TSO).
ARM, in contrast, allows stores to different memory locations to be
observed
differently from the program order.
More info:
https://www.sciencedirect.com/science/article/pii/S1383762124000390
Solution:
The following 7 barriers need to be more strict:
1. When we set the Splitting/Inserting flag on a node, then it is not
enough
to flush all previous changes. It is also very important not to
reorder any
write before it. So instead of Release_Store (which is practically
equivalent
to std::memory_order_release), we need a 2 way barrier.
2. When we search in
SeekToLeaf/GetCopy/ContainsKey/TraverseToLeaf/SeekNextLeaf,
and we check if the version is unchanged, then a load_acquire
(equivalent to
std::memory_order_acquire) is not sufficient. We have to also make
sure no
read is reordered in the other direction.
Putting the appropriate std::atomic_thread_fence(...) calls to this 7
places
would resolve the issue. However, replacing the current function from
atomicops-internals-arm64.h to the C++ standard functions will make the
code
more consistent.
Reason for changing to std::atomics:
atomicops-internals-arm64.h was picked from chromium source and the
missing
functions was reimplemented. The header is gone since, and even chromium
uses a
solution based on std::atomic (see base/atomicops_internals_portable.h
in
chromium source). I see no reason to update the header from chromium and
implement the missing functions, just to have one more abstraction
layer, that
is basically just "function aliases" at this point.
Reason for __builtin_assume_aligned and reinterpret_cast:
Although the nodes are PACKED, We guarantee that they will be aligned to
8
bytes. They can have unaligned values put between them in the same
arena, so
packing does save memory. If you put std::atomic into a struct, g++
refuses to
pack it, so we need a placeholder.
Reason for renaming IsLocked to IsLockedUnsafe:
In the original version, it was an unsafe memory access. Currently, it
is only
in places where we hold the lock (which means we have atomic access to
the
version before and after in the same thread). Also, it is only used in
DCHECK()
macros. However, I found it disturbing to have an unsafe function
similarly
named as all the other proper atomic functions around it.
I also fixed TestRacyConcurrentInsert, so it won't freez after failing.
Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
---
M src/kudu/tablet/cbtree-test.cc
M src/kudu/tablet/concurrent_btree.h
2 files changed, 100 insertions(+), 74 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/21127/8
--
To view, visit http://gerrit.cloudera.org:8080/21127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
Gerrit-Change-Number: 21127
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Martonka <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Anonymous Coward (763)
Gerrit-Reviewer: Ashwani Raina <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Zoltan Chovan <[email protected]>
Gerrit-Reviewer: Zoltan Martonka <[email protected]>