Hello Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/21127
to look at the new patch set (#2).
Change subject: WIP [ARM] Concurrent binary tree memory fences fixed.
......................................................................
WIP [ARM] Concurrent binary tree memory fences fixed.
TestCBTree.TestRacyConcurrentInsert sometime fails on ARM.
The concurrent inserts produce a tree with some misordered nodes. Apart from
misordering, there are no other errors in the tree. All inserted elements are
present, all edges are valid.
This is not an unit test error, but a real error in the CBTree implementation.
Modifying the test to only do the inserts first, and only start checking when
the tree is finalized does not prevent the bug. So calling only the tree->Insert
function from multiple threads is sufficient to reproduce the issue.
WIP cause: This change fixes the problem, and can be the final solution.
However I would also consider changing it to the std::atomic library (Also using
the correct memory orders like now, not the default memory_order_seq_cst
everyhere).
Either way, we need to agree on the proposed changes of the the memory barriers
then I can also make a version with using std::atomic.
Root cause:
Some memory order restrictions are not strigt enough. It does not cause a
problem on x86 (There were no real changes for 11 years), but it causes a
problem on ARM.
The following 6. barriest needs to be more strict:
1. When we set the Splittingi/Inserting flag on a node, then it is not enough to
flush all or changes. It is also very important not to reorder any write
before it. So instead of Release_Store we need a 2 way fence.
2. When we search in SeekToLeaf/GetCopy/ContainsKey/TraverseToLeaf, and we
check if the version is unchanged, then a load_acquire is not sufficient. We
have to also make sure no read is reordered in the other direction either.
Currently adding only the 2 fences under (1.) is sufficient to stabilize the
test. However the fences under (2.) are also wrong in theory.
Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
---
M src/kudu/tablet/concurrent_btree.h
1 file changed, 9 insertions(+), 0 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/21127/2
--
To view, visit http://gerrit.cloudera.org:8080/21127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
Gerrit-Change-Number: 21127
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Martonka <[email protected]>
Gerrit-Reviewer: Kudu Jenkins (120)