Hello Kudu Jenkins, Anonymous Coward (763),

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/21127

to look at the new patch set (#3).

Change subject: WIP [ARM] Concurrent binary tree memory fences fixed.
......................................................................

WIP [ARM] Concurrent binary tree memory fences fixed.

TestCBTree.TestRacyConcurrentInsert sometime fails on ARM.
The concurrent inserts produce a tree with some incorrectly ordered nodes. Apart
from incorrect order, there are no other errors in the tree. All inserted
elements are present and all edges are valid.

This is not an unit test error, but a real error in the CBTree implementation.
Modifying the test to only do the inserts first, and only start checking when
the tree is finalized does not prevent the bug. So calling only the tree->Insert
function from multiple threads is sufficient to reproduce the issue.

WIP cause: This change fixes the problem, and can be the final solution.
However I would also consider changing it to the std::atomic library (Also using
the correct memory orders like now, not the default memory_order_seq_cst
everywhere).
Either way, we need to agree on the proposed changes of the the memory barriers
then I can also make a version with using std::atomic.

Root cause:

Some memory order restrictions are not strict enough. It does not cause a
problem on x86 (There were no real changes for 11 years), but it causes a
problem on ARM.

The following 6. barrier needs to be more strict:

1. When we set the Splitting / Inserting flag on a node, then it is not enough
  to flush all or changes. It is also very important not to reorder any write
  before it. So instead of Release_Store we need a 2 way fence.

2. When we search in SeekToLeaf/GetCopy/ContainsKey/TraverseToLeaf, and we
  check if the version is unchanged, then a load_acquire is not sufficient. We
  have to also make sure no read is reordered in the other direction either.

Currently adding only the 2 fences under (1.) is sufficient to stabilize the
test. However the fences under (2.) are also wrong in theory.

Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
---
M src/kudu/tablet/concurrent_btree.h
1 file changed, 9 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/27/21127/3
--
To view, visit http://gerrit.cloudera.org:8080/21127
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie7c02dc3b9444599ef680fa9b29ae7a4d39dd382
Gerrit-Change-Number: 21127
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Martonka <[email protected]>
Gerrit-Reviewer: Anonymous Coward (763)
Gerrit-Reviewer: Kudu Jenkins (120)

Reply via email to