On 02/05/2014 10:38, Michael Dürig wrote:
> Stay with the linked list / skip list approach but defer inserting new
> keys to a background operation. Instead new keys are put into a
> "pending-non-conflicting-id" container until picked up by a background
> thread and merged into the linked list. Conflicts are simply handled
> by a first committer wins strategy in the same way we handle the
> entire index update currently.
>
> When such an index is used, the "pending-non-conflicting-id" container
> would have to be considered and its content would have to be merged
> into the linked list on the fly. This should be rather easy by first
> sorting these extra keys and then on the fly merging the linked list
> and the sorted extra keys.
Interesting idea. We could have the index with a structure like
:index : {
_pending : {
keyA : {...},
keyB : {...},
...
keyN : {...}
},
:start : {...},
...
keyN : {...}
}
where under `_pending` it's kept, as you said, a list of keys+documents
but unlinked. This is maintained with a simple NodeBuilder.child(). The
Async agent will be the only one taking care of doing
NodeBuilder.remove() once merged.
Will it be an unmanageable conflict if we have the agent on ClusterA
removing the key "foobar" and in the same cycle ClusterB add a new
document under "foobar"?
Still not sure how I could manage deletions and edits (documents that
changed key) in this scenario.
> I think such an approach is traded off very well: the index is always
> consistent, however depending on the number of keys in the
> "pending-non-conflicting-id" container performance might degrade a bit
> until the background process has caught up again. Additionally this
> approach is only slightly more complicated than what we already have.
Well we could limit the sorting on the _pending area only to a
configurable limit to avoid OOM. Let's say 1000 keys. It could be a
decent compromise.
In the meanwhile I was investigating the B-Tree (BT). It would work on
paper of course and it has the same O() of the SkipList (SL). As
advantage it could serve ascending and descending order off the same
index. On the other hand it sounds as it will be less efficient in
serving range queries. In the case of the SL all we have to do is
seeking the correct start and then just "browse through", while in case
of the BT to navigate it we'll have to continuously going up and down
the tree, in-memory sorting of the current bucket and seeking where to
go. Plus in the case of `lastModified` dates where we always have to add
a greater element, it will require the update of all the buckets from
top to leaf.
What I'm not sure around the BT is the ability to manage conflict. Let's
say we have
A Z
| \
.. MQZ
/ | \
OQ XYZ
if the same instant ClusterA deletes O and Q the results will be
A Z
| \
.. MZ
/ \
XYZ
and ClusterB adds P with the results will be
A Z
| \
.. MQZ
/ | \
OPQ XYZ
how do we expect mongo to manage this situation during the cluster sync?
Each letter is a node and the line points to subnodes.
Any thoughts on this?
D.