Re: ApacheDS with Mavibot anytime soon?

Ashma Shrestha Fri, 25 Mar 2016 09:21:49 -0700

Carlos,

Thanks for the help.


@Emmanuel - I know you have a busy schedule however, is there a timeline to
when we can expect a fix? Currently we are working a system which might
make this scenario of concurrent update and search occur more.

Ashma

On Thu, Mar 24, 2016 at 6:02 PM, Accorsi, Carlo <carlo.acco...@siemens.com>
wrote:

> HI Ashma,
> We have all our directory content dumped out as an LDIF on a periodic
> schedule. If the system crashes, we delete the partition folders on the
> file system, restart the server and  re-inject the entries from the ldif.
>
> If your directory is small < 1000 entries, you can just get them perhaps
> all in one search result.
> Otherwise use a paged result to loop through each entry and call:
>
> String ldif = LdifUtils.convertToLdif(entry)
>
> and write / append the string value to a file. This is your backup ldif
> file.
>
> Hope this helps..
>
>
>
>
> -----Original Message-----
> From: Ashma Shrestha [mailto:ashma.shres...@gmail.com]
> Sent: Thursday, March 24, 2016 4:38 PM
> To: users@directory.apache.org
> Subject: Re: ApacheDS with Mavibot anytime soon?
>
> Hi Carlos,
>
> You have mentioned "we get the 'ERR_554 double get for block' and then we
> need to restore. We have an automated way of doing this now so it's not too
> disruptive when it happens." . I am running into the same ERR_554 issue,
> can you let me know how you fixed it?
>
> Thanks,
>
> Ashma
>
> On Mon, Sep 28, 2015 at 9:15 AM, <carlo.acco...@ibs-ag.com> wrote:
>
> > Emmanuel! Wow!! Thank you for that most detailed and comprehensive
> > write up of the concurrency problem you are working to solve with
> > Mavibot.  The design sounds world class.
> > I now fully appreciate the complexity of this transaction issue and
> > the ASCII art helped make it clear. We realize you're all doing your
> > best to devise a robust solution for this and so we'll sit tight for now.
> > I appreciate the time you took to explain this to us. Have fun at the
> > conference.
> > Best, Carlo Accorsi
> >
> >
> >
> >
> > -----Original Message-----
> > From: Emmanuel Lécharny [mailto:elecha...@gmail.com]
> > Sent: Sunday, September 27, 2015 1:23 PM
> > To: users@directory.apache.org
> > Subject: Re: ApacheDS with Mavibot anytime soon?
> >
> > Le 23/09/15 21:41, carlo.acco...@ibs-ag.com a écrit :
> > > Hi,
> >
> > hi,
> >
> > sorry for the delayed response ! I saw your mail 3 days ago, but I
> > forgot to answer back then, being caught in a storm of delaying events...
> > >
> > > We have one customer that wants to get replication implemented.
> > > We've
> > set it up in house and it seems to work well.
> > > The trouble is there are 10's of thousands of users and all sorts of
> > concurrent activity. They're on M16 and a few times per month, we get
> > the
> > 'ERR_554 double get for block' and then we need to restore.
> > > We have an automated way of doing this now so it's not too
> > > disruptive
> > when it happens.
> > Hopefully. Still I understand this is not convenient.
> > >
> > > My question is, would adding replication just add more concurrent
> > activity to the servers and lead more frequently to the ERR_554
> situation?
> > For the consumer, it will be just as if the updates were coming from a
> > standard user. So, no, I don't think it's a real problem.
> > >
> > > Also he's asked to upgrade to M20 but my hunch is that for this
> > > issue it
> > won't matter much because we're still on JDBM?
> >
> > True. It's just that some bugs have been fixed in M20.
> > >
> > > Finally, we're eager to test out a release with the Mavibot backed.
> > > I'm sure it's possible to build ourselves but we're trying to keep
> > > the
> > moving parts to a minimum and use only ApacheDS releases.
> > I understand.
> >
> > Mavibot work has been delayed, and we are working on it as much as we
> > can, but it's not enough. Kiran and I have day jobs atm that forbid us
> > to dedicate as much time as would be necessary to complete the last
> > part that is mandatory to get this problem fixed : the transaction
> > system. We restarted to 'think' about the best implementation 2 weeks
> > ago (on your free time) and we expect to be able to spend more time on
> > it next week, as we will both be at Budapest, during the Apache
> Conference.
> >
> > Transaction  support (the way we need it, ie cross-btree) is not that
> > simple to impelment. We have a pretty clear idea on how it should
> > work, but that mpacts the current design, as we need to woirk in
> > memory, and avoid to copy pages that have already been modifed in the
> > transaction boundaries. We also have to deal with the cleanup of old
> > versions, which also means we need to implement the version manager.
> >
> > To give you a insight on what I'm coming for, here is a draft of my
> > thoughts about this transaction manager (note that this is my vision,
> > that I haven't shared yet, because I don't think it's covering all the
> > moving parts, but still, I think it won't be that different to what we
> > will end with ) :
> >
> > Transaction support
> > -------------------
> >
> > Mavibot must support transaction across many B-tres (ie, whatever the
> > number of b-trees we are updating, we should wait before committing
> > the changes up to the point the transaction is closed). That means we
> > may have many updates pending in memory. In any case, if a rollback is
> > done, the modified b-trees will remain intact, in the same state they
> > were before the transaction started.
> >
> > To achive this, we need to work in a working memory. As we may have to
> > fetch pages that are to be updated, we will face three cases :
> > - the page is not in memory :
> > we have to fetch them from disk, update them and store it in the
> > working memory we can read it back if needed
> > - the page is in the cache :
> > we have to copy it to the working memory
> > - the page is in the working memory : we simply update it, leaving it
> > in the working memory.
> >
> > So if we have to modify the same page many times, it will be updated
> > in the working memory, we won't need to copy it.
> >
> > That actually change the current Mavibot implementation, as it's
> > copying pages no matter what. Here, if the page is coming from the
> > working memory, we just update it.
> >
> >
> > Layout
> > ------
> >
> >  Disk            Cache           Working
> >   ___                            Memory
> >  /   \          .------.
> > +     +        /      /|         .------.
> > |\___/|       .------. |       .------. |
> > +     +       |      | |       |      | |
> > |\___/| ----> |      | | ----> |      | |
> > +     +       |      |/        |      |/
> > |\___/|       .------.         .------.
> > +     +
> >  \___/
> >
> >  When the transaction is rollbacked, we simply delete the content of
> > the working memory.
> >  When the transaction is committed, we write all the pages in the
> > working memory on disk, and into the cache, we update the header, and
> > we purge the working memory.
> >
> >  If a crash occurs during a transaction, then either the transaction
> > is rollbacked (if the process is still running) - which is just about
> > purging the working memory -, or there will be nothing to do if we
> > have to restart the application, as the modified pages were in memory
> > nad have been removed while the process crashed.
> >
> > OTOH, when we read, we must be sure that the B-trees are present, and
> > that we always use the B-tree revisions that are correlated. This is
> > critical when using LDAP, where many B-trees are updated during a sinlge
> write.
> >
> > The way to do that is to use a revision, and always use the B-trees
> > which revision is just below or equal to this revision. Let's see with
> > an example
> >
> >  +-----+---+---+---+---+---+---+---+...
> >  |\ Rev|   |   |   |   |   |   |   |
> >  | \   |   |   |   |   |   |   |   |
> >  |  \  | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
> >  |B- \ |   |   |   |   |   |   |   |
> >  |tree\|   |   |   |   |   |   | L | N
> >  +-----+---+---+---+---+---+---+---+...
> >  | aaa | X |   |   | X | X | X |   |
> >  |     | 1 | 1 | 1 | 4 | 5 | 6 | 6 |
> >  +-----+---+---+---+---+---+---+---+...
> >  | bbb | X | X | X |   | X | X | X |
> >  |     | 1 | 2 | 3 | 3 | 5 | 6 | 7 |
> >  +-----+---+---+---+---+---+---+---+...
> >  | ccc | X |   | X | X | X |   | X |
> >  |     | 1 | 1 | 3 | 4 | 5 | 5 | 7 |
> >  +-----+---+---+---+---+---+---+---+...
> >  | ddd | X | X |   |   | X |   |   |
> >  |     | 1 | 2 | 2 | 2 | 5 | 5 | 5 |
> >  +-----+---+---+---+---+---+---+---+...
> >  | eee | X |   | X | X |   | X |   |
> >  |     | 1 | 1 | 3 | 4 | 4 | 6 | 6 |
> >  +-----+---+---+---+---+---+---+---+...
> >
> > L : Latest revision
> > N : New revision
> >
> > In this table, we have had 7 updates, and 7 revsions. As all the
> > B-trees haven't been updated, we may have some 'holes' in the revision
> > associated to a B-tree. Typically, here, the 'aaa' B-tree will have 4
> > revisions (1, 4,
> > 5 and 6) created, because this B-tree has been updated
> > 4 times.
> >
> > Doing a read on revision 7 will use the following B-trees : aaa[6],
> > bbb[7], ccc[7], ddd[5], eee[6].
> >
> >
> > All in all, a read is associated to a global transaction revision
> > which is used as a marker. Every B-tree update is done using the new
> revision.
> >
> > Managing on-going reads
> > -----------------------
> >
> > We maintain all the B-tree revisions associated with a read, until the
> > read is completed, we need to get rid of the old revision when we are
> done.
> > This is critical to keep the database size to a minimum. In order to
> > do that, we must know if an older revision is still in use. We then
> > need to keep a common data structure that is shared across threads,
> > which will hold all the used revisions, ordered. With the previous
> > example, assuming revisions 2, 4 and 6 are in use, we may use a linked
> list for that purpose :
> >
> >    +---+   +---+   +---+
> > o--| 2 |<--| 4 |<--| 6 |
> >    +--2+   +--1+   +--3+
> >
> > When the read using rev 4 is done, we remove it from this list :
> >           +-------------+
> >    +---+  |    +---+    | +---+
> > o--| 2 |<-+  X-| 4 |    +-| 6 |
> >    +--2+       +--0+      +--3+
> >
> > The two remaining revisions are still in use, and one of them is older.
> > We can clean the B-tree pages associated with revision 4 if there is
> > already a new revision for this B-tree, and for the pages that have
> > been copied.
> >
> > When the read using rev 2 is done, we can clean all the B-tree pages
> > that are associated with every revision below 6, as soon as there is a
> > B-tree revision above the oldest one.
> >
> > An other important point : each revision might be used more than once.
> > Releasing a read will decrement the count, and when it goes down to 0,
> > we can safely remove the element from the list. We need a thread-safe
> > data structure for that.
> >
> > Managing the list
> > -----------------
> >
> > let's go a bit deeper in the management of this list. It has
> > interesting characteristics :
> >
> > - elements are added by a unique thread : the writter
> > - elements are not removed unless the count becomes 0
> > - the count is equal to the maximum number of thread that can use the
> > element at some point
> > - it's not because the counter is N that N thread *are* using this
> > element
> > : we may eventually have no thread using it ! (but they *may* come
> > back
> > later...)
> > - each element must be be protected agains concurrent update (which is
> > a different requirement than protecting the full list)
> > - any thread requiring an element in the list will always use the head
> > of the least (which is the most recent)
> > - we don't do any list traversal ! Once a thread uses an element, it
> > remembers about its position
> > - no user Thread is allowed to remove an element from this list. The
> > only thread that can do that is the Cleaner Thread
> >
> >
> > The last characteristic is interesting : that means we can safely
> > depends on one single thread to do the cleaning, so there is no
> > concurrent access on teh list while deleting elements.
> >
> > The deletion can be done in two ways :
> > - from the tail of the list, if the element's counter is equal to zero
> > (and we can iterate)
> > - from the inner of the list, when the element's counter to be removed
> > is zero
> >
> > The first way is simpler, but it holds potentially a lot of element if
> > the oldest one is there for a long time The second way implies we can
> > relink elements in the list, ie this is a double-linked list
> >
> > Let's assume we go for the first way. In any case, the list will
> > *always* have at least one element, even if its counter is zero :
> >
> >    head
> >    +---+
> >    | N | revision
> >    +---+
> >    | 0 | users counter (AtomicInteger)
> >    +---+
> >    |   |--o
> >    +---+
> > o--|   |
> >    +---+
> >
> > A thread wanting to use revision N will simply increment the counter,
> > which is an AtomicInteger, guaranteing that we can't have a wrong
> > update of this value. The revision will never change.
> > At this point, using the head is quite straight-forward.
> >
> > * Adding an element
> >
> > So the writter creates a new revision. This will create a new element :
> >
> >     new      head
> >    +---+     +---+
> >    |N+1|     | N |<<--- T1, T2, T5
> >    +---+     +---+
> >    | 0 |     | 3 |
> >    +---+     +---+
> >    |   |--o  |   |--o
> >    +---+     +---+
> > o--|   |  o--|   |
> >    +---+     +---+
> >
> > The Head should always be pointing to one single cell, and any thread
> > should be able to get it. This is easy to do if we use an
> AtomicReference :
> >
> >     new      [head]
> >    +---+     +---+
> >    |N+1|     | N |<<--- T1, T2, T5
> >    +---+     +---+
> >    | 0 |     | 3 |
> >    +---+     +---+
> >    |   |--o  |   |--o
> >    +---+     +---+
> > o--|   |  o--|   |
> >    +---+     +---+
> >
> > Swapping the head is done after we have updated the links, which is
> > done by the writter thread :
> >
> > Step 1 :
> >
> > Attach the head to the new element
> >
> >     new      [head]
> >    +---+     +---+
> >    |N+1|     | N |<<--- T1, T2, T5
> >    +---+     +---+
> >    | 0 |     | 3 |
> >    +---+     +---+
> >    |   |--o  |   |--o
> >    +---+     +---+
> > o--|   |<----|   |
> >    +---+     +---+
> >
> > Step 2 :
> >
> > Attach the new element to the head (Here, at the same time, a new
> > thread needs the latest revison (still N))
> >
> >
> >     new      [head]
> >    +---+     +---+
> >    |N+1|     | N |<<--- T1, T2, T5, T7
> >    +---+     +---+
> >    | 0 |     | 4 |
> >    +---+     +---+
> >    |   |---->|   |--o
> >    +---+     +---+
> > o--|   |<----|   |
> >    +---+     +---+
> >
> > Step 3 :
> >
> > Swap the head
> >
> >    [head]
> >    +---+     +---+
> >    |N+1|     | N |<<--- T1, T2, T5, T7
> >    +---+     +---+
> >    | 0 |     | 4 |
> >    +---+     +---+
> >    |   |---->|   |--o
> >    +---+     +---+
> > o--|   |<----|   |
> >    +---+     +---+
> >
> > Step 4 :
> >
> > A new thread need the latest revision
> >
> >
> >            +--- T8
> >    [head]  |
> >    +---+   | +---+
> >    |N+1|<<-+ | N |<<--- T1, T2, T5, T7
> >    +---+     +---+
> >    | 1 |     | 4 |
> >    +---+     +---+
> >    |   |---->|   |--o
> >    +---+     +---+
> > o--|   |<----|   |
> >    +---+     +---+
> >
> > At this point, no other thread will *ever* be attached to revision N
> >
> > This can go on an on, with new revisions attached to the list by the
> > writter thread. Now, let's consider two different use case :
> > 1) when the oldest revision has a counter going down to zero
> > 2) when a revision which is not the head nor the oldest that is going
> > to zero
> >
> > First case, a first approach :
> >
> > This is the simplest. We have to remove the element from the lists,
> > and check if its parent's counter is also zero and is not the head. If
> > so, we iterate until we meet an element which counter is not zero and
> > is not the head.
> >
> > In any case, we simply have to change its parent's next pointer, and
> > to remove the element from the list :
> >
> >            +--- T8   +-- T1, T2, T5, T7
> >    [head]  |         |
> >    +---+   | +---+   | +---+     +---+
> >    |N+1|<<-+ | N |<<-+ |N-1|     |N-2|x---- T9 This thread will release
> > the element, setting teh counter to 0
> >    +---+     +---+     +---+     +---+
> >    | 1 |     | 4 |     | 0 |     | 0 |  Was 1
> >    +---+     +---+     +---+     +---+
> >    |   |---->|   |---->|   |---->|   |--o
> >    +---+     +---+     +---+     +---+
> > o--|   |<----|   |<----|   |<----|   |
> >    +---+     +---+     +---+     +---+
> >
> > The thread that set the counter to 0
> >
> > Step 1 :
> >
> > Detach the last element parent's next pointer
> >
> >            +--- T8   +-- T1, T2, T5, T7
> >    [head]  |         |
> >    +---+   | +---+   | +---+     +---+
> >    |N+1|<<-+ | N |<<-+ |N-1|     |N-2|x---- T9
> >    +---+     +---+     +---+     +---+
> >    | 1 |     | 4 |     | 0 |     | 0 |
> >    +---+     +---+     +---+     +---+
> >    |   |---->|   |---->|   |--o  |   |--o
> >    +---+     +---+     +---+     +---+
> > o--|   |<----|   |<----|   |<----|   |
> >    +---+     +---+     +---+     +---+
> >
> > Step 2 :
> > Iterate if the parent's counter is 0
> >
> >            +--- T8   +-- T1, T2, T5, T7
> >    [head]  |         |
> >    +---+   | +---+   | +---+     +---+
> >    |N+1|<<-+ | N |<<-+ |N-1|     |N-2|x---- T9
> >    +---+     +---+     +---+     +---+
> >    | 1 |     | 4 |     | 0 |     | 0 |
> >    +---+     +---+     +---+     +---+
> >    |   |---->|   |--o  |   |--o  |   |--o
> >    +---+     +---+     +---+     +---+
> > o--|   |<----|   |<----|   |<----|   |
> >    +---+     +---+     +---+     +---+
> >
> > Step 3 :
> > Detach the last element which counter is 0 prev pointer
> >
> >            +--- T8   +-- T1, T2, T5, T7
> >    [head]  |         |
> >    +---+   | +---+   |     +---+     +---+
> >    |N+1|<<-+ | N |<<-+     |N-1|     |N-2|x---- T9
> >    +---+     +---+         +---+     +---+
> >    | 1 |     | 4 |         | 0 |     | 0 |
> >    +---+     +---+         +---+     +---+
> >    |   |---->|   |--o      |   |--o  |   |--o
> >    +---+     +---+         +---+     +---+
> > o--|   |<----|   |      o--|   |<----|   |
> >    +---+     +---+         +---+     +---+
> >
> > Step 4 :
> >
> >  We can now dispose the unlinked elements.
> >
> > What if a thread is freeing an element while we are already free
> > another one ? This can be exposed using the previous sample :
> >
> >            +--- T8   +-- T1, T2, T5, T7
> >    [head]  |         |
> >    +---+   | +---+   | +---+     +---+
> >    |N+1|<<-+ | N |<<-+ |N-1|     |N-2|x---- T9 This thread will release
> > the element, setting teh counter to 0
> >    +---+     +---+     +---+     +---+
> >    | 1 |     | 4 |     | 1 |     | 0 |  Was 1
> >    +---+     +---+     +---+     +---+
> >    |   |---->|   |---->|   |---->|   |--o
> >    +---+     +---+     +---+     +---+
> > o--|   |<----|   |<----|   |<----|   |
> >    +---+     +---+     +---+     +---+
> >
> > Step 0+
> >
> > A thread (10) release the N-1 revison. We have two possibilities here :
> >
> > 1) The N-2 parent's next is not yet detached
> >
> >
> >            +--- T8   +-- T1, T2, T5, T7
> >    [head]  |         |
> >    +---+   | +---+   | +---+     +---+
> >    |N+1|<<-+ | N |<<-+ |N-1|     |N-2|x---- T9 This thread will release
> > the element, setting teh counter to 0
> >    +---+     +---+     +---+     +---+
> >    | 1 |     | 4 |     | 0 |     | 0 |  Was 1
> >    +---+     +---+     +---+     +---+
> >    |   |---->|   |---->|   |---->|   |--o
> >    +---+     +---+     +---+     +---+
> > o--|   |<----|   |<----|   |<----|   |
> >    +---+     +---+     +---+     +---+
> >
> > We just continue walking the list up to the first element with a non
> > zero counter
> >
> > 2) The N-2 paren'ts next is already detached
> >
> >
> >
> >            +--- T8   +-- T1, T2, T5, T7
> >    [head]  |         |         +-- T10
> >    +---+   | +---+   | +---+   | +---+
> >    |N+1|<<-+ | N |<<-+ |N-1|x--+ |N-2|x---- T9 This thread will
> > release the element, setting teh counter to 0
> >    +---+     +---+     +---+     +---+
> >    | 1 |     | 4 |     | 0 |     | 0 |  Was 1
> >    +---+     +---+     +---+     +---+
> >    |   |---->|   |---->|   |--o  |   |--o
> >    +---+     +---+     +---+     +---+
> > o--|   |<----|   |<----|   |<----|   |
> >    +---+     +---+     +---+     +---+
> >
> > In this case, we will have 2 threads dealing with two elements to be
> > cleaned up. This is not a problem, as T9 will now detach the N-2
> > element from teh N-1 one and T9 will detach the N-1 element from the
> > list. All is safe.
> >
> > About the removed elements
> > --------------------------
> >
> > Once a thread has detached some elements which counters are equal to
> > 0, what should we do with them ? As we will update the disk with the
> > removed pages, we have to send them to teh Writter thread. This can be
> > done by pushing teminto a deque (and more specifically a
> > ConcurrentLinkedDeque)
> >
> > Another option is that the writer thread handle the cleaning of this
> > queue when it is weaken up or periodically. In this case, we don't
> > need to push the element sinto a deque, and the reader threads will
> > just set the counter to 0.
> >
> > Second case
> > -----------
> >
> > What if we accept the removal of elements from the middle of the list ?
> > This is way more complex, as we have to be sure we don't break the chain.
> > One way to get rid of this constraint is to enforce the writer thread
> > to do the job, by periodically checing the whole list (which shuld not
> > be too costly). In this case, the reader thread never update the
> > element's parent's next pointer, they just decrement teh counter.
> > When this counter becomes 0, then we can remove the element.
> >
> > Solution
> > ========
> >
> > We will separate the threads into 2 categories :
> > - readers (we may have many)
> > - writer (we have only one)
> >
> > The readers always peek the latest revision (which is the head of the
> > list). It then increment the counter. When teh thread does not need
> > the revision anymore, the counter is decremented. This is the only
> > action the reader thread will ever do on the elements and on the list,
> > but it will always keep a reference to the element into the session.
> >
> > The writer thread has a lot more to do :
> > - it is responsible for adding element at the top of the list (see
> upper).
> > Once the head is swapped, every new reader will get the added element
> > as a reference.
> > - it also has to check in the list for the elements which counter is
> > 0, if teh list is longer than 1 element. This check can be done after
> > each update, or periodically (every N seconds if the number of
> > elements is > 1, or after M additions) if tht help saving updates on
> disk.
> >
> > Referencing pages
> > -----------------
> >
> > All the B-tree revisions are kept into the BoB B-tree, so it's easy to
> > retrieve the rootPage for a given revision.
> >
> > As all the pages can't be hold in memory, we sometime have to reach a
> > page from disk or from the cache. This is done following this simple
> algorithm :
> >
> >   for a given page with offset O
> >     if the cache contains an instance for O
> >       return the page
> >     else
> >       fetch the page with offset O from disk
> >       store it in the cache
> >
> > In our case, we just add a layer, which will check in the working
> > memory for the instance of a page, given its offset.
> >
> >
> >
>
>
> --
> Ashma Shrestha
> PHP Developer
> (770) 310-9456
> ashma.shres...@gmail.com
>



-- 
Ashma Shrestha
PHP Developer
(770) 310-9456
ashma.shres...@gmail.com

Re: ApacheDS with Mavibot anytime soon?

Reply via email to