Kresten Krab Thorup wrote:
Howard,
I took a look at MDB, and it does look quite promising!
Perhaps you can enlighten us a bit on these aspects of MDB:
- How do you back up an MDB instance? Can you do that while it is actively
updating, or do you need to stop it? I'm asking because the operational ease
of the log-structured stores is one of the features that many Riak'ers like
quite a lot.
Since this was originally developed for OpenLDAP slapd we currently use
slapcat to backup an MDB instance. That won't be useful in general though, so
we will be providing an mdb_dump/mdb_load utility shortly. There is no need to
stop the database while performing a backup, but...
MDB uses MVCC, so once a read transaction starts, it is guaranteed a
self-consistent view of the database until the transaction ends.
Keeping a read txn open for a very long time will prevent the dirty page
reclaimer from re-using the pages that are referenced by that read txn. As
such, any ongoing write activity will be forced to use new pages. The DB size
can grow very rapidly in these situations, until the read txn ends.
- What is the typical response-time distribution for inserts? I've tried
to
work with BDB some time back, and one of the issues with that is that every
once in a while it slows down quite significantly as B-tree rebalancing makes
some requests unreasonably slow.
MDB is also a B+tree; at a high level it will have similar characteristics to
BDB. It's always possible for an insert that results in a page split to have a
cascade effect, causing page splits all the way back up to the root of the
tree. But in general MDB is still more efficient than BDB so the actual
variance will be much smaller.
Also note that if you're bulk loading records in sorted order, using the
MDB_APPEND option basically degenerates into sequential write operations -
when one page fills, instead of splitting it in half as usual, we just
allocate a new sibling page and continue filling it. Page "splits" of this
sort can still ripple upward, but they're so cheap as to be unmeasurable.
- Does an MDB instance exploit multiple cores? If so, what is the
structure
of this usage? In Riak, we have the benefit that a typical Riak node runs
multiple independent databases (one for each data partition/vnode), and so at
least that can provide some concurrency to better leverage I/O and CPU
concurrency.
Within an MDB environment, multiple readers can run concurrently with a single
writer. Readers are never blocked by anything. (Readers don't block writers,
but as mentioned above, can prevent old pages from being reused.)
Kresten
On Sep 29, 2012, at 9:50 PM, Kresten Krab Thorup
<[email protected]<mailto:[email protected]>> wrote:
BETS May be a good template for MDB NIFs since the MDB API looks like bdb. It
doesn't implement the Riak backend API but an ets subset.
https://github.com/krestenkrab/bets
Kresten
Trifork
On 28/09/2012, at 13.50, "Howard Chu" <[email protected]<mailto:[email protected]>>
wrote:
yaoxinming wrote:
mdb looks very good ,you can use nif ,maybe you can look the eleveldb
project,that's a good example.
Thanks for the tip, I'll look into eleveldb.
2012/9/27 Howard Chu <[email protected]<mailto:[email protected]>
<mailto:[email protected]>>
Hi, I'm interested in developing a backend using OpenLDAP's memory-mapped
database library (MDB). Unfortunately Erlang is not a language I've used
before. Any tips on where to get started?
You can find information on MDB here: http://highlandsun.com/hyc/__mdb/
<http://highlandsun.com/hyc/mdb/>
Its read performance is several times faster than anything else out there,
and scales perfectly linearly since readers require no locks. Its write
performance is also quite fast...
--
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com