Hi, Xinxin. I will try to answer briefly, without a details:
- To allow readers be never blocked by a writer, LMDB provides a snapshot of data, indexes and directory for each completed transaction. - Most of a db-pages (which is not changed by a particular transaction) are "shared" between such snapshots. But any changes of data itself and reflection to btree-indexes (include a particular table, free-db, main-db and so forth) require a new pages to be used and written to the disk. - In a large db a small "one-byte" change may make "dirty" a lot of db-pages (usualy 4K each). For example, one add/del/mod operation in LDAP-db with size of few GB, requires about 50-100 page-level IOPS. Leonid. P.S. For highload uses-cases I made a few changes in our fork of OpenLDAP/LMDB. A one of these features we called "LIFO reclaiming". It give us 10-50 times performance boost, especially by engaging benefits of write-back cache of storage subsystem. Nowadays we used it in our production (telco) environment. But currently ones is not safe for all cases, see https://github.com/ReOpen/ReOpenLDAP/issues/2 and https://github.com/ReOpen/ReOpenLDAP/issues/1. 2015-05-04 5:31 GMT+03:00 Shu, Xinxin <[email protected]>: > Hi list, > > Recently I run micro tests on LMDB on DC3700 (200GB), I use bench code > https://github.com/hyc/leveldb/tree/benches , I tested fillrandsync mode > and collected iostat data, found that write amplification is large > For fillrandsync case: > > IOPS : 1020 ops/sec > > Iostat data shows that w/s on that SSD is 8093, and avgqu-sz is ~ 1, await > time is about 0.16 ms, so the write amplification is ~8, which is large to > me, can someone help explain why write amplification is so large? thanks > > > Cheers, > xinxin > >
