Hi leonid, 

Thanks for your reply, I observed another scenario , I also tested "overwrite 
mode", I slightly modify source code to change default behavior (set dbflags_ = 
SYNC, flush data to disk once transaction is committed ), also collected iostat 
, the overwrite IOPS is ~ 521 ops/sec , but iostat show that w/s is ~ 4666,  
the write amplification  is ~9,  to my understanding, overwriting exist value 
does not adjust btree,  why write amplification is so large, could you help 
explain ? thanks

Cheers,
xinxin

-----Original Message-----
From: Леонид Юрьев [mailto:[email protected]] 
Sent: Monday, May 04, 2015 6:59 PM
To: Shu, Xinxin
Cc: [email protected]
Subject: Re: large write amplification

Hi, Xinxin.

I will try to answer briefly, without a details:

- To allow readers be never blocked by a writer, LMDB provides a snapshot of 
data, indexes and directory for each completed transaction.

- Most of a db-pages (which is not changed by a particular
transaction) are "shared" between such snapshots. But any changes of data 
itself and reflection to btree-indexes (include a particular table, free-db, 
main-db and so forth) require a new pages to be used and written to the disk.

- In a large db a small "one-byte" change may make "dirty" a lot of db-pages 
(usualy 4K each). For example, one add/del/mod operation in LDAP-db with size 
of few GB,  requires about 50-100 page-level IOPS.

Leonid.

P.S.
For highload uses-cases I made a few changes in our fork of OpenLDAP/LMDB.
A one of these features we called "LIFO reclaiming".
It give us 10-50 times performance boost, especially by engaging benefits of 
write-back cache of storage subsystem.
Nowadays we used it in our production (telco) environment.
But currently ones is not safe for all cases, see
https://github.com/ReOpen/ReOpenLDAP/issues/2 and 
https://github.com/ReOpen/ReOpenLDAP/issues/1.

2015-05-04 5:31 GMT+03:00 Shu, Xinxin <[email protected]>:
> Hi list,
>
> Recently I run micro tests on LMDB on DC3700 (200GB), I use bench code 
> https://github.com/hyc/leveldb/tree/benches ,  I tested  fillrandsync mode 
> and collected iostat data, found that write amplification is large For 
> fillrandsync case:
>
> IOPS : 1020 ops/sec
>
> Iostat data shows that w/s on that SSD is 8093, and avgqu-sz is ~ 1, 
> await time is about 0.16 ms,  so the write amplification is ~8, which 
> is large to me, can someone help explain why write amplification is so 
> large? thanks
>
>
> Cheers,
> xinxin
>
>

Reply via email to