Matthew Moskewicz wrote:

warnings: new to list, first post, lmdb noob.

i'm a caffe user:
https://github.com/BVLC/caffe

in one use case, caffe sequentially streams though >100GB lmdbs at a
rate of ~30MB/s in blocks of about 40MB. however, if multiple caffe
processes are reading the same lmdb (opened with MDB_RDONLY), read
performance becomes limiting (i.e. the processes become IO bound), even
though the disk has sufficient read bandwidth (say ~180MB/s). some of
the relevant caffe lmdb code is here:

https://github.com/BVLC/caffe/blob/master/src/caffe/util/db.cpp

however, if i *both*
1) run  blockdev --setra 65536 --setfra 65536 /dev/sdwhatever
2) modify lmdb to call posix_madvise(env->me_map, env->me_mapsize,
POSIX_MADV_SEQUENTIAL);

then i can get >1 reader to run without being IO limited.

This is quite timing-dependent - if you start your multiple readers at exactly 
the same time and they run at exactly the same speed, then they will all be 
using the same cached pages and all of the readers can run at the full 
bandwidth of the disk. If they're staggered or not running in lockstep, then 
you'll only get partial performance.

for (2), see https://github.com/moskewcz/scratch/tree/lmdb_seq_read_opt

similarly, using a sequential read microbenchmark designed to model the
caffe reads from here:
https://github.com/moskewcz/boda/blob/master/src/lmdbif.cc

if i run one reader, i get 180MB/s bandwidth.
with two readers, but neither (1) nor (2) above, each gets ~30MB/s
bandwidth.
with (1) and (2) enabled, and two readers, each gets ~90MB/s bandwidth.

The other point to note is that sequential reads in LMDB won't remain truly 
sequential (as seen by the storage device) after a few rounds of 
inserts/deletes/updates. Once you get any element of seek/random I/O in here 
your madvise will be useless.

any advice?

mwm

PS: backstory (skippable):
caffe originally used LevelDB to get better read performance for
sequentially loading sets of ~1M 227x227x3 raw images (~200GB data).
typically processing time is ~2 hours for this data set size, yielding a
read BW need of 30MB/s or so. it's not really clear if/why LevelDB was
uses aside from the fact that the caffe author was a google intern at
the time he wrote it, but anecdotally i think the claim is that reading
the raw .jpgs had perf. issues, although it's unclear exactly what or
why. i guess it was the usual story about not getting sequential reads
without using LevelDB. they switched to lmdb a while back.

<mailto:openldap-devel@openldap.org>



--
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

Reply via email to