Jérôme Carretero wrote: > Hi, > > > I wanted to share some thoughts and findings and perhaps collect some > comments about a small test I did: I used LMDB vs. SQLite3 to perform > indexing of a file tree. > > In this scenario, we're capturing file paths, along with some metadata > (size, mtime, and hashes, let's use only sha1 for this example), > and we want to be able to efficiently walk the tree, as well as resolve > a file's info by id or by path. > > I'm showing the SQL schema first, as it helps formalizing the "NoSQL" > case: > > CREATE TABLE IF NOT EXISTS files ( > id INTEGER PRIMARY KEY AUTOINCREMENT, > sha1 BLOB, > size INTEGER, > mtime INTEGER > ); > > CREATE TABLE IF NOT EXISTS pathsegments ( > id INTEGER PRIMARY KEY AUTOINCREMENT, > parent_id INTEGER, > file_id INTEGER, > segment_name TEXT, > FOREIGN KEY(parent_id) REFERENCES pathsegments(id), > FOREIGN KEY(file_id) REFERENCES files(id) > ); > > CREATE INDEX IF NOT EXISTS idx_sha1 ON files (sha1); > > CREATE INDEX IF NOT EXISTS idx_fileid2segment ON pathsegments > (file_id); > > CREATE INDEX IF NOT EXISTS idx_parent_id2segment ON pathsegments > (parent_id); > > > Capturing the full path of each entry as a string was not something I > wanted, and was also wasteful in the few datasets I tested it on. > So we're using path segments, and a full path can be reconstructed; in > the walking situation, from root to leaves, or if we have a file ID and > want its path, from the leaves to the root, using the fileid2segment > index and the parent information. > And if files get moved around, we can alter the database without > rewriting many paths.
Yes, that's the correct approach. This is how back-hdb, back-mdb, and back-ndb are structured as well. > With lmdb, as I was conscious that each database would "intrinsically" > build an index, I packed the file_id->{mtime,size,hash} as fid2attrs > and segment_id->{parent_id,name} as sid2kids databases, and we still > have segment_id->kids DUPSORT sid2kids for doing asciibetical readdir > like the SQL idx_parent_id2segment. > > So we have our 2 tables with 2+3 indexes in SQL, and 4 DBs in lmdb > (fid2attrs, sid2kids, sid2dirname, hash2fid). In OpenLDAP, fileID and segmentID are the same thing. And sid2kids / sid2dirname are a single table. You can read about how that's done in the back-hdb paper https://openldap.org/conf/odd-sfo-2003/proceedings.html > I implemented this (in Python) with sqlite and lmdb (using COBS for > serialization, for simplicity and because it has the nice property of > keeping serialized varint unsigned integers with preserved ordering), > and went on to build dbs from a dataset of 474782 small files and 45014 > directories (which I've coincidentally uploaded at > https://archive.org/download/ftp.modland.com_pub_modules). > > Well, I was surprised that the SQLite3 file was weighing 67,149,824 > bytes, and the lmdb one was larger at 82,718,720 bytes. That shouldn't be surprising, since SQLite3 is an update-in-place design and LMDB is copy-on-write. > In particular, looking at database statistics, it seems that the hash > to id index/DB had unexpected (for me) overhead of ~ 112% to payload > size with lmdb (and it was using less payload), with a size of 24932352 > bytes for lmdb's DB vs. 15298560 for SQLite's index; Any normal LMDB record has 8 byte overhead per record: 4 byte value length, 2 byte key length, and 2 byte flags. Most SQLite3 records have very little per-record overhead because they're fixed-width columns. Since your fid2attrs and hash2id records are fixed size, you can use MDB_DUPFIXED to eliminate all of the per-record overhead of those tables. (Don't use varints for these, that would prevent using DUPFIXED and would completely defeat the point.) > > SQLite statistics: > > Percentage of total database...................... 22.8% > Number of entries................................. 474782 > Bytes of storage consumed......................... 15298560 > Bytes of payload.................................. 12311437 80.5% > Bytes of metadata................................. 1469162 9.6% > B-tree depth...................................... 3 > Average payload per entry......................... 25.93 > Average unused bytes per entry.................... 3.20 > Average metadata per entry........................ 3.09 > Average fanout.................................... 109.00 > Non-sequential pages.............................. 2986 80.0% > Maximum payload per entry......................... 26 > Entries that use overflow......................... 0 0.0% > Index pages used.................................. 34 > Primary pages used................................ 3701 > Overflow pages used............................... 0 > Total pages used.................................. 3735 > Unused bytes on index pages....................... 16998 12.2% > Unused bytes on primary pages..................... 1500963 9.9% > Unused bytes on overflow pages.................... 0 > Unused bytes on all pages......................... 1517961 9.9% Looks like SQLite3 uses a much higher page fill factor than textbook B+trees. Normally for a random-insert case, pages are split when they're full, leaving you two pages at about 50% utilization each. So on average a B+tree's pages would be 75% full, with only 25% unused bytes. This is what LMDB does for random inserts. > LMDB: depth=3 branch_pages=66 leaf_pages=6021 overflow_pages=0 > Keys size: 9495640 > Values size: 2261081 > Payload size: 11756721 > Pages size: 24932352 > Average key size: 20.000 (20-20) > Average value size: 4.762 (1-5) > Average entry size: 24.762 > Entries number: 474782 > Overhead : 13175631 > Overhead/payload ratio: 1.121 > Overhead/entry avg: 27.751 > > I checked the overhead for in-order insertions and it's still around > 50% for this key/value sizes, much more than SQLite does (~21%) for > random insertions. > > But even if we were to remove this index, SQLite's file would still be > tighter... > > > That's it! > > > Best regards, > -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/