--- crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
On Mon, Dec 19, 2016 at 12:31 PM, Nicolas Sebrecht <[email protected]> wrote: > > Hi, > > I did a quick review. > > This feature would obviously require documentation in offlineimap.conf > and this must be marked EXPERIMENTAL. ohhh yeah. definitely. wrote it up since posting > On Mon, Dec 19, 2016 at 09:29:29AM +0000, lkcl wrote: > >> diff --git a/offlineimap/folder/LocalStatusLMDB.py >> b/offlineimap/folder/LocalStatusLMDB.py >> index e69de29..3108595 100644 >> --- a/offlineimap/folder/LocalStatusLMDB.py >> +++ b/offlineimap/folder/LocalStatusLMDB.py >> @@ -0,0 +1,299 @@ >> +# Local status cache virtual folder: LMDB backend >> +# Copyright (C) 2009-2016 Stewart Smith and contributors. >> +# Copyright (C) 2016 Luke Kenneth Casson Leighton <[email protected]> > > We decided to use "and contributors" to avoid adding new copyright > lines. i quite like explicit copyright notices. plus, it's hell on earth for distros to work out who actually contributed *unless* they're explicitly listed... and if they're not listed, the package cannot be added until the package maintainer has gone through every single damn commit working out exactly who the contributors are to every single file... by hand. if you're familiar with debian's copyright file, it lists the contributors using regexps (i.e. you can't simply claim "all contributors contributed all files" because that would be misleading and quite possibly a criminal offense to state that someone else owns copyright on code that they don't actually own). i had to maintain a package, once, that literally had tens of thousands of files in it. even considering reviewing them to create that initial debian/copyright file was out of the question, so i wrote a program to work out the discrepancies (it was an O(N^3) algorithm but it did the job). without that it would have taken literally weeks to review all the files... and it relied *critically* on the copyright holders being explicity listed. bottom line is, there's really *really* good reasons not to just say "and contributors". > If this new file is yours, I'd suggest: > > # Copyright (C) 2016 Luke Kenneth Casson Leighton and contributors. it's based on the sqlite file. > to keep things simple. ... and make life for debian, ubuntu, fedora and all other distros that have strict policies for explicitly and meticulously maintaining an accurate and up-to-date list of all copyright holders for all files... >> +import os >> +import lmdb > > > Python 2.7.10 (default, Nov 9 2016, 23:16:09) > [GCC 4.9.3] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> import lmdb > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > ImportError: No module named lmdb >>>> > > We need to have this dependency made optional. import sqlite should probably also likewise. > Notice this is likely > easier to achieve while importing/using LocalStatusLMDB in > repository/LocalStatus.py. moved it into the __init__, where once you have an env there's no further need for the lmdb module anyway > """LocalStatus backend implemented with an LMDB database.""" > > (style) yep got that one (and a few others) >> + if not os.path.exists(dirname): >> + os.makedirs(dirname) >> + if not os.path.isdir(dirname): >> + raise UserWarning("LMDB database path '%s' is not a directory."% >> + dirname) > > Race condition. Not signicant, though. the same race condition is present in the sqlite code >> + with self._env.begin() as txn: > > I wonder it's missing of locks. This class will be *instanciated* and > used more than once in different threads. ok lmdb is a multi-reader with a single global mutex on writing. it's extremely clever. so all those "with self._env.begin()" blocks are transactions (that don't block readers) - the only one(s) that will block (across all threads) is the "with self._env.begin(write=True)" ones. soOoo... the only bit that has me concerned is the get/check on the db_version.... i know what to do there.... use a self._env.begin(write=True) as txn then pass that txn through to __create_db and __upgrade_db. > In order to avoid repeating the same code/patterns in the backends, I > guess the best would be to add a new layer (class) to serialize the I/O > on top of them. Not sure about that, though. would mean close synchronisation / analysis of the backends... not sure if it's worth it... and some may have finer-grained opportunities for locking (and different rules) than others. sqlite for example does locking on writes *and* reads... whereas lmdb does locking *only* on writes, using copy-on-write and only needs a global mutex for the very top block of the b+tree... it's amazingly elegant. >> + 'flags': set(), >> + 'labels': set(), > > Trailing spaces for above two lines. :%s/ $//g :) >> + 'flags': set(flags or () ), > ^ > >> + 'labels': set(labels or () ) > ^ damn gmail variable-width fonts, can't tell what you're pointing at, i assume the space in betweeen () and ) which i put there deliberately so as to emphasise the empty tuple... yeh ok it goes :) amazingly this code actually works, i found a couple of errors, also in the current version i removed the dict and replaced it with a 4-tuple, no point storing the keys of the dict repeated all the time, they just take up space, esp. when there's versioning. the one thing i'm really not happy about is the *massive* in-memory cacheing. i have 200,000 messages, it's a 9GB gmail folder, and as a libre project maintainer i need to be keeping an eye on messages several times an hour. 100% CPU usage for up to 90 seconds, to take an in-memory copy of the database, isn't okay. however.... it looks like in-memory cacheing is an integral part of the design. i thought about creating a derivative class of UserDict (one that understands and mirrors the status_db) and then returning that from cachemessagelist, the general idea being that the replacement items, __get__ etc. functions actually *directly* reference the lmdb database on-demand instead of keeping the data in-memory. lmdb is a particularly good candidate for this because it does *not* take copies of keys or values: because it's a mmap'd shmem file (with COW enabled) it passes the *real* memory address around. kinda cool. ok off to taiwan in the morning, will take this up again in a day or so. l. _______________________________________________ OfflineIMAP-project mailing list: [email protected] http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/offlineimap-project OfflineIMAP homepages: - https://github.com/OfflineIMAP - http://offlineimap.org
