Ðеонид ЮÑÑев wrote: > 2014-10-03 3:13 GMT+04:00 Howard Chu <[email protected]>: > 2014-10-04 0:04 GMT+04:00 Ðеонид ЮÑÑев <[email protected]>:
>> 2014-10-03 3:13 GMT+04:00 Howard Chu <[email protected]>: >>>> commit 841059330fd44769e93eb4b937c3ce42654fad6f >>>> Author: Leo Yuriev <[email protected]> >>>> Date: 2014-09-20 07:16:15 +0400 >>>> >>>> BUGFIX - lmdb: lock meta-pages in writemap-mode to avoid unexpected >>>> write, >>>> before the data pages would be synchronized. >>>> >>>> Without locking the meta-pages may be writen by OS before other >>>> data, >>>> in this case database would be inconsistent. >>> >>> >>> Seems unnecessary. Won't happen by default; could happen with MDB_NOSYNC but >>> that risk is already documented. >> >> We are using the combination: >> envflags writemap nosync lifo >> checkpoint 0 1 >> >> If the checkpoint is set in seconds, it gives us the assurance >> consistent state database on disk. >> However, without this patch meta-pages can be written by the kernel >> before the data. >> >> In fact, for a full guarantee in case of death slapd process, >> meta-page should be written explicitly. No, the DB can never go inconsistent due to a process crashing - the pages in OS cache are always correct. It can only go inconsistent if the OS crashes and a proper sync has not occurred. >> But it requires a lot of changes and I do not do that. >>>> commit 8ddd63161aeb2689822d1a8d27385d62e4e341ae >>>> Author: Leo Yuriev <[email protected]> >>>> Date: 2014-09-19 22:47:19 +0400 >>>> >>>> BUGFIX - lmdb: properly sync meta-pages in mdb_sync_env(). >>>> >>>> Meta-pages may be updated during data-syncing in mdb_sync_env(), >>>> in this case database would be inconsistent. >>>> >>>> Check-and-retry if lead txn-id changed during flushing data in >>>> mdb_sync_env(). Fundamentally, you are trying to make an inherently unsafe configuration "safer", but it's impossible. Assume you have mlock'd the meta pages into memory, so the OS never flushes them itself any more, and you're running with NOSYNC. That means, within 3 transactions, the data pages on disk will be out of sync with the meta pages on disk. If the OS crashes at that point, the entire DB will be lost. The only way to make this mode of operation somewhat safe is to defer reclaiming pages for even longer. E.g., instead of halting at current_txnid - 3, halt at current_txnid - 22, in which case the data pointed to by the on-disk meta pages cannot get obsolete until 20 transactions have occurred. Note that in combination with your LIFO patch, it's pretty much guaranteed that the on-disk meta pages will be useless after only 2 un-sync'd transactions. >>> Probably could simplify this, just obtain the write mutex unconditionally, >>> then there's no need to loop or retry. But also, this depends on MDB_NOLOCK >>> - if that's set, then do no locking at all. >> >> I did so for reasons of performance and less a lock retention time. >> >> Retries will be if there an intensive flow of changes. >> In this case it will be a lot of updated pages, the record which will >> take some time. >> >> However, in subsequent iterations (if a transactions had committed >> while there was a record), >> the modified pages will be much fewer, and the sync will be quick. >> >> Thus (and it was seen in tests) even when a substantial amount of the >> transactions, >> usually only two iterations of the cycle, >> without locking and flow of changes are not suspended. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
