[WIP PATCH 2/4] lib: Add per-message last modification tracking
From: Austin Clements This adds a new document value that stores the revision of the last modification to message metadata, where the revision number increases monotonically with each database commit. An alternative would be to store the wall-clock time of the last modification of each message. In principle this is simpler and has the advantage that any process can determine the current timestamp without support from libnotmuch. However, even assuming a computer's clock never goes backward and ignoring clock skew in networked environments, this has a fatal flaw. Xapian uses (optimistic) snapshot isolation, which means reads can be concurrent with writes. Given this, consider the following time line with a write and two read transactions: write |-X-A--| read 1 |---B---| read 2 |---| The write transaction modifies message X and records the wall-clock time of the modification at A. The writer hangs around for a while and later commits its change. Read 1 is concurrent with the write, so it doesn't see the change to X. It does some query and records the wall-clock time of its results at B. Transaction read 2 later starts after the write commits and queries for changes since wall-clock time B (say the reads are performing an incremental backup). Even though read 1 could not see the change to X, read 2 is told (correctly) that X has not changed since B, the time of the last read. In fact, X changed before wall-clock time A, but the change was not visible until *after* wall-clock time B, so read 2 misses the change to X. This is tricky to solve in full-blown snapshot isolation, but because Xapian serializes writes, we can use a simple, monotonically increasing database revision number. Furthermore, maintaining this revision number requires no more IO than a wall-clock time solution because Xapian already maintains statistics on the upper (and lower) bound of each value stream. --- lib/database-private.h | 15 ++- lib/database.cc| 49 +++-- lib/message.cc | 22 ++ lib/notmuch-private.h | 10 +- 4 files changed, 92 insertions(+), 4 deletions(-) diff --git a/lib/database-private.h b/lib/database-private.h index 15e03cc..465065d 100644 --- a/lib/database-private.h +++ b/lib/database-private.h @@ -92,6 +92,12 @@ enum _notmuch_features { * * Introduced: version 3. */ NOTMUCH_FEATURE_GHOSTS = 1 << 4, + +/* If set, messages store the revision number of the last + * modification in NOTMUCH_VALUE_LAST_MOD. + * + * Introduced: version 3. */ +NOTMUCH_FEATURE_LAST_MOD = 1 << 5, }; /* In C++, a named enum is its own type, so define bitwise operators @@ -137,6 +143,8 @@ struct _notmuch_database { notmuch_database_mode_t mode; int atomic_nesting; +/* TRUE if changes have been made in this atomic section */ +notmuch_bool_t atomic_dirty; Xapian::Database *xapian_db; /* Bit mask of features used by this database. This is a @@ -145,6 +153,10 @@ struct _notmuch_database { unsigned int last_doc_id; uint64_t last_thread_id; +/* Highest committed revision number. Modifications are recorded + * under a higher revision number, which can be generated with + * notmuch_database_new_revision. */ +unsigned long revision; Xapian::QueryParser *query_parser; Xapian::TermGenerator *term_gen; @@ -166,7 +178,8 @@ struct _notmuch_database { * databases will have it). */ #define NOTMUCH_FEATURES_CURRENT \ (NOTMUCH_FEATURE_FILE_TERMS | NOTMUCH_FEATURE_DIRECTORY_DOCS | \ - NOTMUCH_FEATURE_BOOL_FOLDER | NOTMUCH_FEATURE_GHOSTS) + NOTMUCH_FEATURE_BOOL_FOLDER | NOTMUCH_FEATURE_GHOSTS | \ + NOTMUCH_FEATURE_LAST_MOD) /* Return the list of terms from the given iterator matching a prefix. * The prefix will be stripped from the strings in the returned list. diff --git a/lib/database.cc b/lib/database.cc index 6e51a72..45d32ab 100644 --- a/lib/database.cc +++ b/lib/database.cc @@ -101,6 +101,9 @@ typedef struct { * * SUBJECT:The value of the "Subject" header * + * LAST_MOD: The revision number as of the last tag or + * filename change. + * * In addition, terms from the content of the message are added with * "from", "to", "attachment", and "subject" prefixes for use by the * user in searching. Similarly, terms from the path of the mail @@ -304,6 +307,8 @@ static const struct { "exact folder:/path: search", "rw" }, { NOTMUCH_FEATURE_GHOSTS, "mail documents for missing messages", "w"}, +{ NOTMUCH_FEATURE_LAST_MOD, + "modification tracking", "w"}, }; const char * @@ -678,6 +683,23 @@ _notmuch_database_ensure_writable (notmuch_database_t *notmuch) return NOTMUCH_STATUS_SUCCESS; } +/* Allocate a revision number for the next change. */ +unsigned long +_notmuch_database_new_revision
[WIP PATCH 2/4] lib: Add per-message last modification tracking
From: Austin Clements This adds a new document value that stores the revision of the last modification to message metadata, where the revision number increases monotonically with each database commit. An alternative would be to store the wall-clock time of the last modification of each message. In principle this is simpler and has the advantage that any process can determine the current timestamp without support from libnotmuch. However, even assuming a computer's clock never goes backward and ignoring clock skew in networked environments, this has a fatal flaw. Xapian uses (optimistic) snapshot isolation, which means reads can be concurrent with writes. Given this, consider the following time line with a write and two read transactions: write |-X-A--| read 1 |---B---| read 2 |---| The write transaction modifies message X and records the wall-clock time of the modification at A. The writer hangs around for a while and later commits its change. Read 1 is concurrent with the write, so it doesn't see the change to X. It does some query and records the wall-clock time of its results at B. Transaction read 2 later starts after the write commits and queries for changes since wall-clock time B (say the reads are performing an incremental backup). Even though read 1 could not see the change to X, read 2 is told (correctly) that X has not changed since B, the time of the last read. In fact, X changed before wall-clock time A, but the change was not visible until *after* wall-clock time B, so read 2 misses the change to X. This is tricky to solve in full-blown snapshot isolation, but because Xapian serializes writes, we can use a simple, monotonically increasing database revision number. Furthermore, maintaining this revision number requires no more IO than a wall-clock time solution because Xapian already maintains statistics on the upper (and lower) bound of each value stream. --- lib/database-private.h | 15 ++- lib/database.cc| 49 +++-- lib/message.cc | 22 ++ lib/notmuch-private.h | 10 +- 4 files changed, 92 insertions(+), 4 deletions(-) diff --git a/lib/database-private.h b/lib/database-private.h index 15e03cc..465065d 100644 --- a/lib/database-private.h +++ b/lib/database-private.h @@ -92,6 +92,12 @@ enum _notmuch_features { * * Introduced: version 3. */ NOTMUCH_FEATURE_GHOSTS = 1 << 4, + +/* If set, messages store the revision number of the last + * modification in NOTMUCH_VALUE_LAST_MOD. + * + * Introduced: version 3. */ +NOTMUCH_FEATURE_LAST_MOD = 1 << 5, }; /* In C++, a named enum is its own type, so define bitwise operators @@ -137,6 +143,8 @@ struct _notmuch_database { notmuch_database_mode_t mode; int atomic_nesting; +/* TRUE if changes have been made in this atomic section */ +notmuch_bool_t atomic_dirty; Xapian::Database *xapian_db; /* Bit mask of features used by this database. This is a @@ -145,6 +153,10 @@ struct _notmuch_database { unsigned int last_doc_id; uint64_t last_thread_id; +/* Highest committed revision number. Modifications are recorded + * under a higher revision number, which can be generated with + * notmuch_database_new_revision. */ +unsigned long revision; Xapian::QueryParser *query_parser; Xapian::TermGenerator *term_gen; @@ -166,7 +178,8 @@ struct _notmuch_database { * databases will have it). */ #define NOTMUCH_FEATURES_CURRENT \ (NOTMUCH_FEATURE_FILE_TERMS | NOTMUCH_FEATURE_DIRECTORY_DOCS | \ - NOTMUCH_FEATURE_BOOL_FOLDER | NOTMUCH_FEATURE_GHOSTS) + NOTMUCH_FEATURE_BOOL_FOLDER | NOTMUCH_FEATURE_GHOSTS | \ + NOTMUCH_FEATURE_LAST_MOD) /* Return the list of terms from the given iterator matching a prefix. * The prefix will be stripped from the strings in the returned list. diff --git a/lib/database.cc b/lib/database.cc index 6e51a72..45d32ab 100644 --- a/lib/database.cc +++ b/lib/database.cc @@ -101,6 +101,9 @@ typedef struct { * * SUBJECT:The value of the "Subject" header * + * LAST_MOD: The revision number as of the last tag or + * filename change. + * * In addition, terms from the content of the message are added with * "from", "to", "attachment", and "subject" prefixes for use by the * user in searching. Similarly, terms from the path of the mail @@ -304,6 +307,8 @@ static const struct { "exact folder:/path: search", "rw" }, { NOTMUCH_FEATURE_GHOSTS, "mail documents for missing messages", "w"}, +{ NOTMUCH_FEATURE_LAST_MOD, + "modification tracking", "w"}, }; const char * @@ -678,6 +683,23 @@ _notmuch_database_ensure_writable (notmuch_database_t *notmuch) return NOTMUCH_STATUS_SUCCESS; } +/* Allocate a revision number for the next change. */ +unsigned long +_notmuch_database_new_r