[rfc patch v3 6/6] lib: index message files with duplicate message-ids
The corresponding xapian document just gets more terms added to it, but this doesn't seem to break anything. --- lib/database.cc| 3 +++ test/T670-duplicate-mid.sh | 22 +++--- 2 files changed, 22 insertions(+), 3 deletions(-) diff --git a/lib/database.cc b/lib/database.cc index 5bc131a3..3b9f7828 100644 --- a/lib/database.cc +++ b/lib/database.cc @@ -2582,6 +2582,9 @@ notmuch_database_add_message (notmuch_database_t *notmuch, if (ret) goto DONE; } else { + ret = _notmuch_message_index_file (message, message_file); + if (ret) + goto DONE; ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID; } diff --git a/test/T670-duplicate-mid.sh b/test/T670-duplicate-mid.sh index 88bd12cb..2c77e11e 100755 --- a/test/T670-duplicate-mid.sh +++ b/test/T670-duplicate-mid.sh @@ -2,11 +2,10 @@ test_description="duplicate message ids" . ./test-lib.sh || exit 1 -add_message '[id]="id:duplicate"' '[subject]="message 1"' -add_message '[id]="id:duplicate"' '[subject]="message 2"' +add_message '[id]="duplicate"' '[subject]="message 1"' +add_message '[id]="duplicate"' '[subject]="message 2"' test_begin_subtest 'Search for second subject' -test_subtest_known_broken catOUTPUT test_expect_equal_file EXPECTED OUTPUT +add_message '[id]="duplicate"' '[body]="sekrit"' +test_begin_subtest 'search for body in duplicate file' +cat OUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest 'reindex removes terms from duplicate file' +rm $MAIL_DIR/msg-003 +notmuch reindex id:duplicate +cp /dev/null EXPECTED +notmuch search --output=files "sekrit" | notmuch_dir_sanitize > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + test_done -- 2.11.0 ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
[rfc patch v3 2/6] lib: add _notmuch_message_remove_indexed_terms
Testing will be provided via use in notmuch_message_reindex --- lib/message.cc| 44 lib/notmuch-private.h | 2 ++ lib/notmuch.h | 4 3 files changed, 50 insertions(+) diff --git a/lib/message.cc b/lib/message.cc index f8215a49..a7bd38ac 100644 --- a/lib/message.cc +++ b/lib/message.cc @@ -599,6 +599,50 @@ _notmuch_message_remove_terms (notmuch_message_t *message, const char *prefix) } } + +/* Remove all terms generated by indexing, i.e. not tags or + * properties, along with any automatic tags*/ +notmuch_private_status_t +_notmuch_message_remove_indexed_terms (notmuch_message_t *message) +{ +Xapian::TermIterator i; + +const std::string tag_prefix = _find_prefix ("tag"); +const std::string property_prefix = _find_prefix ("property"); + +for (i = message->doc.termlist_begin (); +i != message->doc.termlist_end (); i++) { + + const std::string term = *i; + + if (term.compare (0, property_prefix.size (), property_prefix) == 0) + continue; + + if (term.compare (0, tag_prefix.size (), tag_prefix) == 0 && + term.compare (1, strlen("encrypted"), "encrypted") != 0 && + term.compare (1, strlen("signed"), "signed") != 0 && + term.compare (1, strlen("attachment"), "attachment") != 0) + continue; + + try { + message->doc.remove_term ((*i)); + message->modified = TRUE; + } catch (const Xapian::InvalidArgumentError) { + /* Ignore failure to remove non-existent term. */ + } catch (const Xapian::Error ) { + notmuch_database_t *notmuch = message->notmuch; + + if (!notmuch->exception_reported) { + _notmuch_database_log(_notmuch_message_database (message), "A Xapian exception occurred creating message: %s\n", + error.get_msg().c_str()); + notmuch->exception_reported = TRUE; + } + return NOTMUCH_PRIVATE_STATUS_XAPIAN_EXCEPTION; + } +} +return NOTMUCH_PRIVATE_STATUS_SUCCESS; +} + /* Return true if p points at "new" or "cur". */ static bool is_maildir (const char *p) { diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h index 8587e86c..1198d932 100644 --- a/lib/notmuch-private.h +++ b/lib/notmuch-private.h @@ -509,6 +509,8 @@ _notmuch_message_add_reply (notmuch_message_t *message, notmuch_database_t * _notmuch_message_database (notmuch_message_t *message); +void +_notmuch_message_remove_unprefixed_terms (notmuch_message_t *message); /* sha1.c */ char * diff --git a/lib/notmuch.h b/lib/notmuch.h index fc00f96d..33e9fd24 100644 --- a/lib/notmuch.h +++ b/lib/notmuch.h @@ -1685,6 +1685,10 @@ notmuch_message_thaw (notmuch_message_t *message); void notmuch_message_destroy (notmuch_message_t *message); +/* for testing */ + +void +notmuch_test_clear_terms(notmuch_message_t *message); /** * @name Message Properties * -- 2.11.0 ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
[rfc patch v3 4/6] add "notmuch reindex" subcommand
From: Daniel Kahn GillmorThis new subcommand takes a set of search terms, and re-indexes the list of matching messages. --- Makefile.local| 1 + doc/conf.py | 4 ++ doc/index.rst | 1 + doc/man1/notmuch-reindex.rst | 29 + doc/man1/notmuch.rst | 4 +- doc/man7/notmuch-search-terms.rst | 7 +- notmuch-client.h | 3 + notmuch-reindex.c | 131 ++ notmuch.c | 2 + performance-test/M04-reindex.sh | 11 performance-test/T03-reindex.sh | 13 test/T700-reindex.sh | 21 ++ 12 files changed, 223 insertions(+), 4 deletions(-) create mode 100644 doc/man1/notmuch-reindex.rst create mode 100644 notmuch-reindex.c create mode 100755 performance-test/M04-reindex.sh create mode 100755 performance-test/T03-reindex.sh create mode 100755 test/T700-reindex.sh diff --git a/Makefile.local b/Makefile.local index 03eafaaa..c6e272bc 100644 --- a/Makefile.local +++ b/Makefile.local @@ -222,6 +222,7 @@ notmuch_client_srcs = \ notmuch-dump.c \ notmuch-insert.c\ notmuch-new.c \ + notmuch-reindex.c \ notmuch-reply.c \ notmuch-restore.c \ notmuch-search.c\ diff --git a/doc/conf.py b/doc/conf.py index a3d82696..aa864b3c 100644 --- a/doc/conf.py +++ b/doc/conf.py @@ -95,6 +95,10 @@ man_pages = [ u'incorporate new mail into the notmuch database', [notmuch_authors], 1), +('man1/notmuch-reindex', 'notmuch-reindex', + u're-index matching messages', + [notmuch_authors], 1), + ('man1/notmuch-reply', 'notmuch-reply', u'constructs a reply template for a set of messages', [notmuch_authors], 1), diff --git a/doc/index.rst b/doc/index.rst index 344606d9..aa6c9f40 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -18,6 +18,7 @@ Contents: man5/notmuch-hooks man1/notmuch-insert man1/notmuch-new + man1/notmuch-reindex man1/notmuch-reply man1/notmuch-restore man1/notmuch-search diff --git a/doc/man1/notmuch-reindex.rst b/doc/man1/notmuch-reindex.rst new file mode 100644 index ..6c786b85 --- /dev/null +++ b/doc/man1/notmuch-reindex.rst @@ -0,0 +1,29 @@ +=== +notmuch-reindex +=== + +SYNOPSIS + + +**notmuch** **reindex** [*option* ...] <*search-term*> ... + +DESCRIPTION +=== + +Re-index all messages matching the search terms. + +See **notmuch-search-terms(7)** for details of the supported syntax for +<*search-term*\ >. + +The **reindex** command searches for all messages matching the +supplied search terms, and re-creates the full-text index on these +messages using the supplied options. + +SEE ALSO + + +**notmuch(1)**, **notmuch-config(1)**, **notmuch-count(1)**, +**notmuch-dump(1)**, **notmuch-hooks(5)**, **notmuch-insert(1)**, +**notmuch-new(1)**, +**notmuch-reply(1)**, **notmuch-restore(1)**, **notmuch-search(1)**, +**notmuch-search-terms(7)**, **notmuch-show(1)**, **notmuch-tag(1)** diff --git a/doc/man1/notmuch.rst b/doc/man1/notmuch.rst index fbd7f381..b2a8376e 100644 --- a/doc/man1/notmuch.rst +++ b/doc/man1/notmuch.rst @@ -149,8 +149,8 @@ SEE ALSO **notmuch-address(1)**, **notmuch-compact(1)**, **notmuch-config(1)**, **notmuch-count(1)**, **notmuch-dump(1)**, **notmuch-hooks(5)**, -**notmuch-insert(1)**, **notmuch-new(1)**, **notmuch-reply(1)**, -**notmuch-restore(1)**, **notmuch-search(1)**, +**notmuch-insert(1)**, **notmuch-new(1)**, **notmuch-reindex(1)**, +**notmuch-reply(1)**, **notmuch-restore(1)**, **notmuch-search(1)**, **notmuch-search-terms(7)**, **notmuch-show(1)**, **notmuch-tag(1)** The notmuch website: **https://notmuchmail.org** diff --git a/doc/man7/notmuch-search-terms.rst b/doc/man7/notmuch-search-terms.rst index 47cab48d..dd76972e 100644 --- a/doc/man7/notmuch-search-terms.rst +++ b/doc/man7/notmuch-search-terms.rst @@ -9,6 +9,8 @@ SYNOPSIS **notmuch** **dump** [--format=(batch-tag|sup)] [--] [--output=<*file*>] [--] [<*search-term*> ...] +**notmuch** **reindex** [option ...] <*search-term*> ... + **notmuch** **search** [option ...] <*search-term*> ... **notmuch** **show** [option ...] <*search-term*> ... @@ -421,5 +423,6 @@ SEE ALSO **notmuch(1)**, **notmuch-config(1)**, **notmuch-count(1)**, **notmuch-dump(1)**, **notmuch-hooks(5)**, **notmuch-insert(1)**, -**notmuch-new(1)**, **notmuch-reply(1)**, **notmuch-restore(1)**, -**notmuch-search(1)**, **notmuch-show(1)**, **notmuch-tag(1)** +**notmuch-new(1)**, **notmuch-reindex(1)**, **notmuch-reply(1)**, +**notmuch-restore(1)**, **notmuch-search(1)**, **notmuch-show(1)**, +**notmuch-tag(1)** diff --git a/notmuch-client.h b/notmuch-client.h index a6f70eae..ab7138c6 100644 --- a/notmuch-client.h +++ b/notmuch-client.h @@ -196,6 +196,9 @@ int notmuch_insert_command
[rfc patch v3 1/6] lib: add definitions for notmuch_param_t
This is not an opaque struct because we envision using static initialization much like the command-line-options.h structures. --- lib/notmuch.h | 17 + 1 file changed, 17 insertions(+) diff --git a/lib/notmuch.h b/lib/notmuch.h index d374dc96..fc00f96d 100644 --- a/lib/notmuch.h +++ b/lib/notmuch.h @@ -219,6 +219,23 @@ typedef struct _notmuch_filenames notmuch_filenames_t; typedef struct _notmuch_config_list notmuch_config_list_t; #endif /* __DOXYGEN__ */ +enum notmuch_param_type { +NOTMUCH_PARAM_END = 0, +NOTMUCH_PARAM_BOOLEAN, +NOTMUCH_PARAM_INT, +NOTMUCH_PARAM_STRING +}; + +typedef struct notmuch_param_desc { +enum notmuch_param_type param_type; +int key; +union { + notmuch_bool_t bool_val; + int int_val; + const char *string_val; +}; +} notmuch_param_t; + /** * Create a new, empty notmuch database located at 'path'. * -- 2.11.0 ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
[rfc patch v3 3/6] added notmuch_message_reindex
From: Daniel Kahn GillmorThis new function asks the database to reindex a given message. The parameter `indexopts` is currently ignored, but is intended to provide an extensible API to support e.g. changing the encryption or filtering status (e.g. whether and how certain non-plaintext parts are indexed). --- lib/message.cc | 46 +- lib/notmuch.h | 14 ++ 2 files changed, 59 insertions(+), 1 deletion(-) diff --git a/lib/message.cc b/lib/message.cc index a7bd38ac..193eedb2 100644 --- a/lib/message.cc +++ b/lib/message.cc @@ -579,7 +579,9 @@ void _notmuch_message_remove_terms (notmuch_message_t *message, const char *prefix) { Xapian::TermIterator i; -size_t prefix_len = strlen (prefix); +size_t prefix_len = 0; + +prefix_len = strlen (prefix); while (1) { i = message->doc.termlist_begin (); @@ -1916,3 +1918,45 @@ _notmuch_message_frozen (notmuch_message_t *message) { return message->frozen; } + +notmuch_status_t +notmuch_message_reindex (notmuch_message_t *message, +notmuch_param_t unused (*indexopts)) +{ +notmuch_database_t *notmuch = NULL; +notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS, status; +notmuch_private_status_t private_status; +notmuch_filenames_t *orig_filenames = NULL; +const char *filename = NULL; + +if (message == NULL) + return NOTMUCH_STATUS_NULL_POINTER; + +notmuch = _notmuch_message_database (message); + +orig_filenames = notmuch_message_get_filenames (message); + +private_status = _notmuch_message_remove_indexed_terms (message); +if (private_status) + return COERCE_STATUS(private_status, "error removing terms"); + +/* re-add the filenames with the associated indexopts */ +for (; notmuch_filenames_valid (orig_filenames); +notmuch_filenames_move_to_next (orig_filenames)) { + filename = notmuch_filenames_get (orig_filenames); + + status = notmuch_database_add_message(notmuch, + filename, + ); + if (status != NOTMUCH_STATUS_SUCCESS && + status != NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID) { + /* if we failed to add this filename, go ahead and try the +* next one as though it were first, but report the +* error... */ + ret = status; + } +} + +/* XXX TODO destroy orig_filenames? */ +return ret; +} diff --git a/lib/notmuch.h b/lib/notmuch.h index 33e9fd24..11818018 100644 --- a/lib/notmuch.h +++ b/lib/notmuch.h @@ -1389,6 +1389,20 @@ notmuch_filenames_t * notmuch_message_get_filenames (notmuch_message_t *message); /** + * Re-index the e-mail corresponding to 'message' using the supplied index options + * + * Returns the status of the re-index operation. (see the return + * codes documented in notmuch_database_add_message) + * + * After reindexing, the user should discard the message object passed + * in here by calling notmuch_message_destroy, since it refers to the + * original message, not to the reindexed message. + */ +notmuch_status_t +notmuch_message_reindex (notmuch_message_t *message, +notmuch_param_t *indexopts); + +/** * Message flags. */ typedef enum _notmuch_message_flag { -- 2.11.0 ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
[rfc patch v3 5/6] test: add known broken test for duplicate message id
There are many other problems that could be tested, but this one we have some hope of fixing because it doesn't require UI changes, just indexing changes. --- test/T670-duplicate-mid.sh | 17 + 1 file changed, 17 insertions(+) create mode 100755 test/T670-duplicate-mid.sh diff --git a/test/T670-duplicate-mid.sh b/test/T670-duplicate-mid.sh new file mode 100755 index ..88bd12cb --- /dev/null +++ b/test/T670-duplicate-mid.sh @@ -0,0 +1,17 @@ +#!/usr/bin/env bash +test_description="duplicate message ids" +. ./test-lib.sh || exit 1 + +add_message '[id]="id:duplicate"' '[subject]="message 1"' +add_message '[id]="id:duplicate"' '[subject]="message 2"' + +test_begin_subtest 'Search for second subject' +test_subtest_known_broken +catOUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_done -- 2.11.0 ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
third round of indexing all files
It seems noticeably faster (on the order of 30-50% faster) and the code is quite a bit simpler to adapt the approach in [1] to only delete the terms we are going to re-add via indexing. This obsoletes the previous series at [2]. It still has all of the issues mentioned there UI-wise, and the question of the index options design probably needs more thought. This is new in this round [rfc patch v3 2/6] lib: add _notmuch_message_remove_indexed_terms This is has been pretty drastically rewritten compared to daniel's version [3] [rfc patch v3 3/6] added notmuch_message_reindex This is the same, except I added simple performance tests [rfc patch v3 4/6] add "notmuch reindex" subcommand [1]: id:1471178598-9639-1-git-send-email-da...@tethera.net [2]: id:20170402131646.29884-1-da...@tethera.net [3]: id:20170402131646.29884-3-da...@tethera.net ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: [gmailieer] fast fetch and two-way tag synchronization between notmuch and GMail
Rafael Avila de Espindola writes on april 3, 2017 16:35: After a few issues with the initial sync this is working perfectly. Thanks a lot, it is a big improvement over mbsync. Great, thanks for the patches! This made inital sync much more robust. I have a non-user-api-key version working [0], but I need to figure out if it is safe first [1]. - Gaute [0] https://github.com/gauteh/gmailieer/pull/9 [1] http://stackoverflow.com/questions/43173367/is-it-safe-to-distribute-client-id-and-client-secret-for-google-api-for-an-insta pgpza2ufJQ1UL.pgp Description: PGP signature ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: [gmailieer] fast fetch and two-way tag synchronization between notmuch and GMail
After a few issues with the initial sync this is working perfectly. Thanks a lot, it is a big improvement over mbsync. Cheers, Rafael Gaute Hopewrites: > Hi, > > 'gmailieer' (or 'gmi') is a small program that can pull email and labels > (and changes to labels) from your GMail account and store them locally in a > maildir with the labels synchronized with a notmuch database. The > changes to tags in the notmuch database may be pushed back remotely to > your GMail account. > > The initial fetch of all emails takes some time, but synchronizing > labels and tags, and checking for new messages, is usually done in 1-2 > seconds. > > It requires the most recent notmuch, the python googleapi bindings and > tqdm. > > Disclaimer: > > This is still experimental, but it does not have access to delete > e-mail on your account - only fetch and change labels, so damage > should be limited. > > > > Instructions and source code can be found here: > > https://github.com/gauteh/gmailieer > > > > > Regards, Gaute > ___ > notmuch mailing list > notmuch@notmuchmail.org > https://notmuchmail.org/mailman/listinfo/notmuch ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch