Re: [PATCH] WIP: remove all non-prefixed-terms (and stemmed versions)
David Bremnerwrites: > The testing here is not really suitable for production, since we export > a function just for testing. It would be possible to modify the test > framework to test functions in notmuch-private.h, but this was the quick > and dirty solution. On looking at the problem a second time I think this should really drop all of the non-(tag|property) terms, so including some prefixed terms as well. I think that would be doable; it's probably worth having some performance benchmark before introducing the extra complication versus dkg's approach. d ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
[rfc patch v2 3/5] add "notmuch reindex" subcommand
From: Daniel Kahn GillmorThis new subcommand takes a set of search terms, and re-indexes the list of matching messages. --- Makefile.local| 1 + doc/conf.py | 4 ++ doc/index.rst | 1 + doc/man1/notmuch-reindex.rst | 29 + doc/man1/notmuch.rst | 4 +- doc/man7/notmuch-search-terms.rst | 7 +- notmuch-client.h | 3 + notmuch-reindex.c | 132 ++ notmuch.c | 2 + test/T700-reindex.sh | 21 ++ 10 files changed, 200 insertions(+), 4 deletions(-) create mode 100644 doc/man1/notmuch-reindex.rst create mode 100644 notmuch-reindex.c create mode 100755 test/T700-reindex.sh diff --git a/Makefile.local b/Makefile.local index 03eafaaa..c6e272bc 100644 --- a/Makefile.local +++ b/Makefile.local @@ -222,6 +222,7 @@ notmuch_client_srcs = \ notmuch-dump.c \ notmuch-insert.c\ notmuch-new.c \ + notmuch-reindex.c \ notmuch-reply.c \ notmuch-restore.c \ notmuch-search.c\ diff --git a/doc/conf.py b/doc/conf.py index a3d82696..aa864b3c 100644 --- a/doc/conf.py +++ b/doc/conf.py @@ -95,6 +95,10 @@ man_pages = [ u'incorporate new mail into the notmuch database', [notmuch_authors], 1), +('man1/notmuch-reindex', 'notmuch-reindex', + u're-index matching messages', + [notmuch_authors], 1), + ('man1/notmuch-reply', 'notmuch-reply', u'constructs a reply template for a set of messages', [notmuch_authors], 1), diff --git a/doc/index.rst b/doc/index.rst index 344606d9..aa6c9f40 100644 --- a/doc/index.rst +++ b/doc/index.rst @@ -18,6 +18,7 @@ Contents: man5/notmuch-hooks man1/notmuch-insert man1/notmuch-new + man1/notmuch-reindex man1/notmuch-reply man1/notmuch-restore man1/notmuch-search diff --git a/doc/man1/notmuch-reindex.rst b/doc/man1/notmuch-reindex.rst new file mode 100644 index ..6c786b85 --- /dev/null +++ b/doc/man1/notmuch-reindex.rst @@ -0,0 +1,29 @@ +=== +notmuch-reindex +=== + +SYNOPSIS + + +**notmuch** **reindex** [*option* ...] <*search-term*> ... + +DESCRIPTION +=== + +Re-index all messages matching the search terms. + +See **notmuch-search-terms(7)** for details of the supported syntax for +<*search-term*\ >. + +The **reindex** command searches for all messages matching the +supplied search terms, and re-creates the full-text index on these +messages using the supplied options. + +SEE ALSO + + +**notmuch(1)**, **notmuch-config(1)**, **notmuch-count(1)**, +**notmuch-dump(1)**, **notmuch-hooks(5)**, **notmuch-insert(1)**, +**notmuch-new(1)**, +**notmuch-reply(1)**, **notmuch-restore(1)**, **notmuch-search(1)**, +**notmuch-search-terms(7)**, **notmuch-show(1)**, **notmuch-tag(1)** diff --git a/doc/man1/notmuch.rst b/doc/man1/notmuch.rst index fbd7f381..b2a8376e 100644 --- a/doc/man1/notmuch.rst +++ b/doc/man1/notmuch.rst @@ -149,8 +149,8 @@ SEE ALSO **notmuch-address(1)**, **notmuch-compact(1)**, **notmuch-config(1)**, **notmuch-count(1)**, **notmuch-dump(1)**, **notmuch-hooks(5)**, -**notmuch-insert(1)**, **notmuch-new(1)**, **notmuch-reply(1)**, -**notmuch-restore(1)**, **notmuch-search(1)**, +**notmuch-insert(1)**, **notmuch-new(1)**, **notmuch-reindex(1)**, +**notmuch-reply(1)**, **notmuch-restore(1)**, **notmuch-search(1)**, **notmuch-search-terms(7)**, **notmuch-show(1)**, **notmuch-tag(1)** The notmuch website: **https://notmuchmail.org** diff --git a/doc/man7/notmuch-search-terms.rst b/doc/man7/notmuch-search-terms.rst index 47cab48d..dd76972e 100644 --- a/doc/man7/notmuch-search-terms.rst +++ b/doc/man7/notmuch-search-terms.rst @@ -9,6 +9,8 @@ SYNOPSIS **notmuch** **dump** [--format=(batch-tag|sup)] [--] [--output=<*file*>] [--] [<*search-term*> ...] +**notmuch** **reindex** [option ...] <*search-term*> ... + **notmuch** **search** [option ...] <*search-term*> ... **notmuch** **show** [option ...] <*search-term*> ... @@ -421,5 +423,6 @@ SEE ALSO **notmuch(1)**, **notmuch-config(1)**, **notmuch-count(1)**, **notmuch-dump(1)**, **notmuch-hooks(5)**, **notmuch-insert(1)**, -**notmuch-new(1)**, **notmuch-reply(1)**, **notmuch-restore(1)**, -**notmuch-search(1)**, **notmuch-show(1)**, **notmuch-tag(1)** +**notmuch-new(1)**, **notmuch-reindex(1)**, **notmuch-reply(1)**, +**notmuch-restore(1)**, **notmuch-search(1)**, **notmuch-show(1)**, +**notmuch-tag(1)** diff --git a/notmuch-client.h b/notmuch-client.h index a6f70eae..ab7138c6 100644 --- a/notmuch-client.h +++ b/notmuch-client.h @@ -196,6 +196,9 @@ int notmuch_insert_command (notmuch_config_t *config, int argc, char *argv[]); int +notmuch_reindex_command (notmuch_config_t *config, int argc, char *argv[]); + +int notmuch_reply_command (notmuch_config_t *config, int
[rfc patch v2 2/5] added notmuch_message_reindex
From: Daniel Kahn GillmorThis new function asks the database to reindex a given message. The parameter `indexopts` is currently ignored, but is intended to provide an extensible API to support e.g. changing the encryption or filtering status (e.g. whether and how certain non-plaintext parts are indexed). Since we have no way of distinguising terms added (without prefix) from the headers and terms added from the body, we just save the tags and properties, remove the message from the database entirely, and add it back into the database in full, re-adding tags and properties as needed. --- lib/message.cc | 102 - lib/notmuch.h | 14 2 files changed, 115 insertions(+), 1 deletion(-) diff --git a/lib/message.cc b/lib/message.cc index f8215a49..d68e4c66 100644 --- a/lib/message.cc +++ b/lib/message.cc @@ -579,7 +579,9 @@ void _notmuch_message_remove_terms (notmuch_message_t *message, const char *prefix) { Xapian::TermIterator i; -size_t prefix_len = strlen (prefix); +size_t prefix_len = 0; + +prefix_len = strlen (prefix); while (1) { i = message->doc.termlist_begin (); @@ -1872,3 +1874,101 @@ _notmuch_message_frozen (notmuch_message_t *message) { return message->frozen; } + +notmuch_status_t +notmuch_message_reindex (notmuch_message_t *message, +notmuch_param_t unused (*indexopts)) +{ +notmuch_database_t *notmuch = NULL; +notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS, status; +notmuch_tags_t *tags = NULL; +notmuch_message_properties_t *properties = NULL; +notmuch_filenames_t *filenames, *orig_filenames = NULL; +const char *filename = NULL, *tag = NULL, *propkey = NULL; +notmuch_message_t *newmsg = NULL; +notmuch_bool_t readded = FALSE, skip; +const char *autotags[] = { + "attachment", + "encrypted", + "signed" }; + +if (message == NULL) + return NOTMUCH_STATUS_NULL_POINTER; + +notmuch = _notmuch_message_database (message); + +/* cache tags, properties, and filenames */ +tags = notmuch_message_get_tags (message); +properties = notmuch_message_get_properties (message, "", FALSE); +filenames = notmuch_message_get_filenames (message); +orig_filenames = notmuch_message_get_filenames (message); + +/* walk through filenames, removing them until the message is gone */ +for ( ; notmuch_filenames_valid (filenames); + notmuch_filenames_move_to_next (filenames)) { + filename = notmuch_filenames_get (filenames); + + ret = notmuch_database_remove_message (notmuch, filename); + if (ret != NOTMUCH_STATUS_SUCCESS && + ret != NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID) + return ret; +} +if (ret != NOTMUCH_STATUS_SUCCESS) + return ret; + +/* re-add the filenames with the associated indexopts */ +for (; notmuch_filenames_valid (orig_filenames); +notmuch_filenames_move_to_next (orig_filenames)) { + filename = notmuch_filenames_get (orig_filenames); + + status = notmuch_database_add_message(notmuch, + filename, + readded ? NULL : ); + if (status == NOTMUCH_STATUS_SUCCESS || + status == NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID) { + if (!readded) { + /* re-add tags */ + for (; notmuch_tags_valid (tags); +notmuch_tags_move_to_next (tags)) { + tag = notmuch_tags_get (tags); + skip = FALSE; + + for (size_t i = 0; i < ARRAY_SIZE (autotags); i++) + if (strcmp (tag, autotags[i]) == 0) + skip = TRUE; + + if (!skip) { + status = notmuch_message_add_tag (newmsg, tag); + if (status != NOTMUCH_STATUS_SUCCESS) + ret = status; + } + } + /* re-add properties */ + for (; notmuch_message_properties_valid (properties); +notmuch_message_properties_move_to_next (properties)) { + propkey = notmuch_message_properties_key (properties); + skip = FALSE; + + if (!skip) { + status = notmuch_message_add_property (newmsg, propkey, + notmuch_message_properties_value (properties)); + if (status != NOTMUCH_STATUS_SUCCESS) + ret = status; + } + } + readded = TRUE; + } + } else { + /* if we failed to add this filename, go ahead and try the +* next one as though it were first, but report the +* error...
[rfc patch v2 5/5] lib: index message files with duplicate message-ids
The corresponding xapian document just gets more terms added to it, but this doesn't seem to break anything. --- lib/database.cc| 3 +++ test/T670-duplicate-mid.sh | 22 +++--- 2 files changed, 22 insertions(+), 3 deletions(-) diff --git a/lib/database.cc b/lib/database.cc index 5bc131a3..3b9f7828 100644 --- a/lib/database.cc +++ b/lib/database.cc @@ -2582,6 +2582,9 @@ notmuch_database_add_message (notmuch_database_t *notmuch, if (ret) goto DONE; } else { + ret = _notmuch_message_index_file (message, message_file); + if (ret) + goto DONE; ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID; } diff --git a/test/T670-duplicate-mid.sh b/test/T670-duplicate-mid.sh index 88bd12cb..2c77e11e 100755 --- a/test/T670-duplicate-mid.sh +++ b/test/T670-duplicate-mid.sh @@ -2,11 +2,10 @@ test_description="duplicate message ids" . ./test-lib.sh || exit 1 -add_message '[id]="id:duplicate"' '[subject]="message 1"' -add_message '[id]="id:duplicate"' '[subject]="message 2"' +add_message '[id]="duplicate"' '[subject]="message 1"' +add_message '[id]="duplicate"' '[subject]="message 2"' test_begin_subtest 'Search for second subject' -test_subtest_known_broken catOUTPUT test_expect_equal_file EXPECTED OUTPUT +add_message '[id]="duplicate"' '[body]="sekrit"' +test_begin_subtest 'search for body in duplicate file' +cat OUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_begin_subtest 'reindex removes terms from duplicate file' +rm $MAIL_DIR/msg-003 +notmuch reindex id:duplicate +cp /dev/null EXPECTED +notmuch search --output=files "sekrit" | notmuch_dir_sanitize > OUTPUT +test_expect_equal_file EXPECTED OUTPUT + test_done -- 2.11.0 ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
[rfc patch v2 1/5] lib: add definitions for notmuch_param_t
This is not an opaque struct because we envision using static initialization much like the command-line-options.h structures. --- lib/notmuch.h | 17 + 1 file changed, 17 insertions(+) diff --git a/lib/notmuch.h b/lib/notmuch.h index d374dc96..fc00f96d 100644 --- a/lib/notmuch.h +++ b/lib/notmuch.h @@ -219,6 +219,23 @@ typedef struct _notmuch_filenames notmuch_filenames_t; typedef struct _notmuch_config_list notmuch_config_list_t; #endif /* __DOXYGEN__ */ +enum notmuch_param_type { +NOTMUCH_PARAM_END = 0, +NOTMUCH_PARAM_BOOLEAN, +NOTMUCH_PARAM_INT, +NOTMUCH_PARAM_STRING +}; + +typedef struct notmuch_param_desc { +enum notmuch_param_type param_type; +int key; +union { + notmuch_bool_t bool_val; + int int_val; + const char *string_val; +}; +} notmuch_param_t; + /** * Create a new, empty notmuch database located at 'path'. * -- 2.11.0 ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
[rfc patch v2 4/5] test: add known broken test for duplicate message id
There are many other problems that could be tested, but this one we have some hope of fixing because it doesn't require UI changes, just indexing changes. --- test/T670-duplicate-mid.sh | 17 + 1 file changed, 17 insertions(+) create mode 100755 test/T670-duplicate-mid.sh diff --git a/test/T670-duplicate-mid.sh b/test/T670-duplicate-mid.sh new file mode 100755 index ..88bd12cb --- /dev/null +++ b/test/T670-duplicate-mid.sh @@ -0,0 +1,17 @@ +#!/usr/bin/env bash +test_description="duplicate message ids" +. ./test-lib.sh || exit 1 + +add_message '[id]="id:duplicate"' '[subject]="message 1"' +add_message '[id]="id:duplicate"' '[subject]="message 2"' + +test_begin_subtest 'Search for second subject' +test_subtest_known_broken +catOUTPUT +test_expect_equal_file EXPECTED OUTPUT + +test_done -- 2.11.0 ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
second round of indexing all files
This adds in a "notmuch reindex" command so that deleting the terms from deleted files can be accomplished. There are still several UI issues to deal with (i.e. we return an arbitrary file, not necessarily the one matched). The reindex command is a simplified version of one the that dkg originally wrote for his series on indexing encrypted messages. I've ripped out all the encryption related stuff here. I've also postulated (but not yet written) a more generic way of handling index options, roughly modeled on our command-line-options code. I hope that this will allow fewer functions, and a more static API at the library level; at this point it's just a sketch of an idea. ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch