Re: [PATCH] WIP: remove all non-prefixed-terms (and stemmed versions)

2017-04-02 Thread David Bremner
David Bremner  writes:

> The testing here is not really suitable for production, since we export
> a function just for testing.  It would be possible to modify the test
> framework to test functions in notmuch-private.h, but this was the quick
> and dirty solution.

On looking at the problem a second time I think this should really drop
all of the non-(tag|property) terms, so including some prefixed terms as
well. I think that would be doable; it's probably worth having some
performance benchmark before introducing the extra complication versus
dkg's approach.

d
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


[rfc patch v2 3/5] add "notmuch reindex" subcommand

2017-04-02 Thread David Bremner
From: Daniel Kahn Gillmor 

This new subcommand takes a set of search terms, and re-indexes the
list of matching messages.
---
 Makefile.local|   1 +
 doc/conf.py   |   4 ++
 doc/index.rst |   1 +
 doc/man1/notmuch-reindex.rst  |  29 +
 doc/man1/notmuch.rst  |   4 +-
 doc/man7/notmuch-search-terms.rst |   7 +-
 notmuch-client.h  |   3 +
 notmuch-reindex.c | 132 ++
 notmuch.c |   2 +
 test/T700-reindex.sh  |  21 ++
 10 files changed, 200 insertions(+), 4 deletions(-)
 create mode 100644 doc/man1/notmuch-reindex.rst
 create mode 100644 notmuch-reindex.c
 create mode 100755 test/T700-reindex.sh

diff --git a/Makefile.local b/Makefile.local
index 03eafaaa..c6e272bc 100644
--- a/Makefile.local
+++ b/Makefile.local
@@ -222,6 +222,7 @@ notmuch_client_srcs =   \
notmuch-dump.c  \
notmuch-insert.c\
notmuch-new.c   \
+   notmuch-reindex.c   \
notmuch-reply.c \
notmuch-restore.c   \
notmuch-search.c\
diff --git a/doc/conf.py b/doc/conf.py
index a3d82696..aa864b3c 100644
--- a/doc/conf.py
+++ b/doc/conf.py
@@ -95,6 +95,10 @@ man_pages = [
  u'incorporate new mail into the notmuch database',
  [notmuch_authors], 1),
 
+('man1/notmuch-reindex', 'notmuch-reindex',
+ u're-index matching messages',
+ [notmuch_authors], 1),
+
 ('man1/notmuch-reply', 'notmuch-reply',
  u'constructs a reply template for a set of messages',
  [notmuch_authors], 1),
diff --git a/doc/index.rst b/doc/index.rst
index 344606d9..aa6c9f40 100644
--- a/doc/index.rst
+++ b/doc/index.rst
@@ -18,6 +18,7 @@ Contents:
man5/notmuch-hooks
man1/notmuch-insert
man1/notmuch-new
+   man1/notmuch-reindex
man1/notmuch-reply
man1/notmuch-restore
man1/notmuch-search
diff --git a/doc/man1/notmuch-reindex.rst b/doc/man1/notmuch-reindex.rst
new file mode 100644
index ..6c786b85
--- /dev/null
+++ b/doc/man1/notmuch-reindex.rst
@@ -0,0 +1,29 @@
+===
+notmuch-reindex
+===
+
+SYNOPSIS
+
+
+**notmuch** **reindex** [*option* ...] <*search-term*> ...
+
+DESCRIPTION
+===
+
+Re-index all messages matching the search terms.
+
+See **notmuch-search-terms(7)** for details of the supported syntax for
+<*search-term*\ >.
+
+The **reindex** command searches for all messages matching the
+supplied search terms, and re-creates the full-text index on these
+messages using the supplied options.
+
+SEE ALSO
+
+
+**notmuch(1)**, **notmuch-config(1)**, **notmuch-count(1)**,
+**notmuch-dump(1)**, **notmuch-hooks(5)**, **notmuch-insert(1)**,
+**notmuch-new(1)**,
+**notmuch-reply(1)**, **notmuch-restore(1)**, **notmuch-search(1)**,
+**notmuch-search-terms(7)**, **notmuch-show(1)**, **notmuch-tag(1)**
diff --git a/doc/man1/notmuch.rst b/doc/man1/notmuch.rst
index fbd7f381..b2a8376e 100644
--- a/doc/man1/notmuch.rst
+++ b/doc/man1/notmuch.rst
@@ -149,8 +149,8 @@ SEE ALSO
 
 **notmuch-address(1)**, **notmuch-compact(1)**, **notmuch-config(1)**,
 **notmuch-count(1)**, **notmuch-dump(1)**, **notmuch-hooks(5)**,
-**notmuch-insert(1)**, **notmuch-new(1)**, **notmuch-reply(1)**,
-**notmuch-restore(1)**, **notmuch-search(1)**,
+**notmuch-insert(1)**, **notmuch-new(1)**, **notmuch-reindex(1)**,
+**notmuch-reply(1)**, **notmuch-restore(1)**, **notmuch-search(1)**,
 **notmuch-search-terms(7)**, **notmuch-show(1)**, **notmuch-tag(1)**
 
 The notmuch website: **https://notmuchmail.org**
diff --git a/doc/man7/notmuch-search-terms.rst 
b/doc/man7/notmuch-search-terms.rst
index 47cab48d..dd76972e 100644
--- a/doc/man7/notmuch-search-terms.rst
+++ b/doc/man7/notmuch-search-terms.rst
@@ -9,6 +9,8 @@ SYNOPSIS
 
 **notmuch** **dump** [--format=(batch-tag|sup)] [--] [--output=<*file*>] [--] 
[<*search-term*> ...]
 
+**notmuch** **reindex** [option ...] <*search-term*> ...
+
 **notmuch** **search** [option ...] <*search-term*> ...
 
 **notmuch** **show** [option ...] <*search-term*> ...
@@ -421,5 +423,6 @@ SEE ALSO
 
 **notmuch(1)**, **notmuch-config(1)**, **notmuch-count(1)**,
 **notmuch-dump(1)**, **notmuch-hooks(5)**, **notmuch-insert(1)**,
-**notmuch-new(1)**, **notmuch-reply(1)**, **notmuch-restore(1)**,
-**notmuch-search(1)**, **notmuch-show(1)**, **notmuch-tag(1)**
+**notmuch-new(1)**, **notmuch-reindex(1)**, **notmuch-reply(1)**,
+**notmuch-restore(1)**, **notmuch-search(1)**, **notmuch-show(1)**,
+**notmuch-tag(1)**
diff --git a/notmuch-client.h b/notmuch-client.h
index a6f70eae..ab7138c6 100644
--- a/notmuch-client.h
+++ b/notmuch-client.h
@@ -196,6 +196,9 @@ int
 notmuch_insert_command (notmuch_config_t *config, int argc, char *argv[]);
 
 int
+notmuch_reindex_command (notmuch_config_t *config, int argc, char *argv[]);
+
+int
 notmuch_reply_command (notmuch_config_t *config, int 

[rfc patch v2 2/5] added notmuch_message_reindex

2017-04-02 Thread David Bremner
From: Daniel Kahn Gillmor 

This new function asks the database to reindex a given message.
The parameter `indexopts` is currently ignored, but is intended to
provide an extensible API to support e.g. changing the encryption or
filtering status (e.g. whether and how certain non-plaintext parts are
indexed).

Since we have no way of distinguising terms added (without prefix)
from the headers and terms added from the body, we just save the tags
and properties, remove the message from the database entirely, and add
it back into the database in full, re-adding tags and properties as
needed.
---
 lib/message.cc | 102 -
 lib/notmuch.h  |  14 
 2 files changed, 115 insertions(+), 1 deletion(-)

diff --git a/lib/message.cc b/lib/message.cc
index f8215a49..d68e4c66 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -579,7 +579,9 @@ void
 _notmuch_message_remove_terms (notmuch_message_t *message, const char *prefix)
 {
 Xapian::TermIterator i;
-size_t prefix_len = strlen (prefix);
+size_t prefix_len = 0;
+
+prefix_len = strlen (prefix);
 
 while (1) {
i = message->doc.termlist_begin ();
@@ -1872,3 +1874,101 @@ _notmuch_message_frozen (notmuch_message_t *message)
 {
 return message->frozen;
 }
+
+notmuch_status_t
+notmuch_message_reindex (notmuch_message_t *message,
+notmuch_param_t unused (*indexopts))
+{
+notmuch_database_t *notmuch = NULL;
+notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS, status;
+notmuch_tags_t *tags = NULL;
+notmuch_message_properties_t *properties = NULL;
+notmuch_filenames_t *filenames, *orig_filenames = NULL;
+const char *filename = NULL, *tag = NULL, *propkey = NULL;
+notmuch_message_t *newmsg = NULL;
+notmuch_bool_t readded = FALSE, skip;
+const char *autotags[] = {
+   "attachment",
+   "encrypted",
+   "signed" };
+
+if (message == NULL)
+   return NOTMUCH_STATUS_NULL_POINTER;
+
+notmuch = _notmuch_message_database (message);
+
+/* cache tags, properties, and filenames */
+tags = notmuch_message_get_tags (message);
+properties = notmuch_message_get_properties (message, "", FALSE);
+filenames = notmuch_message_get_filenames (message);
+orig_filenames = notmuch_message_get_filenames (message);
+
+/* walk through filenames, removing them until the message is gone */
+for ( ; notmuch_filenames_valid (filenames);
+ notmuch_filenames_move_to_next (filenames)) {
+   filename = notmuch_filenames_get (filenames);
+
+   ret = notmuch_database_remove_message (notmuch, filename);
+   if (ret != NOTMUCH_STATUS_SUCCESS &&
+   ret != NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID)
+   return ret;
+}
+if (ret != NOTMUCH_STATUS_SUCCESS)
+   return ret;
+
+/* re-add the filenames with the associated indexopts */
+for (; notmuch_filenames_valid (orig_filenames);
+notmuch_filenames_move_to_next (orig_filenames)) {
+   filename = notmuch_filenames_get (orig_filenames);
+
+   status = notmuch_database_add_message(notmuch,
+ filename,
+ readded ? NULL : );
+   if (status == NOTMUCH_STATUS_SUCCESS ||
+   status == NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID) {
+   if (!readded) {
+   /* re-add tags */
+   for (; notmuch_tags_valid (tags);
+notmuch_tags_move_to_next (tags)) {
+   tag = notmuch_tags_get (tags);
+   skip = FALSE;
+
+   for (size_t i = 0; i < ARRAY_SIZE (autotags); i++)
+   if (strcmp (tag, autotags[i]) == 0)
+   skip = TRUE;
+
+   if (!skip) {
+   status = notmuch_message_add_tag (newmsg, tag);
+   if (status != NOTMUCH_STATUS_SUCCESS)
+   ret = status;
+   }
+   }
+   /* re-add properties */
+   for (; notmuch_message_properties_valid (properties);
+notmuch_message_properties_move_to_next (properties)) {
+   propkey = notmuch_message_properties_key (properties);
+   skip = FALSE;
+
+   if (!skip) {
+   status = notmuch_message_add_property (newmsg, propkey,
+  
notmuch_message_properties_value (properties));
+   if (status != NOTMUCH_STATUS_SUCCESS)
+   ret = status;
+   }
+   }
+   readded = TRUE;
+   }
+   } else {
+   /* if we failed to add this filename, go ahead and try the
+* next one as though it were first, but report the
+* error... 

[rfc patch v2 5/5] lib: index message files with duplicate message-ids

2017-04-02 Thread David Bremner
The corresponding xapian document just gets more terms added to it,
but this doesn't seem to break anything.
---
 lib/database.cc|  3 +++
 test/T670-duplicate-mid.sh | 22 +++---
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 5bc131a3..3b9f7828 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -2582,6 +2582,9 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
if (ret)
goto DONE;
} else {
+   ret = _notmuch_message_index_file (message, message_file);
+   if (ret)
+   goto DONE;
ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
}
 
diff --git a/test/T670-duplicate-mid.sh b/test/T670-duplicate-mid.sh
index 88bd12cb..2c77e11e 100755
--- a/test/T670-duplicate-mid.sh
+++ b/test/T670-duplicate-mid.sh
@@ -2,11 +2,10 @@
 test_description="duplicate message ids"
 . ./test-lib.sh || exit 1
 
-add_message '[id]="id:duplicate"' '[subject]="message 1"'
-add_message '[id]="id:duplicate"' '[subject]="message 2"'
+add_message '[id]="duplicate"' '[subject]="message 1"'
+add_message '[id]="duplicate"' '[subject]="message 2"'
 
 test_begin_subtest 'Search for second subject'
-test_subtest_known_broken
 cat  
OUTPUT
 test_expect_equal_file EXPECTED OUTPUT
 
+add_message '[id]="duplicate"' '[body]="sekrit"'
+test_begin_subtest 'search for body in duplicate file'
+cat  OUTPUT
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest 'reindex removes terms from duplicate file'
+rm $MAIL_DIR/msg-003
+notmuch reindex id:duplicate
+cp /dev/null EXPECTED
+notmuch search --output=files "sekrit" | notmuch_dir_sanitize > OUTPUT
+test_expect_equal_file EXPECTED OUTPUT
+
 test_done
-- 
2.11.0

___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


[rfc patch v2 1/5] lib: add definitions for notmuch_param_t

2017-04-02 Thread David Bremner
This is not an opaque struct because we envision using static
initialization much like the command-line-options.h structures.
---
 lib/notmuch.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/lib/notmuch.h b/lib/notmuch.h
index d374dc96..fc00f96d 100644
--- a/lib/notmuch.h
+++ b/lib/notmuch.h
@@ -219,6 +219,23 @@ typedef struct _notmuch_filenames notmuch_filenames_t;
 typedef struct _notmuch_config_list notmuch_config_list_t;
 #endif /* __DOXYGEN__ */
 
+enum notmuch_param_type {
+NOTMUCH_PARAM_END = 0,
+NOTMUCH_PARAM_BOOLEAN,
+NOTMUCH_PARAM_INT,
+NOTMUCH_PARAM_STRING
+};
+
+typedef struct notmuch_param_desc {
+enum notmuch_param_type param_type;
+int key;
+union {
+   notmuch_bool_t bool_val;
+   int int_val;
+   const char *string_val;
+};
+} notmuch_param_t;
+
 /**
  * Create a new, empty notmuch database located at 'path'.
  *
-- 
2.11.0

___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


[rfc patch v2 4/5] test: add known broken test for duplicate message id

2017-04-02 Thread David Bremner
There are many other problems that could be tested, but this one we
have some hope of fixing because it doesn't require UI changes, just
indexing changes.
---
 test/T670-duplicate-mid.sh | 17 +
 1 file changed, 17 insertions(+)
 create mode 100755 test/T670-duplicate-mid.sh

diff --git a/test/T670-duplicate-mid.sh b/test/T670-duplicate-mid.sh
new file mode 100755
index ..88bd12cb
--- /dev/null
+++ b/test/T670-duplicate-mid.sh
@@ -0,0 +1,17 @@
+#!/usr/bin/env bash
+test_description="duplicate message ids"
+. ./test-lib.sh || exit 1
+
+add_message '[id]="id:duplicate"' '[subject]="message 1"'
+add_message '[id]="id:duplicate"' '[subject]="message 2"'
+
+test_begin_subtest 'Search for second subject'
+test_subtest_known_broken
+cat  
OUTPUT
+test_expect_equal_file EXPECTED OUTPUT
+
+test_done
-- 
2.11.0

___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


second round of indexing all files

2017-04-02 Thread David Bremner
This adds in a "notmuch reindex" command so that deleting the terms
from deleted files can be accomplished.  There are still several UI
issues to deal with (i.e. we return an arbitrary file, not necessarily
the one matched).

The reindex command is a simplified version of one the that dkg
originally wrote for his series on indexing encrypted messages. I've
ripped out all the encryption related stuff here.

I've also postulated (but not yet written) a more generic way of
handling index options, roughly modeled on our command-line-options
code. I hope that this will allow fewer functions, and a more static
API at the library level; at this point it's just a sketch of an idea.


___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch