[rfc patch v3 6/6] lib: index message files with duplicate message-ids

2017-04-03 Thread David Bremner
The corresponding xapian document just gets more terms added to it,
but this doesn't seem to break anything.
---
 lib/database.cc|  3 +++
 test/T670-duplicate-mid.sh | 22 +++---
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/lib/database.cc b/lib/database.cc
index 5bc131a3..3b9f7828 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -2582,6 +2582,9 @@ notmuch_database_add_message (notmuch_database_t *notmuch,
if (ret)
goto DONE;
} else {
+   ret = _notmuch_message_index_file (message, message_file);
+   if (ret)
+   goto DONE;
ret = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
}
 
diff --git a/test/T670-duplicate-mid.sh b/test/T670-duplicate-mid.sh
index 88bd12cb..2c77e11e 100755
--- a/test/T670-duplicate-mid.sh
+++ b/test/T670-duplicate-mid.sh
@@ -2,11 +2,10 @@
 test_description="duplicate message ids"
 . ./test-lib.sh || exit 1
 
-add_message '[id]="id:duplicate"' '[subject]="message 1"'
-add_message '[id]="id:duplicate"' '[subject]="message 2"'
+add_message '[id]="duplicate"' '[subject]="message 1"'
+add_message '[id]="duplicate"' '[subject]="message 2"'
 
 test_begin_subtest 'Search for second subject'
-test_subtest_known_broken
 cat  
OUTPUT
 test_expect_equal_file EXPECTED OUTPUT
 
+add_message '[id]="duplicate"' '[body]="sekrit"'
+test_begin_subtest 'search for body in duplicate file'
+cat  OUTPUT
+test_expect_equal_file EXPECTED OUTPUT
+
+test_begin_subtest 'reindex removes terms from duplicate file'
+rm $MAIL_DIR/msg-003
+notmuch reindex id:duplicate
+cp /dev/null EXPECTED
+notmuch search --output=files "sekrit" | notmuch_dir_sanitize > OUTPUT
+test_expect_equal_file EXPECTED OUTPUT
+
 test_done
-- 
2.11.0

___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


[rfc patch v3 2/6] lib: add _notmuch_message_remove_indexed_terms

2017-04-03 Thread David Bremner
Testing will be provided via use in notmuch_message_reindex
---
 lib/message.cc| 44 
 lib/notmuch-private.h |  2 ++
 lib/notmuch.h |  4 
 3 files changed, 50 insertions(+)

diff --git a/lib/message.cc b/lib/message.cc
index f8215a49..a7bd38ac 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -599,6 +599,50 @@ _notmuch_message_remove_terms (notmuch_message_t *message, 
const char *prefix)
 }
 }
 
+
+/* Remove all terms generated by indexing, i.e. not tags or
+ * properties, along with any automatic tags*/
+notmuch_private_status_t
+_notmuch_message_remove_indexed_terms (notmuch_message_t *message)
+{
+Xapian::TermIterator i;
+
+const std::string tag_prefix = _find_prefix ("tag");
+const std::string property_prefix = _find_prefix ("property");
+
+for (i = message->doc.termlist_begin ();
+i != message->doc.termlist_end (); i++) {
+
+   const std::string term = *i;
+
+   if (term.compare (0, property_prefix.size (), property_prefix) == 0)
+   continue;
+
+   if (term.compare (0, tag_prefix.size (), tag_prefix) == 0 &&
+   term.compare (1, strlen("encrypted"), "encrypted") != 0 &&
+   term.compare (1, strlen("signed"), "signed") != 0 &&
+   term.compare (1, strlen("attachment"), "attachment") != 0)
+   continue;
+
+   try {
+   message->doc.remove_term ((*i));
+   message->modified = TRUE;
+   } catch (const Xapian::InvalidArgumentError) {
+   /* Ignore failure to remove non-existent term. */
+   } catch (const Xapian::Error ) {
+   notmuch_database_t *notmuch = message->notmuch;
+
+   if (!notmuch->exception_reported) {
+   _notmuch_database_log(_notmuch_message_database (message), "A 
Xapian exception occurred creating message: %s\n",
+ error.get_msg().c_str());
+   notmuch->exception_reported = TRUE;
+   }
+   return NOTMUCH_PRIVATE_STATUS_XAPIAN_EXCEPTION;
+   }
+}
+return NOTMUCH_PRIVATE_STATUS_SUCCESS;
+}
+
 /* Return true if p points at "new" or "cur". */
 static bool is_maildir (const char *p)
 {
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 8587e86c..1198d932 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -509,6 +509,8 @@ _notmuch_message_add_reply (notmuch_message_t *message,
 notmuch_database_t *
 _notmuch_message_database (notmuch_message_t *message);
 
+void
+_notmuch_message_remove_unprefixed_terms (notmuch_message_t *message);
 /* sha1.c */
 
 char *
diff --git a/lib/notmuch.h b/lib/notmuch.h
index fc00f96d..33e9fd24 100644
--- a/lib/notmuch.h
+++ b/lib/notmuch.h
@@ -1685,6 +1685,10 @@ notmuch_message_thaw (notmuch_message_t *message);
 void
 notmuch_message_destroy (notmuch_message_t *message);
 
+/* for testing */
+
+void
+notmuch_test_clear_terms(notmuch_message_t *message);
 /**
  * @name Message Properties
  *
-- 
2.11.0

___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


[rfc patch v3 4/6] add "notmuch reindex" subcommand

2017-04-03 Thread David Bremner
From: Daniel Kahn Gillmor 

This new subcommand takes a set of search terms, and re-indexes the
list of matching messages.
---
 Makefile.local|   1 +
 doc/conf.py   |   4 ++
 doc/index.rst |   1 +
 doc/man1/notmuch-reindex.rst  |  29 +
 doc/man1/notmuch.rst  |   4 +-
 doc/man7/notmuch-search-terms.rst |   7 +-
 notmuch-client.h  |   3 +
 notmuch-reindex.c | 131 ++
 notmuch.c |   2 +
 performance-test/M04-reindex.sh   |  11 
 performance-test/T03-reindex.sh   |  13 
 test/T700-reindex.sh  |  21 ++
 12 files changed, 223 insertions(+), 4 deletions(-)
 create mode 100644 doc/man1/notmuch-reindex.rst
 create mode 100644 notmuch-reindex.c
 create mode 100755 performance-test/M04-reindex.sh
 create mode 100755 performance-test/T03-reindex.sh
 create mode 100755 test/T700-reindex.sh

diff --git a/Makefile.local b/Makefile.local
index 03eafaaa..c6e272bc 100644
--- a/Makefile.local
+++ b/Makefile.local
@@ -222,6 +222,7 @@ notmuch_client_srcs =   \
notmuch-dump.c  \
notmuch-insert.c\
notmuch-new.c   \
+   notmuch-reindex.c   \
notmuch-reply.c \
notmuch-restore.c   \
notmuch-search.c\
diff --git a/doc/conf.py b/doc/conf.py
index a3d82696..aa864b3c 100644
--- a/doc/conf.py
+++ b/doc/conf.py
@@ -95,6 +95,10 @@ man_pages = [
  u'incorporate new mail into the notmuch database',
  [notmuch_authors], 1),
 
+('man1/notmuch-reindex', 'notmuch-reindex',
+ u're-index matching messages',
+ [notmuch_authors], 1),
+
 ('man1/notmuch-reply', 'notmuch-reply',
  u'constructs a reply template for a set of messages',
  [notmuch_authors], 1),
diff --git a/doc/index.rst b/doc/index.rst
index 344606d9..aa6c9f40 100644
--- a/doc/index.rst
+++ b/doc/index.rst
@@ -18,6 +18,7 @@ Contents:
man5/notmuch-hooks
man1/notmuch-insert
man1/notmuch-new
+   man1/notmuch-reindex
man1/notmuch-reply
man1/notmuch-restore
man1/notmuch-search
diff --git a/doc/man1/notmuch-reindex.rst b/doc/man1/notmuch-reindex.rst
new file mode 100644
index ..6c786b85
--- /dev/null
+++ b/doc/man1/notmuch-reindex.rst
@@ -0,0 +1,29 @@
+===
+notmuch-reindex
+===
+
+SYNOPSIS
+
+
+**notmuch** **reindex** [*option* ...] <*search-term*> ...
+
+DESCRIPTION
+===
+
+Re-index all messages matching the search terms.
+
+See **notmuch-search-terms(7)** for details of the supported syntax for
+<*search-term*\ >.
+
+The **reindex** command searches for all messages matching the
+supplied search terms, and re-creates the full-text index on these
+messages using the supplied options.
+
+SEE ALSO
+
+
+**notmuch(1)**, **notmuch-config(1)**, **notmuch-count(1)**,
+**notmuch-dump(1)**, **notmuch-hooks(5)**, **notmuch-insert(1)**,
+**notmuch-new(1)**,
+**notmuch-reply(1)**, **notmuch-restore(1)**, **notmuch-search(1)**,
+**notmuch-search-terms(7)**, **notmuch-show(1)**, **notmuch-tag(1)**
diff --git a/doc/man1/notmuch.rst b/doc/man1/notmuch.rst
index fbd7f381..b2a8376e 100644
--- a/doc/man1/notmuch.rst
+++ b/doc/man1/notmuch.rst
@@ -149,8 +149,8 @@ SEE ALSO
 
 **notmuch-address(1)**, **notmuch-compact(1)**, **notmuch-config(1)**,
 **notmuch-count(1)**, **notmuch-dump(1)**, **notmuch-hooks(5)**,
-**notmuch-insert(1)**, **notmuch-new(1)**, **notmuch-reply(1)**,
-**notmuch-restore(1)**, **notmuch-search(1)**,
+**notmuch-insert(1)**, **notmuch-new(1)**, **notmuch-reindex(1)**,
+**notmuch-reply(1)**, **notmuch-restore(1)**, **notmuch-search(1)**,
 **notmuch-search-terms(7)**, **notmuch-show(1)**, **notmuch-tag(1)**
 
 The notmuch website: **https://notmuchmail.org**
diff --git a/doc/man7/notmuch-search-terms.rst 
b/doc/man7/notmuch-search-terms.rst
index 47cab48d..dd76972e 100644
--- a/doc/man7/notmuch-search-terms.rst
+++ b/doc/man7/notmuch-search-terms.rst
@@ -9,6 +9,8 @@ SYNOPSIS
 
 **notmuch** **dump** [--format=(batch-tag|sup)] [--] [--output=<*file*>] [--] 
[<*search-term*> ...]
 
+**notmuch** **reindex** [option ...] <*search-term*> ...
+
 **notmuch** **search** [option ...] <*search-term*> ...
 
 **notmuch** **show** [option ...] <*search-term*> ...
@@ -421,5 +423,6 @@ SEE ALSO
 
 **notmuch(1)**, **notmuch-config(1)**, **notmuch-count(1)**,
 **notmuch-dump(1)**, **notmuch-hooks(5)**, **notmuch-insert(1)**,
-**notmuch-new(1)**, **notmuch-reply(1)**, **notmuch-restore(1)**,
-**notmuch-search(1)**, **notmuch-show(1)**, **notmuch-tag(1)**
+**notmuch-new(1)**, **notmuch-reindex(1)**, **notmuch-reply(1)**,
+**notmuch-restore(1)**, **notmuch-search(1)**, **notmuch-show(1)**,
+**notmuch-tag(1)**
diff --git a/notmuch-client.h b/notmuch-client.h
index a6f70eae..ab7138c6 100644
--- a/notmuch-client.h
+++ b/notmuch-client.h
@@ -196,6 +196,9 @@ int
 notmuch_insert_command 

[rfc patch v3 1/6] lib: add definitions for notmuch_param_t

2017-04-03 Thread David Bremner
This is not an opaque struct because we envision using static
initialization much like the command-line-options.h structures.
---
 lib/notmuch.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/lib/notmuch.h b/lib/notmuch.h
index d374dc96..fc00f96d 100644
--- a/lib/notmuch.h
+++ b/lib/notmuch.h
@@ -219,6 +219,23 @@ typedef struct _notmuch_filenames notmuch_filenames_t;
 typedef struct _notmuch_config_list notmuch_config_list_t;
 #endif /* __DOXYGEN__ */
 
+enum notmuch_param_type {
+NOTMUCH_PARAM_END = 0,
+NOTMUCH_PARAM_BOOLEAN,
+NOTMUCH_PARAM_INT,
+NOTMUCH_PARAM_STRING
+};
+
+typedef struct notmuch_param_desc {
+enum notmuch_param_type param_type;
+int key;
+union {
+   notmuch_bool_t bool_val;
+   int int_val;
+   const char *string_val;
+};
+} notmuch_param_t;
+
 /**
  * Create a new, empty notmuch database located at 'path'.
  *
-- 
2.11.0

___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


[rfc patch v3 3/6] added notmuch_message_reindex

2017-04-03 Thread David Bremner
From: Daniel Kahn Gillmor 

This new function asks the database to reindex a given message.
The parameter `indexopts` is currently ignored, but is intended to
provide an extensible API to support e.g. changing the encryption or
filtering status (e.g. whether and how certain non-plaintext parts are
indexed).
---
 lib/message.cc | 46 +-
 lib/notmuch.h  | 14 ++
 2 files changed, 59 insertions(+), 1 deletion(-)

diff --git a/lib/message.cc b/lib/message.cc
index a7bd38ac..193eedb2 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -579,7 +579,9 @@ void
 _notmuch_message_remove_terms (notmuch_message_t *message, const char *prefix)
 {
 Xapian::TermIterator i;
-size_t prefix_len = strlen (prefix);
+size_t prefix_len = 0;
+
+prefix_len = strlen (prefix);
 
 while (1) {
i = message->doc.termlist_begin ();
@@ -1916,3 +1918,45 @@ _notmuch_message_frozen (notmuch_message_t *message)
 {
 return message->frozen;
 }
+
+notmuch_status_t
+notmuch_message_reindex (notmuch_message_t *message,
+notmuch_param_t unused (*indexopts))
+{
+notmuch_database_t *notmuch = NULL;
+notmuch_status_t ret = NOTMUCH_STATUS_SUCCESS, status;
+notmuch_private_status_t private_status;
+notmuch_filenames_t *orig_filenames = NULL;
+const char *filename = NULL;
+
+if (message == NULL)
+   return NOTMUCH_STATUS_NULL_POINTER;
+
+notmuch = _notmuch_message_database (message);
+
+orig_filenames = notmuch_message_get_filenames (message);
+
+private_status = _notmuch_message_remove_indexed_terms (message);
+if (private_status)
+   return COERCE_STATUS(private_status, "error removing terms");
+
+/* re-add the filenames with the associated indexopts */
+for (; notmuch_filenames_valid (orig_filenames);
+notmuch_filenames_move_to_next (orig_filenames)) {
+   filename = notmuch_filenames_get (orig_filenames);
+
+   status = notmuch_database_add_message(notmuch,
+ filename,
+ );
+   if (status != NOTMUCH_STATUS_SUCCESS &&
+   status != NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID) {
+   /* if we failed to add this filename, go ahead and try the
+* next one as though it were first, but report the
+* error... */
+   ret = status;
+   }
+}
+
+/* XXX TODO destroy orig_filenames? */
+return ret;
+}
diff --git a/lib/notmuch.h b/lib/notmuch.h
index 33e9fd24..11818018 100644
--- a/lib/notmuch.h
+++ b/lib/notmuch.h
@@ -1389,6 +1389,20 @@ notmuch_filenames_t *
 notmuch_message_get_filenames (notmuch_message_t *message);
 
 /**
+ * Re-index the e-mail corresponding to 'message' using the supplied index 
options
+ *
+ * Returns the status of the re-index operation.  (see the return
+ * codes documented in notmuch_database_add_message)
+ *
+ * After reindexing, the user should discard the message object passed
+ * in here by calling notmuch_message_destroy, since it refers to the
+ * original message, not to the reindexed message.
+ */
+notmuch_status_t
+notmuch_message_reindex (notmuch_message_t *message,
+notmuch_param_t *indexopts);
+
+/**
  * Message flags.
  */
 typedef enum _notmuch_message_flag {
-- 
2.11.0

___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


[rfc patch v3 5/6] test: add known broken test for duplicate message id

2017-04-03 Thread David Bremner
There are many other problems that could be tested, but this one we
have some hope of fixing because it doesn't require UI changes, just
indexing changes.
---
 test/T670-duplicate-mid.sh | 17 +
 1 file changed, 17 insertions(+)
 create mode 100755 test/T670-duplicate-mid.sh

diff --git a/test/T670-duplicate-mid.sh b/test/T670-duplicate-mid.sh
new file mode 100755
index ..88bd12cb
--- /dev/null
+++ b/test/T670-duplicate-mid.sh
@@ -0,0 +1,17 @@
+#!/usr/bin/env bash
+test_description="duplicate message ids"
+. ./test-lib.sh || exit 1
+
+add_message '[id]="id:duplicate"' '[subject]="message 1"'
+add_message '[id]="id:duplicate"' '[subject]="message 2"'
+
+test_begin_subtest 'Search for second subject'
+test_subtest_known_broken
+cat  
OUTPUT
+test_expect_equal_file EXPECTED OUTPUT
+
+test_done
-- 
2.11.0

___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


third round of indexing all files

2017-04-03 Thread David Bremner
It seems noticeably faster (on the order of 30-50% faster) and the
code is quite a bit simpler to adapt the approach in [1] to only
delete the terms we are going to re-add via indexing.

This obsoletes the previous series at [2]. It still has all of the
issues mentioned there UI-wise, and the question of the index options
design probably needs more thought.

This is new in this round

 [rfc patch v3 2/6] lib: add _notmuch_message_remove_indexed_terms

This is has been pretty drastically rewritten compared to daniel's version [3]

 [rfc patch v3 3/6] added notmuch_message_reindex

This is the same, except I added simple performance tests

 [rfc patch v3 4/6] add "notmuch reindex" subcommand


[1]: id:1471178598-9639-1-git-send-email-da...@tethera.net
[2]: id:20170402131646.29884-1-da...@tethera.net
[3]: id:20170402131646.29884-3-da...@tethera.net
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: [gmailieer] fast fetch and two-way tag synchronization between notmuch and GMail

2017-04-03 Thread Gaute Hope

Rafael Avila de Espindola writes on april 3, 2017 16:35:

After a few issues with the initial sync this is working perfectly.

Thanks a lot, it is a big improvement over mbsync.



Great, thanks for the patches! This made inital sync much more robust.

I have a non-user-api-key version working [0], but I need to figure out if
it is safe first [1].

- Gaute

[0] https://github.com/gauteh/gmailieer/pull/9
[1] 
http://stackoverflow.com/questions/43173367/is-it-safe-to-distribute-client-id-and-client-secret-for-google-api-for-an-insta


pgpza2ufJQ1UL.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: [gmailieer] fast fetch and two-way tag synchronization between notmuch and GMail

2017-04-03 Thread Rafael Avila de Espindola
After a few issues with the initial sync this is working perfectly.

Thanks a lot, it is a big improvement over mbsync.

Cheers,
Rafael

Gaute Hope  writes:

> Hi,
>
>   'gmailieer' (or 'gmi') is a small program that can pull email and labels
> (and changes to labels) from your GMail account and store them locally in a
> maildir with the labels synchronized with a notmuch database. The
> changes to tags in the notmuch database may be pushed back remotely to
> your GMail account.
>
> The initial fetch of all emails takes some time, but synchronizing
> labels and tags, and checking for new messages, is usually done in 1-2
> seconds.
>
> It requires the most recent notmuch, the python googleapi bindings and
> tqdm.
>
> Disclaimer:
>
>   This is still experimental, but it does not have access to delete
>   e-mail on your account - only fetch and change labels, so damage
>   should be limited.
>
>
>
> Instructions and source code can be found here:
>
>   https://github.com/gauteh/gmailieer
>
>
>
>
> Regards, Gaute
> ___
> notmuch mailing list
> notmuch@notmuchmail.org
> https://notmuchmail.org/mailman/listinfo/notmuch
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch