Search mail from people with different addresses and incorrectly configured clients

2016-04-08 Thread Sebastian Fischmeister
Hi,

Sometimes, people's mail clients are configured incorrectly and do not
show the sender name. Often the same person uses different email
addresses. Sometimes they use different names for their alternate email
addresses.

When I search for an email originated from a specific person, I want the
search to cover all messages that match the different criteria mentioned
above.

Has anyone found a good way to use tags to aggregate emails from the
same person? Will it have some disadvantages? It sounds straightforward,
but the management of this would be quite an effort.

The alternative is to create long search strings that encompass all
email addresses of the person. This has less management effort, but I
would need to somewhere store all email addresses associated with a person.

  Sebastian


___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v4 6/7] On deletion, replace with ghost when other active messages in thread

2016-04-08 Thread Daniel Kahn Gillmor
There is no need to add a ghost message upon deletion if there are no
other active messages in the thread.

Also, if the message being deleted was a ghost already, we can just go
ahead and delete it.
---
 lib/message.cc | 58 ++
 1 file changed, 42 insertions(+), 16 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index 2399ab3..1b423b0 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -1044,11 +1044,14 @@ _notmuch_message_delete (notmuch_message_t *message)
 {
 notmuch_status_t status;
 Xapian::WritableDatabase *db;
-const char *mid, *tid;
+const char *mid, *tid, *query_string;
 notmuch_message_t *ghost;
 notmuch_private_status_t private_status;
 notmuch_database_t *notmuch;
-   
+notmuch_query_t *query;
+unsigned int count = 0;
+notmuch_bool_t is_ghost;
+
 mid = notmuch_message_get_message_id (message);
 tid = notmuch_message_get_thread_id (message);
 notmuch = message->notmuch;
@@ -1059,22 +1062,45 @@ _notmuch_message_delete (notmuch_message_t *message)
 
 db = static_cast  (notmuch->xapian_db);
 db->delete_document (message->doc_id);
-   
-/* and reintroduce a ghost in its place */
-ghost = _notmuch_message_create_for_message_id (notmuch, mid, 
_status);
-if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
-   private_status = _notmuch_message_initialize_ghost (ghost, tid);
-   if (! private_status)
-   _notmuch_message_sync (ghost);
-} else if (private_status == NOTMUCH_PRIVATE_STATUS_SUCCESS) {
-   /* this is deeply weird, and we should not have gotten into
-  this state.  is there a better error message to return
-  here? */
-   return NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
+
+/* if this was a ghost to begin with, we are done */
+private_status = _notmuch_message_has_term (message, "type", "ghost", 
_ghost);
+if (private_status)
+   return COERCE_STATUS (private_status,
+ "Error trying to determine whether message was a 
ghost");
+if (is_ghost)
+   return NOTMUCH_STATUS_SUCCESS;
+
+query_string = talloc_asprintf (message, "thread:%s", tid);
+query = notmuch_query_create (notmuch, query_string);
+if (query == NULL)
+   return NOTMUCH_STATUS_OUT_OF_MEMORY;
+status = notmuch_query_count_messages_st (query, );
+if (status) {
+   notmuch_query_destroy (query);
+   return status;
 }
 
-notmuch_message_destroy (ghost);
-return COERCE_STATUS (private_status, "Error converting to ghost message");
+if (count > 0) {
+   /* reintroduce a ghost in its place because there are still
+* other active messages in this thread: */
+   ghost = _notmuch_message_create_for_message_id (notmuch, mid, 
_status);
+   if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
+   private_status = _notmuch_message_initialize_ghost (ghost, tid);
+   if (! private_status)
+   _notmuch_message_sync (ghost);
+   } else if (private_status == NOTMUCH_PRIVATE_STATUS_SUCCESS) {
+   /* this is deeply weird, and we should not have gotten
+  into this state.  is there a better error message to
+  return here? */
+   status = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
+   }
+
+   notmuch_message_destroy (ghost);
+   status = COERCE_STATUS (private_status, "Error converting to ghost 
message");
+}
+notmuch_query_destroy (query);
+return status;
 }
 
 /* Transform a blank message into a ghost message.  The caller must
-- 
2.8.0.rc3

___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v4 7/7] complete ghost-on-removal-when-shared-thread-exists

2016-04-08 Thread Daniel Kahn Gillmor
To fully complete the ghost-on-removal-when-shared-thread-exists
proposal, we need to clear all ghost messages when the last active
message is removed from a thread.
---
 lib/message.cc   | 20 
 test/T590-thread-breakage.sh |  6 +-
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index 1b423b0..39dbe53 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -1098,6 +1098,26 @@ _notmuch_message_delete (notmuch_message_t *message)
 
notmuch_message_destroy (ghost);
status = COERCE_STATUS (private_status, "Error converting to ghost 
message");
+} else {
+   /* the thread is empty; drop all ghost messages from it */
+   notmuch_messages_t *messages;
+   status = _notmuch_query_search_documents (query,
+ "ghost",
+ );
+   if (status == NOTMUCH_STATUS_SUCCESS) {
+   notmuch_status_t last_error = NOTMUCH_STATUS_SUCCESS;
+   while (notmuch_messages_valid (messages)) {
+   message = notmuch_messages_get (messages);
+   status = _notmuch_message_delete (message);
+   if (status) /* we'll report the last failure we see;
+* if there is more than one failure, we
+* forget about previous ones */
+   last_error = status;
+   notmuch_message_destroy (message);
+   notmuch_messages_move_to_next (messages);
+   }
+   status = last_error;
+   }
 }
 notmuch_query_destroy (query);
 return status;
diff --git a/test/T590-thread-breakage.sh b/test/T590-thread-breakage.sh
index 81f27db..45446b9 100755
--- a/test/T590-thread-breakage.sh
+++ b/test/T590-thread-breakage.sh
@@ -121,10 +121,6 @@ notmuch new >/dev/null
 test_thread_count 0 'All messages gone: no threads'
 test_content_count apple 0
 test_content_count banana 0
-test_begin_subtest 'No ghosts should remain after full thread deletion'
-# this is known to fail; we are leaking ghost messages deliberately
-test_subtest_known_broken
-ghosts=$(../ghost-report ${MAIL_DIR}/.notmuch/xapian)
-test_expect_equal "$ghosts" "0"
+test_ghost_count 0 'No ghosts should remain after full thread deletion'
 
 test_done
-- 
2.8.0.rc3

___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v4 1/7] test: add test-binary to print the number of ghost messages

2016-04-08 Thread Daniel Kahn Gillmor
From: David Bremner 

This one-liner seems preferable to the complications of depending on
delve, getting the binary name right and parsing the output.
---
 test/Makefile.local  |  4 
 test/ghost-report.cc | 12 
 2 files changed, 16 insertions(+)
 create mode 100644 test/ghost-report.cc

diff --git a/test/Makefile.local b/test/Makefile.local
index 30d420e..022f2cf 100644
--- a/test/Makefile.local
+++ b/test/Makefile.local
@@ -38,6 +38,9 @@ $(dir)/parse-time: $(dir)/parse-time.o 
parse-time-string/parse-time-string.o
 $(dir)/make-db-version: $(dir)/make-db-version.o
$(call quiet,CXX) $^ -o $@ $(LDFLAGS) $(XAPIAN_LDFLAGS)
 
+$(dir)/ghost-report: $(dir)/ghost-report.o
+   $(call quiet,CXX) $^ -o $@ $(LDFLAGS) $(XAPIAN_LDFLAGS)
+
 .PHONY: test check
 
 test_main_srcs=$(dir)/arg-test.c \
@@ -47,6 +50,7 @@ test_main_srcs=$(dir)/arg-test.c \
  $(dir)/smtp-dummy.c \
  $(dir)/symbol-test.cc \
  $(dir)/make-db-version.cc \
+ $(dir)/ghost-report.cc
 
 test_srcs=$(test_main_srcs) $(dir)/database-test.c
 
diff --git a/test/ghost-report.cc b/test/ghost-report.cc
new file mode 100644
index 000..1739be4
--- /dev/null
+++ b/test/ghost-report.cc
@@ -0,0 +1,12 @@
+#include 
+#include 
+
+int main(int argc, char **argv) {
+
+if (argc < 2) {
+   std::cerr << "usage: ghost-report xapian-dir" << std::endl;
+}
+
+Xapian::Database db(argv[1]);
+std::cout << db.get_termfreq("Tghost") << std::endl;
+}
-- 
2.8.0.rc3

___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v4 3/7] fix thread breakage via ghost-on-removal

2016-04-08 Thread Daniel Kahn Gillmor
implement ghost-on-removal, the solution to T590-thread-breakage.sh
that just adds a ghost message after removing each message.

It leaks information about whether we've ever seen a given message id,
but it's a fairly simple implementation.

Note that _resolve_message_id_to_thread_id already introduces new
message_ids to the database, so i think just searching for a given
message ID may introduce the same metadata leakage.
---
 lib/message.cc   | 30 +++---
 test/T590-thread-breakage.sh | 25 -
 2 files changed, 39 insertions(+), 16 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index 8d72ea2..435b78a 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -1037,20 +1037,44 @@ _notmuch_message_sync (notmuch_message_t *message)
 message->modified = FALSE;
 }
 
-/* Delete a message document from the database. */
+/* Delete a message document from the database, leaving a ghost
+ * message in its place */
 notmuch_status_t
 _notmuch_message_delete (notmuch_message_t *message)
 {
 notmuch_status_t status;
 Xapian::WritableDatabase *db;
+const char *mid, *tid;
+notmuch_message_t *ghost;
+notmuch_private_status_t private_status;
+notmuch_database_t *notmuch;
+   
+mid = notmuch_message_get_message_id (message);
+tid = notmuch_message_get_thread_id (message);
+notmuch = message->notmuch;
 
 status = _notmuch_database_ensure_writable (message->notmuch);
 if (status)
return status;
 
-db = static_cast  
(message->notmuch->xapian_db);
+db = static_cast  (notmuch->xapian_db);
 db->delete_document (message->doc_id);
-return NOTMUCH_STATUS_SUCCESS;
+   
+/* and reintroduce a ghost in its place */
+ghost = _notmuch_message_create_for_message_id (notmuch, mid, 
_status);
+if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
+   private_status = _notmuch_message_initialize_ghost (ghost, tid);
+   if (! private_status)
+   _notmuch_message_sync (ghost);
+} else if (private_status == NOTMUCH_PRIVATE_STATUS_SUCCESS) {
+   /* this is deeply weird, and we should not have gotten into
+  this state.  is there a better error message to return
+  here? */
+   return NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
+}
+
+notmuch_message_destroy (ghost);
+return COERCE_STATUS (private_status, "Error converting to ghost message");
 }
 
 /* Transform a blank message into a ghost message.  The caller must
diff --git a/test/T590-thread-breakage.sh b/test/T590-thread-breakage.sh
index 2b933f6..81f27db 100755
--- a/test/T590-thread-breakage.sh
+++ b/test/T590-thread-breakage.sh
@@ -96,20 +96,11 @@ notmuch new >/dev/null
 test_thread_count 1 'First message removed: still only one thread'
 test_content_count apple 0
 test_content_count banana 1
-test_begin_subtest 'should be one ghost after first message removed'
-test_subtest_known_broken
-ghosts=$(../ghost-report ${MAIL_DIR}/.notmuch/xapian)
-test_expect_equal "$ghosts" "1"
+test_ghost_count 1 'should be one ghost after first message removed'
 
 message_a
 notmuch new >/dev/null
-# this is known to fail (it shows 2 threads) because no "ghost
-# message" was created for message A when it was removed from the
-# index, despite message B still pointing to it.
-test_begin_subtest 'First message reappears: should return to the same thread'
-test_subtest_known_broken
-count=$(notmuch count --output=threads)
-test_expect_equal "$count" "1"
+test_thread_count 1 'First message reappears: should return to the same thread'
 test_content_count apple 1
 test_content_count banana 1
 test_ghost_count 0
@@ -119,13 +110,21 @@ notmuch new >/dev/null
 test_thread_count 1 'Removing second message: still only one thread'
 test_content_count apple 1
 test_content_count banana 0
-test_ghost_count 0 'No ghosts should remain after deletion of second message'
+test_begin_subtest 'No ghosts should remain after deletion of second message'
+# this is known to fail; we are leaking ghost messages deliberately
+test_subtest_known_broken
+ghosts=$(../ghost-report ${MAIL_DIR}/.notmuch/xapian)
+test_expect_equal "$ghosts" "0"
 
 rm -f ${MAIL_DIR}/cur/a
 notmuch new >/dev/null
 test_thread_count 0 'All messages gone: no threads'
 test_content_count apple 0
 test_content_count banana 0
-test_ghost_count 0
+test_begin_subtest 'No ghosts should remain after full thread deletion'
+# this is known to fail; we are leaking ghost messages deliberately
+test_subtest_known_broken
+ghosts=$(../ghost-report ${MAIL_DIR}/.notmuch/xapian)
+test_expect_equal "$ghosts" "0"
 
 test_done
-- 
2.8.0.rc3

___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v4 5/7] Introduce _notmuch_message_has_term()

2016-04-08 Thread Daniel Kahn Gillmor
It can be useful to easily tell if a given message has a given term
associated with it.
---
 lib/message.cc| 35 +++
 lib/notmuch-private.h |  6 ++
 2 files changed, 41 insertions(+)

diff --git a/lib/message.cc b/lib/message.cc
index 435b78a..2399ab3 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -1217,6 +1217,41 @@ _notmuch_message_remove_term (notmuch_message_t *message,
 return NOTMUCH_PRIVATE_STATUS_SUCCESS;
 }
 
+notmuch_private_status_t
+_notmuch_message_has_term (notmuch_message_t *message,
+  const char *prefix_name,
+  const char *value,
+  notmuch_bool_t *result)
+{
+char *term;
+notmuch_bool_t out = FALSE;
+notmuch_private_status_t status = NOTMUCH_PRIVATE_STATUS_SUCCESS;
+
+if (value == NULL)
+   return NOTMUCH_PRIVATE_STATUS_NULL_POINTER;
+
+term = talloc_asprintf (message, "%s%s",
+   _find_prefix (prefix_name), value);
+
+if (strlen (term) > NOTMUCH_TERM_MAX)
+   return NOTMUCH_PRIVATE_STATUS_TERM_TOO_LONG;
+
+try {
+   /* Look for the exact term */
+   Xapian::TermIterator i = message->doc.termlist_begin ();
+   i.skip_to (term);
+   if (i != message->doc.termlist_end () &&
+   !strcmp ((*i).c_str (), term))
+   out = TRUE;
+} catch (Xapian::Error ) {
+   status = NOTMUCH_PRIVATE_STATUS_XAPIAN_EXCEPTION;
+}
+talloc_free (term);
+
+*result = out;
+return status;
+}
+
 notmuch_status_t
 notmuch_message_add_tag (notmuch_message_t *message, const char *tag)
 {
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index d95bf31..9280797 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -280,6 +280,12 @@ _notmuch_message_remove_term (notmuch_message_t *message,
  const char *value);
 
 notmuch_private_status_t
+_notmuch_message_has_term (notmuch_message_t *message,
+  const char *prefix_name,
+  const char *value,
+  notmuch_bool_t *result);
+
+notmuch_private_status_t
 _notmuch_message_gen_terms (notmuch_message_t *message,
const char *prefix_name,
const char *text);
-- 
2.8.0.rc3

___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v4 4/7] Add internal functions to search for alternate doc types

2016-04-08 Thread Daniel Kahn Gillmor
Publicly we are only exposing the non-ghost documents (of "type"
"mail").  But internally we might want to inspect the ghost messages
as well.

This changeset adds two new private interfaces to queries to recover
information about alternate document types.
---
 lib/notmuch-private.h | 11 +++
 lib/query.cc  | 18 --
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index 5dd4770..d95bf31 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -477,6 +477,17 @@ void
 _notmuch_doc_id_set_remove (notmuch_doc_id_set_t *doc_ids,
 unsigned int doc_id);
 
+/* querying xapian documents by type (e.g. "mail" or "ghost"): */
+notmuch_status_t
+_notmuch_query_search_documents (notmuch_query_t *query,
+const char *type,
+notmuch_messages_t **out);
+
+notmuch_status_t
+_notmuch_query_count_documents (notmuch_query_t *query,
+   const char *type,
+   unsigned *count_out);
+
 /* message.cc */
 
 void
diff --git a/lib/query.cc b/lib/query.cc
index e627bfc..77a7926 100644
--- a/lib/query.cc
+++ b/lib/query.cc
@@ -187,6 +187,14 @@ notmuch_status_t
 notmuch_query_search_messages_st (notmuch_query_t *query,
  notmuch_messages_t **out)
 {
+return _notmuch_query_search_documents (query, "mail", out);
+}
+
+notmuch_status_t
+_notmuch_query_search_documents (notmuch_query_t *query,
+const char *type,
+notmuch_messages_t **out)
+{
 notmuch_database_t *notmuch = query->notmuch;
 const char *query_string = query->query_string;
 notmuch_mset_messages_t *messages;
@@ -208,7 +216,7 @@ notmuch_query_search_messages_st (notmuch_query_t *query,
Xapian::Enquire enquire (*notmuch->xapian_db);
Xapian::Query mail_query (talloc_asprintf (query, "%s%s",
   _find_prefix ("type"),
-  "mail"));
+  type));
Xapian::Query string_query, final_query, exclude_query;
Xapian::MSet mset;
Xapian::MSetIterator iterator;
@@ -554,6 +562,12 @@ notmuch_query_count_messages (notmuch_query_t *query)
 notmuch_status_t
 notmuch_query_count_messages_st (notmuch_query_t *query, unsigned *count_out)
 {
+return _notmuch_query_count_documents (query, "mail", count_out);
+}
+
+notmuch_status_t
+_notmuch_query_count_documents (notmuch_query_t *query, const char *type, 
unsigned *count_out)
+{
 notmuch_database_t *notmuch = query->notmuch;
 const char *query_string = query->query_string;
 Xapian::doccount count = 0;
@@ -562,7 +576,7 @@ notmuch_query_count_messages_st (notmuch_query_t *query, 
unsigned *count_out)
Xapian::Enquire enquire (*notmuch->xapian_db);
Xapian::Query mail_query (talloc_asprintf (query, "%s%s",
   _find_prefix ("type"),
-  "mail"));
+  type));
Xapian::Query string_query, final_query, exclude_query;
Xapian::MSet mset;
unsigned int flags = (Xapian::QueryParser::FLAG_BOOLEAN |
-- 
2.8.0.rc3

___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v4 2/7] test thread breakage when messages are removed and re-added

2016-04-08 Thread Daniel Kahn Gillmor
This test (T590-thread-breakage.sh) has known-broken subtests.

If you have a two-message thread where message "B" is in-reply-to "A",
notmuch rightly sees this as a single thread.

But if you:

 * remove "A" from the message store
 * run "notmuch new"
 * add "A" back into the message store
 * re-run "notmuch new"

Then notmuch sees the messages as distinct threads.

This happens because if you insert "B" initially (before anything is
known about "A"), then a "ghost message" gets added to the database in
reference to "A" that is in the same thread, which "A" takes over when
it appears.

But if "A" is subsequently removed, no ghost message is retained, so
when "A" appears, it is treated as a new thread.

I see a few options to fix this:

ghost-on-removal


We could unilaterally add a ghost upon message removal.  This has a
few disadvantages: the message index would leak information about what
messages the user has ever been exposed to, and we also create a
perpetually-growing dataset -- the ghosts can never be removed.

ghost-on-removal-when-shared-thread-exists
--

We could add a ghost upon message removal iff there are other
non-ghost messages with the same thread ID.

We'd also need to remove all ghost messages that share a thread when
the last non-ghost message in that thread is removed.

This still has a bit of information leakage, though: the message index
would reveal that i've seen a newer message in a thread, even if i had
deleted it from my message store

track-dependencies
--

rather than a simple "ghost-message" we could store all the (A,B)
message-reference pairs internally, showing which messages A reference
which other messages B.

Then removal of message X would require deleting all message-reference
pairs (X,B), and only deleting a ghost message if no (A,X) reference
pair exists.

This requires modifying the database by adding a new and fairly weird
table that would need to be indexed by both columns.  I don't know
whether xapian has nice ways to do that.

scan-dependencies
-

Without modifying the database, we could do something less efficient.

Upon removal of message X, we could scan the headers of all non-ghost
messages that share a thread with X.  If any of those messages refers
to X, we would add a ghost message.  If none of them do, then we would
just drop X entirely from the table.

-

One risk of attempted fixes to this problem is that we could fail to
remove the search term indexes entirely.  This test contains
additional subtests to guard against that.

This test also ensures that the right number of ghost messages exist
in each situation; this will help us ensure we don't accumulate ghosts
indefinitely or leak too much information about what messages we've
seen or not seen, while still making it easy to reassemble threads
when messages come in out-of-order.
---
 test/T590-thread-breakage.sh | 131 +++
 1 file changed, 131 insertions(+)
 create mode 100755 test/T590-thread-breakage.sh

diff --git a/test/T590-thread-breakage.sh b/test/T590-thread-breakage.sh
new file mode 100755
index 000..2b933f6
--- /dev/null
+++ b/test/T590-thread-breakage.sh
@@ -0,0 +1,131 @@
+#!/usr/bin/env bash
+#
+# Copyright (c) 2016 Daniel Kahn Gillmor
+#
+
+test_description='thread breakage during reindexing
+
+notmuch uses ghost documents to track messages we have seen references
+to but have never seen.  Regardless of the order of delivery, message
+deletion, and reindexing, the list of ghost messages for a given
+stored corpus should not vary, so that threads can be reassmebled
+cleanly.
+
+In practice, we accept a small amount of variation (and therefore
+traffic pattern metadata leakage to be stored in the index) for the
+sake of efficiency.
+
+This test also embeds some subtests to ensure that indexing actually
+works properly and attempted fixes to threading issues do not break
+the expected contents of the index.'
+
+. ./test-lib.sh || exit 1
+
+message_a() {
+mkdir -p ${MAIL_DIR}/cur
+cat > ${MAIL_DIR}/cur/a <
+From: Alice 
+To: Bob 
+Date: Thu, 31 Mar 2016 20:10:00 -0400
+
+This is the first message in the thread.
+Apple
+EOF
+}
+
+message_b() {
+mkdir -p ${MAIL_DIR}/cur
+cat > ${MAIL_DIR}/cur/b <
+In-Reply-To: 
+References: 
+From: Bob 
+To: Alice 
+Date: Thu, 31 Mar 2016 20:15:00 -0400
+
+This is the second message in the thread.
+Banana
+EOF
+}
+
+
+test_content_count() {
+test_begin_subtest "${3:-looking for $2 instance of '$1'}"
+count=$(notmuch count --output=threads "$1")
+test_expect_equal "$count" "$2"
+}
+
+test_thread_count() {
+test_begin_subtest "${2:-Expecting $1 thread(s)}"
+count=$(notmuch count --output=threads)
+test_expect_equal "$count" "$1"
+}
+
+test_ghost_count() {
+

Re: [PATCH v2 2/7] verify during thread-breakage that messages are removed as well

2016-04-08 Thread Daniel Kahn Gillmor
On Tue 2016-04-05 22:20:45 -0300, David Bremner wrote:
> Daniel Kahn Gillmor  writes:
>
>>  
>> +test_subject_count() {
>> +notmuch new >/dev/null
>> +test_begin_subtest "${3:-looking for $2 instance of '$1'}"
>> +count=$(notmuch count --output=threads "$1")
>> +test_expect_equal "$count" "$2"
>> +}
>
> It's confusing that this doesn't have anything to do with the subject:
> prefix or the corresponding header.

Sigh.  sorry, i just realized i missed this absolutely correct comment
when preparing the v3 series.  v4 is on its way, and hopefully it
addresses all the excellent feedback i got here.

apologies for the noise.

   --dkg
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


[PATCH 3/4] configure: add test for default xapian backend

2016-04-08 Thread David Bremner
This is mainly for the test suite.  We already expect the tests to be
run in the same environment as configure was run, at least to get the
name of the python interpreter. So we are not really imposing a new
restriction.
---
 configure| 26 +-
 test/test-lib.sh | 11 +++
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/configure b/configure
index eb6dbac..4fc31cc 100755
--- a/configure
+++ b/configure
@@ -371,7 +371,25 @@ if [ ${have_xapian} = "1" ]; then
 esac
 fi
 
-
+default_xapian_backend=""
+if [ ${have_xapian} = "1" ]; then
+printf "Testing default Xapian backend... "
+cat >_default_backend.cc <
+int main(int argc, char** argv) {
+   Xapian::WritableDatabase db("test.db",Xapian::DB_CREATE_OR_OPEN);
+}
+EOF
+${CXX} ${CXXLAGS} ${xapian_cxxflags} _default_backend.cc -o 
_default_backend ${xapian_ldflags}
+./_default_backend
+if [ -f test.db/iamglass ]; then
+   default_xapian_backend=glass
+else
+   default_xapian_backend=chert
+fi
+printf "${default_xapian_backend}\n";
+rm -rf test.db _default_backend _default_backend.cc
+fi
 # we need to have a version >= 2.6.5 to avoid a crypto bug. We need
 # 2.6.7 for permissive "From " header handling.
 GMIME_MINVER=2.6.7
@@ -1001,6 +1019,9 @@ LINKER_RESOLVES_LIBRARY_DEPENDENCIES = 
${linker_resolves_library_dependencies}
 XAPIAN_CXXFLAGS = ${xapian_cxxflags}
 XAPIAN_LDFLAGS = ${xapian_ldflags}
 
+# Which backend will Xapian use by default?
+DEFAULT_XAPIAN_BACKEND = ${default_xapian_backend}
+
 # Flags needed to compile and link against GMime
 GMIME_CFLAGS = ${gmime_cflags}
 GMIME_LDFLAGS = ${gmime_ldflags}
@@ -1077,6 +1098,9 @@ cat > sh.config /dev/null && test_set_prereq SYMLINKS
 rm -f y
 
+# convert variable from configure to more convenient form
+case "$NOTMUCH_DEFAULT_XAPIAN_BACKEND" in
+glass)
+   db_ending=glass
+;;
+chert)
+   db_ending=DB
+;;
+*)
+   error "Unknown Xapian backend $NOTMUCH_DEFAULT_XAPIAN_BACKEND"
+esac
 # declare prerequisites for external binaries used in tests
 test_declare_external_prereq dtach
 test_declare_external_prereq emacs
-- 
2.8.0.rc3

___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


[PATCH 4/4] test: cope with glass backend file naming variations

2016-04-08 Thread David Bremner
In several places in the test suite we intentionally corrupt the Xapian
database in order to test error handling. This corruption is specific to
the on-disk organization of the database, and that changed with the
glass backend. We use the previously computed default backend to make
the tests adapt to changing names.
---
 test/T050-new.sh   | 2 +-
 test/T060-count.sh | 4 ++--
 test/T150-tagging.sh   | 4 ++--
 test/T360-symbol-hiding.sh | 4 ++--
 test/T560-lib-error.sh | 4 ++--
 5 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/test/T050-new.sh b/test/T050-new.sh
index 93a6fa9..df9e89a 100755
--- a/test/T050-new.sh
+++ b/test/T050-new.sh
@@ -284,7 +284,7 @@ notmuch config set new.tags $OLDCONFIG
 
 
 test_begin_subtest "Xapian exception: read only files"
-chmod u-w  ${MAIL_DIR}/.notmuch/xapian/*.DB
+chmod u-w  ${MAIL_DIR}/.notmuch/xapian/*.${db_ending}
 output=$(NOTMUCH_NEW --debug 2>&1 | sed 's/: .*$//' )
 chmod u+w  ${MAIL_DIR}/.notmuch/xapian/*.DB
 test_expect_equal "$output" "A Xapian exception occurred opening database"
diff --git a/test/T060-count.sh b/test/T060-count.sh
index 3fec94e..0ac8314 100755
--- a/test/T060-count.sh
+++ b/test/T060-count.sh
@@ -95,7 +95,7 @@ test_expect_equal_file EXPECTED OUTPUT
 
 backup_database
 test_begin_subtest "error message for database open"
-dd if=/dev/zero of="${MAIL_DIR}/.notmuch/xapian/postlist.DB" count=3
+dd if=/dev/zero of="${MAIL_DIR}/.notmuch/xapian/postlist.${db_ending}" count=3
 notmuch count '*' 2>OUTPUT 1>/dev/null
 output=$(sed 's/^\(A Xapian exception [^:]*\):.*$/\1/' OUTPUT)
 test_expect_equal "${output}" "A Xapian exception occurred opening database"
@@ -105,7 +105,7 @@ cat < count-files.gdb
 set breakpoint pending on
 break count_files
 commands
-shell cp /dev/null ${MAIL_DIR}/.notmuch/xapian/postlist.DB
+shell cp /dev/null ${MAIL_DIR}/.notmuch/xapian/postlist.${db_ending}
 continue
 end
 run
diff --git a/test/T150-tagging.sh b/test/T150-tagging.sh
index 8adcabc..6fd6a18 100755
--- a/test/T150-tagging.sh
+++ b/test/T150-tagging.sh
@@ -287,9 +287,9 @@ test_expect_code 1 "Empty tag names" 'notmuch tag + One'
 test_expect_code 1 "Tag name beginning with -" 'notmuch tag +- One'
 
 test_begin_subtest "Xapian exception: read only files"
-chmod u-w  ${MAIL_DIR}/.notmuch/xapian/*.DB
+chmod u-w  ${MAIL_DIR}/.notmuch/xapian/*.${db_ending}
 output=$(notmuch tag +something '*' 2>&1 | sed 's/: .*$//' )
-chmod u+w  ${MAIL_DIR}/.notmuch/xapian/*.DB
+chmod u+w  ${MAIL_DIR}/.notmuch/xapian/{*.DB,*.glass}
 test_expect_equal "$output" "A Xapian exception occurred opening database"
 
 test_done
diff --git a/test/T360-symbol-hiding.sh b/test/T360-symbol-hiding.sh
index 89e7f16..3f18ec1 100755
--- a/test/T360-symbol-hiding.sh
+++ b/test/T360-symbol-hiding.sh
@@ -15,11 +15,11 @@ test_begin_subtest 'running test' run_test
 mkdir -p ${PWD}/fakedb/.notmuch
 ( 
LD_LIBRARY_PATH="$TEST_DIRECTORY/../lib${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" \
 $TEST_DIRECTORY/symbol-test ${PWD}/fakedb ${PWD}/nonexistent \
-2>&1 | notmuch_dir_sanitize | sed "s,\`,\',g") > OUTPUT
+2>&1 | notmuch_dir_sanitize | sed -e "s,\`,\',g" -e 
"s,${NOTMUCH_DEFAULT_XAPIAN_BACKEND},backend,g") > OUTPUT
 
 cat < EXPECTED
 A Xapian exception occurred opening database: Couldn't stat 
'CWD/fakedb/.notmuch/xapian'
-caught No chert database found at path 'CWD/nonexistent'
+caught No backend database found at path 'CWD/nonexistent'
 EOF
 test_expect_equal_file EXPECTED OUTPUT
 
diff --git a/test/T560-lib-error.sh b/test/T560-lib-error.sh
index 49d3674..087c6bd 100755
--- a/test/T560-lib-error.sh
+++ b/test/T560-lib-error.sh
@@ -189,7 +189,7 @@ Path already exists: MAIL_DIR
 EOF
 test_expect_equal_file EXPECTED OUTPUT
 
-cat <<'EOF' > c_head
+cat < c_head
 #include 
 #include 
 #include 
@@ -210,7 +210,7 @@ int main (int argc, char** argv)
  fprintf (stderr, "error opening database: %d %s\n", stat, msg ? msg : "");
  exit (1);
}
-   path = talloc_asprintf (db, "%s/.notmuch/xapian/postlist.DB", argv[1]);
+   path = talloc_asprintf (db, "%s/.notmuch/xapian/postlist.${db_ending}", 
argv[1]);
fd = open(path,O_WRONLY|O_TRUNC);
if (fd < 0) {
fprintf (stderr, "error opening %s\n", argv[1]);
-- 
2.8.0.rc3

___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


xapian 1.3.5 compatibility, v3

2016-04-08 Thread David Bremner
Tomi convinced me (with his review of my previous attempt) to go the
route of testing the default xapian backend once in configure, and
using that variable elsewhere.

This obsoletes the series(s)
 id:1459855082-5715-2-git-send-email-da...@tethera.net
 id:1459938431-28670-2-git-send-email-da...@tethera.net
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v3 6/7] On deletion, replace with ghost when other active messages in thread

2016-04-08 Thread Daniel Kahn Gillmor
There is no need to add a ghost message upon deletion if there are no
other active messages in the thread.

Also, if the message being deleted was a ghost already, we can just go
ahead and delete it.
---
 lib/message.cc | 58 ++
 1 file changed, 42 insertions(+), 16 deletions(-)

diff --git a/lib/message.cc b/lib/message.cc
index 2399ab3..1b423b0 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -1044,11 +1044,14 @@ _notmuch_message_delete (notmuch_message_t *message)
 {
 notmuch_status_t status;
 Xapian::WritableDatabase *db;
-const char *mid, *tid;
+const char *mid, *tid, *query_string;
 notmuch_message_t *ghost;
 notmuch_private_status_t private_status;
 notmuch_database_t *notmuch;
-   
+notmuch_query_t *query;
+unsigned int count = 0;
+notmuch_bool_t is_ghost;
+
 mid = notmuch_message_get_message_id (message);
 tid = notmuch_message_get_thread_id (message);
 notmuch = message->notmuch;
@@ -1059,22 +1062,45 @@ _notmuch_message_delete (notmuch_message_t *message)
 
 db = static_cast  (notmuch->xapian_db);
 db->delete_document (message->doc_id);
-   
-/* and reintroduce a ghost in its place */
-ghost = _notmuch_message_create_for_message_id (notmuch, mid, 
_status);
-if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
-   private_status = _notmuch_message_initialize_ghost (ghost, tid);
-   if (! private_status)
-   _notmuch_message_sync (ghost);
-} else if (private_status == NOTMUCH_PRIVATE_STATUS_SUCCESS) {
-   /* this is deeply weird, and we should not have gotten into
-  this state.  is there a better error message to return
-  here? */
-   return NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
+
+/* if this was a ghost to begin with, we are done */
+private_status = _notmuch_message_has_term (message, "type", "ghost", 
_ghost);
+if (private_status)
+   return COERCE_STATUS (private_status,
+ "Error trying to determine whether message was a 
ghost");
+if (is_ghost)
+   return NOTMUCH_STATUS_SUCCESS;
+
+query_string = talloc_asprintf (message, "thread:%s", tid);
+query = notmuch_query_create (notmuch, query_string);
+if (query == NULL)
+   return NOTMUCH_STATUS_OUT_OF_MEMORY;
+status = notmuch_query_count_messages_st (query, );
+if (status) {
+   notmuch_query_destroy (query);
+   return status;
 }
 
-notmuch_message_destroy (ghost);
-return COERCE_STATUS (private_status, "Error converting to ghost message");
+if (count > 0) {
+   /* reintroduce a ghost in its place because there are still
+* other active messages in this thread: */
+   ghost = _notmuch_message_create_for_message_id (notmuch, mid, 
_status);
+   if (private_status == NOTMUCH_PRIVATE_STATUS_NO_DOCUMENT_FOUND) {
+   private_status = _notmuch_message_initialize_ghost (ghost, tid);
+   if (! private_status)
+   _notmuch_message_sync (ghost);
+   } else if (private_status == NOTMUCH_PRIVATE_STATUS_SUCCESS) {
+   /* this is deeply weird, and we should not have gotten
+  into this state.  is there a better error message to
+  return here? */
+   status = NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID;
+   }
+
+   notmuch_message_destroy (ghost);
+   status = COERCE_STATUS (private_status, "Error converting to ghost 
message");
+}
+notmuch_query_destroy (query);
+return status;
 }
 
 /* Transform a blank message into a ghost message.  The caller must
-- 
2.8.0.rc3

___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v3 1/7] test: add test-binary to print the number of ghost messages

2016-04-08 Thread Daniel Kahn Gillmor
From: David Bremner 

This one-liner seems preferable to the complications of depending on
delve, getting the binary name right and parsing the output.
---
 test/Makefile.local  |  4 
 test/ghost-report.cc | 12 
 2 files changed, 16 insertions(+)
 create mode 100644 test/ghost-report.cc

diff --git a/test/Makefile.local b/test/Makefile.local
index 30d420e..022f2cf 100644
--- a/test/Makefile.local
+++ b/test/Makefile.local
@@ -38,6 +38,9 @@ $(dir)/parse-time: $(dir)/parse-time.o 
parse-time-string/parse-time-string.o
 $(dir)/make-db-version: $(dir)/make-db-version.o
$(call quiet,CXX) $^ -o $@ $(LDFLAGS) $(XAPIAN_LDFLAGS)
 
+$(dir)/ghost-report: $(dir)/ghost-report.o
+   $(call quiet,CXX) $^ -o $@ $(LDFLAGS) $(XAPIAN_LDFLAGS)
+
 .PHONY: test check
 
 test_main_srcs=$(dir)/arg-test.c \
@@ -47,6 +50,7 @@ test_main_srcs=$(dir)/arg-test.c \
  $(dir)/smtp-dummy.c \
  $(dir)/symbol-test.cc \
  $(dir)/make-db-version.cc \
+ $(dir)/ghost-report.cc
 
 test_srcs=$(test_main_srcs) $(dir)/database-test.c
 
diff --git a/test/ghost-report.cc b/test/ghost-report.cc
new file mode 100644
index 000..1739be4
--- /dev/null
+++ b/test/ghost-report.cc
@@ -0,0 +1,12 @@
+#include 
+#include 
+
+int main(int argc, char **argv) {
+
+if (argc < 2) {
+   std::cerr << "usage: ghost-report xapian-dir" << std::endl;
+}
+
+Xapian::Database db(argv[1]);
+std::cout << db.get_termfreq("Tghost") << std::endl;
+}
-- 
2.8.0.rc3

___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v3 5/7] Introduce _notmuch_message_has_term()

2016-04-08 Thread Daniel Kahn Gillmor
It can be useful to easily tell if a given message has a given term
associated with it.
---
 lib/message.cc| 35 +++
 lib/notmuch-private.h |  6 ++
 2 files changed, 41 insertions(+)

diff --git a/lib/message.cc b/lib/message.cc
index 435b78a..2399ab3 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -1217,6 +1217,41 @@ _notmuch_message_remove_term (notmuch_message_t *message,
 return NOTMUCH_PRIVATE_STATUS_SUCCESS;
 }
 
+notmuch_private_status_t
+_notmuch_message_has_term (notmuch_message_t *message,
+  const char *prefix_name,
+  const char *value,
+  notmuch_bool_t *result)
+{
+char *term;
+notmuch_bool_t out = FALSE;
+notmuch_private_status_t status = NOTMUCH_PRIVATE_STATUS_SUCCESS;
+
+if (value == NULL)
+   return NOTMUCH_PRIVATE_STATUS_NULL_POINTER;
+
+term = talloc_asprintf (message, "%s%s",
+   _find_prefix (prefix_name), value);
+
+if (strlen (term) > NOTMUCH_TERM_MAX)
+   return NOTMUCH_PRIVATE_STATUS_TERM_TOO_LONG;
+
+try {
+   /* Look for the exact term */
+   Xapian::TermIterator i = message->doc.termlist_begin ();
+   i.skip_to (term);
+   if (i != message->doc.termlist_end () &&
+   !strcmp ((*i).c_str (), term))
+   out = TRUE;
+} catch (Xapian::Error ) {
+   status = NOTMUCH_PRIVATE_STATUS_XAPIAN_EXCEPTION;
+}
+talloc_free (term);
+
+*result = out;
+return status;
+}
+
 notmuch_status_t
 notmuch_message_add_tag (notmuch_message_t *message, const char *tag)
 {
diff --git a/lib/notmuch-private.h b/lib/notmuch-private.h
index d95bf31..9280797 100644
--- a/lib/notmuch-private.h
+++ b/lib/notmuch-private.h
@@ -280,6 +280,12 @@ _notmuch_message_remove_term (notmuch_message_t *message,
  const char *value);
 
 notmuch_private_status_t
+_notmuch_message_has_term (notmuch_message_t *message,
+  const char *prefix_name,
+  const char *value,
+  notmuch_bool_t *result);
+
+notmuch_private_status_t
 _notmuch_message_gen_terms (notmuch_message_t *message,
const char *prefix_name,
const char *text);
-- 
2.8.0.rc3

___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v3 2/7] test thread breakage when messages are removed and re-added

2016-04-08 Thread Daniel Kahn Gillmor
This test (T590-thread-breakage.sh) has known-broken subtests.

If you have a two-message thread where message "B" is in-reply-to "A",
notmuch rightly sees this as a single thread.

But if you:

 * remove "A" from the message store
 * run "notmuch new"
 * add "A" back into the message store
 * re-run "notmuch new"

Then notmuch sees the messages as distinct threads.

This happens because if you insert "B" initially (before anything is
known about "A"), then a "ghost message" gets added to the database in
reference to "A" that is in the same thread, which "A" takes over when
it appears.

But if "A" is subsequently removed, no ghost message is retained, so
when "A" appears, it is treated as a new thread.

I see a few options to fix this:

ghost-on-removal


We could unilaterally add a ghost upon message removal.  This has a
few disadvantages: the message index would leak information about what
messages the user has ever been exposed to, and we also create a
perpetually-growing dataset -- the ghosts can never be removed.

ghost-on-removal-when-shared-thread-exists
--

We could add a ghost upon message removal iff there are other
non-ghost messages with the same thread ID.

We'd also need to remove all ghost messages that share a thread when
the last non-ghost message in that thread is removed.

This still has a bit of information leakage, though: the message index
would reveal that i've seen a newer message in a thread, even if i had
deleted it from my message store

track-dependencies
--

rather than a simple "ghost-message" we could store all the (A,B)
message-reference pairs internally, showing which messages A reference
which other messages B.

Then removal of message X would require deleting all message-reference
pairs (X,B), and only deleting a ghost message if no (A,X) reference
pair exists.

This requires modifying the database by adding a new and fairly weird
table that would need to be indexed by both columns.  I don't know
whether xapian has nice ways to do that.

scan-dependencies
-

Without modifying the database, we could do something less efficient.

Upon removal of message X, we could scan the headers of all non-ghost
messages that share a thread with X.  If any of those messages refers
to X, we would add a ghost message.  If none of them do, then we would
just drop X entirely from the table.

-

One risk of attempted fixes to this problem is that we could fail to
remove the search term indexes entirely.  This test contains
additional subtests to guard against that.

This test also ensures that the right number of ghost messages exist
in each situation; this will help us ensure we don't accumulate ghosts
indefinitely or leak too much information about what messages we've
seen or not seen, while still making it easy to reassemble threads
when messages come in out-of-order.
---
 test/T590-thread-breakage.sh | 131 +++
 1 file changed, 131 insertions(+)
 create mode 100755 test/T590-thread-breakage.sh

diff --git a/test/T590-thread-breakage.sh b/test/T590-thread-breakage.sh
new file mode 100755
index 000..24ffb42
--- /dev/null
+++ b/test/T590-thread-breakage.sh
@@ -0,0 +1,131 @@
+#!/usr/bin/env bash
+#
+# Copyright (c) 2016 Daniel Kahn Gillmor
+#
+
+test_description='thread breakage during reindexing
+
+notmuch uses ghost documents to track messages we have seen references
+to but have never seen.  Regardless of the order of delivery, message
+deletion, and reindexing, the list of ghost messages for a given
+stored corpus should not vary, so that threads can be reassmebled
+cleanly.
+
+In practice, we accept a small amount of variation (and therefore
+traffic pattern metadata leakage to be stored in the index) for the
+sake of efficiency.
+
+This test also embeds some subtests to ensure that indexing actually
+works properly and attempted fixes to threading issues do not break
+the expected contents of the index.'
+
+. ./test-lib.sh || exit 1
+
+message_a() {
+mkdir -p ${MAIL_DIR}/cur
+cat > ${MAIL_DIR}/cur/a <
+From: Alice 
+To: Bob 
+Date: Thu, 31 Mar 2016 20:10:00 -0400
+
+This is the first message in the thread.
+Apple
+EOF
+}
+
+message_b() {
+mkdir -p ${MAIL_DIR}/cur
+cat > ${MAIL_DIR}/cur/b <
+In-Reply-To: 
+References: 
+From: Bob 
+To: Alice 
+Date: Thu, 31 Mar 2016 20:15:00 -0400
+
+This is the second message in the thread.
+Banana
+EOF
+}
+
+
+test_subject_count() {
+test_begin_subtest "${3:-looking for $2 instance of '$1'}"
+count=$(notmuch count --output=threads "$1")
+test_expect_equal "$count" "$2"
+}
+
+test_thread_count() {
+test_begin_subtest "${2:-Expecting $1 thread(s)}"
+count=$(notmuch count --output=threads)
+test_expect_equal "$count" "$1"
+}
+
+test_ghost_count() {
+