Re: [PATCH] WIP: add all subjects to value.

2018-05-06 Thread David Bremner
Daniel Kahn Gillmor  writes:

>
> i think either we care about careful ordering (which strikes me as
> delicate and potentially difficult to handle, esp. when you get into the
> headers changing depending on whether you index the cleartext or not),
> or we should avoid injecting duplicates.
>

I'm not opposed to keeping the set of unique headers (although there is
a bit of a tarpit in deciding when two Subjects are the "same"). The
question I'm not sure of a good answer to is how we handle display. In
particular this patch doesn't make any effort to keep the subject
consistent with the Date and From headers. From the point of view of
displaying things, it makes sense to keep "parallel lists" in the value
slots by doing the same trick as with Subject. On the gripping hand,
sensible date search really requires one value per document.  In
principle we could make more extensive use of the document "data
area". This can store an arbitrary string, and is the recommended place
to store information needed to display the document.

___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] WIP: add all subjects to value.

2018-05-04 Thread Daniel Kahn Gillmor
I like the ideas behind this patch.  it's labeled WIP, presumably
because it doesn't handle ordering the subject lines, right?

further comments below inline:

On Tue 2017-12-19 09:15:40 -0400, David Bremner wrote:
> +/* Remove all values from a document; currently these are
> +   all regenerated during indexing */
> +
> +notmuch_private_status_t
> +_notmuch_message_remove_values (notmuch_message_t *message)
> +{
> +try {
> + message->doc.clear_values ();
> + message->modified = TRUE;
> +} catch (const Xapian::Error ) {
> + notmuch_database_t *notmuch = message->notmuch;
> +
> + if (!notmuch->exception_reported) {
> + _notmuch_database_log(_notmuch_message_database (message), "A 
> Xapian exception occurred creating message: %s\n",
> +   error.get_msg().c_str());

is this the right exception message?  seems like it should talk about
removing values rather than "creating message"

> @@ -1114,7 +1144,13 @@ _notmuch_message_set_header_values (notmuch_message_t 
> *message,
>  message->doc.add_value (NOTMUCH_VALUE_TIMESTAMP,
>   Xapian::sortable_serialise (time_value));
>  message->doc.add_value (NOTMUCH_VALUE_FROM, from);
> -message->doc.add_value (NOTMUCH_VALUE_SUBJECT, subject);
> +
> +old_subject = message->doc.get_value (NOTMUCH_VALUE_SUBJECT);
> +if (old_subject.empty())
> + message->doc.add_value (NOTMUCH_VALUE_SUBJECT, subject);
> +else
> + message->doc.add_value (NOTMUCH_VALUE_SUBJECT, old_subject + "\n" + 
> subject);

here we're appending the subject to the tail -- so the first injected
subject line stays at the top.

it looks like it will re-add the same subject line multiple times,
even if already present.  so if i get 3 copies of a message with subject
"foo" then the value slot will be "foo\nfoo\nfoo".  is that desirable?

i think either we care about careful ordering (which strikes me as
delicate and potentially difficult to handle, esp. when you get into the
headers changing depending on whether you index the cleartext or not),
or we should avoid injecting duplicates.

wdyt?

--dkg


signature.asc
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


[PATCH] WIP: add all subjects to value.

2017-12-19 Thread David Bremner
---
 lib/add-message.cc |  3 +--
 lib/message.cc | 52 --
 test/T670-duplicate-mid.sh |  1 -
 3 files changed, 47 insertions(+), 9 deletions(-)

diff --git a/lib/add-message.cc b/lib/add-message.cc
index f5fac8be..095a1f37 100644
--- a/lib/add-message.cc
+++ b/lib/add-message.cc
@@ -538,8 +538,7 @@ notmuch_database_index_file (notmuch_database_t *notmuch,
if (ret)
goto DONE;
 
-   if (is_new || is_ghost)
-   _notmuch_message_set_header_values (message, date, from, subject);
+   _notmuch_message_set_header_values (message, date, from, subject);
 
if (!indexopts) {
def_indexopts = notmuch_database_get_default_indexopts (notmuch);
diff --git a/lib/message.cc b/lib/message.cc
index d5db89b6..c624e145 100644
--- a/lib/message.cc
+++ b/lib/message.cc
@@ -26,6 +26,8 @@
 
 #include 
 
+#include 
+
 struct _notmuch_message {
 notmuch_database_t *notmuch;
 Xapian::docid doc_id;
@@ -514,8 +516,14 @@ notmuch_message_get_header (notmuch_message_t *message, 
const char *header)
 
 if (slot != Xapian::BAD_VALUENO) {
try {
-   std::string value = message->doc.get_value (slot);
-
+   std::string raw = message->doc.get_value (slot);
+   std::string value;
+   if (slot == NOTMUCH_VALUE_SUBJECT) {
+   std::istringstream f(raw);
+   std::getline(f, value);
+   } else {
+   value = raw;
+   }
/* If we have NOTMUCH_FEATURE_FROM_SUBJECT_ID_VALUES, then
 * empty values indicate empty headers.  If we don't, then
 * it could just mean we didn't record the header. */
@@ -655,6 +663,27 @@ _notmuch_message_remove_indexed_terms (notmuch_message_t 
*message)
 }
 return NOTMUCH_PRIVATE_STATUS_SUCCESS;
 }
+/* Remove all values from a document; currently these are
+   all regenerated during indexing */
+
+notmuch_private_status_t
+_notmuch_message_remove_values (notmuch_message_t *message)
+{
+try {
+   message->doc.clear_values ();
+   message->modified = TRUE;
+} catch (const Xapian::Error ) {
+   notmuch_database_t *notmuch = message->notmuch;
+
+   if (!notmuch->exception_reported) {
+   _notmuch_database_log(_notmuch_message_database (message), "A 
Xapian exception occurred creating message: %s\n",
+ error.get_msg().c_str());
+   notmuch->exception_reported = TRUE;
+   }
+   return NOTMUCH_PRIVATE_STATUS_XAPIAN_EXCEPTION;
+}
+return NOTMUCH_PRIVATE_STATUS_SUCCESS;
+}
 
 /* Return true if p points at "new" or "cur". */
 static bool is_maildir (const char *p)
@@ -1097,6 +1126,7 @@ _notmuch_message_set_header_values (notmuch_message_t 
*message,
const char *subject)
 {
 time_t time_value;
+std::string old_subject;
 
 /* GMime really doesn't want to see a NULL date, so protect its
  * sensibilities. */
@@ -1114,7 +1144,13 @@ _notmuch_message_set_header_values (notmuch_message_t 
*message,
 message->doc.add_value (NOTMUCH_VALUE_TIMESTAMP,
Xapian::sortable_serialise (time_value));
 message->doc.add_value (NOTMUCH_VALUE_FROM, from);
-message->doc.add_value (NOTMUCH_VALUE_SUBJECT, subject);
+
+old_subject = message->doc.get_value (NOTMUCH_VALUE_SUBJECT);
+if (old_subject.empty())
+   message->doc.add_value (NOTMUCH_VALUE_SUBJECT, subject);
+else
+   message->doc.add_value (NOTMUCH_VALUE_SUBJECT, old_subject + "\n" + 
subject);
+
 message->modified = true;
 }
 
@@ -1999,6 +2035,12 @@ notmuch_message_reindex (notmuch_message_t *message,
goto DONE;
 }
 
+private_status = _notmuch_message_remove_values (message);
+if (private_status) {
+   ret = COERCE_STATUS(private_status, "error values");
+   goto DONE;
+}
+
 ret = notmuch_message_remove_all_properties_with_prefix (message, 
"index.");
 if (ret)
goto DONE; /* XXX TODO: distinguish from other error returns above? */
@@ -2043,9 +2085,7 @@ notmuch_message_reindex (notmuch_message_t *message,
thread_id = orig_thread_id;
 
_notmuch_message_add_term (message, "thread", thread_id);
-   /* Take header values only from first filename */
-   if (found == 0)
-   _notmuch_message_set_header_values (message, date, from, subject);
+   _notmuch_message_set_header_values (message, date, from, subject);
 
ret = _notmuch_message_index_file (message, indexopts, message_file);
 
diff --git a/test/T670-duplicate-mid.sh b/test/T670-duplicate-mid.sh
index bf8cc3a8..cfc5dafb 100755
--- a/test/T670-duplicate-mid.sh
+++ b/test/T670-duplicate-mid.sh
@@ -48,7 +48,6 @@ notmuch search --output=files subject:'"message 2"' | 
notmuch_dir_sanitize > OUT
 test_expect_equal_file EXPECTED OUTPUT
 
 test_begin_subtest 'Regexp search for second subject'