Re: [PATCH 2/2] emacs: add function to resend message to new recipients

2015-08-30 Thread David Bremner
Tomi Ollila  writes:

>
>   emacs -q -L $PWD/emacs -l emacs/notmuch.el -f notmuch --eval '(progn (setq 
> notmuch-address-command "nottoomuch-addresses.sh") 
> (notmuch-address-message-insinuate))'
>
Ah, I missed notmuch-address-message-insinuate; it does work if I run
that. I wonder if this can be simplified at all now that
notmuch-message-mode exists. In general I'm leery of running code when
we load notmuch*.el, as this has a history of causing problems for
people using notmuch e.g. as a search tool for gnus.

The completion interface takes a little getting used to, but it's
definitely usable. And of course much better than what we have now ;).

One thing I noticed which I _think_ is a bug, is that after calling
notmuch-show-resend-message, notmuch-view-raw-message doesn't work
anymore, complaining about a read-only buffer.

d
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH 1/1] Store and search for canonical Unicode text [WIP]

2015-08-30 Thread Rob Browning
WARNING: this version is very preliminary, and might eat your data.

Unicode has multiple sequences representing what should normally be
considered the same text.  For example here's a combining Á and a
noncombining Á.

Depending on the way you view this, you may or may not see a
difference, but the former is the canonical form, and is represented
by two Unicode code points: a capital A (U+0041) followed by a
"combining acute accent" (U+0301); the latter is the single code
point (U+00C1), which is probably what most people would type.

Before this change, notmuch would index two strings that differ only
with respect to canonicalization, like tóken and tóken, as separate
terms, even though they may be visually indistinguishable, and do (for
most purposes) represent the same text.  After indexing, searching for
one would not find the other, and which one you present to notmuch
when you search depends on your tools.  See test/T570-normalization.sh
for a working example.

Since we're talking about differing representations that one wouldn't
normally want to distinguish, this patch unifies the various
representations by converting all incoming text to its canonical form
before indexing, and canonicalizing all query strings.

Up to now, notmuch has let Xapian handle converting the incoming bytes
to UTF-8.  Xapian treats any byte sequence as UTF-8, and interprets
any invalid UTF-8 bytes as Latin-1.  This patch maintains the existing
behavior (excepting the new canonicalization) by using Xapian's
Utf8Iterator to handle the initial Unicode character parsing.

Note that the parsing approach in this patch is not particularly
efficient, both because it traverses the incoming bytes three times:

   - once to determine how long the input is (currently the iterator
 can't directly handle null terminated char*'s),

   - once to determine how long the final UTF-8 allocation needs to
 be,

   - and once for the conversion.

And because when the input is already UTF-8, it just blindly converts
from UTF-8 to Unicode code points, and then back to UTF-8 (after
canonicalization), during each pass.  There are certainly
opportunities to optimize, though it may be worth discussing the
detection of data encodings more broadly first.

FIXME: document current encoding behavior clearly in
new/insert/search-terms.

FIXME: what about existing indexed text?
---

 Posted for preliminary discussion, and as a milestone (it appears to
 mostly work now).  Though I doubt I'm handling things correctly
 everywhere notmuch-wise, wrt talloc, etc.

 lib/Makefile.local |  1 +
 lib/database.cc| 17 --
 lib/message.cc | 51 +++-
 lib/notmuch.h  |  3 ++
 lib/query.cc   |  6 ++--
 lib/text-util.cc   | 82 ++
 test/Makefile.local| 10 --
 test/T150-tagging.sh   | 54 +++---
 test/T240-dump-restore.sh  |  4 +--
 test/T480-hex-escaping.sh  |  4 +--
 test/T570-normalization.sh | 28 
 test/corpus/cur/52:2,  |  6 ++--
 test/to-utf8.c | 44 +
 13 files changed, 267 insertions(+), 43 deletions(-)
 create mode 100644 lib/text-util.cc
 create mode 100755 test/T570-normalization.sh
 create mode 100644 test/to-utf8.c

diff --git a/lib/Makefile.local b/lib/Makefile.local
index 3a07090..41fd1e1 100644
--- a/lib/Makefile.local
+++ b/lib/Makefile.local
@@ -48,6 +48,7 @@ libnotmuch_cxx_srcs = \
$(dir)/index.cc \
$(dir)/message.cc   \
$(dir)/query.cc \
+   $(dir)/text-util.cc \
$(dir)/thread.cc

 libnotmuch_modules := $(libnotmuch_c_srcs:.c=.o) $(libnotmuch_cxx_srcs:.cc=.o)
diff --git a/lib/database.cc b/lib/database.cc
index 6a15174..7a01f95 100644
--- a/lib/database.cc
+++ b/lib/database.cc
@@ -436,6 +436,7 @@ find_document_for_doc_id (notmuch_database_t *notmuch, 
unsigned doc_id)
 char *
 _notmuch_message_id_compressed (void *ctx, const char *message_id)
 {
+// Assumes message_id is normalized utf-8.
 char *sha1, *compressed;

 sha1 = _notmuch_sha1_of_string (message_id);
@@ -457,12 +458,20 @@ notmuch_database_find_message (notmuch_database_t 
*notmuch,
 if (message_ret == NULL)
return NOTMUCH_STATUS_NULL_POINTER;

-if (strlen (message_id) > NOTMUCH_MESSAGE_ID_MAX)
-   message_id = _notmuch_message_id_compressed (notmuch, message_id);
+const char *u8_id = notmuch_bytes_to_utf8 (notmuch, message_id, -1);
+
+// Is strlen still appropriate?
+if (strlen (u8_id) > NOTMUCH_MESSAGE_ID_MAX)
+{
+   message_id = _notmuch_message_id_compressed (notmuch, u8_id);
+   talloc_free ((char *) u8_id);
+} else
+   message_id = u8_id;

 try {
status = _notmuch_database_find_unique_doc_id (notmuch, "id",
   message_id, &doc_id);
+   talloc_free ((char *

Re: [PATCH 1/2] emacs: add defsubst notmuch-address--message-insinuated

2015-08-30 Thread Tomi Ollila
On Sun, Aug 30 2015, David Bremner  wrote:

> Tomi Ollila  writes:
>
>> +(defsubst notmuch-address--message-insinuated ()
>> +  (memq notmuch-address-message-alist-member message-completion-alist))
>> +
>
> Is there some advantage to defsubst other than (maybe?) performance?  It
> just seems like one more construct for people to get up to speed with
> the codebase.

No advantage. The reason for defsubst may originally be that I did it for
least execution chance -- and iirc I was reading some (gnus?) code that
had quite a few defsubsts which infected my coding. Now that I checked
there is no (other) defsubsts code in notmuch-emacs... so that can be
just be changed to `defun' to be consistent in that sense.

>
> d

Tomi
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 2/2] emacs: add function to resend message to new recipients

2015-08-30 Thread Tomi Ollila
On Sun, Aug 30 2015, David Bremner  wrote:

> Tomi Ollila  writes:
>
>> The new function notmuch-show-message-resend re-sends
>> message to new recipients using #'message-resend.
>>
>> Recipients are read from minibuffer as a comma-separated
>> string (with some keyboard support including tab completion).
>>
>
> I couldn't get the tab completion to work, at least if I go
>
>   M-x notmuch-show-resend-message
>
> nor when evaluating (notmuch-address-from-minibuffer "foo:")
>
> Do I need to bind a key to test this?

Nope. both of the above should work -- and worked for me just now
(this is how I tested just now:

  emacs -q -L $PWD/emacs -l emacs/notmuch.el -f notmuch --eval '(progn (setq 
notmuch-address-command "nottoomuch-addresses.sh") 
(notmuch-address-message-insinuate))'

)

>> I remember that Emacs VM might have had 'b' bound to this functionality
>> but I cannot be sure. A few weeks ago I looked gnus, rmail & mh-e to
>> figure out whether 'b' would have been bound to similar functionality
>> there but I cannot find it...
>
> mutt uses 'b'. AFAICT, gnus uses some sequence ending in b to resend
> bounced messages (i.e. from mailer-daemon).
>
> quoting the manual:
>
> S D b
>
> If you have sent a mail, but the mail was bounced back to you for
> some reason (wrong address, transient failure), you can use this
> command to resend that bounced mail
> (gnus-summary-resend-bounced-mail). [...]
> 
> S D r
>
> Not to be confused with the previous command,
> gnus-summary-resend-message will prompt you for an address to send
> the current message off to, and then send it to that place. [...]
>
> Be that as it may, we already use 'r' for reply, and I'm not sure we
> want to go (more) in the way of multi-letter sequences.

I agree that we don't want to to multi-letter sequences (early). I
personally would be fine w/ 'b'...

Tomi
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 2/2] emacs: add function to resend message to new recipients

2015-08-30 Thread David Bremner
Tomi Ollila  writes:

> The new function notmuch-show-message-resend re-sends
> message to new recipients using #'message-resend.
>
> Recipients are read from minibuffer as a comma-separated
> string (with some keyboard support including tab completion).
>

I couldn't get the tab completion to work, at least if I go

  M-x notmuch-show-resend-message

nor when evaluating (notmuch-address-from-minibuffer "foo:")

Do I need to bind a key to test this?

> I remember that Emacs VM might have had 'b' bound to this functionality
> but I cannot be sure. A few weeks ago I looked gnus, rmail & mh-e to
> figure out whether 'b' would have been bound to similar functionality
> there but I cannot find it...

mutt uses 'b'. AFAICT, gnus uses some sequence ending in b to resend
bounced messages (i.e. from mailer-daemon).

quoting the manual:

S D b

If you have sent a mail, but the mail was bounced back to you for
some reason (wrong address, transient failure), you can use this
command to resend that bounced mail
(gnus-summary-resend-bounced-mail). [...]

S D r

Not to be confused with the previous command,
gnus-summary-resend-message will prompt you for an address to send
the current message off to, and then send it to that place. [...]

Be that as it may, we already use 'r' for reply, and I'm not sure we
want to go (more) in the way of multi-letter sequences.

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 1/2] emacs: add defsubst notmuch-address--message-insinuated

2015-08-30 Thread David Bremner
Tomi Ollila  writes:

> +(defsubst notmuch-address--message-insinuated ()
> +  (memq notmuch-address-message-alist-member message-completion-alist))
> +

Is there some advantage to defsubst other than (maybe?) performance?  It
just seems like one more construct for people to get up to speed with
the codebase.

d
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [RFC PATCH 5/5] cli: add support for deduplicating based on case insensitive address

2015-08-30 Thread David Bremner
Jani Nikula  writes:

> Consider all variants of an email address as one, and print the most
> common variant.

Other than the quibbles already mentioned, the series looks ok to
me. For production it should have one or two tests I guess. Oh, and man
page updates. But you knew that I guess.

I can live with the current argument syntax, but since a certain a mount
of bikeshedding is expected, here is an alternative suggestion

--deduplication={none|mailbox|address}
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [RFC PATCH 4/5] cli: change the data structure for notmuch address deduplication

2015-08-30 Thread David Bremner
Jani Nikula  writes:

>> I found this use of mailbox as a temporary variable confusing; despite
>> the obvious return I thought it might have something to do with the
>> g_list_append below.  Maybe just make a block scope temporary variable?
>
> This is how the function would turn out with that. Better, I guess? I
> also tried to think of ways to combine the two g_list_append paths here,
> but in the end doing it like this has most clarity I think.
>

Your (new) version is fine for me. As it happens I was thinking of a
smaller tweak:


l = g_list_find_custom (list, mailbox, mailbox_compare);
if (l) {
mailbox_t *found;

talloc_free (mailbox);

found = l->data;
found->count++;

return TRUE;
}

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [RFC PATCH 4/5] cli: change the data structure for notmuch address deduplication

2015-08-30 Thread Jani Nikula
On Sun, 30 Aug 2015, David Bremner  wrote:
> I guess this doesn't make the error handling worse; both old and new
> code silently ignore OOM if I understand correctly.

Oh, and current git will not silently ignore OOM. It will segfault... ;)

BR,
Jani.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [RFC PATCH 4/5] cli: change the data structure for notmuch address deduplication

2015-08-30 Thread Jani Nikula
On Sun, 30 Aug 2015, David Bremner  wrote:
> Jani Nikula  writes:
>
>>  
>> +static int
>> +strcase_equal (const void *a, const void *b)
>> +{
>> +return strcasecmp (a, b) == 0;
>> +}
>> +
>> +static unsigned int
>> +strcase_hash (const void *ptr)
>> +{
>> +const char *s = ptr;
>> +
>> +/* This is the djb2 hash. */
>> +unsigned int hash = 5381;
>> +while (s && *s) {
>> +hash = ((hash << 5) + hash) + tolower (*s);
>> +s++;
>> +}
>> +
>> +return hash;
>> +}
>> +
>
> as discussed, these functions probably need to be factored out into
> libutil.

Ack.

>
>> +l = g_list_find_custom (list, mailbox, mailbox_compare);
>> +if (l) {
>> +talloc_free (mailbox);
>> +mailbox = l->data;
>> +mailbox->count++;
>> +return TRUE;
>> +}
>
> I found this use of mailbox as a temporary variable confusing; despite
> the obvious return I thought it might have something to do with the
> g_list_append below.  Maybe just make a block scope temporary variable?

This is how the function would turn out with that. Better, I guess? I
also tried to think of ways to combine the two g_list_append paths here,
but in the end doing it like this has most clarity I think.

static notmuch_bool_t
is_duplicate (const search_context_t *ctx, const char *name, const char *addr)
{
char *key;
GList *list, *l;
mailbox_t *mailbox;

list = g_hash_table_lookup (ctx->addresses, addr);
if (list) {
mailbox_t find = {
.name = name,
.addr = addr,
};

l = g_list_find_custom (list, &find, mailbox_compare);
if (l) {
mailbox = l->data;
mailbox->count++;
return TRUE;
}

mailbox = new_mailbox (ctx->format, name, addr);
if (! mailbox)
return FALSE;

g_list_append (list, mailbox);
return FALSE;
}

key = talloc_strdup (ctx->format, addr);
if (! key)
return FALSE;

mailbox = new_mailbox (ctx->format, name, addr);
if (! mailbox)
return FALSE;

list = g_list_append (NULL, mailbox);
if (! list)
return FALSE;

g_hash_table_insert (ctx->addresses, key, list);

return FALSE;
}

>> +
>> +g_list_append (list, mailbox);
>> +return FALSE;
>>  }
>>  
>> -mailbox = new_mailbox (ctx->format, name, addr);
>> -if (! mailbox)
>> +key = talloc_strdup (ctx->format, addr);
>> +if (! key)
>>  return FALSE;
>
> I guess this doesn't make the error handling worse; both old and new
> code silently ignore OOM if I understand correctly. Do you happen to
> understand the original choice of using ctx->format rather than that
> ctx->notmuch for a talloc parent? it doesn't seem to get deallocated any
> earlier.

I don't know or understand that part of the history. It doesn't really
matter though because the deallocation is explicitly done on all keys
and values via g_hash_table_unref.

BR,
Jani.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [RFC PATCH 3/5] cli: add support for not deduplicating notmuch address results

2015-08-30 Thread Jani Nikula
On Sun, 30 Aug 2015, David Bremner  wrote:
> Jani Nikula  writes:
>
>
>> +{ NOTMUCH_OPT_KEYWORD, &ctx->dupe, "deduplicate", 'x',
>
> probably you want 'D' or 'd' here. Not that it makes a practical
> difference at this point.
>
>> +  (notmuch_keyword_t []){ { "yes", -1 },
>
> I'm not very enthusiastic about reusing ctx->dupe for this.
> In particular the use of -1 for yes is off-putting
> It seems better to allocate another word of memory and use an
> enum, as elsewhere.

Agreed on both.

BR,
Jani.

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch