Re: [PATCH 2/2] emacs: add function to resend message to new recipients
Tomi Ollila writes: > > emacs -q -L $PWD/emacs -l emacs/notmuch.el -f notmuch --eval '(progn (setq > notmuch-address-command "nottoomuch-addresses.sh") > (notmuch-address-message-insinuate))' > Ah, I missed notmuch-address-message-insinuate; it does work if I run that. I wonder if this can be simplified at all now that notmuch-message-mode exists. In general I'm leery of running code when we load notmuch*.el, as this has a history of causing problems for people using notmuch e.g. as a search tool for gnus. The completion interface takes a little getting used to, but it's definitely usable. And of course much better than what we have now ;). One thing I noticed which I _think_ is a bug, is that after calling notmuch-show-resend-message, notmuch-view-raw-message doesn't work anymore, complaining about a read-only buffer. d ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
[PATCH 1/1] Store and search for canonical Unicode text [WIP]
WARNING: this version is very preliminary, and might eat your data. Unicode has multiple sequences representing what should normally be considered the same text. For example here's a combining AÌ and a noncombining Ã. Depending on the way you view this, you may or may not see a difference, but the former is the canonical form, and is represented by two Unicode code points: a capital A (U+0041) followed by a "combining acute accent" (U+0301); the latter is the single code point (U+00C1), which is probably what most people would type. Before this change, notmuch would index two strings that differ only with respect to canonicalization, like toÌken and tóken, as separate terms, even though they may be visually indistinguishable, and do (for most purposes) represent the same text. After indexing, searching for one would not find the other, and which one you present to notmuch when you search depends on your tools. See test/T570-normalization.sh for a working example. Since we're talking about differing representations that one wouldn't normally want to distinguish, this patch unifies the various representations by converting all incoming text to its canonical form before indexing, and canonicalizing all query strings. Up to now, notmuch has let Xapian handle converting the incoming bytes to UTF-8. Xapian treats any byte sequence as UTF-8, and interprets any invalid UTF-8 bytes as Latin-1. This patch maintains the existing behavior (excepting the new canonicalization) by using Xapian's Utf8Iterator to handle the initial Unicode character parsing. Note that the parsing approach in this patch is not particularly efficient, both because it traverses the incoming bytes three times: - once to determine how long the input is (currently the iterator can't directly handle null terminated char*'s), - once to determine how long the final UTF-8 allocation needs to be, - and once for the conversion. And because when the input is already UTF-8, it just blindly converts from UTF-8 to Unicode code points, and then back to UTF-8 (after canonicalization), during each pass. There are certainly opportunities to optimize, though it may be worth discussing the detection of data encodings more broadly first. FIXME: document current encoding behavior clearly in new/insert/search-terms. FIXME: what about existing indexed text? --- Posted for preliminary discussion, and as a milestone (it appears to mostly work now). Though I doubt I'm handling things correctly everywhere notmuch-wise, wrt talloc, etc. lib/Makefile.local | 1 + lib/database.cc| 17 -- lib/message.cc | 51 +++- lib/notmuch.h | 3 ++ lib/query.cc | 6 ++-- lib/text-util.cc | 82 ++ test/Makefile.local| 10 -- test/T150-tagging.sh | 54 +++--- test/T240-dump-restore.sh | 4 +-- test/T480-hex-escaping.sh | 4 +-- test/T570-normalization.sh | 28 test/corpus/cur/52:2, | 6 ++-- test/to-utf8.c | 44 + 13 files changed, 267 insertions(+), 43 deletions(-) create mode 100644 lib/text-util.cc create mode 100755 test/T570-normalization.sh create mode 100644 test/to-utf8.c diff --git a/lib/Makefile.local b/lib/Makefile.local index 3a07090..41fd1e1 100644 --- a/lib/Makefile.local +++ b/lib/Makefile.local @@ -48,6 +48,7 @@ libnotmuch_cxx_srcs = \ $(dir)/index.cc \ $(dir)/message.cc \ $(dir)/query.cc \ + $(dir)/text-util.cc \ $(dir)/thread.cc libnotmuch_modules := $(libnotmuch_c_srcs:.c=.o) $(libnotmuch_cxx_srcs:.cc=.o) diff --git a/lib/database.cc b/lib/database.cc index 6a15174..7a01f95 100644 --- a/lib/database.cc +++ b/lib/database.cc @@ -436,6 +436,7 @@ find_document_for_doc_id (notmuch_database_t *notmuch, unsigned doc_id) char * _notmuch_message_id_compressed (void *ctx, const char *message_id) { +// Assumes message_id is normalized utf-8. char *sha1, *compressed; sha1 = _notmuch_sha1_of_string (message_id); @@ -457,12 +458,20 @@ notmuch_database_find_message (notmuch_database_t *notmuch, if (message_ret == NULL) return NOTMUCH_STATUS_NULL_POINTER; -if (strlen (message_id) > NOTMUCH_MESSAGE_ID_MAX) - message_id = _notmuch_message_id_compressed (notmuch, message_id); +const char *u8_id = notmuch_bytes_to_utf8 (notmuch, message_id, -1); + +// Is strlen still appropriate? +if (strlen (u8_id) > NOTMUCH_MESSAGE_ID_MAX) +{ + message_id = _notmuch_message_id_compressed (notmuch, u8_id); + talloc_free ((char *) u8_id); +} else + message_id = u8_id; try { status = _notmuch_database_find_unique_doc_id (notmuch, "id", message_id, &doc_id); + talloc_free ((char *
Re: [PATCH 1/2] emacs: add defsubst notmuch-address--message-insinuated
On Sun, Aug 30 2015, David Bremner wrote: > Tomi Ollila writes: > >> +(defsubst notmuch-address--message-insinuated () >> + (memq notmuch-address-message-alist-member message-completion-alist)) >> + > > Is there some advantage to defsubst other than (maybe?) performance? It > just seems like one more construct for people to get up to speed with > the codebase. No advantage. The reason for defsubst may originally be that I did it for least execution chance -- and iirc I was reading some (gnus?) code that had quite a few defsubsts which infected my coding. Now that I checked there is no (other) defsubsts code in notmuch-emacs... so that can be just be changed to `defun' to be consistent in that sense. > > d Tomi ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH 2/2] emacs: add function to resend message to new recipients
On Sun, Aug 30 2015, David Bremner wrote: > Tomi Ollila writes: > >> The new function notmuch-show-message-resend re-sends >> message to new recipients using #'message-resend. >> >> Recipients are read from minibuffer as a comma-separated >> string (with some keyboard support including tab completion). >> > > I couldn't get the tab completion to work, at least if I go > > M-x notmuch-show-resend-message > > nor when evaluating (notmuch-address-from-minibuffer "foo:") > > Do I need to bind a key to test this? Nope. both of the above should work -- and worked for me just now (this is how I tested just now: emacs -q -L $PWD/emacs -l emacs/notmuch.el -f notmuch --eval '(progn (setq notmuch-address-command "nottoomuch-addresses.sh") (notmuch-address-message-insinuate))' ) >> I remember that Emacs VM might have had 'b' bound to this functionality >> but I cannot be sure. A few weeks ago I looked gnus, rmail & mh-e to >> figure out whether 'b' would have been bound to similar functionality >> there but I cannot find it... > > mutt uses 'b'. AFAICT, gnus uses some sequence ending in b to resend > bounced messages (i.e. from mailer-daemon). > > quoting the manual: > > S D b > > If you have sent a mail, but the mail was bounced back to you for > some reason (wrong address, transient failure), you can use this > command to resend that bounced mail > (gnus-summary-resend-bounced-mail). [...] > > S D r > > Not to be confused with the previous command, > gnus-summary-resend-message will prompt you for an address to send > the current message off to, and then send it to that place. [...] > > Be that as it may, we already use 'r' for reply, and I'm not sure we > want to go (more) in the way of multi-letter sequences. I agree that we don't want to to multi-letter sequences (early). I personally would be fine w/ 'b'... Tomi ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH 2/2] emacs: add function to resend message to new recipients
Tomi Ollila writes: > The new function notmuch-show-message-resend re-sends > message to new recipients using #'message-resend. > > Recipients are read from minibuffer as a comma-separated > string (with some keyboard support including tab completion). > I couldn't get the tab completion to work, at least if I go M-x notmuch-show-resend-message nor when evaluating (notmuch-address-from-minibuffer "foo:") Do I need to bind a key to test this? > I remember that Emacs VM might have had 'b' bound to this functionality > but I cannot be sure. A few weeks ago I looked gnus, rmail & mh-e to > figure out whether 'b' would have been bound to similar functionality > there but I cannot find it... mutt uses 'b'. AFAICT, gnus uses some sequence ending in b to resend bounced messages (i.e. from mailer-daemon). quoting the manual: S D b If you have sent a mail, but the mail was bounced back to you for some reason (wrong address, transient failure), you can use this command to resend that bounced mail (gnus-summary-resend-bounced-mail). [...] S D r Not to be confused with the previous command, gnus-summary-resend-message will prompt you for an address to send the current message off to, and then send it to that place. [...] Be that as it may, we already use 'r' for reply, and I'm not sure we want to go (more) in the way of multi-letter sequences. ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH 1/2] emacs: add defsubst notmuch-address--message-insinuated
Tomi Ollila writes: > +(defsubst notmuch-address--message-insinuated () > + (memq notmuch-address-message-alist-member message-completion-alist)) > + Is there some advantage to defsubst other than (maybe?) performance? It just seems like one more construct for people to get up to speed with the codebase. d ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [RFC PATCH 5/5] cli: add support for deduplicating based on case insensitive address
Jani Nikula writes: > Consider all variants of an email address as one, and print the most > common variant. Other than the quibbles already mentioned, the series looks ok to me. For production it should have one or two tests I guess. Oh, and man page updates. But you knew that I guess. I can live with the current argument syntax, but since a certain a mount of bikeshedding is expected, here is an alternative suggestion --deduplication={none|mailbox|address} ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [RFC PATCH 4/5] cli: change the data structure for notmuch address deduplication
Jani Nikula writes: >> I found this use of mailbox as a temporary variable confusing; despite >> the obvious return I thought it might have something to do with the >> g_list_append below. Maybe just make a block scope temporary variable? > > This is how the function would turn out with that. Better, I guess? I > also tried to think of ways to combine the two g_list_append paths here, > but in the end doing it like this has most clarity I think. > Your (new) version is fine for me. As it happens I was thinking of a smaller tweak: l = g_list_find_custom (list, mailbox, mailbox_compare); if (l) { mailbox_t *found; talloc_free (mailbox); found = l->data; found->count++; return TRUE; } ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [RFC PATCH 4/5] cli: change the data structure for notmuch address deduplication
On Sun, 30 Aug 2015, David Bremner wrote: > I guess this doesn't make the error handling worse; both old and new > code silently ignore OOM if I understand correctly. Oh, and current git will not silently ignore OOM. It will segfault... ;) BR, Jani. ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [RFC PATCH 4/5] cli: change the data structure for notmuch address deduplication
On Sun, 30 Aug 2015, David Bremner wrote: > Jani Nikula writes: > >> >> +static int >> +strcase_equal (const void *a, const void *b) >> +{ >> +return strcasecmp (a, b) == 0; >> +} >> + >> +static unsigned int >> +strcase_hash (const void *ptr) >> +{ >> +const char *s = ptr; >> + >> +/* This is the djb2 hash. */ >> +unsigned int hash = 5381; >> +while (s && *s) { >> +hash = ((hash << 5) + hash) + tolower (*s); >> +s++; >> +} >> + >> +return hash; >> +} >> + > > as discussed, these functions probably need to be factored out into > libutil. Ack. > >> +l = g_list_find_custom (list, mailbox, mailbox_compare); >> +if (l) { >> +talloc_free (mailbox); >> +mailbox = l->data; >> +mailbox->count++; >> +return TRUE; >> +} > > I found this use of mailbox as a temporary variable confusing; despite > the obvious return I thought it might have something to do with the > g_list_append below. Maybe just make a block scope temporary variable? This is how the function would turn out with that. Better, I guess? I also tried to think of ways to combine the two g_list_append paths here, but in the end doing it like this has most clarity I think. static notmuch_bool_t is_duplicate (const search_context_t *ctx, const char *name, const char *addr) { char *key; GList *list, *l; mailbox_t *mailbox; list = g_hash_table_lookup (ctx->addresses, addr); if (list) { mailbox_t find = { .name = name, .addr = addr, }; l = g_list_find_custom (list, &find, mailbox_compare); if (l) { mailbox = l->data; mailbox->count++; return TRUE; } mailbox = new_mailbox (ctx->format, name, addr); if (! mailbox) return FALSE; g_list_append (list, mailbox); return FALSE; } key = talloc_strdup (ctx->format, addr); if (! key) return FALSE; mailbox = new_mailbox (ctx->format, name, addr); if (! mailbox) return FALSE; list = g_list_append (NULL, mailbox); if (! list) return FALSE; g_hash_table_insert (ctx->addresses, key, list); return FALSE; } >> + >> +g_list_append (list, mailbox); >> +return FALSE; >> } >> >> -mailbox = new_mailbox (ctx->format, name, addr); >> -if (! mailbox) >> +key = talloc_strdup (ctx->format, addr); >> +if (! key) >> return FALSE; > > I guess this doesn't make the error handling worse; both old and new > code silently ignore OOM if I understand correctly. Do you happen to > understand the original choice of using ctx->format rather than that > ctx->notmuch for a talloc parent? it doesn't seem to get deallocated any > earlier. I don't know or understand that part of the history. It doesn't really matter though because the deallocation is explicitly done on all keys and values via g_hash_table_unref. BR, Jani. ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [RFC PATCH 3/5] cli: add support for not deduplicating notmuch address results
On Sun, 30 Aug 2015, David Bremner wrote: > Jani Nikula writes: > > >> +{ NOTMUCH_OPT_KEYWORD, &ctx->dupe, "deduplicate", 'x', > > probably you want 'D' or 'd' here. Not that it makes a practical > difference at this point. > >> + (notmuch_keyword_t []){ { "yes", -1 }, > > I'm not very enthusiastic about reusing ctx->dupe for this. > In particular the use of -1 for yes is off-putting > It seems better to allocate another word of memory and use an > enum, as elsewhere. Agreed on both. BR, Jani. ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch