Re: [PATCH 1/1] Store and search for canonical Unicode text [WIP]
David Bremner writes: > One way to break this up into more bite sized pieces would be to first > create one or more tests that fail with current notmuch, and mark those > as broken. Right - for the moment I just wanted to post what I had for consideration. I didn't want to spend too much more time on the approach if was uninteresting/inappropriate. One simple place to start might be the included T570-normalization.sh. Though perhaps that should be "canonicalization"? > Can you explain why notmuch is the right place to do this, and not > Xapian? I know we talked back and forth about this, but I never really > got a solid sense of what the conclusion was. Is it just dependencies? I have no strong opinion there, but to do the work in Xapian will require a new release at a minimum, and likely new dependencies. And generally speaking, I suppose I have a suspicion that application needs with respect to encoding "detection", tokenization, stemming, stop words, synonyms, phrase detection, etc. may be domain specific and complex enough that Xapian won't want to try to accommodate the broad array of possibilities, at least not in its core library. Though it might try to handle some or all of that by providing suitable customizability (presumably via callbacks or subclassing or...). And since I'm new to Xapian, I'm not completely sure what's already available. > It seems plausible to specify UTF-8 input for the library, but what > about the CLI? It seems like the canonicalization operation increases > the chance of mangling user input in non-UTF-8 locales. Yes, the key question: what does notmuch intend? i.e. given a sequence of bytes, how will notmuch interpret them? I think we should decide that, and document it clearly somewhere. The commit message describes my understanding of how things currently work, and if/when I get time, I'd like to propose some related documentation updates (perhaps to notmuch-search-terms or notmuch-insert/new?). Oh, and if I do understand things correctly, notmuch may already stand a chance of mangling any bytes that aren't an invalid UTF-8 byte sequence, but also aren't actually in UTF-8 (excepting encodings that are a strict subset of UTF-8, like ASCII). For example (if I did this right), [0xd1 0xa1] is valid UTF-8, producing omega "ѡ", and also valid Latin-1, producing "Ñ¡". > I suppose some upgrade code to canonicalize all the terms? That sounds > pretty slow. Perhaps, or I suppose you could just document that older indexed data might not be canonicalized, and that you should reindex if that matters to you. Although I suppose anyone with affected characters might well want to reindex if the canonical form isn't the one people normally receive (which seemed possible). Hmm, another question -- for terms, does notmuch store ordinal positions, Unicode character offsets, input byte offsets, or...? Canonicalization will of course change the latter. I imagine it might be possible to traverse the index terms and just detect and merge those affected, but no idea if that would be reasonable. > I really didn't look at the code very closely, but there were a > surprising number of calls to talloc_free. But those kind of details can > wait. Right, I wasn't sure what the policies were, so in most cases, I just tried to release the data when it was no longer needed. Thanks -- Rob Browning rlb @defaultvalue.org and @debian.org GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH 1/1] Store and search for canonical Unicode text [WIP]
Rob Browning writes: > > Before this change, notmuch would index two strings that differ only > with respect to canonicalization, like tóken and tóken, as separate > terms, even though they may be visually indistinguishable, and do (for > most purposes) represent the same text. After indexing, searching for > one would not find the other, and which one you present to notmuch > when you search depends on your tools. See test/T570-normalization.sh > for a working example. One way to break this up into more bite sized pieces would be to first create one or more tests that fail with current notmuch, and mark those as broken. > Up to now, notmuch has let Xapian handle converting the incoming bytes > to UTF-8. Xapian treats any byte sequence as UTF-8, and interprets > any invalid UTF-8 bytes as Latin-1. This patch maintains the existing > behavior (excepting the new canonicalization) by using Xapian's > Utf8Iterator to handle the initial Unicode character parsing. Can you explain why notmuch is the right place to do this, and not Xapian? I know we talked back and forth about this, but I never really got a solid sense of what the conclusion was. Is it just dependencies? > And because when the input is already UTF-8, it just blindly converts > from UTF-8 to Unicode code points, and then back to UTF-8 (after > canonicalization), during each pass. There are certainly > opportunities to optimize, though it may be worth discussing the > detection of data encodings more broadly first. It seems plausible to specify UTF-8 input for the library, but what about the CLI? It seems like the canonicalization operation increases the chance of mangling user input in non-UTF-8 locales. > FIXME: what about existing indexed text? I suppose some upgrade code to canonicalize all the terms? That sounds pretty slow. > --- > > Posted for preliminary discussion, and as a milestone (it appears to > mostly work now). Though I doubt I'm handling things correctly > everywhere notmuch-wise, wrt talloc, etc. I really didn't look at the code very closely, but there were a surprising number of calls to talloc_free. But those kind of details can wait. ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: muchsync files renames
Amadeusz Żołnowski writes: > When I have added 'unread' tag the file was still in new/. Only after > removing 'unread' afterwards the file has been moved to cur/. The unread tag corresponds to the *absence* of the ,S flag, so if you don't add unread at notmuch new, tagging it unread later is effectively a no-op from the point of view of maildir-flag synching. I guess the part that is optional is moving from new/foo to cur/foo:2, . I believe we used to be more aggressive about doing this, but mutt users complained. > So it seems you're right, but take a look at the following excerpt > from T340-maildir-sync.sh: > [...] > What is different about the test case and my case is that my mail file > doesn't have ":2," suffix. Adding the suffix to file name makes it > working as expect by test case. I see I would have to convert my mail > files names, but I think this inconsistency in notmuch should also take > some attention. Have a look at http://cr.yp.to/proto/maildir.html http://www.qmail.org/man/man5/maildir.html I don't think messages in new are supposed to have : in their names. So this test is dealing with a corner case of some out-of-spec MUA writing :info onto the filename. So I don't think adding a suffix is the right thing to do here. It also seems like leaving a message in new/ when tagging it as unread is a reasonable option. The gory details (per David's earlier request) are in _new_maildir_filename in lib/message.cc. d ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: muchsync files renames
David Bremner writes: > If I understand the code correctly, this movement will only happen > when one of the maildir-flag-equivalent tags is changed. I haven't dug > ack through the archives, but I think mutt uses presence in new/ as > some kind of extra unseen state, so people requested not to move files > until needed. When I have added 'unread' tag the file was still in new/. Only after removing 'unread' afterwards the file has been moved to cur/. So it seems you're right, but take a look at the following excerpt from T340-maildir-sync.sh: test_begin_subtest "Message in new with maildir info is moved to cur on any tag change" add_message [filename]='message-with-info-to-be-moved-to-cur:2,' [dir]=new notmuch tag +anytag id:$gen_msg_id output=$(cd "$MAIL_DIR"; ls */message-with-info-to-be-moved-to-cur*) test_expect_equal "$output" "cur/message-with-info-to-be-moved-to-cur:2," What is different about the test case and my case is that my mail file doesn't have ":2," suffix. Adding the suffix to file name makes it working as expect by test case. I see I would have to convert my mail files names, but I think this inconsistency in notmuch should also take some attention. -- Amadeusz Żołnowski signature.asc Description: PGP signature ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: synchronize_flags leaving files in new (was muchsync files renames)
David Mazieres expires 2015-11-30 PST writes: >> $ notmuch tag +hey thread:000108bf >> $ notmuch search thread:000108bf >> thread:000108bf Yest. 11:58 [1/1] Somebody; Subject (hey reklama >> unread) >> $ notmuch search --output=files thread:000108bf >> /home/aidecoe/Mail/aidecoe/2015/new/1441022521.M714465P23412VFE04I00141A38_0.freja,S=53857 > > First, just to be absolutely sure, doe the file exist? 100% sure. > Second, I wonder if the ",S=53857" suffix (size) is throwing things > off. Perhaps libnotmuch only expects suffixes when they are of the > form ",S=53857:2,." Since muchsync does not add a size field > (in either the new or cur subdirectory), it could be leading to the > different behavior. To test this, can you rename the file in the new > directory without the ",S=..." and see if the behavior changes? ,S=... doesn't have any impact. I have tested it. -- Amadeusz Żołnowski signature.asc Description: PGP signature ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
[PATCH 2/4] emacs: add function notmuch-address--message-insinuated
This function is currently used in notmuch-address-message-insinuate (to not enable address completion if it is already enabled). In near future this will be called in other functions to know whether address completion can be used there, too. --- Since id:1440619626-18768-1-git-send-email-tomi.oll...@iki.fi - changed defsubst to defun emacs/notmuch-address.el | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/emacs/notmuch-address.el b/emacs/notmuch-address.el index fde3c1b2b861..8982a415ce11 100644 --- a/emacs/notmuch-address.el +++ b/emacs/notmuch-address.el @@ -54,8 +54,11 @@ (defvar notmuch-address-message-alist-member (defvar notmuch-address-history nil) +(defun notmuch-address--message-insinuated () + (memq notmuch-address-message-alist-member message-completion-alist)) + (defun notmuch-address-message-insinuate () - (unless (memq notmuch-address-message-alist-member message-completion-alist) + (unless (notmuch-address--message-insinuated) (setq message-completion-alist (push notmuch-address-message-alist-member message-completion-alist -- 2.0.0 ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
[PATCH 3/4] emacs: add function to resend message to new recipients
The new function notmuch-show-message-resend re-sends message to new recipients using #'message-resend. Recipients are read from minibuffer as a comma-separated string (with some keyboard support including tab completion). Final confirmation before sending is asked. --- Since id:1440619626-18768-2-git-send-email-tomi.oll...@iki.fi - changed (bury-buffer) to (notmuch-bury-or-kill-this-buffer) - it is hard to have the buffer been kept around but it is posiible emacs/notmuch-address.el | 19 +++ emacs/notmuch-show.el| 8 2 files changed, 27 insertions(+) diff --git a/emacs/notmuch-address.el b/emacs/notmuch-address.el index 8982a415ce11..83788efd3c1b 100644 --- a/emacs/notmuch-address.el +++ b/emacs/notmuch-address.el @@ -119,4 +119,23 @@ (defun notmuch-address-locate-command (command) ;; +(defun notmuch-address-from-minibuffer (prompt) + (if (not (notmuch-address--message-insinuated)) + (read-string prompt) +(let ((rmap (copy-keymap minibuffer-local-map)) + (omap minibuffer-local-map)) + ;; Configure TAB to start completion when executing read-string. + ;; "Original" minibuffer keymap is restored just before calling + ;; notmuch-address-expand-name as it may also use minibuffer-local-map + ;; (completing-read probably does not but if something else is used there). + (define-key rmap "\C-i" (lambda () ;; TAB + (interactive) + (let ((enable-recursive-minibuffers t) +(minibuffer-local-map omap)) +(notmuch-address-expand-name + (let ((minibuffer-local-map rmap)) + (read-string prompt) + +;; + (provide 'notmuch-address) diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el index 0565ab0725b2..046cb0e41f0b 100644 --- a/emacs/notmuch-show.el +++ b/emacs/notmuch-show.el @@ -1806,6 +1806,14 @@ (defun notmuch-show-forward-message (&optional prompt-for-sender) (with-current-notmuch-show-message (notmuch-mua-new-forward-message prompt-for-sender))) +(defun notmuch-show-resend-message (addresses) + "Resend the current message." + (interactive (list (notmuch-address-from-minibuffer "Resend to: "))) + (when (yes-or-no-p (concat "Confirm resend to " addresses " ")) +(notmuch-show-view-raw-message) +(message-resend addresses) +(notmuch-bury-or-kill-this-buffer))) + (defun notmuch-show-next-message (&optional pop-at-end) "Show the next message. -- 2.0.0 ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
[PATCH 4/4] emacs: bind notmuch-show-resend-message to 'b' in notmuch-show mode
This binding is similar to mutt's, which is bind {mode} b "bounce-message"# remail a message to another user where {mode} is 'index', 'pager' or 'attach'. --- emacs/notmuch-show.el | 1 + 1 file changed, 1 insertion(+) diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el index 046cb0e41f0b..e7e381eecc42 100644 --- a/emacs/notmuch-show.el +++ b/emacs/notmuch-show.el @@ -1373,6 +1373,7 @@ (defvar notmuch-show-mode-map (define-key map (kbd "") 'notmuch-show-previous-button) (define-key map (kbd "TAB") 'notmuch-show-next-button) (define-key map "f" 'notmuch-show-forward-message) +(define-key map "b" 'notmuch-show-resend-message) (define-key map "l" 'notmuch-show-filter-thread) (define-key map "r" 'notmuch-show-reply-sender) (define-key map "R" 'notmuch-show-reply) -- 2.0.0 ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
[PATCH 1/4] emacs: notmuch-show-view-raw-message clears buffer, makes it read-only
notmuch-show-view-raw-message() re-uses buffer created with same name (same Message-Id:) but it did not erase it before filling. If this ever happened, there were duplicated (potentially overlapping) content in the buffer. Now this is fixed. Apparently since emacs 24.5 the (view-buffer) makes the buffer read-only; so this problem would not have happened there, just that notmuch-show-view-raw-message() failed. This is fixed by setting inhibit-read-only t before erasing and filling the buffer. The emacs 24.5 feature having raw message buffer read-only is also now explicitly set to the buffer so the same experience is available with emaces < 24.5. --- emacs/notmuch-show.el | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el index 848ec2c870c4..0565ab0725b2 100644 --- a/emacs/notmuch-show.el +++ b/emacs/notmuch-show.el @@ -1886,12 +1886,15 @@ (defun notmuch-show-view-raw-message () "View the original source of the current message." (interactive) (let* ((id (notmuch-show-get-message-id)) -(buf (get-buffer-create (concat "*notmuch-raw-" id "*" -(let ((coding-system-for-read 'no-conversion)) - (call-process notmuch-command nil buf nil "show" "--format=raw" id)) +(buf (get-buffer-create (concat "*notmuch-raw-" id "*"))) +(inhibit-read-only t)) (switch-to-buffer buf) +(erase-buffer) +(let ((coding-system-for-read 'no-conversion)) + (call-process notmuch-command nil t nil "show" "--format=raw" id)) (goto-char (point-min)) (set-buffer-modified-p nil) +(setq buffer-read-only t) (view-buffer buf 'kill-buffer-if-not-modified))) (put 'notmuch-show-pipe-message 'notmuch-doc -- 2.0.0 ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch