[PATCH 0/7] doc: Python 3 compat, rst2man.py support, etc.

2014-04-21 Thread David Bremner
"W. Trevor King"  writes:

> I just bumped into this today while testing v2 of my
> content-description series:
>

pushed patches 1-4

d


[PATCH 1/1] emacs: hello: bugfix for saved searches defcustom

2014-04-21 Thread David Bremner
Mark Walters  writes:

> The recent changes for saved searches introduced a bug when notmuch
> was loaded after the saved search was defined. This was caused by a
> utility function not being defined when the defcustom was loaded.

pushed.

d


[PATCH v2 0/4] doc: notmuch-show improvements

2014-04-21 Thread David Bremner
Austin Clements  writes:

> This is v2 of id:1397834332-25175-1-git-send-email-amdragon at mit.edu.
> It expands the explanation of "non-MIME" message parts and moves it to
> the --part documentation.

pushed.

d


[PATCH v2] test: Test thread linking in all possible delivery orders

2014-04-21 Thread Austin Clements
These tests deliver all possible (single-root) four-message threads in
all possible orders and check that notmuch successfully links them
into threads.

There are two variants of the test: one delivers messages that
reference only their immediate parent and the other delivers messages
that reference all of their parents.  The latter test is currently
known-broken.

This is introduced as a new test (rather than just adding it to
T050-new) because it's much easier for this to start with an empty
database.
---

This version hopefully addresses David's comments in
id:87y4zhfmrn.fsf at maritornes.cs.unb.ca and adds a second test that
demonstrates the bug Mark in figured out in
id:8738h7kv2q.fsf at qmul.ac.uk.

test/T051-new-linking.sh | 91 
 1 file changed, 91 insertions(+)
 create mode 100755 test/T051-new-linking.sh

diff --git a/test/T051-new-linking.sh b/test/T051-new-linking.sh
new file mode 100755
index 000..9ccbc52
--- /dev/null
+++ b/test/T051-new-linking.sh
@@ -0,0 +1,91 @@
+#!/usr/bin/env bash
+test_description='"notmuch new" thread linking'
+
+. ./test-lib.sh
+
+# Generate all possible single-root four message thread structures.
+# Each line in THREADS is a thread structure, where the n'th field is
+# the parent of message n.  We'll use this for multiple tests below.
+THREADS=$(python -c '
+def mkTrees(free, tree={}):
+if free == set():
+print(" ".join(map(str, [msg[1] for msg in sorted(tree.items())])))
+return
+# Attach each free message to each message in the tree (if there is
+# no tree, make the free message the root), backtracking after each
+for msg in sorted(free):
+parents = sorted(tree.keys()) if tree else ["none"]
+for parent in parents:
+ntree = tree.copy()
+ntree[msg] = parent
+mkTrees(free - set([msg]), ntree)
+mkTrees(set(range(4)))')
+nthreads=$(wc -l <<< "$THREADS")
+
+test_begin_subtest "All four-message threads get linked in all delivery orders 
(one parent)"
+# In the first variant, this delivers messages that reference only
+# their immediate parent.  Hence, we should only expect threads to be
+# fully joined at the end.
+for ((n = 0; n < 4; n++)); do
+# Deliver the n'th message of every thread
+thread=0
+while read -a parents; do
+parent=${parents[$n]}
+generate_message \
+[id]=m$n at t$thread [in-reply-to]="\" \
+[subject]=p$thread [from]=m$n
+thread=$((thread + 1))
+done <<< "$THREADS"
+notmuch new > /dev/null
+done
+output=$(notmuch search --sort=newest-first '*' | notmuch_search_sanitize)
+expected=$(for ((i = 0; i < $nthreads; i++)); do
+echo "thread:XXX   2001-01-05 [4/4] m3, m2, m1, m0; p$i (inbox unread)"
+done)
+test_expect_equal "$output" "$expected"
+
+test_begin_subtest "The same (full parent linkage)"
+test_subtest_known_broken
+# Here we do the same thing as the previous test, but each message
+# references all of its parents.  Since every message references the
+# root of the thread, each thread should always be fully joined.  This
+# is currently broken because of the bug detailed in
+# id:8738h7kv2q.fsf at qmul.ac.uk.
+rm ${MAIL_DIR}/*
+notmuch new
+output=""
+expected=""
+for ((n = 0; n < 4; n++)); do
+# Deliver the n'th message of every thread
+thread=0
+while read -a parents; do
+references=""
+parent=${parents[$n]}
+while [[ $parent != none ]]; do
+references=" $references"
+parent=${parents[$parent]}
+done
+
+generate_message \
+[id]=m$n at t$thread [references]="'$references'" \
+[subject]=p$thread [from]=m$n
+thread=$((thread + 1))
+done <<< "$THREADS"
+notmuch new > /dev/null
+
+output="$output
+$(notmuch search --sort=newest-first '*' | notmuch_search_sanitize)"
+
+# Construct expected output
+template="thread:XXX   2001-01-05 [$((n+1))/$((n+1))]"
+for ((m = n; m > 0; m--)); do
+template="$template m$m,"
+done
+expected="$expected
+$(for ((i = 0; i < $nthreads; i++)); do
+echo "$template m0; p$i (inbox unread)"
+done)"
+done
+test_expect_equal "$output" "$expected"
+
+test_done
-- 
1.9.1



[PATCH 11/11] emacs: Support cid: references with shr renderer

2014-04-21 Thread Austin Clements
shr has really nice support for inline image rendering, but previously
we only had the hooks for w3m cid: references.
---
 emacs/notmuch-show.el | 41 +
 1 file changed, 33 insertions(+), 8 deletions(-)

diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index f758091..a489279 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -745,14 +745,39 @@ (defun 
notmuch-show-get-mime-type-of-application/octet-stream (part)
  nil

 (defun notmuch-show-insert-part-text/html (msg part content-type nth depth 
button)
-  ;; text/html handler to work around bugs in renderers and our
-  ;; invisibile parts code. In particular w3m sets up a keymap which
-  ;; "leaks" outside the invisible region and causes strange effects
-  ;; in notmuch. We set mm-inline-text-html-with-w3m-keymap to nil to
-  ;; tell w3m not to set a keymap (so the normal notmuch-show-mode-map
-  ;; remains).
-  (let ((mm-inline-text-html-with-w3m-keymap nil))
-(notmuch-show-insert-part-*/* msg part content-type nth depth button)))
+  (if (eq mm-text-html-renderer 'shr)
+  ;; It's easier to drive shr ourselves than to work around the
+  ;; goofy things `mm-shr' does (like irreversibly taking over
+  ;; content ID handling).
+  (notmuch-show--insert-part-text/html-shr msg part)
+;; Otherwise, let message-mode do the heavy lifting
+;;
+;; w3m sets up a keymap which "leaks" outside the invisible region
+;; and causes strange effects in notmuch. We set
+;; mm-inline-text-html-with-w3m-keymap to nil to tell w3m not to
+;; set a keymap (so the normal notmuch-show-mode-map remains).
+(let ((mm-inline-text-html-with-w3m-keymap nil))
+  (notmuch-show-insert-part-*/* msg part content-type nth depth button
+
+(defun notmuch-show--insert-part-text/html-shr (msg part)
+  ;; Make sure shr is loaded before we start let-binding its globals
+  (require 'shr)
+  (let ((dom (let (process-crypto notmuch-show-process-crypto)
+  (with-temp-buffer
+(insert (notmuch-get-bodypart-text msg part process-crypto))
+(libxml-parse-html-region (point-min) (point-max)
+   (shr-content-function
+(lambda (url)
+  ;; shr strips the "cid:" part of URL, but doesn't
+  ;; URL-decode it (see RFC 2392).
+  (let ((cid (url-unhex-string url)))
+(first (notmuch-show--get-cid-content cid)
+   ;; Block all external images to prevent privacy leaks and
+   ;; potential attacks.  FIXME: If we block an image, offer a
+   ;; button to load external images.
+   (shr-blocked-images "."))
+(shr-insert-document dom)
+t))

 (defun notmuch-show-insert-part-*/* (msg part content-type nth depth button)
   ;; This handler _must_ succeed - it is the handler of last resort.
-- 
1.9.1



[PATCH 10/11] emacs: Rewrite content ID handling

2014-04-21 Thread Austin Clements
Besides generally cleaning up the code and separating the general
content ID handling from the w3m-specific code, this fixes several
problems.

Foremost is that, previously, the code roughly assumed that referenced
parts would be in the same multipart/related as the reference.
According to RFC 2392, nothing could be further from the truth:
content IDs are supposed to be globally unique and globally
addressable.  This is nonsense, but this patch at least fixes things
so content IDs can be anywhere in the same message.

As a side-effect of the above, this handles multipart/alternate
content-IDs more in line with RFC 2046 section 5.1.2 (not that I've
ever seen this in the wild).  This also properly URL-decodes cid:
URLs, as per RFC 2392 (the previous code did not), and applies crypto
settings from the show buffer (the previous code used the global
crypto settings).
---
 emacs/notmuch-show.el | 120 +++---
 1 file changed, 74 insertions(+), 46 deletions(-)

diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index 9411c9a..f758091 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -503,6 +503,73 @@ (defun notmuch-show-toggle-part-invisibility ( 
button)
  (overlay-put overlay 'invisible (not show))
  t)

+;; Part content ID handling
+
+(defvar notmuch-show--cids nil
+  "Alist from raw content ID to (MSG PART).")
+(make-variable-buffer-local 'notmuch-show--cids)
+
+(defun notmuch-show--register-cids (msg part)
+  "Register content-IDs in PART and all of PART's sub-parts."
+  (let ((content-id (plist-get part :content-id)))
+(when content-id
+  ;; Note that content-IDs are globally unique, except when they
+  ;; aren't: RFC 2046 section 5.1.4 permits children of a
+  ;; multipart/alternative to have the same content-ID, in which
+  ;; case the MUA is supposed to pick the best one it can render.
+  ;; We simply add the content-ID to the beginning of our alist;
+  ;; so if this happens, we'll take the last (and "best")
+  ;; alternative (even if we can't render it).
+  (push (list content-id msg part) notmuch-show--cids)))
+  ;; Recurse on sub-parts
+  (let ((ctype (notmuch-split-content-type
+   (downcase (plist-get part :content-type)
+(cond ((equal (first ctype) "multipart")
+  (mapc (apply-partially #'notmuch-show--register-cids msg)
+(plist-get part :content)))
+ ((equal ctype '("message" "rfc822"))
+  (notmuch-show--register-cids
+   msg
+   (first (plist-get (first (plist-get part :content)) :body)))
+
+(defun notmuch-show--get-cid-content (cid)
+  "Return a list (CID-content content-type) or nil.
+
+This will only find parts from messages that have been inserted
+into the current buffer.  CID must be a raw content ID, without
+enclosing angle brackets, a cid: prefix, or URL encoding.  This
+will return nil if the CID is unknown or cannot be retrieved."
+  (let ((descriptor (cdr (assoc cid notmuch-show--cids
+(when descriptor
+  (let* ((msg (first descriptor))
+(part (second descriptor))
+;; Request caching for this content, as some messages
+;; reference the same cid: part many times (hundreds!).
+(content (notmuch-get-bodypart-binary
+  msg part notmuch-show-process-crypto 'cache))
+(content-type (plist-get part :content-type)))
+   (list content content-type)
+
+(defun notmuch-show-setup-w3m ()
+  "Instruct w3m how to retrieve content from a \"related\" part of a message."
+  (interactive)
+  (if (boundp 'w3m-cid-retrieve-function-alist)
+(unless (assq 'notmuch-show-mode w3m-cid-retrieve-function-alist)
+  (push (cons 'notmuch-show-mode #'notmuch-show--cid-w3m-retrieve)
+   w3m-cid-retrieve-function-alist)))
+  (setq mm-inline-text-html-with-images t))
+
+(defvar w3m-current-buffer) ;; From `w3m.el'.
+(defun notmuch-show--cid-w3m-retrieve (url  args)
+  ;; url includes the cid: prefix and is URL encoded (see RFC 2392).
+  (let* ((cid (url-unhex-string (substring url 4)))
+(content-and-type
+ (with-current-buffer w3m-current-buffer
+   (notmuch-show--get-cid-content cid
+(when content-and-type
+  (insert (first content-and-type))
+  (second content-and-type
+
 ;; MIME part renderers

 (defun notmuch-show-multipart/*-to-list (part)
@@ -527,56 +594,11 @@ (defun notmuch-show-insert-part-multipart/alternative 
(msg part content-type nth
   (indent-rigidly start (point) 1)))
   t)

-(defun notmuch-show-setup-w3m ()
-  "Instruct w3m how to retrieve content from a \"related\" part of a message."
-  (interactive)
-  (if (boundp 'w3m-cid-retrieve-function-alist)
-(unless (assq 'notmuch-show-mode w3m-cid-retrieve-function-alist)
-  (push (cons 'notmuch-show-mode 'notmuch-show-w3m-cid-retrieve)
-   w3m-cid-retrieve-function-alist)))
-  (setq 

[PATCH 09/11] emacs: Use generalized content caching in w3m CID code

2014-04-21 Thread Austin Clements
Previously this did its own caching, but this is now supported by more
generally by `notmuch-get-bodypart-binary'.
---
 emacs/notmuch-show.el | 23 ---
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index 537c558..9411c9a 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -540,15 +540,14 @@ (defvar w3m-current-buffer) ;; From `w3m.el'.
 (defvar notmuch-show-w3m-cid-store nil)
 (make-variable-buffer-local 'notmuch-show-w3m-cid-store)

-(defun notmuch-show-w3m-cid-store-internal (content-id msg part content)
-  (push (list content-id msg part content)
-   notmuch-show-w3m-cid-store))
+(defun notmuch-show-w3m-cid-store-internal (content-id msg part)
+  (push (list content-id msg part) notmuch-show-w3m-cid-store))

 (defun notmuch-show-w3m-cid-store (msg part)
   (let ((content-id (plist-get part :content-id)))
 (when content-id
   (notmuch-show-w3m-cid-store-internal (concat "cid:" content-id)
-  msg part nil
+  msg part

 (defun notmuch-show-w3m-cid-retrieve (url  args)
   (let ((matching-part (with-current-buffer w3m-current-buffer
@@ -556,18 +555,12 @@ (defun notmuch-show-w3m-cid-retrieve (url  args)
 (if matching-part
(let* ((msg (nth 1 matching-part))
   (part (nth 2 matching-part))
-  (content (nth 3 matching-part))
   (content-type (plist-get part :content-type)))
- ;; If we don't already have the content, get it and cache
- ;; it, as some messages reference the same cid: part many
- ;; times (hundreds!), which results in many calls to
- ;; `notmuch part'.
- (unless content
-   (setq content (notmuch-get-bodypart-binary
-  msg part notmuch-show-process-crypto))
-   (with-current-buffer w3m-current-buffer
- (notmuch-show-w3m-cid-store-internal url msg part content)))
- (insert content)
+ ;; Request content caching, as some messages reference the
+ ;; same cid: part many times (hundreds!), which results in
+ ;; many calls to `notmuch show'.
+ (insert (notmuch-get-bodypart-binary
+  msg part notmuch-show-process-crypto 'cache))
  content-type)
   nil)))

-- 
1.9.1



[PATCH 08/11] emacs: Support caching in notmuch-get-bodypart-{binary, text}

2014-04-21 Thread Austin Clements
(The actual code change here is small, but requires re-indenting
existing code.)
---
 emacs/notmuch-lib.el | 52 ++--
 1 file changed, 30 insertions(+), 22 deletions(-)

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index fc67b14..fee8512 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -503,33 +503,39 @@ (defun notmuch-parts-filter-by-type (parts type)
(lambda (part) (notmuch-match-content-type (plist-get part :content-type) 
type))
parts))

-(defun notmuch-get-bodypart-binary (msg part process-crypto)
+(defun notmuch-get-bodypart-binary (msg part process-crypto  cache)
   "Return the unprocessed content of PART in MSG as a unibyte string.

 This returns the \"raw\" content of the given part after content
 transfer decoding, but with no further processing (see the
 discussion of --format=raw in man notmuch-show).  In particular,
 this does no charset conversion."
-  (let ((args `("show" "--format=raw"
-   ,(format "--part=%d" (plist-get part :id))
-   ,@(when process-crypto '("--decrypt"))
-   ,(notmuch-id-to-query (plist-get msg :id)
-(with-temp-buffer
-  ;; Emacs internally uses a UTF-8-like multibyte string
-  ;; representation by default (regardless of the coding system,
-  ;; which only affects how it goes from outside data to this
-  ;; internal representation).  This *almost* never matters.
-  ;; Annoyingly, it does matter if we use this data in an image
-  ;; descriptor, since Emacs will use its internal data buffer
-  ;; directly and this multibyte representation corrupts binary
-  ;; image formats.  Since the caller is asking for binary data, a
-  ;; unibyte string is a more appropriate representation anyway.
-  (set-buffer-multibyte nil)
-  (let ((coding-system-for-read 'no-conversion))
-   (apply #'call-process notmuch-command nil '(t nil) nil args)
-   (buffer-string)
-
-(defun notmuch-get-bodypart-text (msg part process-crypto)
+  (let ((data (plist-get part :binary-content)))
+(when (not data)
+  (let ((args `("show" "--format=raw"
+   ,(format "--part=%d" (plist-get part :id))
+   ,@(when process-crypto '("--decrypt"))
+   ,(notmuch-id-to-query (plist-get msg :id)
+   (with-temp-buffer
+ ;; Emacs internally uses a UTF-8-like multibyte string
+ ;; representation by default (regardless of the coding
+ ;; system, which only affects how it goes from outside data
+ ;; to this internal representation).  This *almost* never
+ ;; matters.  Annoyingly, it does matter if we use this data
+ ;; in an image descriptor, since Emacs will use its internal
+ ;; data buffer directly and this multibyte representation
+ ;; corrupts binary image formats.  Since the caller is
+ ;; asking for binary data, a unibyte string is a more
+ ;; appropriate representation anyway.
+ (set-buffer-multibyte nil)
+ (let ((coding-system-for-read 'no-conversion))
+   (apply #'call-process notmuch-command nil '(t nil) nil args)
+   (setq data (buffer-string)
+  (when cache
+   (plist-put part :binary-content data)))
+data))
+
+(defun notmuch-get-bodypart-text (msg part process-crypto  cache)
   "Return the text content of PART in MSG.

 This returns the content of the given part as a multibyte Lisp
@@ -546,7 +552,9 @@ (defun notmuch-get-bodypart-text (msg part process-crypto)
 (npart (apply #'notmuch-call-notmuch-sexp args)))
(setq content (plist-get npart :content))
(when (not content)
- (error "Internal error: No :content from %S" args
+ (error "Internal error: No :content from %S" args)))
+  (when cache
+   (plist-put part :content content)))
 content))

 ;; Workaround: The call to `mm-display-part' below triggers a bug in
-- 
1.9.1



[PATCH 07/11] emacs: Return unibyte strings for binary part data

2014-04-21 Thread Austin Clements
Unibyte strings are meant for representing binary data.  In practice,
using unibyte versus multibyte strings affects *almost* nothing.  It
does happen to matter if we use the binary data in an image descriptor
(which is, helpfully, not documented anywhere and getting it wrong
results in opaque errors like "Not a PNG image: ").
---
 emacs/notmuch-lib.el | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index ece29c3..fc67b14 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -504,7 +504,7 @@ (defun notmuch-parts-filter-by-type (parts type)
parts))

 (defun notmuch-get-bodypart-binary (msg part process-crypto)
-  "Return the unprocessed content of PART in MSG.
+  "Return the unprocessed content of PART in MSG as a unibyte string.

 This returns the \"raw\" content of the given part after content
 transfer decoding, but with no further processing (see the
@@ -515,6 +515,16 @@ (defun notmuch-get-bodypart-binary (msg part 
process-crypto)
,@(when process-crypto '("--decrypt"))
,(notmuch-id-to-query (plist-get msg :id)
 (with-temp-buffer
+  ;; Emacs internally uses a UTF-8-like multibyte string
+  ;; representation by default (regardless of the coding system,
+  ;; which only affects how it goes from outside data to this
+  ;; internal representation).  This *almost* never matters.
+  ;; Annoyingly, it does matter if we use this data in an image
+  ;; descriptor, since Emacs will use its internal data buffer
+  ;; directly and this multibyte representation corrupts binary
+  ;; image formats.  Since the caller is asking for binary data, a
+  ;; unibyte string is a more appropriate representation anyway.
+  (set-buffer-multibyte nil)
   (let ((coding-system-for-read 'no-conversion))
(apply #'call-process notmuch-command nil '(t nil) nil args)
(buffer-string)
-- 
1.9.1



[PATCH 06/11] emacs: Remove broken `notmuch-get-bodypart-content' API

2014-04-21 Thread Austin Clements
`notmuch-get-bodypart-content' could do two very different things,
depending on conditions: for text/* parts other than text/html, it
would return the part content as a multibyte Lisp string *after*
charset conversion, while for other parts (including text/html), it
would return binary part content without charset conversion.

This commit completes the split of `notmuch-get-bodypart-content' into
two different and explicit APIs: `notmuch-get-bodypart-binary' and
`notmuch-get-bodypart-text'.  It updates all callers to use one or the
other depending on what's appropriate.
---
 emacs/notmuch-lib.el  | 37 -
 emacs/notmuch-show.el |  8 
 2 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index 4711098..ece29c3 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -519,9 +519,25 @@ (defun notmuch-get-bodypart-binary (msg part 
process-crypto)
(apply #'call-process notmuch-command nil '(t nil) nil args)
(buffer-string)

-(defun notmuch-get-bodypart-content (msg part process-crypto)
-  (or (plist-get part :content)
-  (notmuch-get-bodypart-binary msg part process-crypto)))
+(defun notmuch-get-bodypart-text (msg part process-crypto)
+  "Return the text content of PART in MSG.
+
+This returns the content of the given part as a multibyte Lisp
+string after performing content transfer decoding and any
+necessary charset decoding.  It is an error to use this for
+non-text/* parts."
+  (let ((content (plist-get part :content)))
+(when (not content)
+  ;; Use show --format=sexp to fetch decoded content
+  (let* ((args `("show" "--format=sexp" "--include-html"
+,(format "--part=%s" (plist-get part :id))
+,@(when process-crypto '("--decrypt"))
+,(notmuch-id-to-query (plist-get msg :id
+(npart (apply #'notmuch-call-notmuch-sexp args)))
+   (setq content (plist-get npart :content))
+   (when (not content)
+ (error "Internal error: No :content from %S" args
+content))

 ;; Workaround: The call to `mm-display-part' below triggers a bug in
 ;; Emacs 24 if it attempts to use the shr renderer to display an HTML
@@ -542,18 +558,21 @@ (defun notmuch-mm-display-part-inline (msg part 
content-type process-crypto)
 current buffer, if possible."
   (let ((display-buffer (current-buffer)))
 (with-temp-buffer
-  ;; In case there is :content, the content string is already converted
-  ;; into emacs internal format. `gnus-decoded' is a fake charset,
-  ;; which means no further decoding (to be done by mm- functions).
-  (let* ((charset (if (plist-member part :content)
- 'gnus-decoded
+  ;; In case we already have :content, use it and tell mm-* that
+  ;; it's already been charset-decoded by using the fake
+  ;; `gnus-decoded' charset.  Otherwise, we'll fetch the binary
+  ;; part content and let mm-* decode it.
+  (let* ((have-content (plist-member part :content))
+(charset (if have-content 'gnus-decoded
(plist-get part :content-charset)))
 (handle (mm-make-handle (current-buffer) `(,content-type (charset 
. ,charset)
;; If the user wants the part inlined, insert the content and
;; test whether we are able to inline it (which includes both
;; capability and suitability tests).
(when (mm-inlined-p handle)
- (insert (notmuch-get-bodypart-content msg part process-crypto))
+ (if have-content
+ (insert (notmuch-get-bodypart-text msg part process-crypto))
+   (insert (notmuch-get-bodypart-binary msg part process-crypto)))
  (when (mm-inlinable-p handle)
(set-buffer display-buffer)
(mm-display-part handle)
diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index e9867fd..537c558 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -680,7 +680,7 @@ (defun notmuch-show-insert-part-text/plain (msg part 
content-type nth depth butt
   (let ((start (if button
   (button-start button)
 (point
-(insert (notmuch-get-bodypart-content msg part 
notmuch-show-process-crypto))
+(insert (notmuch-get-bodypart-text msg part notmuch-show-process-crypto))
 (save-excursion
   (save-restriction
(narrow-to-region start (point-max))
@@ -689,9 +689,9 @@ (defun notmuch-show-insert-part-text/plain (msg part 
content-type nth depth butt

 (defun notmuch-show-insert-part-text/calendar (msg part content-type nth depth 
button)
   (insert (with-temp-buffer
-   (insert (notmuch-get-bodypart-content msg part 
notmuch-show-process-crypto))
-   ;; notmuch-get-bodypart-content provides "raw", non-converted
-   ;; data. Replace CRLF with LF before icalendar can use it.
+   (insert (notmuch-get-bodypart-text msg 

[PATCH 05/11] emacs: Create an API for fetching parts as undecoded binary

2014-04-21 Thread Austin Clements
The new function, `notmuch-get-bodypart-binary', replaces
`notmuch-get-bodypart-internal'.  Whereas the old function was really
meant for internal use in `notmuch-get-bodypart-content', it was used
in a few other places.  Since the difference between
`notmuch-get-bodypart-content' and `notmuch-get-bodypart-internal' was
unclear, these other uses were always confusing and potentially
inconsistent.  The new call clearly requests the part as undecoded
binary.

This is step 1 of 2 in separating `notmuch-get-bodypart-content' into
two APIs for retrieving either undecoded binary or decoded text.
---
 emacs/notmuch-lib.el  | 28 ++--
 emacs/notmuch-show.el | 22 +-
 2 files changed, 23 insertions(+), 27 deletions(-)

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index 35b9065..4711098 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -503,25 +503,25 @@ (defun notmuch-parts-filter-by-type (parts type)
(lambda (part) (notmuch-match-content-type (plist-get part :content-type) 
type))
parts))

-;; Helper for parts which are generally not included in the default
-;; SEXP output.
-(defun notmuch-get-bodypart-internal (query part-number process-crypto)
-  (let ((args '("show" "--format=raw"))
-   (part-arg (format "--part=%s" part-number)))
-(setq args (append args (list part-arg)))
-(if process-crypto
-   (setq args (append args '("--decrypt"
-(setq args (append args (list query)))
+(defun notmuch-get-bodypart-binary (msg part process-crypto)
+  "Return the unprocessed content of PART in MSG.
+
+This returns the \"raw\" content of the given part after content
+transfer decoding, but with no further processing (see the
+discussion of --format=raw in man notmuch-show).  In particular,
+this does no charset conversion."
+  (let ((args `("show" "--format=raw"
+   ,(format "--part=%d" (plist-get part :id))
+   ,@(when process-crypto '("--decrypt"))
+   ,(notmuch-id-to-query (plist-get msg :id)
 (with-temp-buffer
   (let ((coding-system-for-read 'no-conversion))
-   (progn
- (apply 'call-process (append (list notmuch-command nil (list t nil) 
nil) args))
- (buffer-string))
+   (apply #'call-process notmuch-command nil '(t nil) nil args)
+   (buffer-string)

 (defun notmuch-get-bodypart-content (msg part process-crypto)
   (or (plist-get part :content)
-  (notmuch-get-bodypart-internal (notmuch-id-to-query (plist-get msg :id))
-(plist-get part :id) process-crypto)))
+  (notmuch-get-bodypart-binary msg part process-crypto)))

 ;; Workaround: The call to `mm-display-part' below triggers a bug in
 ;; Emacs 24 if it attempts to use the shr renderer to display an HTML
diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index 455cfee..e9867fd 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -557,16 +557,14 @@ (defun notmuch-show-w3m-cid-retrieve (url  args)
(let* ((msg (nth 1 matching-part))
   (part (nth 2 matching-part))
   (content (nth 3 matching-part))
-  (message-id (plist-get msg :id))
-  (part-number (plist-get part :id))
   (content-type (plist-get part :content-type)))
  ;; If we don't already have the content, get it and cache
  ;; it, as some messages reference the same cid: part many
  ;; times (hundreds!), which results in many calls to
  ;; `notmuch part'.
  (unless content
-   (setq content (notmuch-get-bodypart-internal (notmuch-id-to-query 
message-id)
- part-number 
notmuch-show-process-crypto))
+   (setq content (notmuch-get-bodypart-binary
+  msg part notmuch-show-process-crypto))
(with-current-buffer w3m-current-buffer
  (notmuch-show-w3m-cid-store-internal url msg part content)))
  (insert content)
@@ -2067,15 +2065,14 @@ (defun notmuch-show-stash-mlarchive-link-and-go 
( mla)

 ;; Interactive part functions and their helpers

-(defun notmuch-show-generate-part-buffer (message-id nth)
+(defun notmuch-show-generate-part-buffer (msg part)
   "Return a temporary buffer containing the specified part's content."
   (let ((buf (generate-new-buffer " *notmuch-part*"))
(process-crypto notmuch-show-process-crypto))
 (with-current-buffer buf
-  (setq notmuch-show-process-crypto process-crypto)
-  ;; Always acquires the part via `notmuch part', even if it is
-  ;; available in the SEXP output.
-  (insert (notmuch-get-bodypart-internal message-id nth 
notmuch-show-process-crypto)))
+  ;; This is always used in the content of mm handles, which
+  ;; expect undecoded, binary part content.
+  (insert (notmuch-get-bodypart-binary msg part process-crypto)))
 buf))

 (defun 

[PATCH 04/11] emacs: Track full message and part descriptor in w3m CID store

2014-04-21 Thread Austin Clements
This will simplify later changes.
---
 emacs/notmuch-show.el | 33 ++---
 1 file changed, 10 insertions(+), 23 deletions(-)

diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index 2b225df..455cfee 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -540,35 +540,26 @@ (defvar w3m-current-buffer) ;; From `w3m.el'.
 (defvar notmuch-show-w3m-cid-store nil)
 (make-variable-buffer-local 'notmuch-show-w3m-cid-store)

-(defun notmuch-show-w3m-cid-store-internal (content-id
-   message-id
-   part-number
-   content-type
-   content)
-  (push (list content-id
- message-id
- part-number
- content-type
- content)
+(defun notmuch-show-w3m-cid-store-internal (content-id msg part content)
+  (push (list content-id msg part content)
notmuch-show-w3m-cid-store))

 (defun notmuch-show-w3m-cid-store (msg part)
   (let ((content-id (plist-get part :content-id)))
 (when content-id
   (notmuch-show-w3m-cid-store-internal (concat "cid:" content-id)
-  (plist-get msg :id)
-  (plist-get part :id)
-  (plist-get part :content-type)
-  nil
+  msg part nil

 (defun notmuch-show-w3m-cid-retrieve (url  args)
   (let ((matching-part (with-current-buffer w3m-current-buffer
 (assoc url notmuch-show-w3m-cid-store
 (if matching-part
-   (let ((message-id (nth 1 matching-part))
- (part-number (nth 2 matching-part))
- (content-type (nth 3 matching-part))
- (content (nth 4 matching-part)))
+   (let* ((msg (nth 1 matching-part))
+  (part (nth 2 matching-part))
+  (content (nth 3 matching-part))
+  (message-id (plist-get msg :id))
+  (part-number (plist-get part :id))
+  (content-type (plist-get part :content-type)))
  ;; If we don't already have the content, get it and cache
  ;; it, as some messages reference the same cid: part many
  ;; times (hundreds!), which results in many calls to
@@ -577,11 +568,7 @@ (defun notmuch-show-w3m-cid-retrieve (url  args)
(setq content (notmuch-get-bodypart-internal (notmuch-id-to-query 
message-id)
  part-number 
notmuch-show-process-crypto))
(with-current-buffer w3m-current-buffer
- (notmuch-show-w3m-cid-store-internal url
-  message-id
-  part-number
-  content-type
-  content)))
+ (notmuch-show-w3m-cid-store-internal url msg part content)))
  (insert content)
  content-type)
   nil)))
-- 
1.9.1



[PATCH 03/11] emacs: Fix coding system in `notmuch-show-view-raw-message'

2014-04-21 Thread Austin Clements
This fixes the known-broken test of viewing 8bit messages added by the
previous commit.
---
 emacs/notmuch-show.el   | 5 +++--
 test/T455-emacs-charsets.sh | 1 -
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index 949ac09..2b225df 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -1748,11 +1748,12 @@ (defun notmuch-show-previous-open-message ()
   (notmuch-show-message-adjust))

 (defun notmuch-show-view-raw-message ()
-  "View the file holding the current message."
+  "View the original source of the current message."
   (interactive)
   (let* ((id (notmuch-show-get-message-id))
 (buf (get-buffer-create (concat "*notmuch-raw-" id "*"
-(call-process notmuch-command nil buf nil "show" "--format=raw" id)
+(let ((coding-system-for-read 'no-conversion))
+  (call-process notmuch-command nil buf nil "show" "--format=raw" id))
 (switch-to-buffer buf)
 (goto-char (point-min))
 (set-buffer-modified-p nil)
diff --git a/test/T455-emacs-charsets.sh b/test/T455-emacs-charsets.sh
index a42a1d2..3078f9c 100755
--- a/test/T455-emacs-charsets.sh
+++ b/test/T455-emacs-charsets.sh
@@ -128,7 +128,6 @@ EOF
 test_expect_equal_file OUTPUT EXPECTED

 test_begin_subtest "8bit text message are not decoded when viewing"
-test_subtest_known_broken
 test_emacs '(notmuch-show "id:test-plain-8bit at example.com")
(notmuch-show-view-raw-message)
(test-visible-output "OUTPUT.raw")'
-- 
1.9.1



[PATCH 02/11] test: New tests for Emacs charset handling

2014-04-21 Thread Austin Clements
The test of viewing 8bit messages is known-broken.  The rest pass, but
for very fragile reasons.  The next several commits will fix the
known-broken test and make our charset handling robust.
---
 test/T455-emacs-charsets.sh | 141 
 test/test-lib.el|   4 +-
 2 files changed, 144 insertions(+), 1 deletion(-)
 create mode 100755 test/T455-emacs-charsets.sh

diff --git a/test/T455-emacs-charsets.sh b/test/T455-emacs-charsets.sh
new file mode 100755
index 000..a42a1d2
--- /dev/null
+++ b/test/T455-emacs-charsets.sh
@@ -0,0 +1,141 @@
+#!/usr/bin/env bash
+
+test_description="emacs notmuch-show charset handling"
+. ./test-lib.sh
+
+
+UTF8_YEN=$'\xef\xbf\xa5'
+BIG5_YEN=$'\xa2\x44'
+
+# Add four messages with unusual encoding requirements:
+#
+# 1) text/plain in quoted-printable big5
+generate_message \
+[id]=test-plain at example.com \
+'[content-type]="text/plain; charset=big5"' \
+'[content-transfer-encoding]=quoted-printable' \
+'[body]="Yen: =A2=44"'
+
+# 2) text/plain in 8bit big5
+generate_message \
+[id]=test-plain-8bit at example.com \
+'[content-type]="text/plain; charset=big5"' \
+'[content-transfer-encoding]=8bit' \
+'[body]="Yen: '$BIG5_YEN'"'
+
+# 3) text/html in quoted-printable big5
+generate_message \
+[id]=test-html at example.com \
+'[content-type]="text/html; charset=big5"' \
+'[content-transfer-encoding]=quoted-printable' \
+'[body]="Yen: =A2=44"'
+
+# 4) application/octet-stream in quoted-printable of big5 text
+generate_message \
+[id]=test-binary at example.com \
+'[content-type]="application/octet-stream"' \
+'[content-transfer-encoding]=quoted-printable' \
+'[body]="Yen: =A2=44"'
+
+notmuch new > /dev/null
+
+# Test rendering
+
+test_begin_subtest "Text parts are decoded when rendering"
+test_emacs '(notmuch-show "id:test-plain at example.com")
+   (test-visible-output "OUTPUT.raw")'
+awk 'show {print} /^$/ {show=1}' < OUTPUT.raw > OUTPUT
+cat  OUTPUT
+cat  OUTPUT
+cat  OUTPUT
+cat  OUTPUT
+cat 

[PATCH 01/11] emacs: Remove redundant NTH argument from `notmuch-get-bodypart-content'.

2014-04-21 Thread Austin Clements
This can be derived from the PART argument (which is arguably
canonical), so there's no sense in giving the caller an extra foot
gun.
---
 emacs/notmuch-lib.el  | 9 +
 emacs/notmuch-show.el | 6 +++---
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index 2941da3..35b9065 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -518,9 +518,10 @@ (defun notmuch-get-bodypart-internal (query part-number 
process-crypto)
  (apply 'call-process (append (list notmuch-command nil (list t nil) 
nil) args))
  (buffer-string))

-(defun notmuch-get-bodypart-content (msg part nth process-crypto)
+(defun notmuch-get-bodypart-content (msg part process-crypto)
   (or (plist-get part :content)
-  (notmuch-get-bodypart-internal (notmuch-id-to-query (plist-get msg :id)) 
nth process-crypto)))
+  (notmuch-get-bodypart-internal (notmuch-id-to-query (plist-get msg :id))
+(plist-get part :id) process-crypto)))

 ;; Workaround: The call to `mm-display-part' below triggers a bug in
 ;; Emacs 24 if it attempts to use the shr renderer to display an HTML
@@ -536,7 +537,7 @@ (defun notmuch-get-bodypart-content (msg part nth 
process-crypto)
   (ad-disable-advice 'mm-shr 'before 'load-gnus-arts)
   (ad-activate 'mm-shr)))

-(defun notmuch-mm-display-part-inline (msg part nth content-type 
process-crypto)
+(defun notmuch-mm-display-part-inline (msg part content-type process-crypto)
   "Use the mm-decode/mm-view functions to display a part in the
 current buffer, if possible."
   (let ((display-buffer (current-buffer)))
@@ -552,7 +553,7 @@ (defun notmuch-mm-display-part-inline (msg part nth 
content-type process-crypto)
;; test whether we are able to inline it (which includes both
;; capability and suitability tests).
(when (mm-inlined-p handle)
- (insert (notmuch-get-bodypart-content msg part nth process-crypto))
+ (insert (notmuch-get-bodypart-content msg part process-crypto))
  (when (mm-inlinable-p handle)
(set-buffer display-buffer)
(mm-display-part handle)
diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index df10d4b..949ac09 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -695,7 +695,7 @@ (defun notmuch-show-insert-part-text/plain (msg part 
content-type nth depth butt
   (let ((start (if button
   (button-start button)
 (point
-(insert (notmuch-get-bodypart-content msg part nth 
notmuch-show-process-crypto))
+(insert (notmuch-get-bodypart-content msg part 
notmuch-show-process-crypto))
 (save-excursion
   (save-restriction
(narrow-to-region start (point-max))
@@ -704,7 +704,7 @@ (defun notmuch-show-insert-part-text/plain (msg part 
content-type nth depth butt

 (defun notmuch-show-insert-part-text/calendar (msg part content-type nth depth 
button)
   (insert (with-temp-buffer
-   (insert (notmuch-get-bodypart-content msg part nth 
notmuch-show-process-crypto))
+   (insert (notmuch-get-bodypart-content msg part 
notmuch-show-process-crypto))
;; notmuch-get-bodypart-content provides "raw", non-converted
;; data. Replace CRLF with LF before icalendar can use it.
(goto-char (point-min))
@@ -756,7 +756,7 @@ (defun notmuch-show-insert-part-text/html (msg part 
content-type nth depth butto

 (defun notmuch-show-insert-part-*/* (msg part content-type nth depth button)
   ;; This handler _must_ succeed - it is the handler of last resort.
-  (notmuch-mm-display-part-inline msg part nth content-type 
notmuch-show-process-crypto)
+  (notmuch-mm-display-part-inline msg part content-type 
notmuch-show-process-crypto)
   t)

 ;; Functions for determining how to handle MIME parts.
-- 
1.9.1



[PATCH 00/11] Improve charset and cid: handling

2014-04-21 Thread Austin Clements
I set out to quickly add support for cid: links in the shr renderer
and wound up making our charset handling more robust and rewriting our
content-ID handling.  The test introduced in patch 2 passes in all but
one really obscure case, but only because of many unwritten and
potentially fragile assumptions that Emacs and the CLI make about each
other.

The first three patches could reasonably go in to 0.18.  The rest of
this series is certainly post-0.18, but I didn't want to lose track of
it.

This series comes in three stages.  Each depends on the earlier ones,
but each prefix makes sense on its own and could be pushed without the
later stages.

Patch 1 is a simple clean up patch.

Patches 2 through 7 robust-ify our charset handling in Emacs, mostly
by splitting the broken `notmuch-get-bodypart-content' API into
`notmuch-get-bodypart-binary' and `notmuch-get-bodypart-text' so a
caller can explicitly convey their requirements.

The remaining patches improve our content-ID handling and add support
for cid: links for shr.



[RFC PATCH] Re: excessive thread fusing

2014-04-21 Thread Austin Clements
Quoth Mark Walters on Apr 21 at  8:20 am:
> 
> >> I haven't tracked through all the logic of the existing algorithm for
> >> this case. But I don't like hearing that notmuch constructs different
> >> threads for the same messages presented in different orders. This sounds
> >> like a bug separate from what we've discussed above. 
> 
> I think I have now found this bug and it is separate from the malformed
> In-Reply-To problems.
> 
> The problem is that when we merge threads we update all the thread-ids
> of documents in the loser thread. But we don't (if I understand the code
> correctly) update dangling "metadata" references to threads which don't
> (yet) have any documents.

This exactly the problem I wrote
id:1395608456-9673-1-git-send-email-amdragon at mit.edu to test, but I
had convinced myself everything was okay because we link a message to
both its parents and all of its children.  But that's only true if you
eventually receive the linking message (which in the test I made, you
do).  In this case, you never receive the linking message, so even
though notmuch has enough information to bring the two threads
together, it doesn't.

Maybe I should create a second variant of that test where all of the
messages reference their entire heritage (rather than just their
immediate parent) and test that they're *always* in one thread
regardless of receipt order (rather than only checking once they've
all been received)?  I think that would weed out this case.

> To make this explicit consider the 2 messages 17,18 in the set. 
> 
> Message 17 has id <87wrkidfrh.fsf at pinto.chemeng.ucl.ac.uk> and has no
> references/in-reply-to so has no parents.
> 
> Message 18 has a reference to <87wrkidfrh.fsf at pinto.chemeng.ucl.ac.uk>
> and an in-reply-to to  and
> <87wrkidfrh.fsf at pinto.chemeng.ucl.ac.uk>
> 
> If you add 17 first then it gets thread-id 1 and then when you add 18 message 
> 18 gets
> thread-id 2 as does the dangling link to the "unseen" message
> e.fraga at ucl.ac.uk, and then message 17 is moved to thread-id 2.
> 
> However, if you add 18 first then it gets thread-id 1 and the dangling
> link gets id 1. When 17 is added it gets thread-id 2, message 18 gets
> thread-id updated to 2 *but* the dangling link to e.fraga at ucl.ac.uk does
> not get updated so stays thread-id 1.
> 
> In particular when 52 comes along with its link to e.fraga at ucl.ac.uk
> then:
> 
> In the first case it gets put in thread-id 3 and the other two
> messages get moved into thread 3.
> 
> In the second case, message 52 gets put in thread 3 and thread 1
> (the dangling link) gets moved into thread 3 leaving thread 2
> (containing messages 17 and 18) untouched.

So, there's an obvious, messy way to fix this: update the metadata
references when we do thread renumbering.  This is messy because that
data *isn't indexed*.  The only way to find the records we need to
update is to scan them all.  This isn't completely terrible because
it's a sequential scan and we could cache it in memory, but it
certainly isn't going to help notmuch new's performance.  (My database
has 6,749 of these, which takes ~1 second to scan on a cold cache,
though that's with an SSD [1]).


But let me propose an idea I've been kicking around for a while: ghost
message documents.  Rather than using user metadata for tracking these
missing messages, use regular documents with the exact same terms we
use now for message IDs and thread IDs, but with a Tghost term instead
of a Tmail term to distinguish their type.  This solves the problem
using infrastructure we already have in place, simplifies the message
linking code, and may even make it faster.  It's a schema update, but
a simple and fast one.  I think the hardest part is that things like
notmuch_database_find_message would need to distinguish ghosts and
regular messages (which may require pre-fetching the Tghost or Tmail
posting list to do efficiently).

This also sets us up to do some cool things in the future, though
they're more invasive.  If we have message-like documents for these
ghosts, we can store other message-like metadata as well.  If we store
tags on them, then we can keep tags around for deleted messages and
*reapply them* if the message comes back.  This would finally fix the
races we have now where, if a message is renamed or moved during a
notmuch new, we may think it's deleted only to reindex it with default
tags on the next run.  We could also pre-tag messages that haven't
been indexed yet, say from procmail or when sending a message.  This
could simplify or even obviate notmuch insert.  If we add message
ctimes as proposed by Dave Mazi?res, this would give us a place to
store and query ctimes of deleted messages (otherwise it's unclear how
you find out about deletions without a full database scan).  In
effect, the database becomes truly monotonic.

[1] Curious?
yes n | xapian-inspect postlist.DB | \
awk '!/Key/ {next} /Key: 

[RFC PATCH] Re: excessive thread fusing

2014-04-21 Thread Mark Walters

>> I haven't tracked through all the logic of the existing algorithm for
>> this case. But I don't like hearing that notmuch constructs different
>> threads for the same messages presented in different orders. This sounds
>> like a bug separate from what we've discussed above. 

I think I have now found this bug and it is separate from the malformed
In-Reply-To problems.

The problem is that when we merge threads we update all the thread-ids
of documents in the loser thread. But we don't (if I understand the code
correctly) update dangling "metadata" references to threads which don't
(yet) have any documents.

To make this explicit consider the 2 messages 17,18 in the set. 

Message 17 has id <87wrkidfrh.fsf at pinto.chemeng.ucl.ac.uk> and has no
references/in-reply-to so has no parents.

Message 18 has a reference to <87wrkidfrh.fsf at pinto.chemeng.ucl.ac.uk>
and an in-reply-to to  and
<87wrkidfrh.fsf at pinto.chemeng.ucl.ac.uk>

If you add 17 first then it gets thread-id 1 and then when you add 18 message 
18 gets
thread-id 2 as does the dangling link to the "unseen" message
e.fraga at ucl.ac.uk, and then message 17 is moved to thread-id 2.

However, if you add 18 first then it gets thread-id 1 and the dangling
link gets id 1. When 17 is added it gets thread-id 2, message 18 gets
thread-id updated to 2 *but* the dangling link to e.fraga at ucl.ac.uk does
not get updated so stays thread-id 1.

In particular when 52 comes along with its link to e.fraga at ucl.ac.uk
then:

In the first case it gets put in thread-id 3 and the other two
messages get moved into thread 3.

In the second case, message 52 gets put in thread 3 and thread 1
(the dangling link) gets moved into thread 3 leaving thread 2
(containing messages 17 and 18) untouched.

Best wishes

Mark







[PATCH 0/7] doc: Python 3 compat, rst2man.py support, etc.

2014-04-21 Thread David Bremner
Tomi Ollila  writes:

> In this series IMO the patches 1-4:
>
> id:8d518408f2da8bc96ae3123f05791142da26b9bc.1396718720.git.wking at tremily.us
> id:543aee63407956e60f85dc11a2d25855e98c10c3.1396718720.git.wking at tremily.us
> id:5e4509ab08699afe2681110fb35075e1d0bbdc7e.1396718720.git.wking at tremily.us
> id:c5ec510ac25c867ad600c475a0070a003440a4b8.1396718720.git.wking at tremily.us
>
> could go in as those are. 5:
>
> id:adce76bb9a0ca728d856da4ecaf6b282e22e7440.1396718720.git.wking at tremily.us
>
> if, for consistency reason (we don't use absolute paths with other commands
> either), rst2man/rst2man.py is used as is (and commit message adjusted
> accordingly).

I've queued 1-4 for merging. Any patches that might break the build
(e.g. 5 and 6 in this series) have to go in pretty quick if they are to
be in 0.18; patch 7 we can sort out during the freeze.

I'm not sure I completely understand the state of the discussion around
patch 5. Personally I don't like either undefined or empty RST2MAN as a
boolean a priori. I'd rather keep HAVE_RST2MAN for consistency.

d


[PATCH] doc: make notmuch-new summary line more generic

2014-04-21 Thread David Bremner
David Bremner  writes:

> Since 'notmuch new' now takes multiple options, it's confusing to show
> only one of them in the summary.

pushed,

d


[PATCH v2] NEWS for displaying tag changes

2014-04-21 Thread David Bremner
Mark Walters  writes:

> ---
> Fixed the markdown and the bracket error.
pushed.

d


Re: [RFC PATCH] Re: excessive thread fusing

2014-04-21 Thread Mark Walters

 I haven't tracked through all the logic of the existing algorithm for
 this case. But I don't like hearing that notmuch constructs different
 threads for the same messages presented in different orders. This sounds
 like a bug separate from what we've discussed above. 

I think I have now found this bug and it is separate from the malformed
In-Reply-To problems.

The problem is that when we merge threads we update all the thread-ids
of documents in the loser thread. But we don't (if I understand the code
correctly) update dangling metadata references to threads which don't
(yet) have any documents.

To make this explicit consider the 2 messages 17,18 in the set. 

Message 17 has id 87wrkidfrh@pinto.chemeng.ucl.ac.uk and has no
references/in-reply-to so has no parents.

Message 18 has a reference to 87wrkidfrh@pinto.chemeng.ucl.ac.uk
and an in-reply-to to e.fr...@ucl.ac.uk and
87wrkidfrh@pinto.chemeng.ucl.ac.uk

If you add 17 first then it gets thread-id 1 and then when you add 18 message 
18 gets
thread-id 2 as does the dangling link to the unseen message
e.fr...@ucl.ac.uk, and then message 17 is moved to thread-id 2.

However, if you add 18 first then it gets thread-id 1 and the dangling
link gets id 1. When 17 is added it gets thread-id 2, message 18 gets
thread-id updated to 2 *but* the dangling link to e.fr...@ucl.ac.uk does
not get updated so stays thread-id 1.

In particular when 52 comes along with its link to e.fr...@ucl.ac.uk
then:

In the first case it gets put in thread-id 3 and the other two
messages get moved into thread 3.

In the second case, message 52 gets put in thread 3 and thread 1
(the dangling link) gets moved into thread 3 leaving thread 2
(containing messages 17 and 18) untouched.

Best wishes

Mark





___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH v2 0/4] doc: notmuch-show improvements

2014-04-21 Thread David Bremner
Austin Clements amdra...@mit.edu writes:

 This is v2 of id:1397834332-25175-1-git-send-email-amdra...@mit.edu.
 It expands the explanation of non-MIME message parts and moves it to
 the --part documentation.

pushed.

d
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 1/1] emacs: hello: bugfix for saved searches defcustom

2014-04-21 Thread David Bremner
Mark Walters markwalters1...@gmail.com writes:

 The recent changes for saved searches introduced a bug when notmuch
 was loaded after the saved search was defined. This was caused by a
 utility function not being defined when the defcustom was loaded.

pushed.

d
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 0/7] doc: Python 3 compat, rst2man.py support, etc.

2014-04-21 Thread David Bremner
W. Trevor King wk...@tremily.us writes:

 I just bumped into this today while testing v2 of my
 content-description series:


pushed patches 1-4

d
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [RFC PATCH] Re: excessive thread fusing

2014-04-21 Thread Austin Clements
Quoth Mark Walters on Apr 21 at  8:20 am:
 
  I haven't tracked through all the logic of the existing algorithm for
  this case. But I don't like hearing that notmuch constructs different
  threads for the same messages presented in different orders. This sounds
  like a bug separate from what we've discussed above. 
 
 I think I have now found this bug and it is separate from the malformed
 In-Reply-To problems.
 
 The problem is that when we merge threads we update all the thread-ids
 of documents in the loser thread. But we don't (if I understand the code
 correctly) update dangling metadata references to threads which don't
 (yet) have any documents.

This exactly the problem I wrote
id:1395608456-9673-1-git-send-email-amdra...@mit.edu to test, but I
had convinced myself everything was okay because we link a message to
both its parents and all of its children.  But that's only true if you
eventually receive the linking message (which in the test I made, you
do).  In this case, you never receive the linking message, so even
though notmuch has enough information to bring the two threads
together, it doesn't.

Maybe I should create a second variant of that test where all of the
messages reference their entire heritage (rather than just their
immediate parent) and test that they're *always* in one thread
regardless of receipt order (rather than only checking once they've
all been received)?  I think that would weed out this case.

 To make this explicit consider the 2 messages 17,18 in the set. 
 
 Message 17 has id 87wrkidfrh@pinto.chemeng.ucl.ac.uk and has no
 references/in-reply-to so has no parents.
 
 Message 18 has a reference to 87wrkidfrh@pinto.chemeng.ucl.ac.uk
 and an in-reply-to to e.fr...@ucl.ac.uk and
 87wrkidfrh@pinto.chemeng.ucl.ac.uk
 
 If you add 17 first then it gets thread-id 1 and then when you add 18 message 
 18 gets
 thread-id 2 as does the dangling link to the unseen message
 e.fr...@ucl.ac.uk, and then message 17 is moved to thread-id 2.
 
 However, if you add 18 first then it gets thread-id 1 and the dangling
 link gets id 1. When 17 is added it gets thread-id 2, message 18 gets
 thread-id updated to 2 *but* the dangling link to e.fr...@ucl.ac.uk does
 not get updated so stays thread-id 1.
 
 In particular when 52 comes along with its link to e.fr...@ucl.ac.uk
 then:
 
 In the first case it gets put in thread-id 3 and the other two
 messages get moved into thread 3.
 
 In the second case, message 52 gets put in thread 3 and thread 1
 (the dangling link) gets moved into thread 3 leaving thread 2
 (containing messages 17 and 18) untouched.

So, there's an obvious, messy way to fix this: update the metadata
references when we do thread renumbering.  This is messy because that
data *isn't indexed*.  The only way to find the records we need to
update is to scan them all.  This isn't completely terrible because
it's a sequential scan and we could cache it in memory, but it
certainly isn't going to help notmuch new's performance.  (My database
has 6,749 of these, which takes ~1 second to scan on a cold cache,
though that's with an SSD [1]).


But let me propose an idea I've been kicking around for a while: ghost
message documents.  Rather than using user metadata for tracking these
missing messages, use regular documents with the exact same terms we
use now for message IDs and thread IDs, but with a Tghost term instead
of a Tmail term to distinguish their type.  This solves the problem
using infrastructure we already have in place, simplifies the message
linking code, and may even make it faster.  It's a schema update, but
a simple and fast one.  I think the hardest part is that things like
notmuch_database_find_message would need to distinguish ghosts and
regular messages (which may require pre-fetching the Tghost or Tmail
posting list to do efficiently).

This also sets us up to do some cool things in the future, though
they're more invasive.  If we have message-like documents for these
ghosts, we can store other message-like metadata as well.  If we store
tags on them, then we can keep tags around for deleted messages and
*reapply them* if the message comes back.  This would finally fix the
races we have now where, if a message is renamed or moved during a
notmuch new, we may think it's deleted only to reindex it with default
tags on the next run.  We could also pre-tag messages that haven't
been indexed yet, say from procmail or when sending a message.  This
could simplify or even obviate notmuch insert.  If we add message
ctimes as proposed by Dave Mazières, this would give us a place to
store and query ctimes of deleted messages (otherwise it's unclear how
you find out about deletions without a full database scan).  In
effect, the database becomes truly monotonic.

[1] Curious?
yes n | xapian-inspect postlist.DB | \
awk '!/Key/ {next} /Key: \\x00\\xc0thread_id_/ {N++} /Key: \\x00\\xd0/ 
{exit} END {print N}'

Re: [RFC PATCH] Re: excessive thread fusing

2014-04-21 Thread Carl Worth
Austin Clements amdra...@mit.edu writes:
 But let me propose an idea I've been kicking around for a while: ghost
 message documents.  Rather than using user metadata for tracking these
 missing messages, use regular documents with the exact same terms we
 use now for message IDs and thread IDs, but with a Tghost term instead
 of a Tmail term to distinguish their type.

For what it's worth. I like this proposed fix very much. It's obviously
what I should have done in the first place.

 In effect, the database becomes truly monotonic.

Yes. All of these benefits are very compelling.

 [1] Curious?
 yes n | xapian-inspect postlist.DB | \
 awk '!/Key/ {next} /Key: \\x00\\xc0thread_id_/ {N++} /Key: \\x00\\xd0/ 
 {exit} END {print N}'

16725 for me.

-Carl



pgpuS8ZLLxizB.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH 03/11] emacs: Fix coding system in `notmuch-show-view-raw-message'

2014-04-21 Thread Austin Clements
This fixes the known-broken test of viewing 8bit messages added by the
previous commit.
---
 emacs/notmuch-show.el   | 5 +++--
 test/T455-emacs-charsets.sh | 1 -
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index 949ac09..2b225df 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -1748,11 +1748,12 @@ (defun notmuch-show-previous-open-message ()
   (notmuch-show-message-adjust))
 
 (defun notmuch-show-view-raw-message ()
-  View the file holding the current message.
+  View the original source of the current message.
   (interactive)
   (let* ((id (notmuch-show-get-message-id))
 (buf (get-buffer-create (concat *notmuch-raw- id *
-(call-process notmuch-command nil buf nil show --format=raw id)
+(let ((coding-system-for-read 'no-conversion))
+  (call-process notmuch-command nil buf nil show --format=raw id))
 (switch-to-buffer buf)
 (goto-char (point-min))
 (set-buffer-modified-p nil)
diff --git a/test/T455-emacs-charsets.sh b/test/T455-emacs-charsets.sh
index a42a1d2..3078f9c 100755
--- a/test/T455-emacs-charsets.sh
+++ b/test/T455-emacs-charsets.sh
@@ -128,7 +128,6 @@ EOF
 test_expect_equal_file OUTPUT EXPECTED
 
 test_begin_subtest 8bit text message are not decoded when viewing
-test_subtest_known_broken
 test_emacs '(notmuch-show id:test-plain-8...@example.com)
(notmuch-show-view-raw-message)
(test-visible-output OUTPUT.raw)'
-- 
1.9.1

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH 01/11] emacs: Remove redundant NTH argument from `notmuch-get-bodypart-content'.

2014-04-21 Thread Austin Clements
This can be derived from the PART argument (which is arguably
canonical), so there's no sense in giving the caller an extra foot
gun.
---
 emacs/notmuch-lib.el  | 9 +
 emacs/notmuch-show.el | 6 +++---
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index 2941da3..35b9065 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -518,9 +518,10 @@ (defun notmuch-get-bodypart-internal (query part-number 
process-crypto)
  (apply 'call-process (append (list notmuch-command nil (list t nil) 
nil) args))
  (buffer-string))
 
-(defun notmuch-get-bodypart-content (msg part nth process-crypto)
+(defun notmuch-get-bodypart-content (msg part process-crypto)
   (or (plist-get part :content)
-  (notmuch-get-bodypart-internal (notmuch-id-to-query (plist-get msg :id)) 
nth process-crypto)))
+  (notmuch-get-bodypart-internal (notmuch-id-to-query (plist-get msg :id))
+(plist-get part :id) process-crypto)))
 
 ;; Workaround: The call to `mm-display-part' below triggers a bug in
 ;; Emacs 24 if it attempts to use the shr renderer to display an HTML
@@ -536,7 +537,7 @@ (defun notmuch-get-bodypart-content (msg part nth 
process-crypto)
   (ad-disable-advice 'mm-shr 'before 'load-gnus-arts)
   (ad-activate 'mm-shr)))
 
-(defun notmuch-mm-display-part-inline (msg part nth content-type 
process-crypto)
+(defun notmuch-mm-display-part-inline (msg part content-type process-crypto)
   Use the mm-decode/mm-view functions to display a part in the
 current buffer, if possible.
   (let ((display-buffer (current-buffer)))
@@ -552,7 +553,7 @@ (defun notmuch-mm-display-part-inline (msg part nth 
content-type process-crypto)
;; test whether we are able to inline it (which includes both
;; capability and suitability tests).
(when (mm-inlined-p handle)
- (insert (notmuch-get-bodypart-content msg part nth process-crypto))
+ (insert (notmuch-get-bodypart-content msg part process-crypto))
  (when (mm-inlinable-p handle)
(set-buffer display-buffer)
(mm-display-part handle)
diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index df10d4b..949ac09 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -695,7 +695,7 @@ (defun notmuch-show-insert-part-text/plain (msg part 
content-type nth depth butt
   (let ((start (if button
   (button-start button)
 (point
-(insert (notmuch-get-bodypart-content msg part nth 
notmuch-show-process-crypto))
+(insert (notmuch-get-bodypart-content msg part 
notmuch-show-process-crypto))
 (save-excursion
   (save-restriction
(narrow-to-region start (point-max))
@@ -704,7 +704,7 @@ (defun notmuch-show-insert-part-text/plain (msg part 
content-type nth depth butt
 
 (defun notmuch-show-insert-part-text/calendar (msg part content-type nth depth 
button)
   (insert (with-temp-buffer
-   (insert (notmuch-get-bodypart-content msg part nth 
notmuch-show-process-crypto))
+   (insert (notmuch-get-bodypart-content msg part 
notmuch-show-process-crypto))
;; notmuch-get-bodypart-content provides raw, non-converted
;; data. Replace CRLF with LF before icalendar can use it.
(goto-char (point-min))
@@ -756,7 +756,7 @@ (defun notmuch-show-insert-part-text/html (msg part 
content-type nth depth butto
 
 (defun notmuch-show-insert-part-*/* (msg part content-type nth depth button)
   ;; This handler _must_ succeed - it is the handler of last resort.
-  (notmuch-mm-display-part-inline msg part nth content-type 
notmuch-show-process-crypto)
+  (notmuch-mm-display-part-inline msg part content-type 
notmuch-show-process-crypto)
   t)
 
 ;; Functions for determining how to handle MIME parts.
-- 
1.9.1

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH 00/11] Improve charset and cid: handling

2014-04-21 Thread Austin Clements
I set out to quickly add support for cid: links in the shr renderer
and wound up making our charset handling more robust and rewriting our
content-ID handling.  The test introduced in patch 2 passes in all but
one really obscure case, but only because of many unwritten and
potentially fragile assumptions that Emacs and the CLI make about each
other.

The first three patches could reasonably go in to 0.18.  The rest of
this series is certainly post-0.18, but I didn't want to lose track of
it.

This series comes in three stages.  Each depends on the earlier ones,
but each prefix makes sense on its own and could be pushed without the
later stages.

Patch 1 is a simple clean up patch.

Patches 2 through 7 robust-ify our charset handling in Emacs, mostly
by splitting the broken `notmuch-get-bodypart-content' API into
`notmuch-get-bodypart-binary' and `notmuch-get-bodypart-text' so a
caller can explicitly convey their requirements.

The remaining patches improve our content-ID handling and add support
for cid: links for shr.

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH 09/11] emacs: Use generalized content caching in w3m CID code

2014-04-21 Thread Austin Clements
Previously this did its own caching, but this is now supported by more
generally by `notmuch-get-bodypart-binary'.
---
 emacs/notmuch-show.el | 23 ---
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index 537c558..9411c9a 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -540,15 +540,14 @@ (defvar w3m-current-buffer) ;; From `w3m.el'.
 (defvar notmuch-show-w3m-cid-store nil)
 (make-variable-buffer-local 'notmuch-show-w3m-cid-store)
 
-(defun notmuch-show-w3m-cid-store-internal (content-id msg part content)
-  (push (list content-id msg part content)
-   notmuch-show-w3m-cid-store))
+(defun notmuch-show-w3m-cid-store-internal (content-id msg part)
+  (push (list content-id msg part) notmuch-show-w3m-cid-store))
 
 (defun notmuch-show-w3m-cid-store (msg part)
   (let ((content-id (plist-get part :content-id)))
 (when content-id
   (notmuch-show-w3m-cid-store-internal (concat cid: content-id)
-  msg part nil
+  msg part
 
 (defun notmuch-show-w3m-cid-retrieve (url rest args)
   (let ((matching-part (with-current-buffer w3m-current-buffer
@@ -556,18 +555,12 @@ (defun notmuch-show-w3m-cid-retrieve (url rest args)
 (if matching-part
(let* ((msg (nth 1 matching-part))
   (part (nth 2 matching-part))
-  (content (nth 3 matching-part))
   (content-type (plist-get part :content-type)))
- ;; If we don't already have the content, get it and cache
- ;; it, as some messages reference the same cid: part many
- ;; times (hundreds!), which results in many calls to
- ;; `notmuch part'.
- (unless content
-   (setq content (notmuch-get-bodypart-binary
-  msg part notmuch-show-process-crypto))
-   (with-current-buffer w3m-current-buffer
- (notmuch-show-w3m-cid-store-internal url msg part content)))
- (insert content)
+ ;; Request content caching, as some messages reference the
+ ;; same cid: part many times (hundreds!), which results in
+ ;; many calls to `notmuch show'.
+ (insert (notmuch-get-bodypart-binary
+  msg part notmuch-show-process-crypto 'cache))
  content-type)
   nil)))
 
-- 
1.9.1

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH 08/11] emacs: Support caching in notmuch-get-bodypart-{binary, text}

2014-04-21 Thread Austin Clements
(The actual code change here is small, but requires re-indenting
existing code.)
---
 emacs/notmuch-lib.el | 52 ++--
 1 file changed, 30 insertions(+), 22 deletions(-)

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index fc67b14..fee8512 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -503,33 +503,39 @@ (defun notmuch-parts-filter-by-type (parts type)
(lambda (part) (notmuch-match-content-type (plist-get part :content-type) 
type))
parts))
 
-(defun notmuch-get-bodypart-binary (msg part process-crypto)
+(defun notmuch-get-bodypart-binary (msg part process-crypto optional cache)
   Return the unprocessed content of PART in MSG as a unibyte string.
 
 This returns the \raw\ content of the given part after content
 transfer decoding, but with no further processing (see the
 discussion of --format=raw in man notmuch-show).  In particular,
 this does no charset conversion.
-  (let ((args `(show --format=raw
-   ,(format --part=%d (plist-get part :id))
-   ,@(when process-crypto '(--decrypt))
-   ,(notmuch-id-to-query (plist-get msg :id)
-(with-temp-buffer
-  ;; Emacs internally uses a UTF-8-like multibyte string
-  ;; representation by default (regardless of the coding system,
-  ;; which only affects how it goes from outside data to this
-  ;; internal representation).  This *almost* never matters.
-  ;; Annoyingly, it does matter if we use this data in an image
-  ;; descriptor, since Emacs will use its internal data buffer
-  ;; directly and this multibyte representation corrupts binary
-  ;; image formats.  Since the caller is asking for binary data, a
-  ;; unibyte string is a more appropriate representation anyway.
-  (set-buffer-multibyte nil)
-  (let ((coding-system-for-read 'no-conversion))
-   (apply #'call-process notmuch-command nil '(t nil) nil args)
-   (buffer-string)
-
-(defun notmuch-get-bodypart-text (msg part process-crypto)
+  (let ((data (plist-get part :binary-content)))
+(when (not data)
+  (let ((args `(show --format=raw
+   ,(format --part=%d (plist-get part :id))
+   ,@(when process-crypto '(--decrypt))
+   ,(notmuch-id-to-query (plist-get msg :id)
+   (with-temp-buffer
+ ;; Emacs internally uses a UTF-8-like multibyte string
+ ;; representation by default (regardless of the coding
+ ;; system, which only affects how it goes from outside data
+ ;; to this internal representation).  This *almost* never
+ ;; matters.  Annoyingly, it does matter if we use this data
+ ;; in an image descriptor, since Emacs will use its internal
+ ;; data buffer directly and this multibyte representation
+ ;; corrupts binary image formats.  Since the caller is
+ ;; asking for binary data, a unibyte string is a more
+ ;; appropriate representation anyway.
+ (set-buffer-multibyte nil)
+ (let ((coding-system-for-read 'no-conversion))
+   (apply #'call-process notmuch-command nil '(t nil) nil args)
+   (setq data (buffer-string)
+  (when cache
+   (plist-put part :binary-content data)))
+data))
+
+(defun notmuch-get-bodypart-text (msg part process-crypto optional cache)
   Return the text content of PART in MSG.
 
 This returns the content of the given part as a multibyte Lisp
@@ -546,7 +552,9 @@ (defun notmuch-get-bodypart-text (msg part process-crypto)
 (npart (apply #'notmuch-call-notmuch-sexp args)))
(setq content (plist-get npart :content))
(when (not content)
- (error Internal error: No :content from %S args
+ (error Internal error: No :content from %S args)))
+  (when cache
+   (plist-put part :content content)))
 content))
 
 ;; Workaround: The call to `mm-display-part' below triggers a bug in
-- 
1.9.1

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH 06/11] emacs: Remove broken `notmuch-get-bodypart-content' API

2014-04-21 Thread Austin Clements
`notmuch-get-bodypart-content' could do two very different things,
depending on conditions: for text/* parts other than text/html, it
would return the part content as a multibyte Lisp string *after*
charset conversion, while for other parts (including text/html), it
would return binary part content without charset conversion.

This commit completes the split of `notmuch-get-bodypart-content' into
two different and explicit APIs: `notmuch-get-bodypart-binary' and
`notmuch-get-bodypart-text'.  It updates all callers to use one or the
other depending on what's appropriate.
---
 emacs/notmuch-lib.el  | 37 -
 emacs/notmuch-show.el |  8 
 2 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index 4711098..ece29c3 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -519,9 +519,25 @@ (defun notmuch-get-bodypart-binary (msg part 
process-crypto)
(apply #'call-process notmuch-command nil '(t nil) nil args)
(buffer-string)
 
-(defun notmuch-get-bodypart-content (msg part process-crypto)
-  (or (plist-get part :content)
-  (notmuch-get-bodypart-binary msg part process-crypto)))
+(defun notmuch-get-bodypart-text (msg part process-crypto)
+  Return the text content of PART in MSG.
+
+This returns the content of the given part as a multibyte Lisp
+string after performing content transfer decoding and any
+necessary charset decoding.  It is an error to use this for
+non-text/* parts.
+  (let ((content (plist-get part :content)))
+(when (not content)
+  ;; Use show --format=sexp to fetch decoded content
+  (let* ((args `(show --format=sexp --include-html
+,(format --part=%s (plist-get part :id))
+,@(when process-crypto '(--decrypt))
+,(notmuch-id-to-query (plist-get msg :id
+(npart (apply #'notmuch-call-notmuch-sexp args)))
+   (setq content (plist-get npart :content))
+   (when (not content)
+ (error Internal error: No :content from %S args
+content))
 
 ;; Workaround: The call to `mm-display-part' below triggers a bug in
 ;; Emacs 24 if it attempts to use the shr renderer to display an HTML
@@ -542,18 +558,21 @@ (defun notmuch-mm-display-part-inline (msg part 
content-type process-crypto)
 current buffer, if possible.
   (let ((display-buffer (current-buffer)))
 (with-temp-buffer
-  ;; In case there is :content, the content string is already converted
-  ;; into emacs internal format. `gnus-decoded' is a fake charset,
-  ;; which means no further decoding (to be done by mm- functions).
-  (let* ((charset (if (plist-member part :content)
- 'gnus-decoded
+  ;; In case we already have :content, use it and tell mm-* that
+  ;; it's already been charset-decoded by using the fake
+  ;; `gnus-decoded' charset.  Otherwise, we'll fetch the binary
+  ;; part content and let mm-* decode it.
+  (let* ((have-content (plist-member part :content))
+(charset (if have-content 'gnus-decoded
(plist-get part :content-charset)))
 (handle (mm-make-handle (current-buffer) `(,content-type (charset 
. ,charset)
;; If the user wants the part inlined, insert the content and
;; test whether we are able to inline it (which includes both
;; capability and suitability tests).
(when (mm-inlined-p handle)
- (insert (notmuch-get-bodypart-content msg part process-crypto))
+ (if have-content
+ (insert (notmuch-get-bodypart-text msg part process-crypto))
+   (insert (notmuch-get-bodypart-binary msg part process-crypto)))
  (when (mm-inlinable-p handle)
(set-buffer display-buffer)
(mm-display-part handle)
diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index e9867fd..537c558 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -680,7 +680,7 @@ (defun notmuch-show-insert-part-text/plain (msg part 
content-type nth depth butt
   (let ((start (if button
   (button-start button)
 (point
-(insert (notmuch-get-bodypart-content msg part 
notmuch-show-process-crypto))
+(insert (notmuch-get-bodypart-text msg part notmuch-show-process-crypto))
 (save-excursion
   (save-restriction
(narrow-to-region start (point-max))
@@ -689,9 +689,9 @@ (defun notmuch-show-insert-part-text/plain (msg part 
content-type nth depth butt
 
 (defun notmuch-show-insert-part-text/calendar (msg part content-type nth depth 
button)
   (insert (with-temp-buffer
-   (insert (notmuch-get-bodypart-content msg part 
notmuch-show-process-crypto))
-   ;; notmuch-get-bodypart-content provides raw, non-converted
-   ;; data. Replace CRLF with LF before icalendar can use it.
+   (insert (notmuch-get-bodypart-text msg part 

[PATCH 11/11] emacs: Support cid: references with shr renderer

2014-04-21 Thread Austin Clements
shr has really nice support for inline image rendering, but previously
we only had the hooks for w3m cid: references.
---
 emacs/notmuch-show.el | 41 +
 1 file changed, 33 insertions(+), 8 deletions(-)

diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index f758091..a489279 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -745,14 +745,39 @@ (defun 
notmuch-show-get-mime-type-of-application/octet-stream (part)
  nil
 
 (defun notmuch-show-insert-part-text/html (msg part content-type nth depth 
button)
-  ;; text/html handler to work around bugs in renderers and our
-  ;; invisibile parts code. In particular w3m sets up a keymap which
-  ;; leaks outside the invisible region and causes strange effects
-  ;; in notmuch. We set mm-inline-text-html-with-w3m-keymap to nil to
-  ;; tell w3m not to set a keymap (so the normal notmuch-show-mode-map
-  ;; remains).
-  (let ((mm-inline-text-html-with-w3m-keymap nil))
-(notmuch-show-insert-part-*/* msg part content-type nth depth button)))
+  (if (eq mm-text-html-renderer 'shr)
+  ;; It's easier to drive shr ourselves than to work around the
+  ;; goofy things `mm-shr' does (like irreversibly taking over
+  ;; content ID handling).
+  (notmuch-show--insert-part-text/html-shr msg part)
+;; Otherwise, let message-mode do the heavy lifting
+;;
+;; w3m sets up a keymap which leaks outside the invisible region
+;; and causes strange effects in notmuch. We set
+;; mm-inline-text-html-with-w3m-keymap to nil to tell w3m not to
+;; set a keymap (so the normal notmuch-show-mode-map remains).
+(let ((mm-inline-text-html-with-w3m-keymap nil))
+  (notmuch-show-insert-part-*/* msg part content-type nth depth button
+
+(defun notmuch-show--insert-part-text/html-shr (msg part)
+  ;; Make sure shr is loaded before we start let-binding its globals
+  (require 'shr)
+  (let ((dom (let (process-crypto notmuch-show-process-crypto)
+  (with-temp-buffer
+(insert (notmuch-get-bodypart-text msg part process-crypto))
+(libxml-parse-html-region (point-min) (point-max)
+   (shr-content-function
+(lambda (url)
+  ;; shr strips the cid: part of URL, but doesn't
+  ;; URL-decode it (see RFC 2392).
+  (let ((cid (url-unhex-string url)))
+(first (notmuch-show--get-cid-content cid)
+   ;; Block all external images to prevent privacy leaks and
+   ;; potential attacks.  FIXME: If we block an image, offer a
+   ;; button to load external images.
+   (shr-blocked-images .))
+(shr-insert-document dom)
+t))
 
 (defun notmuch-show-insert-part-*/* (msg part content-type nth depth button)
   ;; This handler _must_ succeed - it is the handler of last resort.
-- 
1.9.1

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH 04/11] emacs: Track full message and part descriptor in w3m CID store

2014-04-21 Thread Austin Clements
This will simplify later changes.
---
 emacs/notmuch-show.el | 33 ++---
 1 file changed, 10 insertions(+), 23 deletions(-)

diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index 2b225df..455cfee 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -540,35 +540,26 @@ (defvar w3m-current-buffer) ;; From `w3m.el'.
 (defvar notmuch-show-w3m-cid-store nil)
 (make-variable-buffer-local 'notmuch-show-w3m-cid-store)
 
-(defun notmuch-show-w3m-cid-store-internal (content-id
-   message-id
-   part-number
-   content-type
-   content)
-  (push (list content-id
- message-id
- part-number
- content-type
- content)
+(defun notmuch-show-w3m-cid-store-internal (content-id msg part content)
+  (push (list content-id msg part content)
notmuch-show-w3m-cid-store))
 
 (defun notmuch-show-w3m-cid-store (msg part)
   (let ((content-id (plist-get part :content-id)))
 (when content-id
   (notmuch-show-w3m-cid-store-internal (concat cid: content-id)
-  (plist-get msg :id)
-  (plist-get part :id)
-  (plist-get part :content-type)
-  nil
+  msg part nil
 
 (defun notmuch-show-w3m-cid-retrieve (url rest args)
   (let ((matching-part (with-current-buffer w3m-current-buffer
 (assoc url notmuch-show-w3m-cid-store
 (if matching-part
-   (let ((message-id (nth 1 matching-part))
- (part-number (nth 2 matching-part))
- (content-type (nth 3 matching-part))
- (content (nth 4 matching-part)))
+   (let* ((msg (nth 1 matching-part))
+  (part (nth 2 matching-part))
+  (content (nth 3 matching-part))
+  (message-id (plist-get msg :id))
+  (part-number (plist-get part :id))
+  (content-type (plist-get part :content-type)))
  ;; If we don't already have the content, get it and cache
  ;; it, as some messages reference the same cid: part many
  ;; times (hundreds!), which results in many calls to
@@ -577,11 +568,7 @@ (defun notmuch-show-w3m-cid-retrieve (url rest args)
(setq content (notmuch-get-bodypart-internal (notmuch-id-to-query 
message-id)
  part-number 
notmuch-show-process-crypto))
(with-current-buffer w3m-current-buffer
- (notmuch-show-w3m-cid-store-internal url
-  message-id
-  part-number
-  content-type
-  content)))
+ (notmuch-show-w3m-cid-store-internal url msg part content)))
  (insert content)
  content-type)
   nil)))
-- 
1.9.1

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH 07/11] emacs: Return unibyte strings for binary part data

2014-04-21 Thread Austin Clements
Unibyte strings are meant for representing binary data.  In practice,
using unibyte versus multibyte strings affects *almost* nothing.  It
does happen to matter if we use the binary data in an image descriptor
(which is, helpfully, not documented anywhere and getting it wrong
results in opaque errors like Not a PNG image: giant binary spew
that is, in fact, a PNG image).
---
 emacs/notmuch-lib.el | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index ece29c3..fc67b14 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -504,7 +504,7 @@ (defun notmuch-parts-filter-by-type (parts type)
parts))
 
 (defun notmuch-get-bodypart-binary (msg part process-crypto)
-  Return the unprocessed content of PART in MSG.
+  Return the unprocessed content of PART in MSG as a unibyte string.
 
 This returns the \raw\ content of the given part after content
 transfer decoding, but with no further processing (see the
@@ -515,6 +515,16 @@ (defun notmuch-get-bodypart-binary (msg part 
process-crypto)
,@(when process-crypto '(--decrypt))
,(notmuch-id-to-query (plist-get msg :id)
 (with-temp-buffer
+  ;; Emacs internally uses a UTF-8-like multibyte string
+  ;; representation by default (regardless of the coding system,
+  ;; which only affects how it goes from outside data to this
+  ;; internal representation).  This *almost* never matters.
+  ;; Annoyingly, it does matter if we use this data in an image
+  ;; descriptor, since Emacs will use its internal data buffer
+  ;; directly and this multibyte representation corrupts binary
+  ;; image formats.  Since the caller is asking for binary data, a
+  ;; unibyte string is a more appropriate representation anyway.
+  (set-buffer-multibyte nil)
   (let ((coding-system-for-read 'no-conversion))
(apply #'call-process notmuch-command nil '(t nil) nil args)
(buffer-string)
-- 
1.9.1

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH 05/11] emacs: Create an API for fetching parts as undecoded binary

2014-04-21 Thread Austin Clements
The new function, `notmuch-get-bodypart-binary', replaces
`notmuch-get-bodypart-internal'.  Whereas the old function was really
meant for internal use in `notmuch-get-bodypart-content', it was used
in a few other places.  Since the difference between
`notmuch-get-bodypart-content' and `notmuch-get-bodypart-internal' was
unclear, these other uses were always confusing and potentially
inconsistent.  The new call clearly requests the part as undecoded
binary.

This is step 1 of 2 in separating `notmuch-get-bodypart-content' into
two APIs for retrieving either undecoded binary or decoded text.
---
 emacs/notmuch-lib.el  | 28 ++--
 emacs/notmuch-show.el | 22 +-
 2 files changed, 23 insertions(+), 27 deletions(-)

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index 35b9065..4711098 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -503,25 +503,25 @@ (defun notmuch-parts-filter-by-type (parts type)
(lambda (part) (notmuch-match-content-type (plist-get part :content-type) 
type))
parts))
 
-;; Helper for parts which are generally not included in the default
-;; SEXP output.
-(defun notmuch-get-bodypart-internal (query part-number process-crypto)
-  (let ((args '(show --format=raw))
-   (part-arg (format --part=%s part-number)))
-(setq args (append args (list part-arg)))
-(if process-crypto
-   (setq args (append args '(--decrypt
-(setq args (append args (list query)))
+(defun notmuch-get-bodypart-binary (msg part process-crypto)
+  Return the unprocessed content of PART in MSG.
+
+This returns the \raw\ content of the given part after content
+transfer decoding, but with no further processing (see the
+discussion of --format=raw in man notmuch-show).  In particular,
+this does no charset conversion.
+  (let ((args `(show --format=raw
+   ,(format --part=%d (plist-get part :id))
+   ,@(when process-crypto '(--decrypt))
+   ,(notmuch-id-to-query (plist-get msg :id)
 (with-temp-buffer
   (let ((coding-system-for-read 'no-conversion))
-   (progn
- (apply 'call-process (append (list notmuch-command nil (list t nil) 
nil) args))
- (buffer-string))
+   (apply #'call-process notmuch-command nil '(t nil) nil args)
+   (buffer-string)
 
 (defun notmuch-get-bodypart-content (msg part process-crypto)
   (or (plist-get part :content)
-  (notmuch-get-bodypart-internal (notmuch-id-to-query (plist-get msg :id))
-(plist-get part :id) process-crypto)))
+  (notmuch-get-bodypart-binary msg part process-crypto)))
 
 ;; Workaround: The call to `mm-display-part' below triggers a bug in
 ;; Emacs 24 if it attempts to use the shr renderer to display an HTML
diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index 455cfee..e9867fd 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -557,16 +557,14 @@ (defun notmuch-show-w3m-cid-retrieve (url rest args)
(let* ((msg (nth 1 matching-part))
   (part (nth 2 matching-part))
   (content (nth 3 matching-part))
-  (message-id (plist-get msg :id))
-  (part-number (plist-get part :id))
   (content-type (plist-get part :content-type)))
  ;; If we don't already have the content, get it and cache
  ;; it, as some messages reference the same cid: part many
  ;; times (hundreds!), which results in many calls to
  ;; `notmuch part'.
  (unless content
-   (setq content (notmuch-get-bodypart-internal (notmuch-id-to-query 
message-id)
- part-number 
notmuch-show-process-crypto))
+   (setq content (notmuch-get-bodypart-binary
+  msg part notmuch-show-process-crypto))
(with-current-buffer w3m-current-buffer
  (notmuch-show-w3m-cid-store-internal url msg part content)))
  (insert content)
@@ -2067,15 +2065,14 @@ (defun notmuch-show-stash-mlarchive-link-and-go 
(optional mla)
 
 ;; Interactive part functions and their helpers
 
-(defun notmuch-show-generate-part-buffer (message-id nth)
+(defun notmuch-show-generate-part-buffer (msg part)
   Return a temporary buffer containing the specified part's content.
   (let ((buf (generate-new-buffer  *notmuch-part*))
(process-crypto notmuch-show-process-crypto))
 (with-current-buffer buf
-  (setq notmuch-show-process-crypto process-crypto)
-  ;; Always acquires the part via `notmuch part', even if it is
-  ;; available in the SEXP output.
-  (insert (notmuch-get-bodypart-internal message-id nth 
notmuch-show-process-crypto)))
+  ;; This is always used in the content of mm handles, which
+  ;; expect undecoded, binary part content.
+  (insert (notmuch-get-bodypart-binary msg part process-crypto)))
 buf))
 
 (defun notmuch-show-current-part-handle 

[PATCH v2] test: Test thread linking in all possible delivery orders

2014-04-21 Thread Austin Clements
These tests deliver all possible (single-root) four-message threads in
all possible orders and check that notmuch successfully links them
into threads.

There are two variants of the test: one delivers messages that
reference only their immediate parent and the other delivers messages
that reference all of their parents.  The latter test is currently
known-broken.

This is introduced as a new test (rather than just adding it to
T050-new) because it's much easier for this to start with an empty
database.
---

This version hopefully addresses David's comments in
id:87y4zhfmrn@maritornes.cs.unb.ca and adds a second test that
demonstrates the bug Mark in figured out in
id:8738h7kv2q@qmul.ac.uk.

test/T051-new-linking.sh | 91 
 1 file changed, 91 insertions(+)
 create mode 100755 test/T051-new-linking.sh

diff --git a/test/T051-new-linking.sh b/test/T051-new-linking.sh
new file mode 100755
index 000..9ccbc52
--- /dev/null
+++ b/test/T051-new-linking.sh
@@ -0,0 +1,91 @@
+#!/usr/bin/env bash
+test_description='notmuch new thread linking'
+
+. ./test-lib.sh
+
+# Generate all possible single-root four message thread structures.
+# Each line in THREADS is a thread structure, where the n'th field is
+# the parent of message n.  We'll use this for multiple tests below.
+THREADS=$(python -c '
+def mkTrees(free, tree={}):
+if free == set():
+print( .join(map(str, [msg[1] for msg in sorted(tree.items())])))
+return
+# Attach each free message to each message in the tree (if there is
+# no tree, make the free message the root), backtracking after each
+for msg in sorted(free):
+parents = sorted(tree.keys()) if tree else [none]
+for parent in parents:
+ntree = tree.copy()
+ntree[msg] = parent
+mkTrees(free - set([msg]), ntree)
+mkTrees(set(range(4)))')
+nthreads=$(wc -l  $THREADS)
+
+test_begin_subtest All four-message threads get linked in all delivery orders 
(one parent)
+# In the first variant, this delivers messages that reference only
+# their immediate parent.  Hence, we should only expect threads to be
+# fully joined at the end.
+for ((n = 0; n  4; n++)); do
+# Deliver the n'th message of every thread
+thread=0
+while read -a parents; do
+parent=${parents[$n]}
+generate_message \
+[id]=m$n@t$thread [in-reply-to]=\m$parent@t$thread\ \
+[subject]=p$thread [from]=m$n
+thread=$((thread + 1))
+done  $THREADS
+notmuch new  /dev/null
+done
+output=$(notmuch search --sort=newest-first '*' | notmuch_search_sanitize)
+expected=$(for ((i = 0; i  $nthreads; i++)); do
+echo thread:XXX   2001-01-05 [4/4] m3, m2, m1, m0; p$i (inbox unread)
+done)
+test_expect_equal $output $expected
+
+test_begin_subtest The same (full parent linkage)
+test_subtest_known_broken
+# Here we do the same thing as the previous test, but each message
+# references all of its parents.  Since every message references the
+# root of the thread, each thread should always be fully joined.  This
+# is currently broken because of the bug detailed in
+# id:8738h7kv2q@qmul.ac.uk.
+rm ${MAIL_DIR}/*
+notmuch new
+output=
+expected=
+for ((n = 0; n  4; n++)); do
+# Deliver the n'th message of every thread
+thread=0
+while read -a parents; do
+references=
+parent=${parents[$n]}
+while [[ $parent != none ]]; do
+references=m$parent@t$thread $references
+parent=${parents[$parent]}
+done
+
+generate_message \
+[id]=m$n@t$thread [references]='$references' \
+[subject]=p$thread [from]=m$n
+thread=$((thread + 1))
+done  $THREADS
+notmuch new  /dev/null
+
+output=$output
+$(notmuch search --sort=newest-first '*' | notmuch_search_sanitize)
+
+# Construct expected output
+template=thread:XXX   2001-01-05 [$((n+1))/$((n+1))]
+for ((m = n; m  0; m--)); do
+template=$template m$m,
+done
+expected=$expected
+$(for ((i = 0; i  $nthreads; i++)); do
+echo $template m0; p$i (inbox unread)
+done)
+done
+test_expect_equal $output $expected
+
+test_done
-- 
1.9.1

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 00/11] Improve charset and cid: handling

2014-04-21 Thread Tomi Ollila
On Mon, Apr 21 2014, Austin Clements amdra...@mit.edu wrote:

 I set out to quickly add support for cid: links in the shr renderer
 and wound up making our charset handling more robust and rewriting our
 content-ID handling.  The test introduced in patch 2 passes in all but
 one really obscure case, but only because of many unwritten and
 potentially fragile assumptions that Emacs and the CLI make about each
 other.

 The first three patches could reasonably go in to 0.18.  The rest of
 this series is certainly post-0.18, but I didn't want to lose track of
 it.

Patches 1-3 looks good and tests pass. I've also (suffessfully) used
this 'no-conversion (although I don't remember why ;/ -- this conversion
stuff is not something simple to comprehend...)

So, +1 for getting 1-3 to 0.18.

Tomi



 This series comes in three stages.  Each depends on the earlier ones,
 but each prefix makes sense on its own and could be pushed without the
 later stages.

 Patch 1 is a simple clean up patch.

 Patches 2 through 7 robust-ify our charset handling in Emacs, mostly
 by splitting the broken `notmuch-get-bodypart-content' API into
 `notmuch-get-bodypart-binary' and `notmuch-get-bodypart-text' so a
 caller can explicitly convey their requirements.

 The remaining patches improve our content-ID handling and add support
 for cid: links for shr.

 ___
 notmuch mailing list
 notmuch@notmuchmail.org
 http://notmuchmail.org/mailman/listinfo/notmuch
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch