[PATCH 06/11] emacs: Remove broken `notmuch-get-bodypart-content' API
On Fri, 11 Jul 2014, David Bremner wrote: > Austin Clements writes: > >> +This returns the content of the given part as a multibyte Lisp > > What does "multibyte" mean here? utf8? current encoding? Elisp has two kinds of stings: "unibyte strings" and "multibyte strings". https://www.gnu.org/software/emacs/manual/html_node/elisp/Non_002dASCII-in-Strings.html You can think of unibyte strings as binary data; they're just vectors of bytes without any particular encoding semantics (though when you use a unibyte string you can endow it with encoding). Multibyte strings, however, are text; they're vectors of Unicode code points. >> +string after performing content transfer decoding and any >> +necessary charset decoding. It is an error to use this for >> +non-text/* parts." >> + (let ((content (plist-get part :content))) >> +(when (not content) >> + ;; Use show --format=sexp to fetch decoded content >> + (let* ((args `("show" "--format=sexp" "--include-html" >> + ,(format "--part=%s" (plist-get part :id)) >> + ,@(when process-crypto '("--decrypt")) >> + ,(notmuch-id-to-query (plist-get msg :id >> + (npart (apply #'notmuch-call-notmuch-sexp args))) >> +(setq content (plist-get npart :content)) >> +(when (not content) >> + (error "Internal error: No :content from %S" args >> +content)) > > I'm a bit curious at the lack of setting "coding-system-for-read" here. > Are we assuming the user has their environment set up correctly? Not so > much a criticism as being nervous about everything coding-system > related. That is interesting. coding-system-for-read should really go in notmuch-call-notmuch-sexp, but I worry that, while *almost* all strings the CLI outputs are UTF-8, not quite all of them are. For example, we output filenames exactly at the OS reports the bytes to us (which is necessary, in a sense, because POSIX enforces no particular encoding on file names, but still really unfortunate). We could set coding-system-for-read, but a full solution needs more cooperation from the CLI. Possibly the right answer, at least for the sexp format, is to do our own UTF-8 to "\u" escapes for strings that are known to be UTF-8 and leave the raw bytes for the few that aren't. Then we would set the coding-system-for-read to 'no-conversion and I think everything would Just Work. That doesn't help for JSON, which is supposed to be all UTF-8 all the time. I can think of solutions there, but they're all ugly and involve things like encoding filenames as base64 when they aren't valid UTF-8. So... I don't think I'm going to do anything about this at this moment. > I didn't see anything else to object to in this patch or the previous > one.
Re: [PATCH 06/11] emacs: Remove broken `notmuch-get-bodypart-content' API
On Fri, 11 Jul 2014, David Bremner da...@tethera.net wrote: Austin Clements amdra...@mit.edu writes: +This returns the content of the given part as a multibyte Lisp What does multibyte mean here? utf8? current encoding? Elisp has two kinds of stings: unibyte strings and multibyte strings. https://www.gnu.org/software/emacs/manual/html_node/elisp/Non_002dASCII-in-Strings.html You can think of unibyte strings as binary data; they're just vectors of bytes without any particular encoding semantics (though when you use a unibyte string you can endow it with encoding). Multibyte strings, however, are text; they're vectors of Unicode code points. +string after performing content transfer decoding and any +necessary charset decoding. It is an error to use this for +non-text/* parts. + (let ((content (plist-get part :content))) +(when (not content) + ;; Use show --format=sexp to fetch decoded content + (let* ((args `(show --format=sexp --include-html + ,(format --part=%s (plist-get part :id)) + ,@(when process-crypto '(--decrypt)) + ,(notmuch-id-to-query (plist-get msg :id + (npart (apply #'notmuch-call-notmuch-sexp args))) +(setq content (plist-get npart :content)) +(when (not content) + (error Internal error: No :content from %S args +content)) I'm a bit curious at the lack of setting coding-system-for-read here. Are we assuming the user has their environment set up correctly? Not so much a criticism as being nervous about everything coding-system related. That is interesting. coding-system-for-read should really go in notmuch-call-notmuch-sexp, but I worry that, while *almost* all strings the CLI outputs are UTF-8, not quite all of them are. For example, we output filenames exactly at the OS reports the bytes to us (which is necessary, in a sense, because POSIX enforces no particular encoding on file names, but still really unfortunate). We could set coding-system-for-read, but a full solution needs more cooperation from the CLI. Possibly the right answer, at least for the sexp format, is to do our own UTF-8 to \u escapes for strings that are known to be UTF-8 and leave the raw bytes for the few that aren't. Then we would set the coding-system-for-read to 'no-conversion and I think everything would Just Work. That doesn't help for JSON, which is supposed to be all UTF-8 all the time. I can think of solutions there, but they're all ugly and involve things like encoding filenames as base64 when they aren't valid UTF-8. So... I don't think I'm going to do anything about this at this moment. I didn't see anything else to object to in this patch or the previous one. ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
[PATCH 06/11] emacs: Remove broken `notmuch-get-bodypart-content' API
Austin Clements writes: > +This returns the content of the given part as a multibyte Lisp What does "multibyte" mean here? utf8? current encoding? > +string after performing content transfer decoding and any > +necessary charset decoding. It is an error to use this for > +non-text/* parts." > + (let ((content (plist-get part :content))) > +(when (not content) > + ;; Use show --format=sexp to fetch decoded content > + (let* ((args `("show" "--format=sexp" "--include-html" > + ,(format "--part=%s" (plist-get part :id)) > + ,@(when process-crypto '("--decrypt")) > + ,(notmuch-id-to-query (plist-get msg :id > + (npart (apply #'notmuch-call-notmuch-sexp args))) > + (setq content (plist-get npart :content)) > + (when (not content) > + (error "Internal error: No :content from %S" args > +content)) I'm a bit curious at the lack of setting "coding-system-for-read" here. Are we assuming the user has their environment set up correctly? Not so much a criticism as being nervous about everything coding-system related. I didn't see anything else to object to in this patch or the previous one.
Re: [PATCH 06/11] emacs: Remove broken `notmuch-get-bodypart-content' API
Austin Clements amdra...@mit.edu writes: +This returns the content of the given part as a multibyte Lisp What does multibyte mean here? utf8? current encoding? +string after performing content transfer decoding and any +necessary charset decoding. It is an error to use this for +non-text/* parts. + (let ((content (plist-get part :content))) +(when (not content) + ;; Use show --format=sexp to fetch decoded content + (let* ((args `(show --format=sexp --include-html + ,(format --part=%s (plist-get part :id)) + ,@(when process-crypto '(--decrypt)) + ,(notmuch-id-to-query (plist-get msg :id + (npart (apply #'notmuch-call-notmuch-sexp args))) + (setq content (plist-get npart :content)) + (when (not content) + (error Internal error: No :content from %S args +content)) I'm a bit curious at the lack of setting coding-system-for-read here. Are we assuming the user has their environment set up correctly? Not so much a criticism as being nervous about everything coding-system related. I didn't see anything else to object to in this patch or the previous one. ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
[PATCH 06/11] emacs: Remove broken `notmuch-get-bodypart-content' API
`notmuch-get-bodypart-content' could do two very different things, depending on conditions: for text/* parts other than text/html, it would return the part content as a multibyte Lisp string *after* charset conversion, while for other parts (including text/html), it would return binary part content without charset conversion. This commit completes the split of `notmuch-get-bodypart-content' into two different and explicit APIs: `notmuch-get-bodypart-binary' and `notmuch-get-bodypart-text'. It updates all callers to use one or the other depending on what's appropriate. --- emacs/notmuch-lib.el | 37 - emacs/notmuch-show.el | 8 2 files changed, 32 insertions(+), 13 deletions(-) diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el index 4711098..ece29c3 100644 --- a/emacs/notmuch-lib.el +++ b/emacs/notmuch-lib.el @@ -519,9 +519,25 @@ (defun notmuch-get-bodypart-binary (msg part process-crypto) (apply #'call-process notmuch-command nil '(t nil) nil args) (buffer-string) -(defun notmuch-get-bodypart-content (msg part process-crypto) - (or (plist-get part :content) - (notmuch-get-bodypart-binary msg part process-crypto))) +(defun notmuch-get-bodypart-text (msg part process-crypto) + "Return the text content of PART in MSG. + +This returns the content of the given part as a multibyte Lisp +string after performing content transfer decoding and any +necessary charset decoding. It is an error to use this for +non-text/* parts." + (let ((content (plist-get part :content))) +(when (not content) + ;; Use show --format=sexp to fetch decoded content + (let* ((args `("show" "--format=sexp" "--include-html" +,(format "--part=%s" (plist-get part :id)) +,@(when process-crypto '("--decrypt")) +,(notmuch-id-to-query (plist-get msg :id +(npart (apply #'notmuch-call-notmuch-sexp args))) + (setq content (plist-get npart :content)) + (when (not content) + (error "Internal error: No :content from %S" args +content)) ;; Workaround: The call to `mm-display-part' below triggers a bug in ;; Emacs 24 if it attempts to use the shr renderer to display an HTML @@ -542,18 +558,21 @@ (defun notmuch-mm-display-part-inline (msg part content-type process-crypto) current buffer, if possible." (let ((display-buffer (current-buffer))) (with-temp-buffer - ;; In case there is :content, the content string is already converted - ;; into emacs internal format. `gnus-decoded' is a fake charset, - ;; which means no further decoding (to be done by mm- functions). - (let* ((charset (if (plist-member part :content) - 'gnus-decoded + ;; In case we already have :content, use it and tell mm-* that + ;; it's already been charset-decoded by using the fake + ;; `gnus-decoded' charset. Otherwise, we'll fetch the binary + ;; part content and let mm-* decode it. + (let* ((have-content (plist-member part :content)) +(charset (if have-content 'gnus-decoded (plist-get part :content-charset))) (handle (mm-make-handle (current-buffer) `(,content-type (charset . ,charset) ;; If the user wants the part inlined, insert the content and ;; test whether we are able to inline it (which includes both ;; capability and suitability tests). (when (mm-inlined-p handle) - (insert (notmuch-get-bodypart-content msg part process-crypto)) + (if have-content + (insert (notmuch-get-bodypart-text msg part process-crypto)) + (insert (notmuch-get-bodypart-binary msg part process-crypto))) (when (mm-inlinable-p handle) (set-buffer display-buffer) (mm-display-part handle) diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el index e9867fd..537c558 100644 --- a/emacs/notmuch-show.el +++ b/emacs/notmuch-show.el @@ -680,7 +680,7 @@ (defun notmuch-show-insert-part-text/plain (msg part content-type nth depth butt (let ((start (if button (button-start button) (point -(insert (notmuch-get-bodypart-content msg part notmuch-show-process-crypto)) +(insert (notmuch-get-bodypart-text msg part notmuch-show-process-crypto)) (save-excursion (save-restriction (narrow-to-region start (point-max)) @@ -689,9 +689,9 @@ (defun notmuch-show-insert-part-text/plain (msg part content-type nth depth butt (defun notmuch-show-insert-part-text/calendar (msg part content-type nth depth button) (insert (with-temp-buffer - (insert (notmuch-get-bodypart-content msg part notmuch-show-process-crypto)) - ;; notmuch-get-bodypart-content provides "raw", non-converted - ;; data. Replace CRLF with LF before icalendar can use it. + (insert (notmuch-get-bodypart-text msg
[PATCH 06/11] emacs: Remove broken `notmuch-get-bodypart-content' API
`notmuch-get-bodypart-content' could do two very different things, depending on conditions: for text/* parts other than text/html, it would return the part content as a multibyte Lisp string *after* charset conversion, while for other parts (including text/html), it would return binary part content without charset conversion. This commit completes the split of `notmuch-get-bodypart-content' into two different and explicit APIs: `notmuch-get-bodypart-binary' and `notmuch-get-bodypart-text'. It updates all callers to use one or the other depending on what's appropriate. --- emacs/notmuch-lib.el | 37 - emacs/notmuch-show.el | 8 2 files changed, 32 insertions(+), 13 deletions(-) diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el index 4711098..ece29c3 100644 --- a/emacs/notmuch-lib.el +++ b/emacs/notmuch-lib.el @@ -519,9 +519,25 @@ (defun notmuch-get-bodypart-binary (msg part process-crypto) (apply #'call-process notmuch-command nil '(t nil) nil args) (buffer-string) -(defun notmuch-get-bodypart-content (msg part process-crypto) - (or (plist-get part :content) - (notmuch-get-bodypart-binary msg part process-crypto))) +(defun notmuch-get-bodypart-text (msg part process-crypto) + Return the text content of PART in MSG. + +This returns the content of the given part as a multibyte Lisp +string after performing content transfer decoding and any +necessary charset decoding. It is an error to use this for +non-text/* parts. + (let ((content (plist-get part :content))) +(when (not content) + ;; Use show --format=sexp to fetch decoded content + (let* ((args `(show --format=sexp --include-html +,(format --part=%s (plist-get part :id)) +,@(when process-crypto '(--decrypt)) +,(notmuch-id-to-query (plist-get msg :id +(npart (apply #'notmuch-call-notmuch-sexp args))) + (setq content (plist-get npart :content)) + (when (not content) + (error Internal error: No :content from %S args +content)) ;; Workaround: The call to `mm-display-part' below triggers a bug in ;; Emacs 24 if it attempts to use the shr renderer to display an HTML @@ -542,18 +558,21 @@ (defun notmuch-mm-display-part-inline (msg part content-type process-crypto) current buffer, if possible. (let ((display-buffer (current-buffer))) (with-temp-buffer - ;; In case there is :content, the content string is already converted - ;; into emacs internal format. `gnus-decoded' is a fake charset, - ;; which means no further decoding (to be done by mm- functions). - (let* ((charset (if (plist-member part :content) - 'gnus-decoded + ;; In case we already have :content, use it and tell mm-* that + ;; it's already been charset-decoded by using the fake + ;; `gnus-decoded' charset. Otherwise, we'll fetch the binary + ;; part content and let mm-* decode it. + (let* ((have-content (plist-member part :content)) +(charset (if have-content 'gnus-decoded (plist-get part :content-charset))) (handle (mm-make-handle (current-buffer) `(,content-type (charset . ,charset) ;; If the user wants the part inlined, insert the content and ;; test whether we are able to inline it (which includes both ;; capability and suitability tests). (when (mm-inlined-p handle) - (insert (notmuch-get-bodypart-content msg part process-crypto)) + (if have-content + (insert (notmuch-get-bodypart-text msg part process-crypto)) + (insert (notmuch-get-bodypart-binary msg part process-crypto))) (when (mm-inlinable-p handle) (set-buffer display-buffer) (mm-display-part handle) diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el index e9867fd..537c558 100644 --- a/emacs/notmuch-show.el +++ b/emacs/notmuch-show.el @@ -680,7 +680,7 @@ (defun notmuch-show-insert-part-text/plain (msg part content-type nth depth butt (let ((start (if button (button-start button) (point -(insert (notmuch-get-bodypart-content msg part notmuch-show-process-crypto)) +(insert (notmuch-get-bodypart-text msg part notmuch-show-process-crypto)) (save-excursion (save-restriction (narrow-to-region start (point-max)) @@ -689,9 +689,9 @@ (defun notmuch-show-insert-part-text/plain (msg part content-type nth depth butt (defun notmuch-show-insert-part-text/calendar (msg part content-type nth depth button) (insert (with-temp-buffer - (insert (notmuch-get-bodypart-content msg part notmuch-show-process-crypto)) - ;; notmuch-get-bodypart-content provides raw, non-converted - ;; data. Replace CRLF with LF before icalendar can use it. + (insert (notmuch-get-bodypart-text msg part