[PATCH 06/11] emacs: Remove broken `notmuch-get-bodypart-content' API

2015-01-24 Thread Austin Clements
On Fri, 11 Jul 2014, David Bremner  wrote:
> Austin Clements  writes:
>
>> +This returns the content of the given part as a multibyte Lisp
>
> What does "multibyte" mean here? utf8? current encoding?

Elisp has two kinds of stings: "unibyte strings" and "multibyte
strings".

  
https://www.gnu.org/software/emacs/manual/html_node/elisp/Non_002dASCII-in-Strings.html

You can think of unibyte strings as binary data; they're just vectors of
bytes without any particular encoding semantics (though when you use a
unibyte string you can endow it with encoding).  Multibyte strings,
however, are text; they're vectors of Unicode code points.

>> +string after performing content transfer decoding and any
>> +necessary charset decoding.  It is an error to use this for
>> +non-text/* parts."
>> +  (let ((content (plist-get part :content)))
>> +(when (not content)
>> +  ;; Use show --format=sexp to fetch decoded content
>> +  (let* ((args `("show" "--format=sexp" "--include-html"
>> + ,(format "--part=%s" (plist-get part :id))
>> + ,@(when process-crypto '("--decrypt"))
>> + ,(notmuch-id-to-query (plist-get msg :id
>> + (npart (apply #'notmuch-call-notmuch-sexp args)))
>> +(setq content (plist-get npart :content))
>> +(when (not content)
>> +  (error "Internal error: No :content from %S" args
>> +content))
>
> I'm a bit curious at the lack of setting "coding-system-for-read" here.
> Are we assuming the user has their environment set up correctly? Not so
> much a criticism as being nervous about everything coding-system
> related.

That is interesting.  coding-system-for-read should really go in
notmuch-call-notmuch-sexp, but I worry that, while *almost* all strings
the CLI outputs are UTF-8, not quite all of them are.  For example, we
output filenames exactly at the OS reports the bytes to us (which is
necessary, in a sense, because POSIX enforces no particular encoding on
file names, but still really unfortunate).

We could set coding-system-for-read, but a full solution needs more
cooperation from the CLI.  Possibly the right answer, at least for the
sexp format, is to do our own UTF-8 to "\u" escapes for strings that
are known to be UTF-8 and leave the raw bytes for the few that aren't.
Then we would set the coding-system-for-read to 'no-conversion and I
think everything would Just Work.

That doesn't help for JSON, which is supposed to be all UTF-8 all the
time.  I can think of solutions there, but they're all ugly and involve
things like encoding filenames as base64 when they aren't valid UTF-8.

So...  I don't think I'm going to do anything about this at this moment.

> I didn't see anything else to object to in this patch or the previous
> one.


Re: [PATCH 06/11] emacs: Remove broken `notmuch-get-bodypart-content' API

2015-01-24 Thread Austin Clements
On Fri, 11 Jul 2014, David Bremner da...@tethera.net wrote:
 Austin Clements amdra...@mit.edu writes:

 +This returns the content of the given part as a multibyte Lisp

 What does multibyte mean here? utf8? current encoding?

Elisp has two kinds of stings: unibyte strings and multibyte
strings.

  
https://www.gnu.org/software/emacs/manual/html_node/elisp/Non_002dASCII-in-Strings.html

You can think of unibyte strings as binary data; they're just vectors of
bytes without any particular encoding semantics (though when you use a
unibyte string you can endow it with encoding).  Multibyte strings,
however, are text; they're vectors of Unicode code points.

 +string after performing content transfer decoding and any
 +necessary charset decoding.  It is an error to use this for
 +non-text/* parts.
 +  (let ((content (plist-get part :content)))
 +(when (not content)
 +  ;; Use show --format=sexp to fetch decoded content
 +  (let* ((args `(show --format=sexp --include-html
 + ,(format --part=%s (plist-get part :id))
 + ,@(when process-crypto '(--decrypt))
 + ,(notmuch-id-to-query (plist-get msg :id
 + (npart (apply #'notmuch-call-notmuch-sexp args)))
 +(setq content (plist-get npart :content))
 +(when (not content)
 +  (error Internal error: No :content from %S args
 +content))

 I'm a bit curious at the lack of setting coding-system-for-read here.
 Are we assuming the user has their environment set up correctly? Not so
 much a criticism as being nervous about everything coding-system
 related.

That is interesting.  coding-system-for-read should really go in
notmuch-call-notmuch-sexp, but I worry that, while *almost* all strings
the CLI outputs are UTF-8, not quite all of them are.  For example, we
output filenames exactly at the OS reports the bytes to us (which is
necessary, in a sense, because POSIX enforces no particular encoding on
file names, but still really unfortunate).

We could set coding-system-for-read, but a full solution needs more
cooperation from the CLI.  Possibly the right answer, at least for the
sexp format, is to do our own UTF-8 to \u escapes for strings that
are known to be UTF-8 and leave the raw bytes for the few that aren't.
Then we would set the coding-system-for-read to 'no-conversion and I
think everything would Just Work.

That doesn't help for JSON, which is supposed to be all UTF-8 all the
time.  I can think of solutions there, but they're all ugly and involve
things like encoding filenames as base64 when they aren't valid UTF-8.

So...  I don't think I'm going to do anything about this at this moment.

 I didn't see anything else to object to in this patch or the previous
 one.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH 06/11] emacs: Remove broken `notmuch-get-bodypart-content' API

2014-07-11 Thread David Bremner
Austin Clements  writes:

> +This returns the content of the given part as a multibyte Lisp

What does "multibyte" mean here? utf8? current encoding?

> +string after performing content transfer decoding and any
> +necessary charset decoding.  It is an error to use this for
> +non-text/* parts."
> +  (let ((content (plist-get part :content)))
> +(when (not content)
> +  ;; Use show --format=sexp to fetch decoded content
> +  (let* ((args `("show" "--format=sexp" "--include-html"
> +  ,(format "--part=%s" (plist-get part :id))
> +  ,@(when process-crypto '("--decrypt"))
> +  ,(notmuch-id-to-query (plist-get msg :id
> +  (npart (apply #'notmuch-call-notmuch-sexp args)))
> + (setq content (plist-get npart :content))
> + (when (not content)
> +   (error "Internal error: No :content from %S" args
> +content))

I'm a bit curious at the lack of setting "coding-system-for-read" here.
Are we assuming the user has their environment set up correctly? Not so
much a criticism as being nervous about everything coding-system
related.

I didn't see anything else to object to in this patch or the previous
one.


Re: [PATCH 06/11] emacs: Remove broken `notmuch-get-bodypart-content' API

2014-07-11 Thread David Bremner
Austin Clements amdra...@mit.edu writes:

 +This returns the content of the given part as a multibyte Lisp

What does multibyte mean here? utf8? current encoding?

 +string after performing content transfer decoding and any
 +necessary charset decoding.  It is an error to use this for
 +non-text/* parts.
 +  (let ((content (plist-get part :content)))
 +(when (not content)
 +  ;; Use show --format=sexp to fetch decoded content
 +  (let* ((args `(show --format=sexp --include-html
 +  ,(format --part=%s (plist-get part :id))
 +  ,@(when process-crypto '(--decrypt))
 +  ,(notmuch-id-to-query (plist-get msg :id
 +  (npart (apply #'notmuch-call-notmuch-sexp args)))
 + (setq content (plist-get npart :content))
 + (when (not content)
 +   (error Internal error: No :content from %S args
 +content))

I'm a bit curious at the lack of setting coding-system-for-read here.
Are we assuming the user has their environment set up correctly? Not so
much a criticism as being nervous about everything coding-system
related.

I didn't see anything else to object to in this patch or the previous
one.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH 06/11] emacs: Remove broken `notmuch-get-bodypart-content' API

2014-04-21 Thread Austin Clements
`notmuch-get-bodypart-content' could do two very different things,
depending on conditions: for text/* parts other than text/html, it
would return the part content as a multibyte Lisp string *after*
charset conversion, while for other parts (including text/html), it
would return binary part content without charset conversion.

This commit completes the split of `notmuch-get-bodypart-content' into
two different and explicit APIs: `notmuch-get-bodypart-binary' and
`notmuch-get-bodypart-text'.  It updates all callers to use one or the
other depending on what's appropriate.
---
 emacs/notmuch-lib.el  | 37 -
 emacs/notmuch-show.el |  8 
 2 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index 4711098..ece29c3 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -519,9 +519,25 @@ (defun notmuch-get-bodypart-binary (msg part 
process-crypto)
(apply #'call-process notmuch-command nil '(t nil) nil args)
(buffer-string)

-(defun notmuch-get-bodypart-content (msg part process-crypto)
-  (or (plist-get part :content)
-  (notmuch-get-bodypart-binary msg part process-crypto)))
+(defun notmuch-get-bodypart-text (msg part process-crypto)
+  "Return the text content of PART in MSG.
+
+This returns the content of the given part as a multibyte Lisp
+string after performing content transfer decoding and any
+necessary charset decoding.  It is an error to use this for
+non-text/* parts."
+  (let ((content (plist-get part :content)))
+(when (not content)
+  ;; Use show --format=sexp to fetch decoded content
+  (let* ((args `("show" "--format=sexp" "--include-html"
+,(format "--part=%s" (plist-get part :id))
+,@(when process-crypto '("--decrypt"))
+,(notmuch-id-to-query (plist-get msg :id
+(npart (apply #'notmuch-call-notmuch-sexp args)))
+   (setq content (plist-get npart :content))
+   (when (not content)
+ (error "Internal error: No :content from %S" args
+content))

 ;; Workaround: The call to `mm-display-part' below triggers a bug in
 ;; Emacs 24 if it attempts to use the shr renderer to display an HTML
@@ -542,18 +558,21 @@ (defun notmuch-mm-display-part-inline (msg part 
content-type process-crypto)
 current buffer, if possible."
   (let ((display-buffer (current-buffer)))
 (with-temp-buffer
-  ;; In case there is :content, the content string is already converted
-  ;; into emacs internal format. `gnus-decoded' is a fake charset,
-  ;; which means no further decoding (to be done by mm- functions).
-  (let* ((charset (if (plist-member part :content)
- 'gnus-decoded
+  ;; In case we already have :content, use it and tell mm-* that
+  ;; it's already been charset-decoded by using the fake
+  ;; `gnus-decoded' charset.  Otherwise, we'll fetch the binary
+  ;; part content and let mm-* decode it.
+  (let* ((have-content (plist-member part :content))
+(charset (if have-content 'gnus-decoded
(plist-get part :content-charset)))
 (handle (mm-make-handle (current-buffer) `(,content-type (charset 
. ,charset)
;; If the user wants the part inlined, insert the content and
;; test whether we are able to inline it (which includes both
;; capability and suitability tests).
(when (mm-inlined-p handle)
- (insert (notmuch-get-bodypart-content msg part process-crypto))
+ (if have-content
+ (insert (notmuch-get-bodypart-text msg part process-crypto))
+   (insert (notmuch-get-bodypart-binary msg part process-crypto)))
  (when (mm-inlinable-p handle)
(set-buffer display-buffer)
(mm-display-part handle)
diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index e9867fd..537c558 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -680,7 +680,7 @@ (defun notmuch-show-insert-part-text/plain (msg part 
content-type nth depth butt
   (let ((start (if button
   (button-start button)
 (point
-(insert (notmuch-get-bodypart-content msg part 
notmuch-show-process-crypto))
+(insert (notmuch-get-bodypart-text msg part notmuch-show-process-crypto))
 (save-excursion
   (save-restriction
(narrow-to-region start (point-max))
@@ -689,9 +689,9 @@ (defun notmuch-show-insert-part-text/plain (msg part 
content-type nth depth butt

 (defun notmuch-show-insert-part-text/calendar (msg part content-type nth depth 
button)
   (insert (with-temp-buffer
-   (insert (notmuch-get-bodypart-content msg part 
notmuch-show-process-crypto))
-   ;; notmuch-get-bodypart-content provides "raw", non-converted
-   ;; data. Replace CRLF with LF before icalendar can use it.
+   (insert (notmuch-get-bodypart-text msg 

[PATCH 06/11] emacs: Remove broken `notmuch-get-bodypart-content' API

2014-04-21 Thread Austin Clements
`notmuch-get-bodypart-content' could do two very different things,
depending on conditions: for text/* parts other than text/html, it
would return the part content as a multibyte Lisp string *after*
charset conversion, while for other parts (including text/html), it
would return binary part content without charset conversion.

This commit completes the split of `notmuch-get-bodypart-content' into
two different and explicit APIs: `notmuch-get-bodypart-binary' and
`notmuch-get-bodypart-text'.  It updates all callers to use one or the
other depending on what's appropriate.
---
 emacs/notmuch-lib.el  | 37 -
 emacs/notmuch-show.el |  8 
 2 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el
index 4711098..ece29c3 100644
--- a/emacs/notmuch-lib.el
+++ b/emacs/notmuch-lib.el
@@ -519,9 +519,25 @@ (defun notmuch-get-bodypart-binary (msg part 
process-crypto)
(apply #'call-process notmuch-command nil '(t nil) nil args)
(buffer-string)
 
-(defun notmuch-get-bodypart-content (msg part process-crypto)
-  (or (plist-get part :content)
-  (notmuch-get-bodypart-binary msg part process-crypto)))
+(defun notmuch-get-bodypart-text (msg part process-crypto)
+  Return the text content of PART in MSG.
+
+This returns the content of the given part as a multibyte Lisp
+string after performing content transfer decoding and any
+necessary charset decoding.  It is an error to use this for
+non-text/* parts.
+  (let ((content (plist-get part :content)))
+(when (not content)
+  ;; Use show --format=sexp to fetch decoded content
+  (let* ((args `(show --format=sexp --include-html
+,(format --part=%s (plist-get part :id))
+,@(when process-crypto '(--decrypt))
+,(notmuch-id-to-query (plist-get msg :id
+(npart (apply #'notmuch-call-notmuch-sexp args)))
+   (setq content (plist-get npart :content))
+   (when (not content)
+ (error Internal error: No :content from %S args
+content))
 
 ;; Workaround: The call to `mm-display-part' below triggers a bug in
 ;; Emacs 24 if it attempts to use the shr renderer to display an HTML
@@ -542,18 +558,21 @@ (defun notmuch-mm-display-part-inline (msg part 
content-type process-crypto)
 current buffer, if possible.
   (let ((display-buffer (current-buffer)))
 (with-temp-buffer
-  ;; In case there is :content, the content string is already converted
-  ;; into emacs internal format. `gnus-decoded' is a fake charset,
-  ;; which means no further decoding (to be done by mm- functions).
-  (let* ((charset (if (plist-member part :content)
- 'gnus-decoded
+  ;; In case we already have :content, use it and tell mm-* that
+  ;; it's already been charset-decoded by using the fake
+  ;; `gnus-decoded' charset.  Otherwise, we'll fetch the binary
+  ;; part content and let mm-* decode it.
+  (let* ((have-content (plist-member part :content))
+(charset (if have-content 'gnus-decoded
(plist-get part :content-charset)))
 (handle (mm-make-handle (current-buffer) `(,content-type (charset 
. ,charset)
;; If the user wants the part inlined, insert the content and
;; test whether we are able to inline it (which includes both
;; capability and suitability tests).
(when (mm-inlined-p handle)
- (insert (notmuch-get-bodypart-content msg part process-crypto))
+ (if have-content
+ (insert (notmuch-get-bodypart-text msg part process-crypto))
+   (insert (notmuch-get-bodypart-binary msg part process-crypto)))
  (when (mm-inlinable-p handle)
(set-buffer display-buffer)
(mm-display-part handle)
diff --git a/emacs/notmuch-show.el b/emacs/notmuch-show.el
index e9867fd..537c558 100644
--- a/emacs/notmuch-show.el
+++ b/emacs/notmuch-show.el
@@ -680,7 +680,7 @@ (defun notmuch-show-insert-part-text/plain (msg part 
content-type nth depth butt
   (let ((start (if button
   (button-start button)
 (point
-(insert (notmuch-get-bodypart-content msg part 
notmuch-show-process-crypto))
+(insert (notmuch-get-bodypart-text msg part notmuch-show-process-crypto))
 (save-excursion
   (save-restriction
(narrow-to-region start (point-max))
@@ -689,9 +689,9 @@ (defun notmuch-show-insert-part-text/plain (msg part 
content-type nth depth butt
 
 (defun notmuch-show-insert-part-text/calendar (msg part content-type nth depth 
button)
   (insert (with-temp-buffer
-   (insert (notmuch-get-bodypart-content msg part 
notmuch-show-process-crypto))
-   ;; notmuch-get-bodypart-content provides raw, non-converted
-   ;; data. Replace CRLF with LF before icalendar can use it.
+   (insert (notmuch-get-bodypart-text msg part