[PATCH 02/11] test: New tests for Emacs charset handling
On Thu, Apr 24 2014, Austin Clements wrote: > Quoth Mark Walters on Apr 24 at 3:38 pm: >> >> On Mon, 21 Apr 2014, Austin Clements wrote: >> > The test of viewing 8bit messages is known-broken. The rest pass, but >> > for very fragile reasons. The next several commits will fix the >> > known-broken test and make our charset handling robust. >> >> Hi >> >> On one of my systems one of these (non-broken) tests fails. I am not >> sure whether I messed up my emacs/environment when doing stuff remotely >> recently so it could just be my system >> >> >> > --- >> > test/T455-emacs-charsets.sh | 141 >> > >> > test/test-lib.el| 4 +- >> > 2 files changed, 144 insertions(+), 1 deletion(-) >> > create mode 100755 test/T455-emacs-charsets.sh >> > >> > diff --git a/test/T455-emacs-charsets.sh b/test/T455-emacs-charsets.sh >> > new file mode 100755 >> > index 000..a42a1d2 >> > --- /dev/null >> > +++ b/test/T455-emacs-charsets.sh >> > @@ -0,0 +1,141 @@ >> > +#!/usr/bin/env bash >> > + >> > +test_description="emacs notmuch-show charset handling" >> > +. ./test-lib.sh >> > + >> > + >> > +UTF8_YEN=$'\xef\xbf\xa5' >> > +BIG5_YEN=$'\xa2\x44' >> > + >> > +# Add four messages with unusual encoding requirements: >> > +# >> > +# 1) text/plain in quoted-printable big5 >> > +generate_message \ >> > +[id]=test-plain at example.com \ >> > +'[content-type]="text/plain; charset=big5"' \ >> > +'[content-transfer-encoding]=quoted-printable' \ >> > +'[body]="Yen: =A2=44"' >> > + >> > +# 2) text/plain in 8bit big5 >> > +generate_message \ >> > +[id]=test-plain-8bit at example.com \ >> > +'[content-type]="text/plain; charset=big5"' \ >> > +'[content-transfer-encoding]=8bit' \ >> > +'[body]="Yen: '$BIG5_YEN'"' >> > + >> > +# 3) text/html in quoted-printable big5 >> > +generate_message \ >> > +[id]=test-html at example.com \ >> > +'[content-type]="text/html; charset=big5"' \ >> > +'[content-transfer-encoding]=quoted-printable' \ >> > +'[body]="Yen: =A2=44"' >> > + >> > +# 4) application/octet-stream in quoted-printable of big5 text >> > +generate_message \ >> > +[id]=test-binary at example.com \ >> > +'[content-type]="application/octet-stream"' \ >> > +'[content-transfer-encoding]=quoted-printable' \ >> > +'[body]="Yen: =A2=44"' >> > + >> > +notmuch new > /dev/null >> > + >> > +# Test rendering >> > + >> > +test_begin_subtest "Text parts are decoded when rendering" >> > +test_emacs '(notmuch-show "id:test-plain at example.com") >> > + (test-visible-output "OUTPUT.raw")' >> > +awk 'show {print} /^$/ {show=1}' < OUTPUT.raw > OUTPUT >> > +cat> > +Yen: $UTF8_YEN >> > +EOF >> > +test_expect_equal_file OUTPUT EXPECTED >> > + >> > +test_begin_subtest "8bit text parts are decoded when rendering" >> > +test_emacs '(notmuch-show "id:test-plain-8bit at example.com") >> > + (test-visible-output "OUTPUT.raw")' >> > +awk 'show {print} /^$/ {show=1}' < OUTPUT.raw > OUTPUT >> > +cat > > +Yen: $UTF8_YEN >> > +EOF >> > +test_expect_equal_file OUTPUT EXPECTED >> > + >> > +test_begin_subtest "HTML parts are decoded when rendering" >> > +test_emacs '(notmuch-show "id:test-html at example.com") >> > + (test-visible-output "OUTPUT.raw")' >> > +awk 'show {print} /^$/ {show=1}' < OUTPUT.raw > OUTPUT >> > +cat > > +[ text/html ] >> > +Yen: $UTF8_YEN >> > +EOF >> > +test_expect_equal_file OUTPUT EXPECTED >> >> It's this test: I get an extra newline after the UFT8_YEN. >> >> This is with emacs 23.4.1 on a Debian wheezy(ish) system. >> >> But as said it could be my fault: I think some odd things happened when >> I tried to install emacs 24 and 23 simultaneously to help testing >> things. > > I can reproduce this. Tests that involve HTML rendering are always > finicky like this since there are so many different HTML renderers. > Maybe I could normalize the spacing? sed '/^$/d;s/ */ /g' or so? IIRC something was done there to unify renderers... id:1336316175-17852-1-git-send-email-awg+notmuch at xvx.ca ... that (or it's alternative(*) ;) could be made global Tomi (*) id:m2fwbc4irf.fsf at guru.guru-group.fi > >> Best wishes >> >> Mark >> >> >> > + >> > +# Test saving >> > + >> > +test_begin_subtest "Text parts are not decoded when saving" >> > +rm -f part >> > +test_emacs '(notmuch-show "id:test-plain at example.com") >> > + (search-forward "Yen") >> > + (let ((standard-input "\"part\"")) >> > + (notmuch-show-save-part))' >> > +cat > > +Yen: $BIG5_YEN >> > +EOF >> > +test_expect_equal_file part EXPECTED >> > + >> > +test_begin_subtest "8bit text parts are not decoded when saving" >> > +rm -f part >> > +test_emacs '(notmuch-show "id:test-plain-8bit at example.com") >> > + (search-forward "Yen") >> > + (let ((standard-input "\"part\"")) >> > + (notmuch-show-save-part))' >> > +cat > > +Yen: $BIG5_YEN >> > +EOF >> >
[PATCH 08/11] emacs: Support caching in notmuch-get-bodypart-{binary, text}
On Thu, 24 Apr 2014, Austin Clements wrote: > Quoth Mark Walters on Apr 24 at 11:46 am: >> >> On Mon, 21 Apr 2014, Austin Clements wrote: >> > (The actual code change here is small, but requires re-indenting >> > existing code.) >> > --- >> > emacs/notmuch-lib.el | 52 >> > ++-- >> > 1 file changed, 30 insertions(+), 22 deletions(-) >> > >> > diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el >> > index fc67b14..fee8512 100644 >> > --- a/emacs/notmuch-lib.el >> > +++ b/emacs/notmuch-lib.el >> > @@ -503,33 +503,39 @@ (defun notmuch-parts-filter-by-type (parts type) >> > (lambda (part) (notmuch-match-content-type (plist-get part >> > :content-type) type)) >> > parts)) >> > >> > -(defun notmuch-get-bodypart-binary (msg part process-crypto) >> > +(defun notmuch-get-bodypart-binary (msg part process-crypto >> > cache) >> >"Return the unprocessed content of PART in MSG as a unibyte string. >> > >> > This returns the \"raw\" content of the given part after content >> > transfer decoding, but with no further processing (see the >> > discussion of --format=raw in man notmuch-show). In particular, >> > this does no charset conversion." >> > - (let ((args `("show" "--format=raw" >> > - ,(format "--part=%d" (plist-get part :id)) >> > - ,@(when process-crypto '("--decrypt")) >> > - ,(notmuch-id-to-query (plist-get msg :id) >> > -(with-temp-buffer >> > - ;; Emacs internally uses a UTF-8-like multibyte string >> > - ;; representation by default (regardless of the coding system, >> > - ;; which only affects how it goes from outside data to this >> > - ;; internal representation). This *almost* never matters. >> > - ;; Annoyingly, it does matter if we use this data in an image >> > - ;; descriptor, since Emacs will use its internal data buffer >> > - ;; directly and this multibyte representation corrupts binary >> > - ;; image formats. Since the caller is asking for binary data, a >> > - ;; unibyte string is a more appropriate representation anyway. >> > - (set-buffer-multibyte nil) >> > - (let ((coding-system-for-read 'no-conversion)) >> > - (apply #'call-process notmuch-command nil '(t nil) nil args) >> > - (buffer-string) >> > - >> > -(defun notmuch-get-bodypart-text (msg part process-crypto) >> > + (let ((data (plist-get part :binary-content))) >> > +(when (not data) >> > + (let ((args `("show" "--format=raw" >> > + ,(format "--part=%d" (plist-get part :id)) >> > + ,@(when process-crypto '("--decrypt")) >> > + ,(notmuch-id-to-query (plist-get msg :id) >> > + (with-temp-buffer >> > +;; Emacs internally uses a UTF-8-like multibyte string >> > +;; representation by default (regardless of the coding >> > +;; system, which only affects how it goes from outside data >> > +;; to this internal representation). This *almost* never >> > +;; matters. Annoyingly, it does matter if we use this data >> > +;; in an image descriptor, since Emacs will use its internal >> > +;; data buffer directly and this multibyte representation >> > +;; corrupts binary image formats. Since the caller is >> > +;; asking for binary data, a unibyte string is a more >> > +;; appropriate representation anyway. >> > +(set-buffer-multibyte nil) >> > +(let ((coding-system-for-read 'no-conversion)) >> > + (apply #'call-process notmuch-command nil '(t nil) nil args) >> > + (setq data (buffer-string) >> > + (when cache >> > + (plist-put part :binary-content data))) >> > +data)) >> >> I am a little puzzled by this but that could be lack of familiarity with >> elisp. As far as I can see plist-put will sometimes modify the original >> plist and sometimes return a new plist. If the latter happens then I >> think it works out as if we hadn't cached anything as the part passed to >> the function is unmodified. That might not matter in this case (though I >> find the lack of determinism disturbing). >> >> Or is something else going on? > > No, your familiarity with Elisp serves you well. I'm completely > cheating here. According to the specification of plist-put, it's > allowed to return a new list but in reality this only happens when the > original plist is nil. We lean on this already all over the > notmuch-emacs code, but maybe that doesn't excuse me adding one more > cheat. > > I could add a comment here explaining what's going on, I could > manually do the list insertion in a way that's guaranteed to mutate it > in place, or I could add a nil :binary-content property when parts are > created (since plist-put is guaranteed to mutate existing keys in > place). I think a comment is fine. (Incidentally what is the best way of telling if emacs has changed an object or returned a new one for other commands? Something like (setq oldobject object) (setq
[PATCH 02/11] test: New tests for Emacs charset handling
On Thu, 24 Apr 2014, Austin Clements wrote: > Quoth Mark Walters on Apr 24 at 3:38 pm: >> >> On Mon, 21 Apr 2014, Austin Clements wrote: >> > The test of viewing 8bit messages is known-broken. The rest pass, but >> > for very fragile reasons. The next several commits will fix the >> > known-broken test and make our charset handling robust. >> >> Hi >> >> On one of my systems one of these (non-broken) tests fails. I am not >> sure whether I messed up my emacs/environment when doing stuff remotely >> recently so it could just be my system >> >> >> > --- >> > test/T455-emacs-charsets.sh | 141 >> > >> > test/test-lib.el| 4 +- >> > 2 files changed, 144 insertions(+), 1 deletion(-) >> > create mode 100755 test/T455-emacs-charsets.sh >> > >> > diff --git a/test/T455-emacs-charsets.sh b/test/T455-emacs-charsets.sh >> > new file mode 100755 >> > index 000..a42a1d2 >> > --- /dev/null >> > +++ b/test/T455-emacs-charsets.sh >> > @@ -0,0 +1,141 @@ >> > +#!/usr/bin/env bash >> > + >> > +test_description="emacs notmuch-show charset handling" >> > +. ./test-lib.sh >> > + >> > + >> > +UTF8_YEN=$'\xef\xbf\xa5' >> > +BIG5_YEN=$'\xa2\x44' >> > + >> > +# Add four messages with unusual encoding requirements: >> > +# >> > +# 1) text/plain in quoted-printable big5 >> > +generate_message \ >> > +[id]=test-plain at example.com \ >> > +'[content-type]="text/plain; charset=big5"' \ >> > +'[content-transfer-encoding]=quoted-printable' \ >> > +'[body]="Yen: =A2=44"' >> > + >> > +# 2) text/plain in 8bit big5 >> > +generate_message \ >> > +[id]=test-plain-8bit at example.com \ >> > +'[content-type]="text/plain; charset=big5"' \ >> > +'[content-transfer-encoding]=8bit' \ >> > +'[body]="Yen: '$BIG5_YEN'"' >> > + >> > +# 3) text/html in quoted-printable big5 >> > +generate_message \ >> > +[id]=test-html at example.com \ >> > +'[content-type]="text/html; charset=big5"' \ >> > +'[content-transfer-encoding]=quoted-printable' \ >> > +'[body]="Yen: =A2=44"' >> > + >> > +# 4) application/octet-stream in quoted-printable of big5 text >> > +generate_message \ >> > +[id]=test-binary at example.com \ >> > +'[content-type]="application/octet-stream"' \ >> > +'[content-transfer-encoding]=quoted-printable' \ >> > +'[body]="Yen: =A2=44"' >> > + >> > +notmuch new > /dev/null >> > + >> > +# Test rendering >> > + >> > +test_begin_subtest "Text parts are decoded when rendering" >> > +test_emacs '(notmuch-show "id:test-plain at example.com") >> > + (test-visible-output "OUTPUT.raw")' >> > +awk 'show {print} /^$/ {show=1}' < OUTPUT.raw > OUTPUT >> > +cat> > +Yen: $UTF8_YEN >> > +EOF >> > +test_expect_equal_file OUTPUT EXPECTED >> > + >> > +test_begin_subtest "8bit text parts are decoded when rendering" >> > +test_emacs '(notmuch-show "id:test-plain-8bit at example.com") >> > + (test-visible-output "OUTPUT.raw")' >> > +awk 'show {print} /^$/ {show=1}' < OUTPUT.raw > OUTPUT >> > +cat > > +Yen: $UTF8_YEN >> > +EOF >> > +test_expect_equal_file OUTPUT EXPECTED >> > + >> > +test_begin_subtest "HTML parts are decoded when rendering" >> > +test_emacs '(notmuch-show "id:test-html at example.com") >> > + (test-visible-output "OUTPUT.raw")' >> > +awk 'show {print} /^$/ {show=1}' < OUTPUT.raw > OUTPUT >> > +cat > > +[ text/html ] >> > +Yen: $UTF8_YEN >> > +EOF >> > +test_expect_equal_file OUTPUT EXPECTED >> >> It's this test: I get an extra newline after the UFT8_YEN. >> >> This is with emacs 23.4.1 on a Debian wheezy(ish) system. >> >> But as said it could be my fault: I think some odd things happened when >> I tried to install emacs 24 and 23 simultaneously to help testing >> things. > > I can reproduce this. Tests that involve HTML rendering are always > finicky like this since there are so many different HTML renderers. > Maybe I could normalize the spacing? sed '/^$/d;s/ */ /g' or so? That sounds plausible. What output would you get if the test genuinely fails? I guess the key thing is to try and make sure the test doesn't pass in that case. Best wishes Mark
Re: [PATCH 02/11] test: New tests for Emacs charset handling
On Thu, 24 Apr 2014, Austin Clements amdra...@mit.edu wrote: Quoth Mark Walters on Apr 24 at 3:38 pm: On Mon, 21 Apr 2014, Austin Clements amdra...@mit.edu wrote: The test of viewing 8bit messages is known-broken. The rest pass, but for very fragile reasons. The next several commits will fix the known-broken test and make our charset handling robust. Hi On one of my systems one of these (non-broken) tests fails. I am not sure whether I messed up my emacs/environment when doing stuff remotely recently so it could just be my system --- test/T455-emacs-charsets.sh | 141 test/test-lib.el| 4 +- 2 files changed, 144 insertions(+), 1 deletion(-) create mode 100755 test/T455-emacs-charsets.sh diff --git a/test/T455-emacs-charsets.sh b/test/T455-emacs-charsets.sh new file mode 100755 index 000..a42a1d2 --- /dev/null +++ b/test/T455-emacs-charsets.sh @@ -0,0 +1,141 @@ +#!/usr/bin/env bash + +test_description=emacs notmuch-show charset handling +. ./test-lib.sh + + +UTF8_YEN=$'\xef\xbf\xa5' +BIG5_YEN=$'\xa2\x44' + +# Add four messages with unusual encoding requirements: +# +# 1) text/plain in quoted-printable big5 +generate_message \ +[id]=test-pl...@example.com \ +'[content-type]=text/plain; charset=big5' \ +'[content-transfer-encoding]=quoted-printable' \ +'[body]=Yen: =A2=44' + +# 2) text/plain in 8bit big5 +generate_message \ +[id]=test-plain-8...@example.com \ +'[content-type]=text/plain; charset=big5' \ +'[content-transfer-encoding]=8bit' \ +'[body]=Yen: '$BIG5_YEN'' + +# 3) text/html in quoted-printable big5 +generate_message \ +[id]=test-h...@example.com \ +'[content-type]=text/html; charset=big5' \ +'[content-transfer-encoding]=quoted-printable' \ +'[body]=htmlbodyYen: =A2=44/body/html' + +# 4) application/octet-stream in quoted-printable of big5 text +generate_message \ +[id]=test-bin...@example.com \ +'[content-type]=application/octet-stream' \ +'[content-transfer-encoding]=quoted-printable' \ +'[body]=Yen: =A2=44' + +notmuch new /dev/null + +# Test rendering + +test_begin_subtest Text parts are decoded when rendering +test_emacs '(notmuch-show id:test-pl...@example.com) + (test-visible-output OUTPUT.raw)' +awk 'show {print} /^$/ {show=1}' OUTPUT.raw OUTPUT +cat EOF EXPECTED +Yen: $UTF8_YEN +EOF +test_expect_equal_file OUTPUT EXPECTED + +test_begin_subtest 8bit text parts are decoded when rendering +test_emacs '(notmuch-show id:test-plain-8...@example.com) + (test-visible-output OUTPUT.raw)' +awk 'show {print} /^$/ {show=1}' OUTPUT.raw OUTPUT +cat EOF EXPECTED +Yen: $UTF8_YEN +EOF +test_expect_equal_file OUTPUT EXPECTED + +test_begin_subtest HTML parts are decoded when rendering +test_emacs '(notmuch-show id:test-h...@example.com) + (test-visible-output OUTPUT.raw)' +awk 'show {print} /^$/ {show=1}' OUTPUT.raw OUTPUT +cat EOF EXPECTED +[ text/html ] +Yen: $UTF8_YEN +EOF +test_expect_equal_file OUTPUT EXPECTED It's this test: I get an extra newline after the UFT8_YEN. This is with emacs 23.4.1 on a Debian wheezy(ish) system. But as said it could be my fault: I think some odd things happened when I tried to install emacs 24 and 23 simultaneously to help testing things. I can reproduce this. Tests that involve HTML rendering are always finicky like this since there are so many different HTML renderers. Maybe I could normalize the spacing? sed '/^$/d;s/ */ /g' or so? That sounds plausible. What output would you get if the test genuinely fails? I guess the key thing is to try and make sure the test doesn't pass in that case. Best wishes Mark ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH 08/11] emacs: Support caching in notmuch-get-bodypart-{binary, text}
On Thu, 24 Apr 2014, Austin Clements amdra...@mit.edu wrote: Quoth Mark Walters on Apr 24 at 11:46 am: On Mon, 21 Apr 2014, Austin Clements amdra...@mit.edu wrote: (The actual code change here is small, but requires re-indenting existing code.) --- emacs/notmuch-lib.el | 52 ++-- 1 file changed, 30 insertions(+), 22 deletions(-) diff --git a/emacs/notmuch-lib.el b/emacs/notmuch-lib.el index fc67b14..fee8512 100644 --- a/emacs/notmuch-lib.el +++ b/emacs/notmuch-lib.el @@ -503,33 +503,39 @@ (defun notmuch-parts-filter-by-type (parts type) (lambda (part) (notmuch-match-content-type (plist-get part :content-type) type)) parts)) -(defun notmuch-get-bodypart-binary (msg part process-crypto) +(defun notmuch-get-bodypart-binary (msg part process-crypto optional cache) Return the unprocessed content of PART in MSG as a unibyte string. This returns the \raw\ content of the given part after content transfer decoding, but with no further processing (see the discussion of --format=raw in man notmuch-show). In particular, this does no charset conversion. - (let ((args `(show --format=raw - ,(format --part=%d (plist-get part :id)) - ,@(when process-crypto '(--decrypt)) - ,(notmuch-id-to-query (plist-get msg :id) -(with-temp-buffer - ;; Emacs internally uses a UTF-8-like multibyte string - ;; representation by default (regardless of the coding system, - ;; which only affects how it goes from outside data to this - ;; internal representation). This *almost* never matters. - ;; Annoyingly, it does matter if we use this data in an image - ;; descriptor, since Emacs will use its internal data buffer - ;; directly and this multibyte representation corrupts binary - ;; image formats. Since the caller is asking for binary data, a - ;; unibyte string is a more appropriate representation anyway. - (set-buffer-multibyte nil) - (let ((coding-system-for-read 'no-conversion)) - (apply #'call-process notmuch-command nil '(t nil) nil args) - (buffer-string) - -(defun notmuch-get-bodypart-text (msg part process-crypto) + (let ((data (plist-get part :binary-content))) +(when (not data) + (let ((args `(show --format=raw + ,(format --part=%d (plist-get part :id)) + ,@(when process-crypto '(--decrypt)) + ,(notmuch-id-to-query (plist-get msg :id) + (with-temp-buffer +;; Emacs internally uses a UTF-8-like multibyte string +;; representation by default (regardless of the coding +;; system, which only affects how it goes from outside data +;; to this internal representation). This *almost* never +;; matters. Annoyingly, it does matter if we use this data +;; in an image descriptor, since Emacs will use its internal +;; data buffer directly and this multibyte representation +;; corrupts binary image formats. Since the caller is +;; asking for binary data, a unibyte string is a more +;; appropriate representation anyway. +(set-buffer-multibyte nil) +(let ((coding-system-for-read 'no-conversion)) + (apply #'call-process notmuch-command nil '(t nil) nil args) + (setq data (buffer-string) + (when cache + (plist-put part :binary-content data))) +data)) I am a little puzzled by this but that could be lack of familiarity with elisp. As far as I can see plist-put will sometimes modify the original plist and sometimes return a new plist. If the latter happens then I think it works out as if we hadn't cached anything as the part passed to the function is unmodified. That might not matter in this case (though I find the lack of determinism disturbing). Or is something else going on? No, your familiarity with Elisp serves you well. I'm completely cheating here. According to the specification of plist-put, it's allowed to return a new list but in reality this only happens when the original plist is nil. We lean on this already all over the notmuch-emacs code, but maybe that doesn't excuse me adding one more cheat. I could add a comment here explaining what's going on, I could manually do the list insertion in a way that's guaranteed to mutate it in place, or I could add a nil :binary-content property when parts are created (since plist-put is guaranteed to mutate existing keys in place). I think a comment is fine. (Incidentally what is the best way of telling if emacs has changed an object or returned a new one for other commands? Something like (setq oldobject object) (setq object (operation-on object)) (if (eq object oldobject) ... )) Also, I think the function should have a comment about the lifetime of the caching. I think in some cases the addition of :binary-content could occur on load