Re: [PATCH v1 0/3] Improve the acquisition of text parts.

2016-03-27 Thread David Bremner
David Edmondson  writes:

> Improve the acquisition of text parts.
>
> This affects the new "reply" behaviour and the rendering of
> application/octet-stream parts that are treated as text.
>

pushed.

d
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH v1 0/3] Improve the acquisition of text parts.

2016-03-26 Thread Mark Walters

Hi

Sorry this email ended up rather long:

Summary: I have run a test (see below) on all of the lkml part of the
performance-corpus, and all the changes look expected. So this series
looks good to me.

First note how we do the bodypart-insertion: for a mime type of
text/plain we first try the text/plain handler, then a text/* handler,
and finally a */* handler until one succeeds. Before this series, when
the part is application/octet-stream but is detected as text/plain, 
text/plain handler fails with a "bodypart insertion error" because
notmuch-get-bodypart-text fails can't get the text (because it's not
officially text). Thus we fall back on the */* handler and that inserts
the part. 

With this series notmuch-get-bodypart-text succeeds and we stop.

Thus in most cases the only change is that we don't get a "bodypart
insertion error", but all the text looks the same. In a couple of cases
the text/plain handler wraps lines/replaces ^M by unix newlines, whereas
as the */* handler does not. This is an improvement.

There is one more "difference" but I think this is actually something
random. Sometimes when the part is application/tar or application/zip I
get "Bodypart insert error: Symbol's function definition is void:
gnus-recursive-directory-files". If I load gnus this goes away. In my
first batch of tests this only occurred when using this series, but
since then I have reproduced it on mainline. I think something else I
did when setting up the test on mainline caused gnus to be loaded, but i
have not worked out what is going on there.

Finally, the test was as follows. I downloaded the performance corpus,
configured a separate notmuch config file to use the
performance-test/corpus/mail/lkml as the mailstore, went into
notmuch-emacs and to the inbox (which contained all messages) and ran
the following lisp function


(defun my-save-all-show ()
  (interactive)
  (goto-char (point-min))
  (let ((count 0))
(while (notmuch-search-find-thread-id)
  (let ((thread-id (notmuch-search-find-thread-id)))
(setq count (1+ count))
(message "Thread %s: %s" count thread-id)
(notmuch-show thread-id)
(let ((text (buffer-string))
  (coding-system-for-write 'no-conversion))
  (with-temp-file (concat "OUTPUT-" thread-id) (insert text)))
(kill-buffer))
  (notmuch-search-next-thread

I moved the OUTPUT files elsewhere and repeated with this series applied
and then ran diff on the output. This gave 7 threads with a change (each
an individual message) from the 16000 threads/ 10 messages which I
looked at individually as above.

Best wishes

Mark






On Mon, 14 Mar 2016, David Bremner  wrote:
> David Edmondson  writes:
>
>> On Sun, Mar 13 2016, Mark Walters wrote:
>>> However, it would be sensible to get testing in a greater variety of
>>> charsets/encodings
>>
>> Agreed. Does anyone have suggestions on how we might achieve this? A
>> corpus of mail that we could use?
>
> Maybe the notmuch performance corpus, particularly the lkml sample.
>
> grep -R charset= performance-test/corpus/mail/lkml | sed -e 's/^.*charset=//' 
> -e 's/;.*//' -e 's/"//g' | tr '[A-Z]' '[a-z]' | sort -u
>
> gives
>
> euc-kr
> gb2312
> iso-2022-jp
> iso-2022-jp-2
> iso-8859-1
> iso-8859-14
> iso 8859-15
> iso-8859-15
> iso-8859-1
> iso-8859-2
> iso-8859-6
> iso-8859-7
> iso-8859-9
> koi8-r
> koi8-u
> ks_c_5601-1987
> shift_jis
> unknown
> unknown-8bit
> us-ascii
> utf8
> utf-8
> windows-1250
> windows-1251
> windows-1252
> windows-1255
>
>
> to unpack the corpus
>
> cd performance-test
> make download-corpus
> ./T00-new.sh --large
>
> probably interrupt the test once notmuch-new starts running.
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH v1 0/3] Improve the acquisition of text parts.

2016-03-14 Thread David Bremner
David Edmondson  writes:

> On Sun, Mar 13 2016, Mark Walters wrote:
>> However, it would be sensible to get testing in a greater variety of
>> charsets/encodings
>
> Agreed. Does anyone have suggestions on how we might achieve this? A
> corpus of mail that we could use?

Maybe the notmuch performance corpus, particularly the lkml sample.

grep -R charset= performance-test/corpus/mail/lkml | sed -e 's/^.*charset=//' 
-e 's/;.*//' -e 's/"//g' | tr '[A-Z]' '[a-z]' | sort -u

gives

euc-kr
gb2312
iso-2022-jp
iso-2022-jp-2
iso-8859-1
iso-8859-14
iso 8859-15
iso-8859-15
iso-8859-1
iso-8859-2
iso-8859-6
iso-8859-7
iso-8859-9
koi8-r
koi8-u
ks_c_5601-1987
shift_jis
unknown
unknown-8bit
us-ascii
utf8
utf-8
windows-1250
windows-1251
windows-1252
windows-1255


to unpack the corpus

cd performance-test
make download-corpus
./T00-new.sh --large

probably interrupt the test once notmuch-new starts running.

___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH v1 0/3] Improve the acquisition of text parts.

2016-03-14 Thread David Edmondson
On Sun, Mar 13 2016, Mark Walters wrote:
> However, it would be sensible to get testing in a greater variety of
> charsets/encodings

Agreed. Does anyone have suggestions on how we might achieve this? A
corpus of mail that we could use?
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH v1 0/3] Improve the acquisition of text parts.

2016-03-13 Thread Mark Walters

This looks good to me +1. However, it would be sensible to get testing 
in a greater variety of charsets/encodings

Best wishes

Mark


On Tue, 08 Mar 2016, David Edmondson  wrote:
> Improve the acquisition of text parts.
>
> This affects the new "reply" behaviour and the rendering of
> application/octet-stream parts that are treated as text.
>
>
> David Edmondson (3):
>   emacs: `notmuch-show-insert-part-multipart/encrypted' should not
> assume the presence of a button.
>   emacs: Neaten `notmuch-show-insert-bodypart-internal'.
>   emacs: Improve the acquisition of text parts.
>
>  emacs/notmuch-lib.el  | 73 
> ++-
>  emacs/notmuch-show.el | 28 +---
>  2 files changed, 43 insertions(+), 58 deletions(-)
>
> -- 
> 2.1.4
>
> ___
> notmuch mailing list
> notmuch@notmuchmail.org
> https://notmuchmail.org/mailman/listinfo/notmuch
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


[PATCH v1 0/3] Improve the acquisition of text parts.

2016-03-08 Thread David Edmondson

Improve the acquisition of text parts.

This affects the new "reply" behaviour and the rendering of
application/octet-stream parts that are treated as text.


David Edmondson (3):
  emacs: `notmuch-show-insert-part-multipart/encrypted' should not
assume the presence of a button.
  emacs: Neaten `notmuch-show-insert-bodypart-internal'.
  emacs: Improve the acquisition of text parts.

 emacs/notmuch-lib.el  | 73 ++-
 emacs/notmuch-show.el | 28 +---
 2 files changed, 43 insertions(+), 58 deletions(-)

-- 
2.1.4

___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch