header continuation issue in notmuch frontend/alot/pythons email module

2013-06-24 Thread Thomas Schwinge
Hi!

On Mon, 24 Jun 2013 10:57:10 +0200, Justus Winter <4winter at 
informatik.uni-hamburg.de> wrote:
> Quoting Austin Clements (2013-06-23 18:59:39)
> > Quoth Justus Winter on Jun 23 at  3:11 pm:
> > > I recently had a problem replying to a mail written by Thomas Schwinge
> > > using an oldish notmuch. Not sure if it has been fixed in more recent

"Oldish", yeah, yeah, I know...  (Mumbles someting about long TODO list.)

> > > versions, but I think notmuch could improve uppon its header
> > > generation (see below). Problematic part of the mail:
> > > 
> > > ~~~ snip ~~~
> > > [...]
> > > To: someone at example.org, "line
> > >  break" , someoneelse at example.org
> > > User-Agent: Notmuch/0.9-101-g81dad07 (http://notmuchmail.org) 
> > > Emacs/23.4.1 (i486-pc-linux-gnu)
> > > [...]
> > > ~~~ snap ~~~

> > Do you happen to know how the strangely folded "to" header was
> > produced for this message?

I just entered/copied all the addresses into one long To: line, and then
let message-mode do its thing.

> No, but Thomas might. Thomas, the problematic message is
> id:877ghpqckb.fsf at kepler.schwinge.homeip.net

Here is the header from the message as I sent it:

To: Samuel Thibault , Justus Winter
 <4winter at informatik.uni-hamburg.de>, fotis.koutoulakis at gmail.com, Ian
 Lance Taylor , toscano.pino at tiscali.it, Luis Machado
 , =?utf-8?B?6ZmG5bKz?=
 

And this is what I received from the bug-hurd mailing list:

To: Samuel Thibault , Justus Winter
<4winter at informatik.uni-hamburg.de>, , "Ian
Lance Taylor" , , Luis 
Machado
,
=?utf-8?B?6ZmG5bKz?= 

So the "corruption" (if it is declared as one; I don't have time right
now to follow your RFC interpretation) must have happened after sending
it off -- perhaps my company's Microsoft Exchange server (as Justus
received a direct copy from that one), or even msmtp used as the local
MTA.


Gr??e,
 Thomas
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 489 bytes
Desc: not available
URL: 



header continuation issue in notmuch frontend/alot/pythons email module

2013-06-24 Thread Justus Winter
Quoting Austin Clements (2013-06-23 18:59:39)
> Quoth Justus Winter on Jun 23 at  3:11 pm:
> > Hi,
> > 
> > I recently had a problem replying to a mail written by Thomas Schwinge
> > using an oldish notmuch. Not sure if it has been fixed in more recent
> > versions, but I think notmuch could improve uppon its header
> > generation (see below). Problematic part of the mail:
> > 
> > ~~~ snip ~~~
> > [...]
> > To: someone at example.org, "line
> >  break" , someoneelse at example.org
> > User-Agent: Notmuch/0.9-101-g81dad07 (http://notmuchmail.org) Emacs/23.4.1 
> > (i486-pc-linux-gnu)
> > [...]
> > ~~~ snap ~~~
> > 
> > http://tools.ietf.org/html/rfc2822#section-2.2.3 says:
> > 
> >Note: Though structured field bodies are defined in such a way that
> >folding can take place between many of the lexical tokens (and even
> >within some of the lexical tokens), folding SHOULD be limited to
> >placing the CRLF at higher-level syntactic breaks.  For instance, if
> >a field body is defined as comma-separated values, it is recommended
> >that folding occur after the comma separating the structured items in
> >preference to other places where the field could be folded, even if
> >it is allowed elsewhere.
> > 
> > So notmuch "rfc-SHOULD" place the newlines after the comma.
> > 
> > The rfc goes on:
> > 
> >The process of moving from this folded multiple-line representation
> >of a header field to its single line representation is called
> >"unfolding". Unfolding is accomplished by simply removing any CRLF
> >that is immediately followed by WSP.  Each header field should be
> >treated in its unfolded form for further syntactic and semantic
> >evaluation.
> > 
> > My interpretation is that unfolding simply removes any linebreaks
> > first, so the value does not contain any newlines. But pythons email
> > module discriminates quoted and unquoted parts of the value:
> > 
> > ~~~ snip ~~~
> > from __future__ import print_function
> > import email
> > from email.utils import getaddresses
> > 
> > m = email.message_from_string('''To: "line
> >  break" , line
> >  break ''')
> > print("m['To'] = ", m['To'])
> > print("getaddresses(m.get_all('To')) = ", getaddresses(m.get_all('To')))
> > ~~~ snap ~~~
> > 
> > % python3 test.py
> > m['To'] =  "line
> >  break" , line
> >  break 
> > getaddresses(m.get_all('To')) =  [('line\n break', 'linebreak at 
> > example.org'), ('line break', 'linebreak at example.org')]
> > 
> > I believe that is what's preventing me from replying to the message
> > using alot without sanitizing the To header first. Not really sure who
> > is wrong or right here... any thoughts?
> 
> There are at least two bugs here.  Regardless of what we RFC-should
> do, that folding *is* permitted by RFC2822, since quoted
> strings can contain folding whitespace:
> 
>   http://tools.ietf.org/html/rfc2822#section-3.2.5
> 
> For completeness, the full derivation for this "To" header is:
> 
> to  =   "To:" address-list CRLF
> address-list=   (address *("," address)) / obs-addr-list
> address =   mailbox / group
> mailbox =   name-addr / addr-spec
> name-addr   =   [display-name] angle-addr
> display-name=   phrase
> phrase  =   1*word / obs-phrase
> word=   atom / quoted-string
> quoted-string   =   [CFWS]
> DQUOTE *([FWS] qcontent) [FWS] DQUOTE
> [CFWS]
> 
> Do you happen to know how the strangely folded "to" header was
> produced for this message?

No, but Thomas might. Thomas, the problematic message is
id:877ghpqckb.fsf at kepler.schwinge.homeip.net

>  In notmuch-emacs, a user can put whatever
> they want in a message-mode buffer's headers and mm will dutifully
> pass it on to their MTA.  We could validate it, but that's a slippery
> slope and I would hope that the MTA itself is validating it (and
> probably more thoroughly than we could).
> 
> That said, the first bug here is in Python.  As I mentioned above,
> foldable whitespace is allowed in quoted strings.  In fact, though the
> standard is rather long-winded about whitespace, if you dig into the
> grammar, you'll find that *all whitespace can be folded* (except in
> the obsolete grammar, which allowed whitespace between the header name
> and the colon, which obviously can't be folded).  I'm not sure what
> Python is doing, but I bet it's going to a lot of effort to
> mis-implement something very simple.

Yes, I'm glad you came to the same conclusion.

> There also appears to be a bug in the notmuch CLI's reply command
> where it omits addresses that were folded in the original message.  I
> don't know if alot uses the CLI's reply command, so this may or may
> not be related to your specific issue.  I haven't dug into this yet,
> other than to confirm that it's the CLI's fault and not
> notmuch-emacs's.

No, alot does not use notmuchs reply command.

Thanks,
Justus

header continuation issue in notmuch frontend/alot/pythons email module

2013-06-23 Thread Justus Winter
Hi,

I recently had a problem replying to a mail written by Thomas Schwinge
using an oldish notmuch. Not sure if it has been fixed in more recent
versions, but I think notmuch could improve uppon its header
generation (see below). Problematic part of the mail:

~~~ snip ~~~
[...]
To: someone at example.org, "line
 break" , someoneelse at example.org
User-Agent: Notmuch/0.9-101-g81dad07 (http://notmuchmail.org) Emacs/23.4.1 
(i486-pc-linux-gnu)
[...]
~~~ snap ~~~

http://tools.ietf.org/html/rfc2822#section-2.2.3 says:

   Note: Though structured field bodies are defined in such a way that
   folding can take place between many of the lexical tokens (and even
   within some of the lexical tokens), folding SHOULD be limited to
   placing the CRLF at higher-level syntactic breaks.  For instance, if
   a field body is defined as comma-separated values, it is recommended
   that folding occur after the comma separating the structured items in
   preference to other places where the field could be folded, even if
   it is allowed elsewhere.

So notmuch "rfc-SHOULD" place the newlines after the comma.

The rfc goes on:

   The process of moving from this folded multiple-line representation
   of a header field to its single line representation is called
   "unfolding". Unfolding is accomplished by simply removing any CRLF
   that is immediately followed by WSP.  Each header field should be
   treated in its unfolded form for further syntactic and semantic
   evaluation.

My interpretation is that unfolding simply removes any linebreaks
first, so the value does not contain any newlines. But pythons email
module discriminates quoted and unquoted parts of the value:

~~~ snip ~~~
from __future__ import print_function
import email
from email.utils import getaddresses

m = email.message_from_string('''To: "line
 break" , line
 break ''')
print("m['To'] = ", m['To'])
print("getaddresses(m.get_all('To')) = ", getaddresses(m.get_all('To')))
~~~ snap ~~~

% python3 test.py
m['To'] =  "line
 break" , line
 break 
getaddresses(m.get_all('To')) =  [('line\n break', 'linebreak at example.org'), 
('line break', 'linebreak at example.org')]

I believe that is what's preventing me from replying to the message
using alot without sanitizing the To header first. Not really sure who
is wrong or right here... any thoughts?

Justus


header continuation issue in notmuch frontend/alot/pythons email module

2013-06-23 Thread Austin Clements
On Sun, 23 Jun 2013, Austin Clements  wrote:
> There also appears to be a bug in the notmuch CLI's reply command
> where it omits addresses that were folded in the original message.  I
> don't know if alot uses the CLI's reply command, so this may or may
> not be related to your specific issue.  I haven't dug into this yet,
> other than to confirm that it's the CLI's fault and not
> notmuch-emacs's.

I take back what I said about there being a bug in the reply command.
It was a problem with my test case.


header continuation issue in notmuch frontend/alot/pythons email module

2013-06-23 Thread Austin Clements
Quoth Justus Winter on Jun 23 at  3:11 pm:
> Hi,
> 
> I recently had a problem replying to a mail written by Thomas Schwinge
> using an oldish notmuch. Not sure if it has been fixed in more recent
> versions, but I think notmuch could improve uppon its header
> generation (see below). Problematic part of the mail:
> 
> ~~~ snip ~~~
> [...]
> To: someone at example.org, "line
>  break" , someoneelse at example.org
> User-Agent: Notmuch/0.9-101-g81dad07 (http://notmuchmail.org) Emacs/23.4.1 
> (i486-pc-linux-gnu)
> [...]
> ~~~ snap ~~~
> 
> http://tools.ietf.org/html/rfc2822#section-2.2.3 says:
> 
>Note: Though structured field bodies are defined in such a way that
>folding can take place between many of the lexical tokens (and even
>within some of the lexical tokens), folding SHOULD be limited to
>placing the CRLF at higher-level syntactic breaks.  For instance, if
>a field body is defined as comma-separated values, it is recommended
>that folding occur after the comma separating the structured items in
>preference to other places where the field could be folded, even if
>it is allowed elsewhere.
> 
> So notmuch "rfc-SHOULD" place the newlines after the comma.
> 
> The rfc goes on:
> 
>The process of moving from this folded multiple-line representation
>of a header field to its single line representation is called
>"unfolding". Unfolding is accomplished by simply removing any CRLF
>that is immediately followed by WSP.  Each header field should be
>treated in its unfolded form for further syntactic and semantic
>evaluation.
> 
> My interpretation is that unfolding simply removes any linebreaks
> first, so the value does not contain any newlines. But pythons email
> module discriminates quoted and unquoted parts of the value:
> 
> ~~~ snip ~~~
> from __future__ import print_function
> import email
> from email.utils import getaddresses
> 
> m = email.message_from_string('''To: "line
>  break" , line
>  break ''')
> print("m['To'] = ", m['To'])
> print("getaddresses(m.get_all('To')) = ", getaddresses(m.get_all('To')))
> ~~~ snap ~~~
> 
> % python3 test.py
> m['To'] =  "line
>  break" , line
>  break 
> getaddresses(m.get_all('To')) =  [('line\n break', 'linebreak at 
> example.org'), ('line break', 'linebreak at example.org')]
> 
> I believe that is what's preventing me from replying to the message
> using alot without sanitizing the To header first. Not really sure who
> is wrong or right here... any thoughts?

There are at least two bugs here.  Regardless of what we RFC-should
do, that folding *is* permitted by RFC2822, since quoted
strings can contain folding whitespace:

  http://tools.ietf.org/html/rfc2822#section-3.2.5

For completeness, the full derivation for this "To" header is:

to  =   "To:" address-list CRLF
address-list=   (address *("," address)) / obs-addr-list
address =   mailbox / group
mailbox =   name-addr / addr-spec
name-addr   =   [display-name] angle-addr
display-name=   phrase
phrase  =   1*word / obs-phrase
word=   atom / quoted-string
quoted-string   =   [CFWS]
DQUOTE *([FWS] qcontent) [FWS] DQUOTE
[CFWS]

Do you happen to know how the strangely folded "to" header was
produced for this message?  In notmuch-emacs, a user can put whatever
they want in a message-mode buffer's headers and mm will dutifully
pass it on to their MTA.  We could validate it, but that's a slippery
slope and I would hope that the MTA itself is validating it (and
probably more thoroughly than we could).

That said, the first bug here is in Python.  As I mentioned above,
foldable whitespace is allowed in quoted strings.  In fact, though the
standard is rather long-winded about whitespace, if you dig into the
grammar, you'll find that *all whitespace can be folded* (except in
the obsolete grammar, which allowed whitespace between the header name
and the colon, which obviously can't be folded).  I'm not sure what
Python is doing, but I bet it's going to a lot of effort to
mis-implement something very simple.

There also appears to be a bug in the notmuch CLI's reply command
where it omits addresses that were folded in the original message.  I
don't know if alot uses the CLI's reply command, so this may or may
not be related to your specific issue.  I haven't dug into this yet,
other than to confirm that it's the CLI's fault and not
notmuch-emacs's.

> Justus