[issue35222] email.utils.formataddr is not exactly the reverse of email.utils.parseaddr

2018-11-13 Thread R. David Murray


R. David Murray  added the comment:

Because the RFCs are defined only for ascii.  Non-ascii in RFC 2822 addresses 
is an RFC violation.  In python2 non-ascii would usually round-trip through 
these functions, but again that was an accident.

If you'd like to propose a doc clarification that would be fine, but the 
clarification would be that behavior on strings containing non-ascii is 
undefined.

Note that these functions are considered soft-deprecated...they are in modules 
that are in the "Legacy API" section of the email docs.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35222] email.utils.formataddr is not exactly the reverse of email.utils.parseaddr

2018-11-12 Thread skreft


skreft  added the comment:

@r.david.murray where do you see that those functions are only defined for 
ascii? There's nothing in the online docs stating that and furthermore 
`formataddr` has supported non-ascii names since version 3.3. RFC 2822 is 
however mentioned in the docstrings.

The fact that `formataddr` is not really the inverse warrants at least a note 
or clarification in the docs.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35222] email.utils.formataddr is not exactly the reverse of email.utils.parseaddr

2018-11-12 Thread R. David Murray

R. David Murray  added the comment:

Thanks for the report, but parseaddr and formataddr are defined *only* for 
ASCII.  In the port to python3, parseaddr sort-of-maybe-sometimes does the 
naively expected thing with non-ascii, but that's just an accident.  We could 
have added a check for non-ascii to parseaddr during the python3 port, but we 
didn't think of it, and it is too late now since adding it would break 
otherwise working code even though that code is technically broken.

So, for the defined API of parseaddr/formataddr, there is no bug here.

As for handling non-ascii in email per your link:

>>> from email.message import EmailMessage
>>> from email.policy import default
>>> m = EmailMessage(policy=default.clone(utf8=True))
>>> m['From'] = 'skreft+ñandú@sudoai.com
>>> bytes(m)
b'From: skreft+\xc3\xb1and\xc3\x...@sudoai.com\n\n'

(NB: in testing the above I discovered there is actually a recent bug in the 
serialization when utf8 is *False*: it does RFC2047 encoding of the username, 
which it should not do...instead it should raise an error.  Feel free to open a 
bug report for that...)

--
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed
type:  -> behavior

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35222] email.utils.formataddr is not exactly the reverse of email.utils.parseaddr

2018-11-12 Thread Rémi Lapeyre

Rémi Lapeyre  added the comment:

This is indeed an issue with formataddr, it expects the input to be ascii 
encoded as RFC 2822 requires.

Email is much more complicated though and has been internationalized, a summary 
of this work is available at 
https://en.wikipedia.org/wiki/Email_address#Internationalization.

I think the check in formataddr is not desirable anymore and should be remove.

I'm not sure wether the resulting value should be encoded using email.header or 
not.

--
nosy: +remi.lapeyre

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue35222] email.utils.formataddr is not exactly the reverse of email.utils.parseaddr

2018-11-12 Thread skreft

New submission from skreft :

The docs 
(https://docs.python.org/3/library/email.util.html#email.utils.formataddr) say 
that formataddr is the inverse of parseaddr, however non-ascii email addresses 
are treated differently in both methods.

parseaddr will return non-ascci addresses, whereas formataddr will raise a 
UnicodeError.

Below is an example:

In [1]: import email.utils as u

In [2]: u.parseaddr('skreft+ñandú@sudoai.com')
Out[2]: ('', 'skreft+ñandú@sudoai.com')

In [3]: u.formataddr(u.parseaddr('skreft+ñandú@sudoai.com'))
---
UnicodeEncodeErrorTraceback (most recent call last)
 in ()
> 1 u.formataddr(u.parseaddr('skreft+ñandú@sudoai.com'))

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/email/utils.py 
in formataddr(pair, charset)
 89 name, address = pair
 90 # The address MUST (per RFC) be ascii, so raise a UnicodeError if 
it isn't.
---> 91 address.encode('ascii')
 92 if name:
 93 try:

UnicodeEncodeError: 'ascii' codec can't encode character '\xf1' in position 7: 
ordinal not in range(128)

--
components: email
messages: 329765
nosy: barry, r.david.murray, skreft
priority: normal
severity: normal
status: open
title: email.utils.formataddr is not exactly the reverse of 
email.utils.parseaddr
versions: Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com