[issue44560] Unrecognized charset "eucgb2312_cn" in email header for many MUA

2021-07-06 Thread R. David Murray

R. David Murray  added the comment:

I can't tell tell for sure if this behavior is intentional or not from a quick 
glance at the code (though like you I wouldn't think it would be).

That's part of the legacy api, at this point.  The new api will just use utf8:

from email.message import EmailMessage

m = EmailMessage()
m['Subject'] = '中文'

print(bytes(m))

results in

b'Subject: =?utf-8?b?5Lit5paH?=\n\n'

The fix, assuming it is correct, would be to add the line:

'eucgb2312_cn': 'gb2312',

to the CODEC_MAP in email/charset.py, and then specify the internal codec name 
in your Charset call.  I'm not sure that's right, though...once upon I time I 
think I understood the logic behind the charset module, but I no longer 
remember the details.

I'd recommend just using the new API and not the legacy API.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44560] Unrecognized charset "eucgb2312_cn" in email header for many MUA

2021-07-06 Thread Mark Sapiro


Change by Mark Sapiro :


--
versions: +Python 3.7, Python 3.8, Python 3.9

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44560] Unrecognized charset "eucgb2312_cn" in email header for many MUA

2021-07-04 Thread Dong-hee Na


Change by Dong-hee Na :


--
nosy: +corona10

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44560] Unrecognized charset "eucgb2312_cn" in email header for many MUA

2021-07-04 Thread TommyLike Hu

New submission from TommyLike Hu :

Email module is used for email message decode and encode, if the header content 
is gb2312 encoded for example "中文", by design we would finally have a rfc-2047 
encoded header as below:
```
=?eucgb2312_cn?b?1tDOxA==?=
```
the test script is as below:
```
from email import header, charset

h = header.make_header([(str("中文").encode("gb2312"),
 charset.Charset("gb2312"))])
print(h.encode())
```

My question is why don't we use "gb2312" as the charset in rfc-2047 encoded 
string, considering the "eucgb2312_cn" is only python awareness.

Thanks

--
components: email
messages: 396939
nosy: barry, r.david.murray, tommylikehu
priority: normal
severity: normal
status: open
title: Unrecognized charset "eucgb2312_cn" in email header for many MUA
type: behavior
versions: Python 3.6

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com