[issue37943] mimetypes.guess_extension() doesn’t get JPG right
Jens Troeger added the comment: @fbidu, oh I missed that, thank you! Shall I close the issue again, or what’s the common procedure in this case? -- ___ Python tracker <https://bugs.python.org/issue37943> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37943] mimetypes.guess_extension() doesn’t get JPG right
Jens Troeger added the comment: This is still not working: tried it on Python 3.8.5 and Python 3.7.8. >>> import mimetypes >>> mimetypes.guess_extension('image/jpg') >>> mimetypes.guess_extension('image/jpeg') '.jpg' Both should return the same value; I expected the mimetype 'image/jpg' to return extension '.jpg' because that mimetype is used a lot. -- resolution: out of date -> remind status: closed -> open ___ Python tracker <https://bugs.python.org/issue37943> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37943] mimetypes.guess_extension() doesn’t get JPG right
Jens Troeger added the comment: Oops, forgot… >>> mimetypes.guess_extension("image/jpeg") # Expected ".jpg" or ".jpeg" as per referenced MDN. I personally would go with ".jpg" because that's the more common file name extension. -- ___ Python tracker <https://bugs.python.org/issue37943> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37943] mimetypes.guess_extension() doesn’t get JPG right
New submission from Jens Troeger : I think this one’s quite easy to reproduce: Python 3.7.4 (default, Jul 11 2019, 01:08:00) [Clang 10.0.1 (clang-1001.0.46.4)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import mimetypes >>> mimetypes.guess_extension("image/jpg") # Expected ".jpg" >>> mimetypes.guess_extension("image/jpeg") # Expected ".jpg" '.jpe' According to MDN https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types/Complete_list_of_MIME_types only "image/jpeg" is a valid MIME type; however, I’ve seen quite a bit of "image/jpg" out in the wild and I think that ought to be accounted for too. Before I look into submitting a PR I wanted to confirm that this is an issue that ought to be fixed. I think it is. -- components: Library (Lib) messages: 350408 nosy: _savage priority: normal severity: normal status: open title: mimetypes.guess_extension() doesn’t get JPG right type: behavior versions: Python 3.7 ___ Python tracker <https://bugs.python.org/issue37943> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34424] Unicode names break email header
Jens Troeger added the comment: Cheryl, if you can find somebody to approve and merge this fix, that would be greatly appreciated! Anything I can do, please let me know. -- ___ Python tracker <https://bugs.python.org/issue34424> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30717] Add unicode grapheme cluster break algorithm
Change by Jens Troeger : -- nosy: +_savage ___ Python tracker <https://bugs.python.org/issue30717> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34424] Unicode names break email header
Jens Troeger added the comment: Can somebody please review and merge https://github.com/python/cpython/pull/8803 ? I am still waiting for this fix the become mainstream. -- ___ Python tracker <https://bugs.python.org/issue34424> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue28879] smtplib send_message should add Date header if it is missing, per RFC5322
Jens Troeger added the comment: Any updates on this? Looks like the proposed change has not been merged into mainstream yet? I’m having problems with Google rejecting emails: (555, b'5.5.2 Syntax error, goodbye. r10-v6sm7321838qtj.41 - gsmtp', '…') and using IETF’s message linter (https://tools.ietf.org/tools/msglint/) I get the following: ERROR: missing mandatory header 'date' lines 1-7 ERROR: missing mandatory header 'return-path' lines 1-7 amongst a few others. -- nosy: +_savage ___ Python tracker <https://bugs.python.org/issue28879> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34424] Unicode names break email header
Jens Troeger added the comment: Pull request https://github.com/python/cpython/pull/8803/ -- ___ Python tracker <https://bugs.python.org/issue34424> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24218] Also support SMTPUTF8 in smtplib's send_message method.
Jens Troeger added the comment: New issue: https://bugs.python.org/issue34424 -- ___ Python tracker <https://bugs.python.org/issue24218> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34424] Unicode names break email header
New submission from Jens Troeger : See also this comment and ensuing conversation: https://bugs.python.org/issue24218?#msg322761 Consider an email message with the following: message = EmailMessage() message["From"] = Address(addr_spec="b...@foo.com", display_name="Jens Troeger") message["To"] = Address(addr_spec="f...@bar.com", display_name="Martín Córdoba") It’s important here that the email itself is `ascii` encodable, but the names are not. Flattening the object (https://github.com/python/cpython/blob/master/Lib/smtplib.py#L964) incorrectly inserts multiple linefeeds, thus breaking the email header, thus mangling the entire email: flatmsg: b'From: Jens Troeger \r\nTo: Fernando =?utf-8?q?Mart=C3=ADn_C=C3=B3rdoba?= \r\r\r\r\r\nSubject:\r\n Confirmation: …\r\n…' After an initial investigation into the BytesGenerator (used to flatten an EmailMessage object), here is what’s happening. Flattening the body and attachments of the EmailMessage object works, and eventually _write_headers() is called to flatten the headers which happens entry by entry (https://github.com/python/cpython/blob/master/Lib/email/generator.py#L417-L418). Flattening a header entry is a recursive process over the parse tree of the entry, which builds the flattened and encoded final string by descending into the parse tree and encoding & concatenating the individual “parts” (tokens of the header entry). Given the parse tree for a header entry like "Martín Córdoba " eventually results in the correct flattened string: '=?utf-8?q?Mart=C3=ADn_C=C3=B3rdoba?= \r\n' at the bottom of the recursion for this “Mailbox” part. The recursive callstack is then: _refold_parse_tree _header_value_parser.py:2687 fold [Mailbox] _header_value_parser.py:144 _refold_parse_tree _header_value_parser.py:2630 fold [Address] _header_value_parser.py:144 _refold_parse_tree _header_value_parser.py:2630 fold [AddressList] _header_value_parser.py:144 _refold_parse_tree _header_value_parser.py:2630 fold [Header] _header_value_parser.py:144 fold [_UniqueAddressHeader] headerregistry.py:258 _fold [EmailPolicy] policy.py:205 fold_binary [EmailPolicy] policy.py:199 _write_headers [BytesGenerator] generator.py:418 _write [BytesGenerator] generator.py:195 The problem now arises from the interplay of # https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser.py#L2629 encoded_part = part.fold(policy=policy)[:-1] # strip nl which strips the '\n' from the returned string, and # https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser.py#L2686 return policy.linesep.join(lines) + policy.linesep which adds the policy’s line separation string linesep="\r\n" to the end of the flattened string upon unrolling the recursion. I am not sure about a proper fix here, but considering that the linesep policy can be any string length (in this case len("\r\n") == 2) a fixed truncation of one character [:-1] seems wrong. Instead, using: encoded_part = part.fold(policy=policy)[:-len(policy.linesep)] # strip nl seems to work for entries with and without Unicode characters in their display names. -- components: email messages: 323686 nosy: _savage, barry, r.david.murray priority: normal severity: normal status: open title: Unicode names break email header type: behavior versions: Python 3.7 ___ Python tracker <https://bugs.python.org/issue34424> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24218] Also support SMTPUTF8 in smtplib's send_message method.
Jens Troeger added the comment: Thanks David: PR on Github (which is R/O) or where should I submit to? -- ___ Python tracker <https://bugs.python.org/issue24218> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24218] Also support SMTPUTF8 in smtplib's send_message method.
Jens Troeger added the comment: @David, any thoughts on this? -- ___ Python tracker <https://bugs.python.org/issue24218> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24218] Also support SMTPUTF8 in smtplib's send_message method.
Jens Troeger added the comment: David, I tried to find the mentioned '\r\r…\n' issue but I could not find it here. However, from an initial investigation into the BytesGenerator, here is what’s happening. Flattening the body and attachments of the EmailMessage object works, and eventually _write_headers() is called to flatten the headers which happens entry by entry (https://github.com/python/cpython/blob/master/Lib/email/generator.py#L417-L418). Flattening a header entry is a recursive process over the parse tree of the entry, which builds the flattened and encoded final string by descending into the parse tree and encoding & concatenating the individual “parts” (tokens of the header entry). Given the parse tree for a header entry like "Martín Córdoba " eventually results in the correct flattened string: '=?utf-8?q?Mart=C3=ADn_C=C3=B3rdoba?= \r\n' at the bottom of the recursion for this “Mailbox” part. The recursive callstack is then: _refold_parse_tree _header_value_parser.py:2687 fold [Mailbox] _header_value_parser.py:144 _refold_parse_tree _header_value_parser.py:2630 fold [Address] _header_value_parser.py:144 _refold_parse_tree _header_value_parser.py:2630 fold [AddressList] _header_value_parser.py:144 _refold_parse_tree _header_value_parser.py:2630 fold [Header] _header_value_parser.py:144 fold [_UniqueAddressHeader] headerregistry.py:258 _fold [EmailPolicy] policy.py:205 fold_binary [EmailPolicy] policy.py:199 _write_headers [BytesGenerator] generator.py:418 _write [BytesGenerator] generator.py:195 The problem now arises from the interplay of # https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser.py#L2629 encoded_part = part.fold(policy=policy)[:-1] # strip nl which strips the '\n' from the returned string, and # https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser.py#L2686 return policy.linesep.join(lines) + policy.linesep which adds the policy’s line separation string linesep="\r\n" to the end of the flattened string upon unrolling the recursion. I am not sure about a proper fix here, but considering that the linesep policy can be any string length (in this case len("\r\n") == 2) a fixed truncation of one character [:-1] seems wrong. Instead, using: encoded_part = part.fold(policy=policy)[:-len(policy.linesep)] # strip nl seems to work for entries with and without Unicode characters in their display names. David, please advise on how to proceed from here. -- ___ Python tracker <https://bugs.python.org/issue24218> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24218] Also support SMTPUTF8 in smtplib's send_message method.
Jens Troeger added the comment: So that’s interesting. I thought that setting `international = True` (see line https://github.com/python/cpython/blob/master/Lib/smtplib.py#L947) would be a neat workaround, but the opposite. When delivering those emails to Gmail I started seeing Failed to send email: (555, b'5.5.2 Syntax error, goodbye. s53-v6sm1864855qts.5 - gsmtp', 'f...@bar.com') and it turns out (according to the IETF message linter, https://tools.ietf.org/tools/msglint/) that: --- UNKNOWN: unknown header 'User-Agent' at line 4 ERROR: missing mandatory header 'date' lines 1-7 ERROR: missing mandatory header 'return-path' lines 1-7 OK: found part text/plain line 9 WARNING: line 13 too long (109 chars); text/plain shouldn't need folding (RFC 2046-4.1.1) WARNING: line 15 too long (124 chars); text/plain shouldn't need folding (RFC 2046-4.1.1) WARNING: Character set mislabelled as 'utf-8' when 'us-ascii' suffices, body part ending line 22 --- It seems that now “Date” and “Return-Path” header entries are missing when the email is generated. I reverted the initial change. Any updates on the multiple CR problem when flattening? -- ___ Python tracker <https://bugs.python.org/issue24218> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24218] Also support SMTPUTF8 in smtplib's send_message method.
Jens Troeger added the comment: > Well, posting on a closed issue is generally not the best way :) Fair enough ;) > The multiple carriage returns is a bug, and there is an open issue for it, > though I'm not finding it at the moment. Oh good, yes that should be fixed! My current workaround is setting `international = True` _always_, which prevents the multiple CRs. Not pretty but it works… -- ___ Python tracker <https://bugs.python.org/issue24218> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33398] From, To, Cc lines break when calling send_message()
Jens Troeger added the comment: See also this issue comment: https://bugs.python.org/issue24218#msg322761 -- ___ Python tracker <https://bugs.python.org/issue33398> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24218] Also support SMTPUTF8 in smtplib's send_message method.
Jens Troeger added the comment: (continuing the previous message msg322761) …unless the addresses should be checked separately from the display names, in which case the BytesGenerator’s flatten() function should be fixed. Without reading the RFC, please let me know how to continue from here. -- ___ Python tracker <https://bugs.python.org/issue24218> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24218] Also support SMTPUTF8 in smtplib's send_message method.
Jens Troeger added the comment: I was about to open an issue when I found this one. Consider an email message with the following: message = EmailMessage() message["From"] = Address(addr_spec="b...@foo.com", display_name="Jens Troeger") message["To"] = Address(addr_spec="f...@bar.com", display_name="Martín Córdoba") It’s important here that the email itself is `ascii` encodable, but the names are not. With that combination, send_message() falsely assumes plain text addresses (see https://github.com/python/cpython/blob/master/Lib/smtplib.py#L949 where it checks only email addresses, not display names!) and therefore the `international` flag stays False. As a result of that, flattening the email object (https://github.com/python/cpython/blob/master/Lib/smtplib.py#L964) incorrectly inserts multiple linefeeds, thus breaking the email header, thus mangling the entire email: flatmsg: b'From: Jens Troeger \r\nTo: Fernando =?utf-8?q?Mart=C3=ADn_C=C3=B3rdoba?= \r\r\r\r\r\nSubject:\r\n Confirmation: …\r\n…' I think a proper fix would be in line 949, where email addresses and display names should be checked for encoding. The comment to that function should also be adjusted to mention display names? Note also that the attached patch does not test the above scenario, and should probably be extended as well. -- nosy: +_savage ___ Python tracker <https://bugs.python.org/issue24218> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33398] From, To, Cc lines break when calling send_message()
New submission from Jens Troeger : It looks like non-ascii characters in an Address()’s display_name parameter cause their lines in the header to get mangled when the message is being sent. For example, a case to reproduce: >>> msg = EmailMessage() >>> msg["To"] = Address(display_name="Jens Tröger", addr_spec="jens.troe...@gmail.com") >>> msg["From"] = Address(display_name="Jens Troeger", addr_spec="jens.troe...@gmail.com") >>> msg.set_content("Some content.") >>> msg.as_string() 'To: Jens =?utf-8?q?Tr=C3=B6ger?= \nContent-Type: text/plain; charset="utf-8"\nContent-Transfer-Encoding: 7bit\nMIME-Version: 1.0\nFrom: Jens Troeger \n\nSome content.\n' Sending this email creates the following SMTP debug output: >>> smtpsrv = smtplib.SMTP("smtp.gmail.com:587") >>> … >>> smtpsrv.send_message(msg) send: 'mail FROM: size=220\r\n' reply: b'250 2.1.0 OK z23sm16924622pfe.110 - gsmtp\r\n' reply: retcode (250); Msg: b'2.1.0 OK z23sm16924622pfe.110 - gsmtp' send: 'rcpt TO:\r\n' reply: b'250 2.1.5 OK z23sm16924622pfe.110 - gsmtp\r\n' reply: retcode (250); Msg: b'2.1.5 OK z23sm16924622pfe.110 - gsmtp' send: 'data\r\n' reply: b'354 Go ahead z23sm16924622pfe.110 - gsmtp\r\n' reply: retcode (354); Msg: b'Go ahead z23sm16924622pfe.110 - gsmtp' data: (354, b'Go ahead z23sm16924622pfe.110 - gsmtp') send: b'To: Jens =?utf-8?q?Tr=C3=B6ger?= \r\r\r\r\r\nContent-Type: text/plain; charset="utf-8"\r\nContent-Transfer- Encoding: 7bit\r\nMIME-Version: 1.0\r\nFrom: Jens Troeger \r\n\r\nSome content.\r\n.\r\n' reply: b'250 2.0.0 OK 1525174591 z23sm16924622pfe.110 - gsmtp\r\n' reply: retcode (250); Msg: b'2.0.0 OK 1525174591 z23sm16924622pfe.110 - gsmtp' data: (250, b'2.0.0 OK 1525174591 z23sm16924622pfe.110 - gsmtp') {} Notice the string of "\r\r\…" for the "To" field which consequently breaks off the remainder of the email’s header into a premature body: […] Message-ID: <5ae8513e.17b9620a.eebf7.d...@mx.google.com> Date: Tue, 01 May 2018 04:36:30 -0700 (PDT) From: jens.troe...@gmail.com To: Jens Tröger Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit MIME-Version: 1.0 From: Jens Troeger Some content. Also notice the two From fields. The first one, I suspect, is supplied from the SMTP server’s login, the second one from them EmailMessage. Without a From in the EmailMessage, I get the following error: >>> smtpsrv.send_message(msg) Traceback (most recent call last): File "", line 1, in File "/…/lib/python3.6/smtplib.py", line 936, in send_message from_addr = email.utils.getaddresses([from_addr])[0][1] File "/…/lib/python3.6/email/utils.py", line 112, in getaddresses all = COMMASPACE.join(fieldvalues) TypeError: sequence item 0: expected str instance, NoneType found Similar breakage of the header into premature body can be achieved with the Cc header field. -- components: email messages: 315994 nosy: _savage, barry, r.david.murray priority: normal severity: normal status: open title: From, To, Cc lines break when calling send_message() type: behavior versions: Python 3.6 ___ Python tracker <https://bugs.python.org/issue33398> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24931] _asdict breaks when inheriting from a namedtuple
Jens Troeger added the comment: With my update from Python 3.4.3 to Python 3.4.4 (default, Dec 25 2015, 06:14:41) I started experiencing crashes of my applications and I suspect this change is the culprit. I have a class that inherits from namedtuple, and code calls vars() (i.e. retrieve __dict__) to iterate over an instance's attributes. Much like Raymond points out in http://bugs.python.org/msg249100 For example with 3.4.3: >>> from collections import namedtuple >>> Point = namedtuple('Point', ['x', 'y']) >>> p = Point(1,2) >>> p Point(x=1, y=2) >>> p.__dict__ OrderedDict([('x', 1), ('y', 2)]) >>> vars(p) OrderedDict([('x', 1), ('y', 2)]) After the most recent update this breaks with 3.4.4: >>> from collections import namedtuple >>> Point = namedtuple('Point', ['x', 'y']) >>> p = Point(1,2) >>> p Point(x=1, y=2) >>> p.__dict__ Traceback (most recent call last): File "", line 1, in AttributeError: 'Point' object has no attribute '__dict__' >>> vars(p) Traceback (most recent call last): File "", line 1, in TypeError: vars() argument must have __dict__ attribute I am not sure about the fix on my side. Should I use _asdict() instead of vars() although I would argue that vars() should remain functional across this change. Calling _asdict() seems messy to me, but it works: >>> p._asdict() OrderedDict([('x', 1), ('y', 2)]) Why not keep the __dict__ property in tact? @property def __dict__(self): return self._asdict() Thanks! -- nosy: +_savage ___ Python tracker <http://bugs.python.org/issue24931> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22561] PyUnicode_InternInPlace crashes
Jens Troeger added the comment: Thanks Victor. I had the suspicion that UNO might set up somewhat incorrectly, and consequently cause this problem. To answer your questions: - Debug symbols: agreed. I haven't built a vanilla Python with symbols yet. I'm using MacPorts default Python 3.3. - Yes, this is Python on Mac using MacPorts. - Yes, UNO is compatible with Python 3.3. When you install LibreOffice (on Mac) then it ships with a Python 3.3 interpreter. Interestingly, using THAT interpreter the crash does not happen when I import the module, but it does happen -- on occasion -- upon exiting. However, I do not know what options Python is compiled with for LibreOffice. Shall I bounce this issue back to the LibreOffice folks and see if I can find whoever owns that piece of code? (If anybody does...) -- ___ Python tracker <http://bugs.python.org/issue22561> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22561] PyUnicode_InternInPlace crashes
New submission from Jens Troeger: This might be an issue with Python, or an issue with Libre/OpenOffice not setting up the UNO environment correctly. The crash happens during "import uno" of Python 3.3 in the PyUnicode_InternInPlace function. I've done some digging and posted more information about this crash in this forum: http://en.libreofficeforum.org/node/9195 -- components: Unicode messages: 228635 nosy: _savage, ezio.melotti, haypo priority: normal severity: normal status: open title: PyUnicode_InternInPlace crashes type: crash versions: Python 3.3 ___ Python tracker <http://bugs.python.org/issue22561> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com