Karsten Hilbert <karsten.hilb...@gmx.net> wrote: > > Terry Reedy <tjre...@udel.edu> wrote: > > > On 8/26/2020 11:10 AM, Chris Green wrote: > > > > > > > I have a simple[ish] local mbox mail delivery module as follows:- > > > ... > > > > It has run faultlessly for many years under Python 2. I've now > > > > changed the calling program to Python 3 and while it handles most > > > > E-Mail OK I have just got the following error:- > > > > > > > > Traceback (most recent call last): > > > > File "/home/chris/.mutt/bin/filter.py", line 102, in <module> > > > > mailLib.deliverMboxMsg(dest, msg, log) > > > ... > > > > File "/usr/lib/python3.8/email/generator.py", line 406, in write > > > > self._fp.write(s.encode('ascii', 'surrogateescape')) > > > > UnicodeEncodeError: 'ascii' codec can't encode character '\ufeff' in > > > position 4: ordinal not in range(128) > > > > > > '\ufeff' is the Unicode byte-order mark. It should not be present in an > > > ascii-only 3.x string and would not normally be present in general > > > unicode except in messages like this that talk about it. Read about it, > > > for instance, at > > > https://en.wikipedia.org/wiki/Byte_order_mark > > > > > > I would catch the error and print part or all of string s to see what is > > > going on with this particular message. Does it have other non-ascii > > > chars? > > > > > I can provoke the error simply by sending myself an E-Mail with > > accented characters in it. I'm pretty sure my Linux system is set up > > correctly for UTF8 characters, I certainly seem to be able to send and > > receive these to others and I even get to see messages in other > > scripts such as arabic, chinese, etc. > > > > The code above works perfectly in Python 2 delivering messages with > > accented (and other extended) characters with no problems at all. > > Sending myself E-Mails with accented characters works OK with the code > > running under Python 2. > > > > While an E-Mail body possibly *shouldn't* have non-ASCII characters in > > it one must be able to handle them without errors. In fact haven't > > the RFCs changed such that the message body should be 8-bit clean? > > Anyway I think the Python 3 mail handling libraries need to be able to > > pass extended characters through without errors. > > Well, '\ufeff' is not a *character* at all in much of any > sense of that word in unicode. > > It's a marker. Whatever puts it into the stream is wrong. I guess the > best one can (and should) do is to catch the exception and dump > the offending stream somewhere binary-capable and pass on a notice. What > you are receiving there very much isn't a (well-formed) e-mail message. > > I would then attempt to backwards-crawl the delivery chain to > find out where it came from. > The error seems to occur with any non-7-bit-ASCII, e.g. my accented characters gave:-
File "/usr/lib/python3.8/email/generator.py", line 406, in write self._fp.write(s.encode('ascii', 'surrogateescape')) UnicodeEncodeError: 'ascii' codec can't encode character '\u2019' in position 34: ordinal not in range(128) It just happened that the first example was an escape. -- Chris Green ยท -- https://mail.python.org/mailman/listinfo/python-list