Aw: Re: Another 2 to 3 mail encoding problem

2020-08-27 Thread Karsten Hilbert
> Because of this, the Python 3 str type is not suitable to store an email
> message, since it insists on the string being Unicode encoded,

I should greatly appreciate to be enlightened as to what
a "string being Unicode encoded" is intended to say ?

Thanks,
Karsten
-- 
https://mail.python.org/mailman/listinfo/python-list


Aw: Re: Another 2 to 3 mail encoding problem

2020-08-27 Thread Karsten Hilbert
> Terry Reedy  wrote:
> > On 8/26/2020 11:10 AM, Chris Green wrote:
> >
> > > I have a simple[ish] local mbox mail delivery module as follows:-
> > ...
> > > It has run faultlessly for many years under Python 2.  I've now
> > > changed the calling program to Python 3 and while it handles most
> > > E-Mail OK I have just got the following error:-
> > >
> > >  Traceback (most recent call last):
> > >File "/home/chris/.mutt/bin/filter.py", line 102, in 
> > >  mailLib.deliverMboxMsg(dest, msg, log)
> > ...
> > >File "/usr/lib/python3.8/email/generator.py", line 406, in write
> > >  self._fp.write(s.encode('ascii', 'surrogateescape'))
> > > UnicodeEncodeError: 'ascii' codec can't encode character '\ufeff' in
> > position 4: ordinal not in range(128)
> >
> > '\ufeff' is the Unicode byte-order mark.  It should not be present in an
> > ascii-only 3.x string and would not normally be present in general
> > unicode except in messages like this that talk about it.  Read about it,
> > for instance, at
> > https://en.wikipedia.org/wiki/Byte_order_mark
> >
> > I would catch the error and print part or all of string s to see what is
> > going on with this particular message.  Does it have other non-ascii chars?
> >
> I can provoke the error simply by sending myself an E-Mail with
> accented characters in it.  I'm pretty sure my Linux system is set up
> correctly for UTF8 characters, I certainly seem to be able to send and
> receive these to others and I even get to see messages in other
> scripts such as arabic, chinese, etc.
>
> The code above works perfectly in Python 2 delivering messages with
> accented (and other extended) characters with no problems at all.
> Sending myself E-Mails with accented characters works OK with the code
> running under Python 2.
>
> While an E-Mail body possibly *shouldn't* have non-ASCII characters in
> it one must be able to handle them without errors.  In fact haven't
> the RFCs changed such that the message body should be 8-bit clean?
> Anyway I think the Python 3 mail handling libraries need to be able to
> pass extended characters through without errors.

Well, '\ufeff' is not a *character* at all in much of any
sense of that word in unicode.

It's a marker. Whatever puts it into the stream is wrong. I guess the
best one can (and should) do is to catch the exception and dump
the offending stream somewhere binary-capable and pass on a notice. What
you are receiving there very much isn't a (well-formed) e-mail message.

I would then attempt to backwards-crawl the delivery chain to
find out where it came from.

Or so is my current understanding.

Karsten
-- 
https://mail.python.org/mailman/listinfo/python-list