On 2023-05-08 23:02:18 +0200, jak wrote: > Peter J. Holzer ha scritto: > > On 2023-05-06 16:27:04 +0200, jak wrote: > > > Chris Green ha scritto: > > > > Chris Green <[email protected]> wrote: > > > > > A bit more information, msg.get("subject", "unknown") does return a > > > > > string, as follows:- > > > > > > > > > > Subject: > > > > > =?utf-8?Q?aka_Marne_=C3=A0_la_Sa=C3=B4ne_(Waterways_Continental_Europe)?= > > [...] > > > > ... and of course I now see the issue! The Subject: with utf-8 > > > > characters in it gets spaces changed to underscores. So searching for > > > > '(Waterways Continental Europe)' fails. > > > > > > > > I'll either need to test for both versions of the string or I'll need > > > > to change underscores to spaces in the Subject: returned by msg.get(). [...] > > > > > > subj = email.header.decode_header(raw_subj)[0] > > > > > > subj[0].decode(subj[1]) [...] > > email.header.decode_header returns a *list* of chunks and you have to > > process and concatenate all of them. > > > > Here is a snippet from a mail to html converter I wrote a few years ago: > > > > def decode_rfc2047(s): > > if s is None: > > return None > > r = "" > > for chunk in email.header.decode_header(s): [...] > > r += chunk[0].decode(chunk[1]) [...] > > return r [...] > > > > I do have to say that Python is extraordinarily clumsy in this regard. > > Thanks for the reply. In fact, I gave that answer because I did > not understand what the OP wanted to achieve. In addition, the > OP opened a second thread on the similar topic in which I gave a > more correct answer (subject: "What do these '=?utf-8?' sequences > mean in python?", date: "Sat, 6 May 2023 14:50:40 UTC").
Right. I saw that after writing my reply. I should have read all
messages, not just that thread before replying.
> the OP, I discovered that the MAME is not the only format used
> to compose the subject.
Not sure what "MAME" is. If it's a typo for MIME, then the base64
variant of RFC 2047 is just as much a part of it as the quoted-printable
variant.
> This made me think that a library could not delegate to the programmer
> the burden of managing all these exceptions,
email.header.decode_header handles both variants, but it produces bytes
sequences which still have to be decoded to get a Python string.
> then I have further investigated to discover that the library also
> provides the conversion function beyond that of coding and this makes
> our labors vain:
>
> ----------
> from email.header import decode_header, make_header
>
> subject = make_header(decode_header( raw_subject )))
> ----------
Yup. I somehow missed that. That's a lot more convenient than calling
decode in a loop (or generator expression). Depending on what you want
to do with the subject you may have wrap that in a call to str(), but
it's still a one-liner.
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | [email protected] | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
signature.asc
Description: PGP signature
-- https://mail.python.org/mailman/listinfo/python-list
