On 2023-05-06 16:27:04 +0200, jak wrote: > Chris Green ha scritto: > > Chris Green <c...@isbd.net> wrote: > > > A bit more information, msg.get("subject", "unknown") does return a > > > string, as follows:- > > > > > > Subject: > > > =?utf-8?Q?aka_Marne_=C3=A0_la_Sa=C3=B4ne_(Waterways_Continental_Europe)?= [...] > > ... and of course I now see the issue! The Subject: with utf-8 > > characters in it gets spaces changed to underscores. So searching for > > '(Waterways Continental Europe)' fails. > > > > I'll either need to test for both versions of the string or I'll need > > to change underscores to spaces in the Subject: returned by msg.get().
You need to decode the Subject properly. Unfortunately the Python email module doesn't do that for you automatically. But it does provide the necessary tools. Don't roll your own unless you've read and understood the relevant RFCs. > > This is probably what you need: > > import email.header > > raw_subj = > '=?utf-8?Q?aka_Marne_=C3=A0_la_Sa=C3=B4ne_(Waterways_Continental_Europe)?=' > > subj = email.header.decode_header(raw_subj)[0] > > subj[0].decode(subj[1]) > > 'aka Marne à la Saône (Waterways Continental Europe)' You are an the right track, but that works only because the example exists only of a single encoded word. This is not always the case (and indeed not what the RFC recommends). email.header.decode_header returns a *list* of chunks and you have to process and concatenate all of them. Here is a snippet from a mail to html converter I wrote a few years ago: def decode_rfc2047(s): if s is None: return None r = "" for chunk in email.header.decode_header(s): if chunk[1]: try: r += chunk[0].decode(chunk[1]) except LookupError: r += chunk[0].decode("windows-1252") except UnicodeDecodeError: r += chunk[0].decode("windows-1252") elif type(chunk[0]) == bytes: r += chunk[0].decode('us-ascii') else: r += chunk[0] return r (this is maybe a bit more forgiving than the OP needs, but I had to deal with malformed mails) I do have to say that Python is extraordinarily clumsy in this regard. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | h...@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
signature.asc
Description: PGP signature
-- https://mail.python.org/mailman/listinfo/python-list