En Tue, 03 Jun 2008 15:38:09 -0300, Daniel Mahoney <[EMAIL PROTECTED]>
escribió:
I'm working on an app that's processing Usenet messages. I'm making a
connection to my NNTP feed and grabbing the headers for the groups I'm
interested in, saving the info to disk, and doing some post-processing.
I'm finding a few bizarre characters and I'm not sure how to handle them
pythonically.
One of the lines I'm finding this problem with contains:
137050 Cleo and I have an anouncement! "Mlle.
=?iso-8859-1?Q?Ana=EFs?="
<[EMAIL PROTECTED]> Sun, 21 Nov 2004 16:21:50 -0500
<[EMAIL PROTECTED]> 4478 69 Xref:
sn-us rec.pets.cats.community:137050
The interesting patch is the string that reads
"=?iso-8859-1?Q?Ana=EFs?=".
An HTML rendering of what this string should look would be "Anaïs".
What I'm doing now is a brute-force substitution from the version in the
file to the HTML version. That's ugly. What's a better way to translate
that string? Or is my problem that I'm grabbing the headers from the NNTP
server incorrectly?
No, it's not you, those headers are formatted following RFC 2047
<http://www.faqs.org/ftp/rfc/rfc2047.txt>
Python already has support for that format, use the email.header class,
see <http://docs.python.org/lib/module-email.header.html>
--
Gabriel Genellina
--
http://mail.python.org/mailman/listinfo/python-list