[Nmh-workers] Handling non-ASCII

2012-04-16 Thread Ken Hornstein
Nmh got a bug report recently complaining that a message containing a binary subject was displayed on a user's terminal and messed it up. The user was kind enough to include the original message; the issue was that the message was spam and in UTF-8, but wasn't using RFC-2047 encoding. Actually

Re: [Nmh-workers] Handling non-ASCII

2012-04-16 Thread Ralph Corderoy
Hi Ken, what should we do if the text is outside of us-ascii? Copy `cat -A'? seq 0 255 | sed 's/$/P/' | dc | cat -A Or some other similar escaping; \x1b. It does mean one wouldn't be able to discern a subject with cat -A-looking output from binary. Cheers, Ralph.

Re: [Nmh-workers] Handling non-ASCII

2012-04-16 Thread Lyndon Nerenberg
On 2012-04-16, at 9:03 AM, Ralph Corderoy wrote: Copy `cat -A'? seq 0 255 | sed 's/$/P/' | dc | cat -A Or some other similar escaping; \x1b. It does mean one wouldn't be able to discern a subject with cat -A-looking output from binary. What about non-utf8 multi-byte encodings. Can

Re: [Nmh-workers] Handling non-ASCII

2012-04-16 Thread Lyndon Nerenberg
On 2012-04-16, at 10:22 AM, Ken Hornstein wrote: Hm. I see your point, but I'm sort of torn here. Do we care about obscuring the surrounding text? I mean, the original bug report came as the result of a spam message; I don't think this is a problem in the normal case, is it? To me ? more

Re: [Nmh-workers] Handling non-ASCII

2012-04-16 Thread Earl Hood
On Mon, Apr 16, 2012 at 12:27 PM, Lyndon Nerenberg wrote:    Updated Estimate: .3,141.00    Updated Estimate: ?3,141.00 Presumably there is a loud warning being printed to say ***danger corrupt message - unprintable characters replaced  with '.' Such warnings can be easily overlooked

Re: [Nmh-workers] Handling non-ASCII

2012-04-16 Thread Ken Hornstein
If the only source of these malformed messages is spam then we should just refuse to display them. I recently ran into this when working on replyfilter (the core issue was that par mangles UTF-8 and replyfilter would end up with invalid UTF-8 sequences). Refusing to display the message is rather