On Thu, 1 Sep 2016 09:42:59 +0200 Albert Shih <albert.s...@obspm.fr> wrote: > > First, https://tools.ietf.org/html/rfc2047#page-5 > > unencoded white space characters (such as SPACE and HTAB) are > > FORBIDDEN within an 'encoded-word' > > > > As such, "=?utf-8?q? #NUMBER=5D?=" is not a valid encoded-word. > > Well I think that's my bad, I change a little the subject to fit my first > email about the tag. The real subject is > > =?utf-8?q?Re=3A?==?utf-8?q?_=5BInfo?= Obspm =?utf-8?q?#31684=5D?= Bonjour > =?utf-8?q?=C3=A0?= vous
OK, that's a little different. Rather better. It still violates: > However, an 'encoded-word' that appears in a header field defined as > '*text' MUST be separated from any adjacent 'encoded-word' or 'text' > by 'linear-white-space'. But: > I'm a not very good with perl, but when I try using ruby to decode this > line > > irb(main):008:0> > Mail::Encodings.unquote_and_convert_to('=?utf-8?q?Re=3A?==?utf-8?q?_=5BInfo?= > Obspm =?utf-8?q?#31684=5D?= Bonjour =?utf-8?q?=C3=A0?= vous','utf-8') > => "Re: [Info Obspm #31684] Bonjour à vous" > > the result seem correct. For decoders that are lenient to encoded-words that aren't space-separated, that's correct. The difference between this and what you had previously is the non-encoded word between the two encoded-words, which makes the space significant. And indeed, this does point to an RT bug. Namely, for historical and bad reasons, RT doesn't use the standard MIME-words decoding library, which would produce: > perl -MEncode -lE 'print Encode::encode("utf8", > Encode::decode("MIME-header", > "=?utf-8?q?Re=3A?==?utf-8?q?_=5BInfo?= Obspm =?utf-8?q?#31684=5D?= Bonjour > =?utf-8?q?=C3=A0?= vous"))' > > Re: [Info Obspm #31684] Bonjour à vous Instead, it rolls its own, and gets it wrong: > perl -Ilib -MRT=-init -le 'print RT::I18N::DecodeMIMEWordsToUTF8( > "=?utf-8?q?Re=3A?==?utf-8?q?_=5BInfo?= Obspm =?utf-8?q?#31684=5D?= Bonjour > =?utf-8?q?=C3=A0?= vous","Subject")' > > Re: [Info Obspm#31684] Bonjourà vous Specifically, it removes spaces before the second and later encoded-words, due to https://github.com/bestpractical/rt/blob/stable/lib/RT/I18N.pm#L445 This looks to be a bug. I've pushed 4.2/encoded-word-spaces to address it; if you'd like to test the fix locally, you can apply https://github.com/bestpractical/rt/commit/bdd6bd96 . Thanks for the more complete bug report. - Alex --------- RT 4.4 and RTIR training sessions, and a new workshop day! https://bestpractical.com/training * Boston - October 24-26 * Los Angeles - Q1 2017