https://bugzilla.wikimedia.org/show_bug.cgi?id=19001
--- Comment #4 from Philippe Verdy <verd...@wanadoo.fr> 2010-10-10 12:21:33 UTC --- Note that the presence of non-ASCII bytes in a subject line and that are not properly reencoded with a disambiguating transfer syntax like Quoted-Printable and Base64, should be assumled today to be encoded as UTF-8 by default. But many legacy email user agents do not do this assumption, and just assume their own local system encoding. The result is mojibake, where Cyrillic or Chinese texts get displayed as if it was Windows-1252 or ISO-8859-1, or the reverse. The result is clearly unpredictable with old email agents. Unfortunately, the same old email agents (including webmails of various ISPs) frequently do not support correct decoding of Quoted-Printable and Base64 as well! In all cases you get unpredictable mojibake with old user agents. It's time for you to upgrade it (or to change your webmail provider). I relaly think that all modern emlail agents should be able to use UTF-8 as the default encoding of MIME headers (including subject lines) for all incoming mails, and should allow the user to force it to use another encoding (because guessing the encoding from a short subject line really does not work at all like it does on email bodies and web pages), and should also support the Quoted-Printable and Base-64 explicit markup. And in your case where you receive many emails in Russian with Cyrillic letters (and not Latin) in most chracters of subject lines, the Quoted-printable encoding is a bad choice, as MediaWiki should probably better use Base64 (which will be shorter), even if this appears still as mojibake for you. MediaWiki could test the string to see which of Base64 or Quoted-Printable is shorter, and should avoid multiple Quoted-Printable sections in the same subject line (when it contains spaces or other ASCII characters between Cyrillic words). Google Mail uses another strategy when sending emails: not only it tries both transport syntax, but also it parses which characters are used to use some common ISO-8859 or CJK encodings, and then reencoed it with one of the two transfer syntax (if there are non-ASCII characters). Google Mail uses various tricks to detect target ISPs in order to select an encoding that its webmail will support and display properly, and monitors the emails received from people in your contact books, so that you'll reply to him using the same encoding he used when sending emails to you (unfortunately, this technic cannot be used by MediaWiki that does not have a database to record what various ISPs in the world will support, and it does not have access to your web contact list). All what MediaWiki COULD do is to include a preference in your user account to specify an encoding that you can read with YOUR email agent, and that will be used by default if subject lines contain only characters from your preferred selected charset (otherwise it will still fallback to UTF-8, using Base64 or Quoted-Printable also according to your preferences, or using a transliteration into your preffered charset if its possible without excessive losses). -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l