Re: Review of changes to Web compat-sensitive prefs in localizations
On Wednesday, February 27, 2013 12:28:43 PM UTC, Axel Hecht wrote: That's rather orthogonal to what you're currently trying to do, but it's also indicating to me that we should remove all of those settings from intl.properties, and just leave accept-lang, and deduce the rest. So how about the parser just accepts a locale value and implements the locale-to-fallback encoding map? Given the numerous problems discovered[1], locale-defaults actually being part of the HTML Standard, and it being available as option to change encourages people to tweak it, I think that would be a better way forward. I wonder if there are similar settings that are in a sense too technical to leave up to localization teams. [1]Recent issues discovered by hsivonen: * https://bugzilla.mozilla.org/show_bug.cgi?id=910163 * https://bugzilla.mozilla.org/show_bug.cgi?id=910165 * https://bugzilla.mozilla.org/show_bug.cgi?id=910169 (bogus value, even) ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Review of changes to Web compat-sensitive prefs in localizations
On Wed, Aug 28, 2013 at 3:33 PM, Henri Sivonen hsivo...@hsivonen.fi wrote: If I were starting such a research project, I'd start by testing hypotheses about TLD correlation with legacy encodings. The first thing I'd like to test would be whether it would be an improvement to make builds that have Traditional Chinese as the UI language use gbk (as opposed to big5) as the fallback encoding when browsing content loaded from a .cn domain. To elaborate, we could first have a lookup table from country TLDs to legacy encodings and then only as a second step would use the lookup from the UI localization to legacy encodings for TLDs that don't have a strong country affiliation. So for example, we'd map .cn to gbk, .tw to big5, .ru to windows-1251 and .de, .fr, .se, .nl, .fi etc. to windows-1252, but for .com, .org and such we'd base the guess on the UI locale like today but using a less brittle way of managing the mapping. But anyway, that would be improving the guessing instead of just fixing how the current guessing mechanism is a managed. I don't want better to be a blocker for good here. -- Henri Sivonen hsivo...@hsivonen.fi http://hsivonen.iki.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Review of changes to Web compat-sensitive prefs in localizations
On Wed, Aug 28, 2013 at 3:46 PM, Henri Sivonen hsivo...@hsivonen.fi wrote: On Wed, Aug 28, 2013 at 3:33 PM, Henri Sivonen hsivo...@hsivonen.fiwrote: If I were starting such a research project, I'd start by testing hypotheses about TLD correlation with legacy encodings. The first thing I'd like to test would be whether it would be an improvement to make builds that have Traditional Chinese as the UI language use gbk (as opposed to big5) as the fallback encoding when browsing content loaded from a .cn domain. To elaborate, we could first have a lookup table from country TLDs to legacy encodings and then only as a second step would use the lookup from the UI localization to legacy encodings for TLDs that don't have a strong country affiliation. So for example, we'd map .cn to gbk, .tw to big5, .ru to windows-1251 and .de, .fr, .se, .nl, .fi etc. to windows-1252, but for .com, .org and such we'd base the guess on the UI locale like today but using a less brittle way of managing the mapping. Filed as: https://bugzilla.mozilla.org/show_bug.cgi?id=910211 -- Henri Sivonen hsivo...@hsivonen.fi http://hsivonen.iki.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Review of changes to Web compat-sensitive prefs in localizations
On Fri, Feb 22, 2013 at 8:03 PM, Axel Hecht l...@mozilla.com wrote: On 22.02.13 18:41, Henri Sivonen wrote: On Feb 22, 2013 5:30 PM, Axel Hecht l...@mozilla.com wrote: There's just no other way than post-mortem work. That's one of the reasons why we're not taking arbitrary changesets to ship to any audience beyond aurora and nightly, for beta and release, we got to have technical checks in place. Where should I file bugs to add checks to this set of checks? Not sure which checks you're talking about, so I can't really tell what you want. I meant checks like flagging attempts to go to beta with either of the following: * Detector pref not being blank except for a specific white list of particular values for the ru, uk, ja, ja-JP-Mac and zh-TW locales. * Fallback charset set to UTF-8 in any locale that doesn't already have it set to UTF-8. -- Henri Sivonen hsivo...@iki.fi http://hsivonen.iki.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Review of changes to Web compat-sensitive prefs in localizations
On 27.02.13 09:30, Henri Sivonen wrote: On Fri, Feb 22, 2013 at 8:03 PM, Axel Hecht l...@mozilla.com wrote: On 22.02.13 18:41, Henri Sivonen wrote: On Feb 22, 2013 5:30 PM, Axel Hecht l...@mozilla.com wrote: There's just no other way than post-mortem work. That's one of the reasons why we're not taking arbitrary changesets to ship to any audience beyond aurora and nightly, for beta and release, we got to have technical checks in place. Where should I file bugs to add checks to this set of checks? Not sure which checks you're talking about, so I can't really tell what you want. I meant checks like flagging attempts to go to beta with either of the following: * Detector pref not being blank except for a specific white list of particular values for the ru, uk, ja, ja-JP-Mac and zh-TW locales. * Fallback charset set to UTF-8 in any locale that doesn't already have it set to UTF-8. I'm doing a source-based review, which at least catches regressions to those settings. And I think we're doing charset detector settings wrong. Let me see if I get right what we're doing: - most content should be labeled for charset - if not, let's see if we can guess the encoding -- if we assume the language of the content, we can guess better -- many languages really only have one option -- ru, uk, ja, zh-TW do have options, so we use a charset detector Now, I don't think it's right to use the UI language to guess content language. We have a list of user-preferred languages (with good defaults based on UI language). We should go through that list, and pick charsets to try for unlabeled content from there. That's rather orthogonal to what you're currently trying to do, but it's also indicating to me that we should remove all of those settings from intl.properties, and just leave accept-lang, and deduce the rest. You also mentioned in the bug that you didn't get the OK to use telemetry to gather further data. I think if we just collect the data about the charset optimization and how good it's doing, we should be OK. I.e., at the point where the locale doesn't matter, but just cp-1252 etc, the entropy goes up a good deal. In particular for small locales. I'd argue that this might even make sense to be part of health report. Axel ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Review of changes to Web compat-sensitive prefs in localizations
On 22.02.13 15:37, Henri Sivonen wrote: I've been finding and, to a lesser extent, reporting and writing patches for bugs where a localization sets the fallback encoding to a value that doesn't suit the purpose of the fallback. In some cases, there such bogosity in the intl.properties file (e.g. translation of the word windows as part of a charset label) that I suspect that changes to intl.properties have been landing without review. I propose we adopt a rule that says that localizations need review from the HTML parser module owner (i.e. me) to change the values of preferences that modify the behavior of the HTML parser. (In practice, this means the localizable properties intl.charset.default and intl.charset.detector.) Opinions? I don't think that .platform is the right group to discuss policies for l10n, tbh. Anyway, I don't think that it requires your review. For one, these rules just don't work in practice. We're facing the very same problem with search engines. There's just no other way than post-mortem work. That's one of the reasons why we're not taking arbitrary changesets to ship to any audience beyond aurora and nightly, for beta and release, we got to have technical checks in place. I usually catch regressions to intl.properties when reviewing requests for updates to those changesets. That said, I don't know what intl.charset.detector should be set to, aside from nothing. Looking at your patch, the comment doesn't make that clearer, too, I'll follow up there. Axel ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Review of changes to Web compat-sensitive prefs in localizations
On Feb 22, 2013 5:30 PM, Axel Hecht l...@mozilla.com wrote: There's just no other way than post-mortem work. That's one of the reasons why we're not taking arbitrary changesets to ship to any audience beyond aurora and nightly, for beta and release, we got to have technical checks in place. Where should I file bugs to add checks to this set of checks? ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Review of changes to Web compat-sensitive prefs in localizations
On Friday 2013-02-22 16:37 +0200, Henri Sivonen wrote: I've been finding and, to a lesser extent, reporting and writing patches for bugs where a localization sets the fallback encoding to a value that doesn't suit the purpose of the fallback. In some cases, there such bogosity in the intl.properties file (e.g. translation of the word windows as part of a charset label) that I suspect that changes to intl.properties have been landing without review. It might not be a bad idea to have a better explanation in http://mxr.mozilla.org/mozilla-central/source/toolkit/locales/en-US/chrome/global/intl.properties of why one would want to change intl.charset.default and intl.charset.detector, explaining clearly that they should only be set to interesting values to deal with a substantial body of legacy content that requires those values, and then saying what they should be in the absence of such legacy content (the latter should clearly be empty; I'm not sure whether the former should be UTF-8 or ISO-8859-1, but we should have a consistent policy). That said, I don't actually know whether the tools localizers use to do localization lead them to read the text. The reality is that I suspect it may be important for you to do occasional audits of these values; it could also be valuable to have a tool that exposes all of them in a single place (perhaps even a place with history, like an automatically-generated wiki page). -David -- 턞 L. David Baron http://dbaron.org/ 턂 턢 Mozilla http://www.mozilla.org/ 턂 ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Review of changes to Web compat-sensitive prefs in localizations
On 22.02.13 20:02, L. David Baron wrote: On Friday 2013-02-22 16:37 +0200, Henri Sivonen wrote: I've been finding and, to a lesser extent, reporting and writing patches for bugs where a localization sets the fallback encoding to a value that doesn't suit the purpose of the fallback. In some cases, there such bogosity in the intl.properties file (e.g. translation of the word windows as part of a charset label) that I suspect that changes to intl.properties have been landing without review. It might not be a bad idea to have a better explanation in http://mxr.mozilla.org/mozilla-central/source/toolkit/locales/en-US/chrome/global/intl.properties of why one would want to change intl.charset.default and intl.charset.detector, explaining clearly that they should only be set to interesting values to deal with a substantial body of legacy content that requires those values, and then saying what they should be in the absence of such legacy content (the latter should clearly be empty; I'm not sure whether the former should be UTF-8 or ISO-8859-1, but we should have a consistent policy). That said, I don't actually know whether the tools localizers use to do localization lead them to read the text. The reality is that I suspect it may be important for you to do occasional audits of these values; it could also be valuable to have a tool that exposes all of them in a single place (perhaps even a place with history, like an automatically-generated wiki page). -David Henri filed https://bugzilla.mozilla.org/show_bug.cgi?id=844042 before posting here (or at least around the same time). Axel ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform