Re: Dropping remains of support for non-UTF-8 file paths on Gtk platforms (was: Re: Please do not use GetNativePath and GetNativeTarget in XP code and Windows-specific code)
On Tue, Dec 5, 2017 at 11:57 PM, Mike Hommeywrote: > Wouldn't it make sense, then, to actively fail to even start Firefox in > such cases, instead of pretending it kind of works at all, if we can't > even save history or bookmarks properly? Good point. Filed https://bugzilla.mozilla.org/show_bug.cgi?id=1423855 . Thanks. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Dropping remains of support for non-UTF-8 file paths on Gtk platforms (was: Re: Please do not use GetNativePath and GetNativeTarget in XP code and Windows-specific code)
On Tue, Dec 05, 2017 at 05:16:52PM +0200, Henri Sivonen wrote: > On Tue, Dec 5, 2017 at 4:37 PM, ISHIKAWA,chiakiwrote: > > There are other non-ASCII character issues such as > > https://bugzilla.mozilla.org/show_bug.cgi?id=1258613 > > Very weird bug! (Summary for others: decomposed voiced sound mark is > rendered on the wrong base character.) > > > But the bug I mention occurs because some characters are encoded DIFFERENTLY > > under iOS and the rest of the world when UTF-8 is used. > > HFS+ decomposed Unicode leakage to other systems causes pain, but the > topic of this thread isn't affected by Unicode normalization. > > > By mentioning the bug, I just wanted to point out that there *ARE* obnoxious > > bugs regarding non-ASCII character handling in mozilla software. > > But majority of the Japanese users probably failed to file the non-ASCII > > character bugs, and just think, "oh, another instance of Japanese characters > > not passed correctly between mozilla applications and the external programs, > > etc." > > Possibly, but unfiled bugs don't get fixed and, as seen with non-ASCII > path handling with non-UTF-8 Linux locales, even filed bugs don't get > fixed. As the experiments documented in my previous email to this > thread indicate, non-UTF-8 paths already cause such breakage that at > this point it no longer makes sense to even pretend to support them. Wouldn't it make sense, then, to actively fail to even start Firefox in such cases, instead of pretending it kind of works at all, if we can't even save history or bookmarks properly? Mike ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Dropping remains of support for non-UTF-8 file paths on Gtk platforms (was: Re: Please do not use GetNativePath and GetNativeTarget in XP code and Windows-specific code)
On 2017/12/06 1:04, Jonathan Kew wrote: On 05/12/2017 15:16, Henri Sivonen wrote: On Tue, Dec 5, 2017 at 4:37 PM, ISHIKAWA,chiakiwrote: There are other non-ASCII character issues such as https://bugzilla.mozilla.org/show_bug.cgi?id=1258613 Very weird bug! (Summary for others: decomposed voiced sound mark is rendered on the wrong base character.) Not all that weird, really; it's almost certainly due to using a font that doesn't support the combining mark. Commented in the bug. JK Thank you for the comment in the bug. But I wonder if there is a clearly discernible property/attribute of a font which allows the combining of a mark and the other font that doesn't. Basically, the issue appears under linux OS (Debian GNU/Linux) which I use daily. Without knowing which font is causing the issue (not supporting the combining of mark), I can't fix it. Since the "normalization" of string into canonical form under linux seems to solve the problem, I am inclined to have the OS or OS-supplied library do that, but I am not entirely sure where the rendering happens. Come to think of it, I am not sure whether the iOS mail client handles the filename of an attachment that is sent from Windows or from Linux. The party with whom I exchanged the problematic e-mails mentioned that there are e-mails with attachments which cannot be saved under the original name and a machine-generated filename seemed to be used. Oh well, I will investigate this a bit during holiday break. TIA ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Dropping remains of support for non-UTF-8 file paths on Gtk platforms (was: Re: Please do not use GetNativePath and GetNativeTarget in XP code and Windows-specific code)
On 05/12/2017 15:16, Henri Sivonen wrote: On Tue, Dec 5, 2017 at 4:37 PM, ISHIKAWA,chiakiwrote: There are other non-ASCII character issues such as https://bugzilla.mozilla.org/show_bug.cgi?id=1258613 Very weird bug! (Summary for others: decomposed voiced sound mark is rendered on the wrong base character.) Not all that weird, really; it's almost certainly due to using a font that doesn't support the combining mark. Commented in the bug. JK ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Dropping remains of support for non-UTF-8 file paths on Gtk platforms (was: Re: Please do not use GetNativePath and GetNativeTarget in XP code and Windows-specific code)
On Tue, Dec 5, 2017 at 4:37 PM, ISHIKAWA,chiakiwrote: > There are other non-ASCII character issues such as > https://bugzilla.mozilla.org/show_bug.cgi?id=1258613 Very weird bug! (Summary for others: decomposed voiced sound mark is rendered on the wrong base character.) > But the bug I mention occurs because some characters are encoded DIFFERENTLY > under iOS and the rest of the world when UTF-8 is used. HFS+ decomposed Unicode leakage to other systems causes pain, but the topic of this thread isn't affected by Unicode normalization. > By mentioning the bug, I just wanted to point out that there *ARE* obnoxious > bugs regarding non-ASCII character handling in mozilla software. > But majority of the Japanese users probably failed to file the non-ASCII > character bugs, and just think, "oh, another instance of Japanese characters > not passed correctly between mozilla applications and the external programs, > etc." Possibly, but unfiled bugs don't get fixed and, as seen with non-ASCII path handling with non-UTF-8 Linux locales, even filed bugs don't get fixed. As the experiments documented in my previous email to this thread indicate, non-UTF-8 paths already cause such breakage that at this point it no longer makes sense to even pretend to support them. > I have noted that, in the last three months or so, some Japanese strings > copied from Mozilla TB or mozilla FF are not parsed correctly when > they are inserted into other programs. That seems worrying, but without knowing which OS and which apps, I can't really comment further. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Dropping remains of support for non-UTF-8 file paths on Gtk platforms (was: Re: Please do not use GetNativePath and GetNativeTarget in XP code and Windows-specific code)
On 2017/12/05 22:24, Henri Sivonen wrote: On Mon, Dec 4, 2017 at 2:04 PM, Masatoshi Kimurawrote: * If by any chance a profile path contains non-ASCII characters on non-UTF-8 UNIX systems, Firefox 57.0.1 must have broken the profile just like 57.0 broke it on Windows. But we didn't hear any such complaints. Are you referring to https://hg.mozilla.org/mozilla-central/rev/345fe119b8cf using GetPath() on all platforms and not just Windows? Experimenting in an Ubuntu VM, Firefox 57.0.1 indeed fails to save prefs and history (but saves the HTTP disk cache and various other things) if the profile path has an illegal byte in it. Additionally, on Debian-based systems generally, adduser only allows usernames (and, thereby in the common case where the home directory matches the user name, home directories) that conform to the POSIX portable username rules (subset of ASCII). useradd appears to have no such safeguards. There are other non-ASCII character issues such as https://bugzilla.mozilla.org/show_bug.cgi?id=1258613 But the bug I mention occurs because some characters are encoded DIFFERENTLY under iOS and the rest of the world when UTF-8 is used. So I think the bug will be there no matter whether this non-UTF-8 path is removed or not. By mentioning the bug, I just wanted to point out that there *ARE* obnoxious bugs regarding non-ASCII character handling in mozilla software. But majority of the Japanese users probably failed to file the non-ASCII character bugs, and just think, "oh, another instance of Japanese characters not passed correctly between mozilla applications and the external programs, etc.": this type of Japanese character mungling has been so common before, so it is simply ignored as one of those bugs, OR when the problem happens it is so difficult to figure out WHERE the buck stops (i.e., on what program either outputs incorrect character strings or what program parses input incorrectly, OR BOTH.) And if the producer and consumer seem to disagree on what type of encoding is used, it is usually not quite clear even to ordinary programming type people WHAT is the correct way on a given platform, and who is to blame, and thus many simply failed to analyze the issue thoroughly and give up half way. Oh well. I have noted that, in the last three months or so, some Japanese strings copied from Mozilla TB or mozilla FF are not parsed correctly when they are inserted into other programs. This did not happen before. But due to exactly the same reason I noted above, I am not sure which program is to blame, and have not bothered to pester mozilla bugzilla with possibly false-positive bug reports. If the problem persists in the next few months, I may file a bug. TIA for people's attention. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Dropping remains of support for non-UTF-8 file paths on Gtk platforms (was: Re: Please do not use GetNativePath and GetNativeTarget in XP code and Windows-specific code)
On Tue, Dec 5, 2017 at 3:24 PM, Henri Sivonenwrote: > On Mon, Dec 4, 2017 at 2:04 PM, Masatoshi Kimura wrote: >> * If by any chance a profile path contains non-ASCII characters on >> non-UTF-8 UNIX systems, Firefox 57.0.1 must have broken the profile just >> like 57.0 broke it on Windows. But we didn't hear any such complaints. > > Are you referring to > https://hg.mozilla.org/mozilla-central/rev/345fe119b8cf using > GetPath() on all platforms and not just Windows? > > Experimenting in an Ubuntu VM, Firefox 57.0.1 indeed fails to save > prefs and history (but saves the HTTP disk cache and various other > things) if the profile path has an illegal byte in it. Additionally, > on Debian-based systems generally, adduser only allows usernames (and, > thereby in the common case where the home directory matches the user > name, home directories) that conform to the POSIX portable username > rules (subset of ASCII). useradd appears to have no such safeguards. If the glibc ja_JP.eucjp locale has been generated prior to login, LC_ALL is set to ja_JP.eucjp and the profile path contains a byte pair that's valid EUC-JP but invalid UTF-8, Firefox 57.0.1 saves prefs but doesn't save history or bookmarks and is unable to complete File: Save Page As... (fails silently before asking where to save even if the EUC-JP bytes are just in the profile path and not in the home directory path). Additionally, downloads fail it the download target path has non-ASCII EUC-JP bytes in it. So far, I've found Solaris documentation that rules out EUC-JP user names: https://docs.oracle.com/cd/E23824_01/html/821-1474/attributes-5.html#scrolltoc . In addition to Debian adduser enforcing POSIX username portability, I found an anecdote suggesting that some management tool of RHEL from years ago did, too. I haven't found conclusive documentation explaining that there can't be EUC-JP usernames and matching home directories out there on Linux or BSD systems. However, per the above observation about failure to save history, save bookmarks, invoke Save As... or download anything if the path has EUC-JP bytes, it seems safe to conclude that if an EUC-JP home directory name exists somewhere out there, Firefox is already very broken on such a system. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Dropping remains of support for non-UTF-8 file paths on Gtk platforms (was: Re: Please do not use GetNativePath and GetNativeTarget in XP code and Windows-specific code)
On Mon, Dec 4, 2017 at 2:04 PM, Masatoshi Kimurawrote: > * If by any chance a profile path contains non-ASCII characters on > non-UTF-8 UNIX systems, Firefox 57.0.1 must have broken the profile just > like 57.0 broke it on Windows. But we didn't hear any such complaints. Are you referring to https://hg.mozilla.org/mozilla-central/rev/345fe119b8cf using GetPath() on all platforms and not just Windows? Experimenting in an Ubuntu VM, Firefox 57.0.1 indeed fails to save prefs and history (but saves the HTTP disk cache and various other things) if the profile path has an illegal byte in it. Additionally, on Debian-based systems generally, adduser only allows usernames (and, thereby in the common case where the home directory matches the user name, home directories) that conform to the POSIX portable username rules (subset of ASCII). useradd appears to have no such safeguards. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Dropping remains of support for non-UTF-8 file paths on Gtk platforms (was: Re: Please do not use GetNativePath and GetNativeTarget in XP code and Windows-specific code)
On 2017/12/04 20:19, Henri Sivonen wrote: > I suggest that instead of delaying with a round of telemetry, we make > all non-Windows platforms in nsNativeCharsetUtils.cpp use what's > currently the OSX/Android code path. +1 Some other data points: * If by any chance a profile path contains non-ASCII characters on non-UTF-8 UNIX systems, Firefox 57.0.1 must have broken the profile just like 57.0 broke it on Windows. But we didn't hear any such complaints. * Our GMP service assumes that the native encoding is always UTF-8 except Windows. Some media playbacks must have been broken on UNIX systems unless the locale is UTF-8. I agree that telemetry is waste of time in this case. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Dropping remains of support for non-UTF-8 file paths on Gtk platforms (was: Re: Please do not use GetNativePath and GetNativeTarget in XP code and Windows-specific code)
On Fri, Dec 1, 2017 at 3:15 AM, Makoto Katowrote: > I think that we don't have any data when user doesn't use non-UTF-8 > (and C) locale such as ja_JP.eucJP. We should get data via telemetry. What should the telemetry measure? (Measuring whether we compute paths to be UTF-8 in the code that still supports non-UTF-8 configurations would probably be the wrong thing to measure, because the "C" locale doesn't compute to UTF-8 and no one has cared enough to fix that.) What kind of telemetry data would we need to see in order to proceed with https://bugzilla.mozilla.org/show_bug.cgi?id=960957 (removing the remains of support for non-UTF-8 file paths)? And if we didn't proceed with that course of action, what would the alternative course of action be? The current state doesn't really support non-UTF-8 file paths. https://bugzilla.mozilla.org/show_bug.cgi?id=1342659 has been open for 9 months, and the user was upgrading from Firefox 17 to 50 in order to notice the problem, so the bug has been there for more than 9 months before the complaint. https://bugzilla.mozilla.org/show_bug.cgi?id=848268 has been open for 5 years. It looks like no one cares enough about non-UTF-8 configurations to make Gecko do what arguably would be the right way to support non-UTF-8 file paths: using the glib file path conversion functions, which don't do non-UTF-8 things unless the G_BROKEN_FILENAMES environment variable has been set. The name of the environment variable is very telling. Considering that we mainly get Gtk telemetry from Ubuntu which has had UTF-8 paths since its introduction and our support for non-UTF-8 file paths has been broken for years without much complaint (AFAIK 3 users reporting non-UTF-8: two EUC-JP and one ISO-8859-something in the past 5 years), can we really expect telemetry to tell us anything useful? Non-UTF-8 paths are an even more deeply legacy configuration than non-PulseAudio audio, and telemetry told us was OK to go PulseAudio-only. Four years ago, smontagu said that in the last 5 years (i.e. since nine years ago now), the code he had contributed assumed UTF-8 on *nix: https://www.mail-archive.com/dev-platform@lists.mozilla.org/msg06083.html I suggest that instead of delaying with a round of telemetry, we make all non-Windows platforms in nsNativeCharsetUtils.cpp use what's currently the OSX/Android code path. -- Henri Sivonen hsivo...@hsivonen.fi https://hsivonen.fi/ ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform