Aristotle A. wrote: > The standard Shift_JIS encoding cannot be mapped to Unicode > without ambiguities.
This is starting to get way of topic... If anyone is bothered with this discussion, just say so and I will continue off-list. > Worse, there is a whole range of variants > of this charset that all go by the name of Shift_JIS, yet map > codepoints that are unused in the official Shift_JIS standard > to different characters. If you have something labelled > Shift_JIS, you cannot safely convert it to Unicode without > risking data loss. Yes... Shift_JIS is a mess while people use proprietary extensions, that Unicode doesn't recognize. But the problem is no more of Unicode than it is from Shift_JIS itself. But the industry standard JIS X 0213 aims to fix this problem, and converge Shift_JIS to something sane. Unicode provides round-trip conversion for JIS X 0213 since version 3.0 or so... Yes, I just confirmed. Here [1][2] are the mapping tables between JIS X 0213:2004 and Unicode. Also, I found [3] that discusses many of these problems of Japanese encodings, and confirms that Unicode 3.1 provides lossless conversion for JIS X 0213. [1] http://x0213.org/codetable/sjis-0213-2004-std.txt [2] http://x0213.org/codetable/jisx0213-2004-std.txt [3] http://www.jbrowse.com/text/unij.html The problem is that many people still use JIS X 0208 with proprietary extensions... It is more or less the same problem that many people still use ISO-8859 when we already have UTF-8... > The problem is that on Unix, at least, the filesystem actually > gives you nothing but octet sequences. The only invalid filenames > are ones containing slashes or nulls. Everything else is fair > game. Filenames can be any random garbage whatsoever. Yes... This is something that Unix could improve a lot. > Take a look at the contortions that the GNOME people had to go > through for the file selector dialog and similar things where > gtk+ and friends touch upon the filesystem. Yes, I know the KDE/Qt side of this opera. KDE quite overkilled in this point... the code that handles URL-escaping and local filename encoding is the same. As a consequence, KDE allows you to put "/" in filenames. It transparently converts back and forth to "%2f" for you in the filesystem. > Then from a SCM design point of view the question turns into > this: do we want to the repository to be unable to store some > actual files that some users may conceivably have good reason to > have? (See Shift_JIS mess above.) I think that this is somewhat expected, specifically for Shift_JIS, due to the problems above. In fact, the user is already forced to restrict to some subset of Shift_JIS, because you need screen fonts that display the characters. Using characters that are in proprietary extensions will always leave you open for ambiguities: you change the font and your filenames change. You must either restrict yourself in what characters you use, or upgrade to JIS X 0213 and enjoy full compatibility with Unicode. > The problem space is unfortunately much bigger and messier than > what you have portrayed in your mail. IMO the fact that git punts > is regrettable, but also pretty much inevitable, so it is not > something that I hold against git. There is simply no truly sane > way of untangling this incredibly ugly yarnball. Well, Windows does it (NTFS is UTF-16 encoded, and Windows of course supports Shift_JIS), MacOS X does it, Subversion does it, Bazaar does it... It is possible and feasible to support Shift_JIS encoding... you may not be able to store all possible filenames, only those that are standard and convertible to Unicode, but possible nevertheless. -- Juliano F. Ravasi ยทยท http://juliano.info/ 5105 46CC B2B7 F0CD 5F47 E740 72CA 54F4 DF37 9E96 "A candle loses nothing by lighting another candle." -- Erin Majors * NOTE: Don't try to reach me through this address, use "contact@" instead. _______________________________________________ vcs-home mailing list [email protected] http://lists.madduck.net/listinfo/vcs-home
