On Tue, Aug 26, 2008 at 11:02:40PM -0300, Juliano F. Ravasi wrote: > martin f krafft wrote: > > Git supports unicode filenames and binary files as large as you want > > them. > > 5. Git doesn't actually support Unicode filenames (neither does > Mercurial). Both just store whatever the file name is in the filesystem > directly into the repository, as just an array of bytes. You won't > notice this unless you create files with names containing characters > beyond the ASCII set, and use different encodings in different > computers. This also causes problems when cloning your repositories to > systems that are Unicode-aware (like Windows[1] and MacOS X[2]). > > [1] http://code.google.com/p/msysgit/issues/detail?id=80 > [2] http://kerneltrap.org/mailarchive/git/2008/1/16/573827 > > I have files with names in Portuguese (which contains acutes, graves, > circumflexes, umlauts, tildes and cedillas), and also a few ones in > Japanese. Although UTF-8 should now be the default for Linux, many > people (well, I say for Portuguese speakers at least) configure their > systems to ISO-8859-1, because that was what was used everywhere until > some time ago, and that is what Portuguese-language Windows > installations store in FAT32 filesystems (very common in pen-drives).
So if your system is wrongly configured to use a non-UTF8 encoding, the filenames break. (Or, rather, they break until you set a UTF-8 locale; any errors aren't permanent.) There is a good reason for this, and it's explained in the git-log manual page: it's not necessarily possible to convert something to UTF-8 (or any other Unicode encoding) and convert it back without introducing errors, especially with some less-commonly-used character sets. Treating everything as a sequence of bytes is far safer (not to mention faster) than converting everything every time it's commited or checked out. > Syncing repositories using Git or Mercurial between systems using > different encodings is a nightmare. Git doesn't even respect LANG and > LC_CTYPE, and expect everything (including commit messages) to be in > UTF-8 no matter what the user have set his system to. Mercurial is a > little better, since it encodes and decodes commit messages properly, > failing only to filenames. Subversion and Bazaar do the right thing. Not true. Though I don't know about LANG and LC_CTYPE support, it's certainly not true that Git expects UTF8 no matter what; you can override the i18n.commitencoding and i18n.logoutputencoding settings as necessary. -- Benjamin M. A'Lee || mail: [EMAIL PROTECTED] web: http://subvert.org.uk/~bma/ || gpg: 0xBB6D2FA0
signature.asc
Description: Digital signature
_______________________________________________ vcs-home mailing list [email protected] http://lists.madduck.net/listinfo/vcs-home
