On Tue, Aug 26, 2008 at 11:02:40PM -0300, Juliano F. Ravasi wrote:
> martin f krafft wrote:
> > Git supports unicode filenames and binary files as large as you want
> > them.
> 5. Git doesn't actually support Unicode filenames (neither does
> Mercurial). Both just store whatever the file name is in the filesystem
> directly into the repository, as just an array of bytes. You won't
> notice this unless you create files with names containing characters
> beyond the ASCII set, and use different encodings in different
> computers. This also causes problems when cloning your repositories to
> systems that are Unicode-aware (like Windows[1] and MacOS X[2]).
> [1] http://code.google.com/p/msysgit/issues/detail?id=80
> [2] http://kerneltrap.org/mailarchive/git/2008/1/16/573827
> I have files with names in Portuguese (which contains acutes, graves,
> circumflexes, umlauts, tildes and cedillas), and also a few ones in
> Japanese. Although UTF-8 should now be the default for Linux, many
> people (well, I say for Portuguese speakers at least) configure their
> systems to ISO-8859-1, because that was what was used everywhere until
> some time ago, and that is what Portuguese-language Windows
> installations store in FAT32 filesystems (very common in pen-drives).

So if your system is wrongly configured to use a non-UTF8 encoding, the
filenames break. (Or, rather, they break until you set a UTF-8 locale;
any errors aren't permanent.)

There is a good reason for this, and it's explained in the git-log
manual page: it's not necessarily possible to convert something to UTF-8
(or any other Unicode encoding) and convert it back without introducing
errors, especially with some less-commonly-used character sets. Treating
everything as a sequence of bytes is far safer (not to mention faster)
than converting everything every time it's commited or checked out.

> Syncing repositories using Git or Mercurial between systems using
> different encodings is a nightmare. Git doesn't even respect LANG and
> LC_CTYPE, and expect everything (including commit messages) to be in
> UTF-8 no matter what the user have set his system to. Mercurial is a
> little better, since it encodes and decodes commit messages properly,
> failing only to filenames. Subversion and Bazaar do the right thing.

Not true. Though I don't know about LANG and LC_CTYPE support, it's
certainly not true that Git expects UTF8 no matter what; you can
override the i18n.commitencoding and i18n.logoutputencoding settings as

Benjamin M. A'Lee || mail: [EMAIL PROTECTED]
web: http://subvert.org.uk/~bma/ || gpg: 0xBB6D2FA0

Attachment: signature.asc
Description: Digital signature

vcs-home mailing list

Reply via email to