Hi, As you might now, from 2.6, GLib is using UTF-8 as the file name encoding for all (hopefully) of its API on Windows. It provides so-called gstdio wrappers in <glib/gstdio.h> for the standard POSIX and C functions that take pathnames as arguments, like g_open().
On Unix, these wrappers are simply #defines for the actual C or POSIX function. On Windows, they convert from UTF-8 to wide characters and call the C library's wide character function, for instance _wopen() in g_open(). (Let's ignore Win9x for now.) There were two reasons for this change: 1) Windows file names *are* in Unicode in the file system, so it's certainly most correct to handle them as Unicode and not shoehorn them into a restricted codepage representation. For instance, support file names with Cyrillic letters on a Western European Windows box. I think it is also relatively common in CJK locales to use characters not in the corresponding double-byte codepage. 2) In the double-byte code pages the trailing byte can be '\\', which otherwise is a directory separator. This means that all code that scans pathnames byte by byte looking for backslashes (either stepping through a string manually, or using strchr() or strrchr()) is broken by design, and would need to be rewritten heavily with ugly ifdefs to use multi-byte string functions on Win32. There are a lot of such places. UTF-8 doesn't have any such issue. Now, upper level GNOME libraries that use GLib can mostly be converted trivially to use the gstdio wrappers. (I use "GNOME" in a loose sense here. Of course a GNOME desktop as such doesn't and won't exist on Windows, but many of the GNOME libraries are being ported to Windows so that it will be able to build GNOME applications on Windows.) Now, a problem are libraries that don't use GLib, but are widely used by GNOME libraries. For instance libxml2. As the GNOME libs get "UTF-8 aware", i.e. are converted to use the gstdio wrappers, what should be done with pathnames passed to libxml2? If I convert them to system codepage, this means it won't work to have XML files with pathnames that aren't representable in the system codepage. This will not be good, as the intention otherwise is to make everything work just fine with any non-ASCII file name. I found one earlier message to this list about this issue, http://mail.gnome.org/archives/xml/2001-October/msg00072.html . There the suggested solution was to override libxml2's default I/O interface. Presumably this would be by calling xmlRegisterInputCallbacks() with an open callback that would call the gstdio wrappers, but otherwise would be more or less a copy of the default xmlFileOpen(). Is this still the recommended approach? --tml _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
