> On Mon, 2010-11-01 at 10:36 +0100, Joerg Schilling > wrote: > > "Garrett D'Amore" <garr...@damore.org> wrote: > > > > > On Mon, 2010-10-25 at 00:43 -0700, shilpa wrote: > > > > How would glob feature work, in case multibyte > filenames are allowed? Because a multibyte character > is a combination of more than one character, which > includes glob characters like "{", "[".... > > > > > > I'm not sure how globbing would work, frankly. > However, I believe UTF8 > > and other common multibyte character schemes > always have bytes with the > > high-order bit set, so that there is never a > multibyte character that > > has component bytes that collide with ASCII. So > this problem should be > > a non-issue. > > The assumption that multi-byte characters use octets > with the high order bit > set is only correct for so called stateless locales. > > Locales that use shift codes behave different. > Actually, its a safe assumption for UTF-8, which is > the main concern I > think. > > The bigger question here is not locales, but > character encoding schemes, > I think. Specifically we're talking about filenames, > which do not > inherently carry a locale with them, but might be > encoded in one of a > small number of locales... for UTF-8 I believe the > code is fine. > > Also, if the code is using libc's glob interfaces, > its fine too, because > libc's glob code is sensitive to the locale and > correctly handles > stateful encodings. > > - Garrett
Between any two systems such that the filesystems in question are content to store arbitrary bytes in the name (other than / and '\0' of course), and where the commands including the filenames are passed 8-bit clean, I'd expect the name would be preserved, assuming that a UTF-8 encoding is used to read the filename on both ends. Globbing depends on the capabilities of the system that's doing the expansion (whichever is sending, I imagine, ie remote for mget, local for put). Ideally all should convert to UTF-8 to send the filenames, and if needed from UTF-8 to store them. But that doesn't happen, AFAIK. On most Unix systems, if you filenames are in UTF-8, it should just work. But some filesystems on those OS's may not support UTF-8. FAT filesystems probably won't; NTFS (if supported) is AFAIK in UTF-16; not sure what the limits are on hsfs without Rock Ridge extensions, etc. Given the present sorry state of I18n support in almost _all_ ftp clients and servers, it's pretty good that it sort of works between Unix systems when the files on both ends will be going to/from filesystems that can handle UTF-8 names. As I mentioned elsewhere in this or a related thread, there is allegedly at least one open-source ftp server that purports to support modern protocol extensions for doing this correctly: http://www.pro-bono-publico.de/projects/ftpd.html I still haven't found a command-line ftp client that's serious about i18n. At least that's how I'd understand it all... -- This message posted from opensolaris.org _______________________________________________ opensolaris-code mailing list opensolaris-code@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/opensolaris-code