Lars Kristan scripsit: > I need to store UNIX filenames in a UTF-16 database residing on Windows. If > I use ANSI->Unicode, there is no problem. However, if I have a filesystem > with filenames mainly in UTF-8? Nobody can guarantee that all of them will > be in UTF-8. Some may still be in ANSI (well ISO). Actually, at some point > in time, there will be UNIX servers with 50% of filenames in UTF-8 and 50% > in ANSI (or something else for that matter). > > Hence my example of "ls > ls.out". My requirement is that there can be no > data loss.
Frankly, your problem is insoluble, because you have set up self-contradictory requirements. Suppose you are dealing with a filesystem where some names are to be interpreted as Latin-1 and others as Latin-2. The kernel will give you absolutely no help about which charset to use for which names, nor are there any Unix utilities which would be able to cope. Filesystems simply aren't meant to manage multiple charsets in names. Suppose some names were ASCII and some EBCDIC: what would you be able to do then? (EBCDIC file names couldn't include 2F, but since that is U+0007 = BELL, it isn't much of a problem.) The only way to ensure "no data loss" is to store file names as uninterpreted byte sequences, and forget about characters altogether. Which is what the kernel actually does: only 00 and 2F mean anything to it. -- John Cowan http://www.ccil.org/~cowan [EMAIL PROTECTED] To say that Bilbo's breath was taken away is no description at all. There are no words left to express his staggerment, since Men changed the language that they learned of elves in the days when all the world was wonderful. --_The Hobbit_

