> My two cents as a user: Why is this still a problem? I am using Linux & > Windows and all my filenames are fine with some accented & special characters > which are not present in English alphabet. Where's the real issue with UTF8, > why do we need to convert it to anything else? Isn't UTF8 the same for all OS > and filesystems and databases etc?
I can chime in here! Just the 3 main differences between the operating systems: 1. Linux does not encode filenames. Any byte sequence is allowed for filesnames except 0x00 and the slash (/). This implies that you can create filenames with backspaces, bells or other crazy stuff that's not valid in most encodings. 2. Windows internally uses a type of UTF-16. (not exactly, forgot the precise name). This does indeed support most characters and I'm not aware of any direct issues with this. However! If you run owncloud on a windows machine, you cannot make use of this. On a english windows server all the PHP filesystem api's talk CP1252 (which is kind of a superset of latin1). This means that if owcloud on windows is the server, you cannot store most characters. 3. OS/X uses UTF-8, BUT! They normalize to unicode normalization form D. (kind of, mostly.. not exactly the standard normalization form). In a nutshell this means that a character like ü (u-umloat) is stored as 2 unicode codepoints (the ¨ and the u separately). Windows is more likely to combine them into a single codepoint. Because Windows doesn't normalize, it means that two files with different (but very similar) names will be normalized to a single filename on HTFS+ filesystems. Lastly.. the normalization form OS/X uses, actually behave buggy on windows when I checked it (granted, this was Windows XP). If you want the details, I wrote a blog post about this a few years ago: http://www.rooftopsolutions.nl/blog/filesystem-encoding-and-php Evert _______________________________________________ Owncloud mailing list [email protected] https://mail.kde.org/mailman/listinfo/owncloud
