On Thu, Aug 2, 2012 at 6:52 PM, Evert Pot <[email protected]> wrote:
> > My two cents as a user: Why is this still a problem? I am using Linux & > Windows and all my filenames are fine with some accented & special > characters which are not present in English alphabet. Where's the real > issue with UTF8, why do we need to convert it to anything else? Isn't UTF8 > the same for all OS and filesystems and databases etc? > > I can chime in here! > > Just the 3 main differences between the operating systems: > > 1. Linux does not encode filenames. Any byte sequence is allowed for > filesnames except 0x00 and the slash (/). This implies that you can create > filenames with backspaces, bells or other crazy stuff that's not valid in > most encodings. > OK, we just limit the characters that can be used in the filenames and client or web interface shall give an error if user tries to upload something with a strange filename. 2. Windows internally uses a type of UTF-16. (not exactly, forgot the > precise name). This does indeed support most characters and I'm not aware > of any direct issues with this. > However! If you run owncloud on a windows machine, you cannot make use of > this. On a english windows server all the PHP filesystem api's talk CP1252 > (which is kind of a superset of latin1). This means that if owcloud on > windows is the server, you cannot store most characters. > This issue with PHP on Windows is not nice and I have nothing to comment on this since Owncloud heavily depends on PHP. Maybe we can consider dropping server support for windows (or use some other API than the PHP one). > > 3. OS/X uses UTF-8, BUT! They normalize to unicode normalization form D. > (kind of, mostly.. not exactly the standard normalization form). In a > nutshell this means that a character like ü (u-umloat) is stored as 2 > unicode codepoints (the ¨ and the u separately). Windows is more likely to > combine them into a single codepoint. > Since OS/X is generally not used as a server, can't the client or web interface handle this when detected? > Because Windows doesn't normalize, it means that two files with different > (but very similar) names will be normalized to a single filename on HTFS+ > filesystems. Lastly.. the normalization form OS/X uses, actually behave > buggy on windows when I checked it (granted, this was Windows XP). > I've never faced such issue in Windows. (I assume you mean NTFS). Maybe this is too remote possibility to consider? > If you want the details, I wrote a blog post about this a few years ago: > http://www.rooftopsolutions.nl/blog/filesystem-encoding-and-php Thanks, I'll definitely read it. :) -- Emre
_______________________________________________ Owncloud mailing list [email protected] https://mail.kde.org/mailman/listinfo/owncloud
