Hi,
this week, we made some promising advances on the syncing client to reduce the
amount of problems leading to incomplete syncs and conflicts. However one of
these fixes requires some input: Unicode normalization and different Operating
systems going differently about it:
What is Unicode normalization?
In unicode, some special characters can be stored in two ways: Decomposed and
Composed. Making them one or the other is called "Unicode Normalization. There
are 4 forms of normalization: NFC (Normalization Form C, i.e. Composed
Normalization) and NFD (Normalization From D, i.e. Decomposed Normalization)
(and one compatibility mapping for each, read
http://en.wikipedia.org/wiki/Unicode_equivalence if you are interested in the
details)
Example: In NFC, so the 'é' in "Amélie" can will be stored as 'é', in NFD, it's
stored as two characters 'e'+' ◌́' (where the latter means "accent on top of
the previous character").
Mac OS, by default, stores all its files as NFD, whereas Linux and Windows use
NFC. the W3C also mandates that special characters URLs should be in NFC prior
to percent-encoding them (check the IRI RFC for details):
What is Unicode normalization not?
- URL percent encoding
- Variable-width encoding (UTF-7, UTF-8, UTF-16, UTF-32)
Why is that a problem?
- Files that should be the same are not (Create the same file with an 'é' on
Linux (or Windows) and on Mac. Upload both to the server: You will see two
identical files on the server (and on the clients after sync). And in fact,
they are both there. And both are valid -> Certainly unexpected.
- Bizarre problems when syncing directories with umlauts to a Mac (could also
be shadowing another bug, we are investigating this atm)
So now I have a fix for the ownCloud Client that normalizes all files towards
the URL "interface" (which mandates NFC) when sending any request to the
server. Other webdav clients for Mac seem to do the same. Still the server
needs to do the same on its side: Normalize whatever hits it from the client
side into what the server OS needs (usually NFC, unless it's a Mac server) and
vice versa (NFC towards the client). Ideally this should still go into 4.5.2.
PHP has Normalizer::normalize
(http://php.net/manual/en/normalizer.normalize.php), suggested in
https://github.com/owncloud/mirall/issues/45, which mandates the intl extension
(a new dependency, although fairly standard). I have not yet figured out if PHP
iconv (already a hard dependency) is capable of doing normalization, and could
use some help there.
We also need to make sure to release a patched 5.4.2 along with a patched 1.1.2
in this scenario to make sure there are no issues with existing client
installations. Also, the server might want to try and look for an NFD-encoded
(or NFC-encoded on a Mac Server) version of the file if it does not exist in
its native encoding, and rename it in that case.
Also, what do we do if both versions exist on the server (should be a rare case
though)?
Cheers,
Daniel
--
www.owncloud.com - Your Data, Your Cloud, Your Way!
ownCloud GmbH, GF: Markus Rex, Holger Dyroff
Schloßäckerstrasse 26a, 90443 Nürnberg, HRB 28050 (AG Nürnberg)
_______________________________________________
Owncloud mailing list
[email protected]
https://mail.kde.org/mailman/listinfo/owncloud