Author: reschke
Date: Thu Feb  1 13:16:36 2018
New Revision: 1822875

URL: http://svn.apache.org/viewvc?rev=1822875&view=rev
Log:
OAK-5506: add a note about how we currently treat 'broken' Java strings

Modified:
    jackrabbit/oak/trunk/oak-doc/src/site/markdown/constraints.md

Modified: jackrabbit/oak/trunk/oak-doc/src/site/markdown/constraints.md
URL: 
http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/constraints.md?rev=1822875&r1=1822874&r2=1822875&view=diff
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/constraints.md (original)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/constraints.md Thu Feb  1 
13:16:36 2018
@@ -41,3 +41,18 @@ Finally, the chosen persistence implemen
 
 - in the "Document NodeStore", the UTF-8 representation of local names can not 
exceed ~150 bytes.
 
+## Invalid Java Strings
+
+Due to the way Java represents characters in strings, not every String is a 
valid sequence of
+Unicode code points. This is because *two* characters are needed to represent 
Unicode
+"suuplementary characters". If these "surrogate" characters do not appear as a 
wellformed
+pair, the Java string can not be serialized to a sequence of Unicode 
characters, nor to
+a byte sequence (using UTF-8 character encoding).
+
+The system behaviour for these strings is currently undefined. This means that 
they
+might get rejected, that they might get accepted but information is lost when 
they
+are stored, or they might be stored and retrieved faithfully.
+
+See [OAK-5505](https://issues.apache.org/jira/browse/OAK-5506) fur further 
information.
+
+


Reply via email to