[ 
https://issues.apache.org/jira/browse/OAK-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15609437#comment-15609437
 ] 

Alexander Klimetschek commented on OAK-4857:
--------------------------------------------

To make some progress on this front, could we at least *document* the current 
state of allowed characters in Oak?

And provide an appropriate escaping/unescaping utility class, with methods for 
both individual names and paths? Which btw could be shared with Jackrabbit 2, 
as the current state in Oak is the same as the current Jackrabbit 2 releases. 
Not sure if jackrabbit-api, jackrabbit-jcr-commons or some other (oak specific) 
place would be the right location.

Here is what I know so far:
* illegal node name if entire name is empty or {{.}} or {{..}}
* no length limit (\?)
* otherwise name can have all unicode chars except:
* JCR illegal chars {{/ : \[ ] | *}}
* 
[Character.isWhitespace()|https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html#isWhitespace(char)],
 except for regular space {{u20}} which is allowed, except first or last char

Now someone needs to list the actual individual chars behind the 
Character.isWhitespace() :)

> Support space chars common in CJK inside node names
> ---------------------------------------------------
>
>                 Key: OAK-4857
>                 URL: https://issues.apache.org/jira/browse/OAK-4857
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.4.7, 1.5.10
>            Reporter: Alexander Klimetschek
>         Attachments: OAK-4857-tests.patch
>
>
> Oak (like Jackrabbit) does not allow spaces commonly used in CJK like 
> {{u3000}} (ideographic space) or {{u00A0}} (no-break space) _inside_ a node 
> name, while allowing some of them (the non breaking spaces) at the _beginning 
> or end_.
> They should be supported for better globalization readiness, and filesystems 
> allow them, making common filesystem to JCR mappings unnecessarily hard. 
> Escaping would be an option for applications, but there is currently no 
> utility method for it 
> ([Text.escapeIllegalJcrChars|https://jackrabbit.apache.org/api/2.8/org/apache/jackrabbit/util/Text.html#escapeIllegalJcrChars(java.lang.String)]
>  will not escape these spaces), nor is it documented for applications how to 
> do so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to