[
https://issues.apache.org/jira/browse/OAK-4857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15530687#comment-15530687
]
Alexander Klimetschek commented on OAK-4857:
--------------------------------------------
In general, increasing the number of allowed characters in a newer Oak version
would not be a problem with backwards compatibility.
Removing support for special spaces at start or end (to be in line with the
regular space) however would be a backwards compatibility issue.
Apart from consistency with Jackrabbit, I haven't found a reason why any of the
spaces and whitespaces are not allowed in Jackrabbit & Oak, i.e. going beyond
what the JCR spec says (looking at OAK-3412, OAK-1891, OAK-1174, JCR-3582). I
could imagine this history:
* newlines and tabs were seen as problematic in UIs, e.g. a repository viewer,
so they should be prevented
* {{Character.isWhitespace()}} was used for getting rid of newlines
* and it wasn't noticed that this actually covers more characters than
necessary for the original issue
* regular space was handled as an exception, without being aware of other
spaces in Unicode
* not sure about the leading & trailing spaces though
There is also a small security aspect, i.e. different space characters that
look the same when rendered could potentially be exploited for phishing attacks
or the like. Not sure if this is important here... since basically all other
Unicode characters are supported, there are likely lots of these already, and
such a protection is likely something for an application layer, not the generic
repository infrastructure.
> Support space chars common in CJK inside node names
> ---------------------------------------------------
>
> Key: OAK-4857
> URL: https://issues.apache.org/jira/browse/OAK-4857
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.4.7, 1.5.10
> Reporter: Alexander Klimetschek
> Attachments: OAK-4857-tests.patch
>
>
> Oak (like Jackrabbit) does not allow spaces commonly used in CJK like
> {{u3000}} (ideographic space) or {{u00A0}} (no-break space) _inside_ a node
> name, while allowing them at the _beginning or end_.
> They should be supported for better globalization readiness, and filesystems
> allow them, making common filesystem to JCR mappings unnecessarily hard.
> Escaping would be an option for applications, but there is currently no
> utility method for it
> ([Text.escapeIllegalJcrChars|https://jackrabbit.apache.org/api/2.8/org/apache/jackrabbit/util/Text.html#escapeIllegalJcrChars(java.lang.String)]
> will not escape these spaces), nor is it documented for applications how to
> do so.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)