[ https://issues.apache.org/jira/browse/OAK-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chetan Mehrotra updated OAK-3395: --------------------------------- Attachment: OAK-3395-2.patch Thanks Thomas for the extensive review! Attached is [updated patch|^OAK-3395-2.patch] bq. by the way, even now an exception is throws if the last character if a backslash In previous case that was handled as {{unescapingRequired}} had a check to ensure that any '\' found is at least second last character in the string. So the out of bound access in unescape would not happen. However your suggestion on simplifying {{unescapingRequired}} looks better so now added a check there bq. There should be unit tests for the special cases as well (code coverage should be 100%). Done now In addition I have now used {{RandomStringUtils}} from commons lang (test scoped dep) as the logic you suggested might not generate valid unicode chars. Commons Lang util ensure that proper unicode chars would be generated Further about my concern bq. I am not very sure if escaping as implemented would work fine for unicode chars (involving surrogate pair i.e. those not in BMP) I checked the docs [1] and it mentions following {quote} ...if an application scans a char sequence for HTML tags, checking each char individually, it knows that these tags only use characters from the Basic Latin block. If the text being scanned contains supplementary characters, then these characters cannot be confused with the tag characters, because UTF-16 represents supplementary characters using code units whose values are not used for BMP characters. {quote} So it confirms that char by char processing approach used would work fine as char being search are from ASCII set and in case of surrogate pair (using 2 chars) its not possible for first char to have values from ASCII (or more broader BMP range). [1] http://www.oracle.com/us/technologies/java/supplementary-142654.html > RevisionGC fails for JCR paths having line feed characters > ---------------------------------------------------------- > > Key: OAK-3395 > URL: https://issues.apache.org/jira/browse/OAK-3395 > Project: Jackrabbit Oak > Issue Type: Bug > Components: mongomk, rdbmk > Reporter: Chetan Mehrotra > Assignee: Chetan Mehrotra > Priority: Minor > Fix For: 1.3.7, 1.2.6, 1.0.21 > > Attachments: OAK-3395-1.patch, OAK-3395-2.patch > > > RevisionGC fails with error while processing any id (derived from JCR path) > having line feed or carriage return char > This happens because it relies on Oak Commons StringSort and ExternalSort > which works with line delimited string and having an id with line break would > break this sorting logic. Error reported is like > {noformat} > java.lang.AssertionError: Invalid id /1442211320 > at > org.apache.jackrabbit.oak.plugins.document.util.Utils.getDepthFromId(Utils.java:337) > at > org.apache.jackrabbit.oak.plugins.document.NodeDocumentIdComparator.compare(NodeDocumentIdComparator.java:38) > at > org.apache.jackrabbit.oak.plugins.document.NodeDocumentIdComparator.compare(NodeDocumentIdComparator.java:30) > at java.util.TimSort.countRunAndMakeAscending(TimSort.java:324) > at java.util.TimSort.sort(TimSort.java:203) > at java.util.TimSort.sort(TimSort.java:173) > at java.util.Arrays.sort(Arrays.java:659) > at java.util.Collections.sort(Collections.java:217) > at > org.apache.jackrabbit.oak.commons.sort.ExternalSort.sortAndSave(ExternalSort.java:279) > at > org.apache.jackrabbit.oak.commons.sort.ExternalSort.sortInBatch(ExternalSort.java:218) > at > org.apache.jackrabbit.oak.commons.sort.ExternalSort.sortInBatch(ExternalSort.java:257) > at > org.apache.jackrabbit.oak.commons.sort.StringSort$PersistentState.sort(StringSort.java:191) > at > org.apache.jackrabbit.oak.commons.sort.StringSort.sort(StringSort.java:88) > at > org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector$DeletedDocsGC.ensureSorted(VersionGarbageCollector.java:383) > at > org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector$DeletedDocsGC.getDocIdsToDelete(VersionGarbageCollector.java:274) > at > org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector$DeletedDocsGC.removeDeletedDocuments(VersionGarbageCollector.java:296) > at > org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector$DeletedDocsGC.removeDocuments(VersionGarbageCollector.java:241) > at > org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector.collectDeletedDocuments(VersionGarbageCollector.java:154) > at > org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector.gc(VersionGarbageCollector.java:105) > at > org.apache.jackrabbit.oak.plugins.document.VersionGCDeletionTest.gcWithPathsHavingNewLine(VersionGCDeletionTest.java:203) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)