[
https://issues.apache.org/jira/browse/OAK-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chetan Mehrotra updated OAK-3395:
---------------------------------
Attachment: OAK-3395-2.patch
Thanks Thomas for the extensive review!
Attached is [updated patch|^OAK-3395-2.patch]
bq. by the way, even now an exception is throws if the last character if a
backslash
In previous case that was handled as {{unescapingRequired}} had a check to
ensure that any '\' found is at least second last character in the string. So
the out of bound access in unescape would not happen. However your suggestion
on simplifying {{unescapingRequired}} looks better so now added a check there
bq. There should be unit tests for the special cases as well (code coverage
should be 100%).
Done now
In addition I have now used {{RandomStringUtils}} from commons lang (test
scoped dep) as the logic you suggested might not generate valid unicode chars.
Commons Lang util ensure that proper unicode chars would be generated
Further about my concern
bq. I am not very sure if escaping as implemented would work fine for unicode
chars (involving surrogate pair i.e. those not in BMP)
I checked the docs [1] and it mentions following
{quote}
...if an application scans a char sequence for HTML tags, checking each char
individually, it knows that these tags only use characters from the Basic Latin
block. If the text being scanned contains supplementary characters, then these
characters cannot be confused with the tag characters, because UTF-16
represents supplementary characters using code units whose values are not used
for BMP characters.
{quote}
So it confirms that char by char processing approach used would work fine as
char being search are from ASCII set and in case of surrogate pair (using 2
chars) its not possible for first char to have values from ASCII (or more
broader BMP range).
[1] http://www.oracle.com/us/technologies/java/supplementary-142654.html
> RevisionGC fails for JCR paths having line feed characters
> ----------------------------------------------------------
>
> Key: OAK-3395
> URL: https://issues.apache.org/jira/browse/OAK-3395
> Project: Jackrabbit Oak
> Issue Type: Bug
> Components: mongomk, rdbmk
> Reporter: Chetan Mehrotra
> Assignee: Chetan Mehrotra
> Priority: Minor
> Fix For: 1.3.7, 1.2.6, 1.0.21
>
> Attachments: OAK-3395-1.patch, OAK-3395-2.patch
>
>
> RevisionGC fails with error while processing any id (derived from JCR path)
> having line feed or carriage return char
> This happens because it relies on Oak Commons StringSort and ExternalSort
> which works with line delimited string and having an id with line break would
> break this sorting logic. Error reported is like
> {noformat}
> java.lang.AssertionError: Invalid id /1442211320
> at
> org.apache.jackrabbit.oak.plugins.document.util.Utils.getDepthFromId(Utils.java:337)
> at
> org.apache.jackrabbit.oak.plugins.document.NodeDocumentIdComparator.compare(NodeDocumentIdComparator.java:38)
> at
> org.apache.jackrabbit.oak.plugins.document.NodeDocumentIdComparator.compare(NodeDocumentIdComparator.java:30)
> at java.util.TimSort.countRunAndMakeAscending(TimSort.java:324)
> at java.util.TimSort.sort(TimSort.java:203)
> at java.util.TimSort.sort(TimSort.java:173)
> at java.util.Arrays.sort(Arrays.java:659)
> at java.util.Collections.sort(Collections.java:217)
> at
> org.apache.jackrabbit.oak.commons.sort.ExternalSort.sortAndSave(ExternalSort.java:279)
> at
> org.apache.jackrabbit.oak.commons.sort.ExternalSort.sortInBatch(ExternalSort.java:218)
> at
> org.apache.jackrabbit.oak.commons.sort.ExternalSort.sortInBatch(ExternalSort.java:257)
> at
> org.apache.jackrabbit.oak.commons.sort.StringSort$PersistentState.sort(StringSort.java:191)
> at
> org.apache.jackrabbit.oak.commons.sort.StringSort.sort(StringSort.java:88)
> at
> org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector$DeletedDocsGC.ensureSorted(VersionGarbageCollector.java:383)
> at
> org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector$DeletedDocsGC.getDocIdsToDelete(VersionGarbageCollector.java:274)
> at
> org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector$DeletedDocsGC.removeDeletedDocuments(VersionGarbageCollector.java:296)
> at
> org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector$DeletedDocsGC.removeDocuments(VersionGarbageCollector.java:241)
> at
> org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector.collectDeletedDocuments(VersionGarbageCollector.java:154)
> at
> org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector.gc(VersionGarbageCollector.java:105)
> at
> org.apache.jackrabbit.oak.plugins.document.VersionGCDeletionTest.gcWithPathsHavingNewLine(VersionGCDeletionTest.java:203)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)