[ 
https://issues.apache.org/jira/browse/OAK-3395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chetan Mehrotra updated OAK-3395:
---------------------------------
    Attachment: OAK-3395-2.patch

Thanks Thomas for the extensive review!

Attached is [updated patch|^OAK-3395-2.patch]

bq. by the way, even now an exception is throws if the last character if a 
backslash

In previous case that was handled as {{unescapingRequired}} had a check to 
ensure that any '\' found is at least second last character in the string. So 
the out of bound access in unescape would not happen. However your suggestion 
on simplifying {{unescapingRequired}} looks better so now added a check there

bq. There should be unit tests for the special cases as well (code coverage 
should be 100%).

Done now

In addition I have now used {{RandomStringUtils}} from commons lang (test 
scoped dep) as the logic you suggested might not generate valid unicode chars. 
Commons Lang util ensure that proper unicode chars would be generated

Further about my concern

bq. I am not very sure if escaping as implemented would work fine for unicode 
chars (involving surrogate pair i.e. those not in BMP)

I checked the docs [1] and it mentions following

{quote}
...if an application scans a char sequence for HTML tags, checking each char 
individually, it knows that these tags only use characters from the Basic Latin 
block. If the text being scanned contains supplementary characters, then these 
characters cannot be confused with the tag characters, because UTF-16 
represents supplementary characters using code units whose values are not used 
for BMP characters. 
{quote}

So it confirms that char by char processing approach used would work fine as 
char being search are from ASCII set and in case of surrogate pair (using 2 
chars) its not possible for first char to have values from ASCII (or more 
broader BMP range).

[1] http://www.oracle.com/us/technologies/java/supplementary-142654.html

> RevisionGC fails for JCR paths having line feed characters
> ----------------------------------------------------------
>
>                 Key: OAK-3395
>                 URL: https://issues.apache.org/jira/browse/OAK-3395
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: mongomk, rdbmk
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>            Priority: Minor
>             Fix For: 1.3.7, 1.2.6, 1.0.21
>
>         Attachments: OAK-3395-1.patch, OAK-3395-2.patch
>
>
> RevisionGC fails with error while processing any id (derived from JCR path) 
> having line feed or carriage return char
> This happens because it relies on Oak Commons StringSort and ExternalSort 
> which works with line delimited string and having an id with line break would 
> break this sorting logic. Error reported is like
> {noformat}
> java.lang.AssertionError: Invalid id /1442211320
>       at 
> org.apache.jackrabbit.oak.plugins.document.util.Utils.getDepthFromId(Utils.java:337)
>       at 
> org.apache.jackrabbit.oak.plugins.document.NodeDocumentIdComparator.compare(NodeDocumentIdComparator.java:38)
>       at 
> org.apache.jackrabbit.oak.plugins.document.NodeDocumentIdComparator.compare(NodeDocumentIdComparator.java:30)
>       at java.util.TimSort.countRunAndMakeAscending(TimSort.java:324)
>       at java.util.TimSort.sort(TimSort.java:203)
>       at java.util.TimSort.sort(TimSort.java:173)
>       at java.util.Arrays.sort(Arrays.java:659)
>       at java.util.Collections.sort(Collections.java:217)
>       at 
> org.apache.jackrabbit.oak.commons.sort.ExternalSort.sortAndSave(ExternalSort.java:279)
>       at 
> org.apache.jackrabbit.oak.commons.sort.ExternalSort.sortInBatch(ExternalSort.java:218)
>       at 
> org.apache.jackrabbit.oak.commons.sort.ExternalSort.sortInBatch(ExternalSort.java:257)
>       at 
> org.apache.jackrabbit.oak.commons.sort.StringSort$PersistentState.sort(StringSort.java:191)
>       at 
> org.apache.jackrabbit.oak.commons.sort.StringSort.sort(StringSort.java:88)
>       at 
> org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector$DeletedDocsGC.ensureSorted(VersionGarbageCollector.java:383)
>       at 
> org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector$DeletedDocsGC.getDocIdsToDelete(VersionGarbageCollector.java:274)
>       at 
> org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector$DeletedDocsGC.removeDeletedDocuments(VersionGarbageCollector.java:296)
>       at 
> org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector$DeletedDocsGC.removeDocuments(VersionGarbageCollector.java:241)
>       at 
> org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector.collectDeletedDocuments(VersionGarbageCollector.java:154)
>       at 
> org.apache.jackrabbit.oak.plugins.document.VersionGarbageCollector.gc(VersionGarbageCollector.java:105)
>       at 
> org.apache.jackrabbit.oak.plugins.document.VersionGCDeletionTest.gcWithPathsHavingNewLine(VersionGCDeletionTest.java:203)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to