[ https://issues.apache.org/jira/browse/OAK-7279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Dürig updated OAK-7279: ------------------------------- Labels: tech-debt (was: ) > segment-tar update from java 7 to java 8 may break persisted names using > invalid characters > ------------------------------------------------------------------------------------------- > > Key: OAK-7279 > URL: https://issues.apache.org/jira/browse/OAK-7279 > Project: Jackrabbit Oak > Issue Type: Bug > Components: segment-tar > Reporter: Julian Reschke > Priority: Minor > Labels: tech-debt > > segment-tar relies on {{String.getBytes()}} when persisting strings such as > item names. > The problem is that the behavior for this has been changed in Java 8 with > respect to invalid strings (here: null characters and unpaired surrogates). > In Java 7, these would roundtrip, as Java was using the so-called "modified > UTF-8" encoding (see > https://docs.oracle.com/javase/6/docs/api/java/io/DataInput.html#modified-utf-8). > This will produce byte sequence that are *not* valid UTF-8. > Java 7 will read them back, but Java 8 will map the non-conforming byte > sequences to the Unicode replacement character. Note that in particular, > multiple child entries might get identical names as a consequence. > I'm not sure about the severity of this, and whether something needs to be > done about it. AFAIC, this is another good reason to reject invalid strings > as early as possible in the stack. > cc [~mduerig] -- This message was sent by Atlassian JIRA (v7.6.3#76005)