[ 
https://issues.apache.org/jira/browse/OAK-3683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344093#comment-15344093
 ] 

Chetan Mehrotra commented on OAK-3683:
--------------------------------------

Some suggestion around supporting it with Mongo. Note that OAK-4471 is somewhat 
related to discussion below

*A - Save as Binary*

Instead of persisting string values as string we can persist them in serialized 
form. So all property values would be stored as byte. 

# Pros
## We need not determine all problematic sequences and rely on the fact the 
Java String -> UTF-8 byte array -> Java String is always safe
# Cons
## Make the persisted document obscure i.e. looking at Mongo document it would 
not be possible to decipher the value. This can somewhat be addressed by 
providing some tooling but still would pose problem while analyzing data dumps 
via std tools like grep etc without doing any post processing

*B - Save problematic values as base64 encoded byte array*

Here we can modify the  {{JsonSerializer}} to check if the value has such 
unicode sequence. If yes then instead if saving value as normal string value we 
first obtain a byte array with UTF-8 encoding and then save that in Base64 
encoded format with some type hint (i.e. its encoded byte array). 

Then in {{DocumentPropertyState}} if the value is found to be base64 encoded we 
de serialize it properly

# Pros 
## Does not make documents obscure
# Cons
## Serialization logic would need to check in advance each value being 
serialized. We would need to see how much performance impact it would have. 
Note there is no adverse cost in read for normal string. If that becomes a 
concern we can fix that by having a new variant of the JsopBuilder#encode which 
also returns if the encoded string is "safe". If it says safe then encoded 
value is directly passed to JsopBuilder otherwise we fallback to base64 encoded 
approach
## We would need to know the sequence which can cause such problem and ensure 
all cases are covered



> BasicDocumentStore.testInterestingStrings failure on MongoDB after OAK-3651 
> with Java 8
> ---------------------------------------------------------------------------------------
>
>                 Key: OAK-3683
>                 URL: https://issues.apache.org/jira/browse/OAK-3683
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: mongomk
>    Affects Versions: 1.4
>         Environment: MongoDB 2.6.9, MongoDB 3.0.2
> Java 8
>            Reporter: Robert Munteanu
>
> On Java 8 only the following test fails:
> {noformat}Failed tests:   testInterestingStrings[MongoFixture: 
> MongoDB](org.apache.jackrabbit.oak.plugins.document.BasicDocumentStoreTest): 
> failure to round-trip brokensurrogate through MongoDB expected:<[?]> but 
> was:<[�]>{noformat}
> According to git bisect the commit which started showing this error was 
> [r1715092|http://svn.apache.org/viewvc?view=revision&revision=r1715092]: 
> OAK-3651 - Remove HierrachialCacheInvalidator
> The command I used to run the tests was {{mvn -am -pl oak-core clean package 
> -Dtest=BasicDocumentStoreTest -DfailIfNoTests=false}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to