[
https://issues.apache.org/jira/browse/OAK-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168925#comment-15168925
]
Ian Boston commented on OAK-2920:
---------------------------------
If the DB config is broken, then content that expects UTF8 in the Path will
fail to import as the IDs will be rejected as duplicates. For instance any
application that stores i18n content in the repository and needs to work with
any language that has double byte characters (eg German) will fail. ID
duplicates are easy to detect. Much harder to detect is data corruption within
JCR properties as a user using Oak via a WebUI could suspect any of the links
between the Browser and the DB as the source of UTF8 corruption.
Taking mySQL as an example. Without utf8, Characters in common use in EU
countries cant be stored as JCR properties.
http://www.periodni.com/unicode_utf-8_encoding.html. Without utf8mb4,
supplementary UTF8 characters can't be stored as JCR properties.
http://www.i18nguy.com/unicode/supplementary-test.html
For those reasons, any database or JDBC connection that is misconfigured is
likely to cause considerable problems in production and probably won't work
with most modern applications that have been internationalised or need to
mention the Euro. € €
One approach to detect this is to write a row to the nodes table containing
supplementary UTF8 characters, commit the row, and then read the same row back,
verifying that the data survived the round trip. Finally delete the row. The ID
of the row can be something that Oak would never use with a low probability of
collision with other Oak instances in the same cluster. (ie ms timestamp eg
21313412313:utf8test). If there is a concern about tables other than the nodes
table, then those can be tested as well.
A switch should be provided to allow those who have managed to run Oak in
production with a misconfigured database to at least keep running in production
while they correct the issue. For mySQL this might be as simple as correcting
the JDBC url to include utf8mb4 encoding.
> RDBDocumentStore: fail init when database config seems to be inadequate
> -----------------------------------------------------------------------
>
> Key: OAK-2920
> URL: https://issues.apache.org/jira/browse/OAK-2920
> Project: Jackrabbit Oak
> Issue Type: Sub-task
> Components: rdbmk
> Reporter: Julian Reschke
> Priority: Minor
> Labels: resilience
>
> It has been suggested that the implementation should fail to start (rather
> than warn) when it detects a DB configuration that is likely to cause
> problems (such as wrt character encoding or collation sequences)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)