[
https://issues.apache.org/jira/browse/OAK-4810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15492777#comment-15492777
]
Chetan Mehrotra commented on OAK-4810:
--------------------------------------
bq. I think default for writing (if not configured explicitly) could still be
SHA-1.
The change can be made anytime. It should not affect any other part much. So
default value can be simply switched to SHA-256
Once a binary is added by any digest method we do not need the method details
while doing a read as that would be purely on the basis of id. Still it would
be good to encode the algo in the id which is passed back to NodeStore
> FileDataStore: support SHA-2
> ----------------------------
>
> Key: OAK-4810
> URL: https://issues.apache.org/jira/browse/OAK-4810
> Project: Jackrabbit Oak
> Issue Type: New Feature
> Components: blob
> Reporter: Thomas Mueller
>
> The FileDataStore currently uses SHA-1, but that algorithm is deprecated. We
> should support other algorithms as well (mainly SHA-256).
> Migration should be painless (no long downtime). I think default for writing
> (if not configured explicitly) could still be SHA-1. But when reading,
> SHA-256 should also be supported (depending on the identifier). That way, the
> new Oak version for all repositories (in a cluster + shared datastore) can be
> installed "slowly".
> After all repositories are running with the new Oak version, the
> configuration for SHA-256 can be enabled. That way, SHA-256 is used for new
> binaries. Both SHA-1 and SHA-256 are supported for reading.
> One potential downside is deduplication would suffer a bit if a new Blob with
> same content is added again as digest based match would fail. That can be
> mitigated by computing 2 types of digest if need arises. The downsides are
> some additional file operations and CPU, and slower migration to SHA-256.
> Some other open questions:
> * While we are at it, it might makes senses to additionally support SHA-3 and
> other algorithms (make it configurable). But the length of the identifier
> alone might then not be enough information to know what algorithm is used, so
> maybe add a prefix.
> * The number of subdirectory levels: should we keep it as is, or should we
> reduce it (for example one level less).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)