[ 
https://issues.apache.org/jira/browse/OAK-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14549790#comment-14549790
 ] 

Chetan Mehrotra commented on OAK-2882:
--------------------------------------

Thanks [~jsedding] for the feedback. Having file lazily generated makes sense 
and would simplify usage of this feature for cases like S3DataStore

attached is [updated patch|^OAK-2882-v2.patch] with following changes
* Changed the DataStore name to 
{{org.apache.jackrabbit.oak.upgrade.blob.LengthCachingDataStore}}
* Config params
** {{mappingFilePath}} - (optional) It defaults to 
{{$\{rep.home\}/datastore-list.txt}}. If the file does not exist then it would 
be created upon close with all the mapping data recorded so far. File would re 
used on subsequent starts and would be updated again if new entries are found
** {{delegateClass}} - (required) FQN of delegating DataStore like 
{{org.apache.jackrabbit.core.data.FileDataStore}}
** {{delegateConfigFilePath}} - (optional) - If the delegating DataStore 
requires config then that can be provided as a properties file and that file 
can be specified in this param

> Support migration without access to DataStore
> ---------------------------------------------
>
>                 Key: OAK-2882
>                 URL: https://issues.apache.org/jira/browse/OAK-2882
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: upgrade
>            Reporter: Chetan Mehrotra
>            Assignee: Chetan Mehrotra
>              Labels: docs-impacting
>             Fix For: 1.3.0, 1.0.15
>
>         Attachments: OAK-2882-v2.patch, OAK-2882.patch, 
> build_datastore_list.sh
>
>
> Migration currently involves access to DataStore as its configured as part of 
> repository.xml. However in complete migration actual binary content in 
> DataStore is not accessed and migration logic only makes use of
> * Dataidentifier = id of the files
> * Length = As it gets encoded as part of blobId (OAK-1667)
> It would be faster and beneficial to allow migration without actual access to 
> the DataStore. It would serve two benefits
> # Allows one to test out migration on local setup by just copying the TarPM 
> files. For e.g. one can only zip following files to get going with repository 
> startup if we can somehow avoid having direct access to DataStore
> {noformat}
> >crx-quickstart# tar -zcvf repo-2.tar.gz repository 
> >--exclude=repository/repository/datastore 
> >--exclude=repository/repository/index 
> >--exclude=repository/workspaces/crx.default/index 
> >--exclude=repository/tarJournal
> {noformat}
> # Provides faster (repeatable) migration as access to DataStore can be 
> avoided which in cases like S3 might be slow.  Given we solve how to get 
> length
> *Proposal*
> Have a DataStore implementation which can be provided a mapping file having 
> entries for blobId and length. This file would be used to answer queries 
> regarding length and existing of blob and thus would avoid actual access to 
> DataStore.
> Going further this DataStore can be configured with a delegate which can be 
> used as a fallback in case the required details is not present in pre 
> computed data set (may be due to change in content after that data was 
> computed)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to