[
https://issues.apache.org/jira/browse/OAK-11991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18035037#comment-18035037
]
Ieran Draghiciu commented on OAK-11991:
---------------------------------------
With [PR|https://github.com/apache/jackrabbit-oak/pull/2604] I introduced
copying blobs in batches (1000 blob/batch):
- split the blobs in batches (each batch has 1000 blobs, last blobs will be
handled in separat batch)
- start copy each batch
- start to check the copy status. the advantage here is that by the time we
check the first blob it should already be copied and not wait for blob to be
copied.
I did testing on an env with 3717 and the copy was done in ~1min2sec.
{code:java}
(recovery process starts)
03.11.2025 10:13:58.251 *WARN* [segmentstore-init-6]
org.apache.jackrabbit.oak.segment.file.tar.TarReader Could not find a valid tar
index in [data00010a.tar], recovering...
03.11.2025 10:13:58.252 *INFO* [segmentstore-init-6]
org.apache.jackrabbit.oak.segment.file.tar.TarReader Recovering segments from
tar file data00010a.tar
(lists all blobs)
...
03.11.2025 10:13:58.822 *INFO* [reactor-http-nio-2]
org.apache.jackrabbit.oak.segment.azure.AzureHttpRequestLoggingPolicy HTTP
Request: GET
https://sa01394020shared0925dfef.blob.core.windows.net/aem-sgmt-fbe8b722af60268e1d58106abf6f4a4522c5d382-000004/aem%2Fdata00010a.tar%2F0000.78fabc53-2ced-4f0a-a7b1-d86b36bd9aee
200 9ms
....
03.11.2025 10:14:20.179 *INFO* [reactor-http-nio-3]
org.apache.jackrabbit.oak.segment.azure.AzureHttpRequestLoggingPolicy HTTP
Request: GET
https://sa01394020shared0925dfef.blob.core.windows.net/aem-sgmt-fbe8b722af60268e1d58106abf6f4a4522c5d382-000004/aem%2Fdata00010a.tar%2F0e84.ef629d76-d7c2-4d6a-a1ee-0f483c7c5256
200 4ms
03.11.2025 10:14:20.181 *INFO* [segmentstore-init-6]
org.apache.jackrabbit.oak.segment.azure.AzureArchiveManager Recovering segment
data00010a.tar/0000.78fabc53-2ced-4f0a-a7b1-d86b36bd9aee
...
(recover blobs)
03.11.2025 10:14:20.263 *INFO* [segmentstore-init-6]
org.apache.jackrabbit.oak.segment.azure.AzureArchiveManager Recovering segment
data00010a.tar/0e84.ef629d76-d7c2-4d6a-a1ee-0f483c7c5256
...
(copy blobs to bak)
03.11.2025 10:14:23.804 *INFO* [segmentstore-init-6]
org.apache.jackrabbit.oak.segment.azure.AzureArchiveManager Start copy 3717
blobs to aem/data00010a.tar.29.bak/
03.11.2025 10:14:23.829 *INFO* [reactor-http-nio-1]
org.apache.jackrabbit.oak.segment.azure.AzureHttpRequestLoggingPolicy HTTP
Request: PUT
https://sa01394020shared0925dfef.blob.core.windows.net/aem-sgmt-fbe8b722af60268e1d58106abf6f4a4522c5d382-000004/aem%2Fdata00010a.tar.29.bak%2F0000.78fabc53-2ced-4f0a-a7b1-d86b36bd9aee
202 11ms
....
03.11.2025 10:15:25.500 *INFO* [reactor-http-nio-3]
org.apache.jackrabbit.oak.segment.azure.AzureHttpRequestLoggingPolicy HTTP
Request: HEAD
https://sa01394020shared0925dfef.blob.core.windows.net/aem-sgmt-fbe8b722af60268e1d58106abf6f4a4522c5d382-000004/aem%2Fdata00010a.tar.29.bak%2F0e84.ef629d76-d7c2-4d6a-a1ee-0f483c7c5256
200 3ms
...
1 min for 3717 blobs
{code}
> Optimize the oak-segment recovery process
> -----------------------------------------
>
> Key: OAK-11991
> URL: https://issues.apache.org/jira/browse/OAK-11991
> Project: Jackrabbit Oak
> Issue Type: Task
> Components: segment-azure, segment-tar
> Reporter: Ieran Draghiciu
> Priority: Major
>
> Tar archives with many segment files (more then 10.000) takes to much to
> recover. Investigate and implement solution to optimize this process.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)