[
https://issues.apache.org/jira/browse/OAK-11991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18035037#comment-18035037
]
Ieran Draghiciu edited comment on OAK-11991 at 11/3/25 12:40 PM:
-----------------------------------------------------------------
With [PR|https://github.com/apache/jackrabbit-oak/pull/2604] I introduced
copying blobs in batches (1000 blob/batch):
- split the blobs in batches (each batch has 1000 blobs, last blobs will be
handled in separat batch)
- start copy each batch
- start to check the copy status. the advantage here is that by the time we
check the first blob it should already be copied and not wait for blob to be
copied.
Also, batch approach works faster because of Azure optimization:
- Azure Storage is heavily optimized for concurrent PUTs (write-heavy
patterns). Azure’s internal server-side copy pipeline
- The service can queue and parallelize these internal copy operations
efficiently.
- But when you interleave PUTs with HEADs (reads), Azure’s load balancers and
caching layers can’t optimize as effectively — you’re forcing read-after-write
consistency checks repeatedly.
Azure is more efficient when doing “pure” PUT operations (writes) or pure HEAD
operations (reads) in bulk, rather than interleaving them.
I did testing on an env with 3717 and the copy was done in ~1min2sec.
{code:java}
(recovery process starts)
03.11.2025 10:13:58.251 *WARN* [segmentstore-init-6]
org.apache.jackrabbit.oak.segment.file.tar.TarReader Could not find a valid tar
index in [data00010a.tar], recovering...
03.11.2025 10:13:58.252 *INFO* [segmentstore-init-6]
org.apache.jackrabbit.oak.segment.file.tar.TarReader Recovering segments from
tar file data00010a.tar
(lists all blobs)
...
03.11.2025 10:13:58.822 *INFO* [reactor-http-nio-2]
org.apache.jackrabbit.oak.segment.azure.AzureHttpRequestLoggingPolicy HTTP
Request: GET
https://sa01394020shared0925dfef.blob.core.windows.net/aem-sgmt-fbe8b722af60268e1d58106abf6f4a4522c5d382-000004/aem%2Fdata00010a.tar%2F0000.78fabc53-2ced-4f0a-a7b1-d86b36bd9aee
200 9ms
....
03.11.2025 10:14:20.179 *INFO* [reactor-http-nio-3]
org.apache.jackrabbit.oak.segment.azure.AzureHttpRequestLoggingPolicy HTTP
Request: GET
https://sa01394020shared0925dfef.blob.core.windows.net/aem-sgmt-fbe8b722af60268e1d58106abf6f4a4522c5d382-000004/aem%2Fdata00010a.tar%2F0e84.ef629d76-d7c2-4d6a-a1ee-0f483c7c5256
200 4ms
03.11.2025 10:14:20.181 *INFO* [segmentstore-init-6]
org.apache.jackrabbit.oak.segment.azure.AzureArchiveManager Recovering segment
data00010a.tar/0000.78fabc53-2ced-4f0a-a7b1-d86b36bd9aee
...
(recover blobs)
03.11.2025 10:14:20.263 *INFO* [segmentstore-init-6]
org.apache.jackrabbit.oak.segment.azure.AzureArchiveManager Recovering segment
data00010a.tar/0e84.ef629d76-d7c2-4d6a-a1ee-0f483c7c5256
...
(copy blobs to bak)
03.11.2025 10:14:23.804 *INFO* [segmentstore-init-6]
org.apache.jackrabbit.oak.segment.azure.AzureArchiveManager Start copy 3717
blobs to aem/data00010a.tar.29.bak/
03.11.2025 10:14:23.829 *INFO* [reactor-http-nio-1]
org.apache.jackrabbit.oak.segment.azure.AzureHttpRequestLoggingPolicy HTTP
Request: PUT
https://sa01394020shared0925dfef.blob.core.windows.net/aem-sgmt-fbe8b722af60268e1d58106abf6f4a4522c5d382-000004/aem%2Fdata00010a.tar.29.bak%2F0000.78fabc53-2ced-4f0a-a7b1-d86b36bd9aee
202 11ms
....
03.11.2025 10:15:25.500 *INFO* [reactor-http-nio-3]
org.apache.jackrabbit.oak.segment.azure.AzureHttpRequestLoggingPolicy HTTP
Request: HEAD
https://sa01394020shared0925dfef.blob.core.windows.net/aem-sgmt-fbe8b722af60268e1d58106abf6f4a4522c5d382-000004/aem%2Fdata00010a.tar.29.bak%2F0e84.ef629d76-d7c2-4d6a-a1ee-0f483c7c5256
200 3ms
...
1 min for 3717 blobs
{code}
was (Author: ierandra):
With [PR|https://github.com/apache/jackrabbit-oak/pull/2604] I introduced
copying blobs in batches (1000 blob/batch):
- split the blobs in batches (each batch has 1000 blobs, last blobs will be
handled in separat batch)
- start copy each batch
- start to check the copy status. the advantage here is that by the time we
check the first blob it should already be copied and not wait for blob to be
copied.
I did testing on an env with 3717 and the copy was done in ~1min2sec.
{code:java}
(recovery process starts)
03.11.2025 10:13:58.251 *WARN* [segmentstore-init-6]
org.apache.jackrabbit.oak.segment.file.tar.TarReader Could not find a valid tar
index in [data00010a.tar], recovering...
03.11.2025 10:13:58.252 *INFO* [segmentstore-init-6]
org.apache.jackrabbit.oak.segment.file.tar.TarReader Recovering segments from
tar file data00010a.tar
(lists all blobs)
...
03.11.2025 10:13:58.822 *INFO* [reactor-http-nio-2]
org.apache.jackrabbit.oak.segment.azure.AzureHttpRequestLoggingPolicy HTTP
Request: GET
https://sa01394020shared0925dfef.blob.core.windows.net/aem-sgmt-fbe8b722af60268e1d58106abf6f4a4522c5d382-000004/aem%2Fdata00010a.tar%2F0000.78fabc53-2ced-4f0a-a7b1-d86b36bd9aee
200 9ms
....
03.11.2025 10:14:20.179 *INFO* [reactor-http-nio-3]
org.apache.jackrabbit.oak.segment.azure.AzureHttpRequestLoggingPolicy HTTP
Request: GET
https://sa01394020shared0925dfef.blob.core.windows.net/aem-sgmt-fbe8b722af60268e1d58106abf6f4a4522c5d382-000004/aem%2Fdata00010a.tar%2F0e84.ef629d76-d7c2-4d6a-a1ee-0f483c7c5256
200 4ms
03.11.2025 10:14:20.181 *INFO* [segmentstore-init-6]
org.apache.jackrabbit.oak.segment.azure.AzureArchiveManager Recovering segment
data00010a.tar/0000.78fabc53-2ced-4f0a-a7b1-d86b36bd9aee
...
(recover blobs)
03.11.2025 10:14:20.263 *INFO* [segmentstore-init-6]
org.apache.jackrabbit.oak.segment.azure.AzureArchiveManager Recovering segment
data00010a.tar/0e84.ef629d76-d7c2-4d6a-a1ee-0f483c7c5256
...
(copy blobs to bak)
03.11.2025 10:14:23.804 *INFO* [segmentstore-init-6]
org.apache.jackrabbit.oak.segment.azure.AzureArchiveManager Start copy 3717
blobs to aem/data00010a.tar.29.bak/
03.11.2025 10:14:23.829 *INFO* [reactor-http-nio-1]
org.apache.jackrabbit.oak.segment.azure.AzureHttpRequestLoggingPolicy HTTP
Request: PUT
https://sa01394020shared0925dfef.blob.core.windows.net/aem-sgmt-fbe8b722af60268e1d58106abf6f4a4522c5d382-000004/aem%2Fdata00010a.tar.29.bak%2F0000.78fabc53-2ced-4f0a-a7b1-d86b36bd9aee
202 11ms
....
03.11.2025 10:15:25.500 *INFO* [reactor-http-nio-3]
org.apache.jackrabbit.oak.segment.azure.AzureHttpRequestLoggingPolicy HTTP
Request: HEAD
https://sa01394020shared0925dfef.blob.core.windows.net/aem-sgmt-fbe8b722af60268e1d58106abf6f4a4522c5d382-000004/aem%2Fdata00010a.tar.29.bak%2F0e84.ef629d76-d7c2-4d6a-a1ee-0f483c7c5256
200 3ms
...
1 min for 3717 blobs
{code}
> Optimize the oak-segment recovery process
> -----------------------------------------
>
> Key: OAK-11991
> URL: https://issues.apache.org/jira/browse/OAK-11991
> Project: Jackrabbit Oak
> Issue Type: Task
> Components: segment-azure, segment-tar
> Reporter: Ieran Draghiciu
> Priority: Major
>
> Tar archives with many segment files (more then 10.000) takes to much to
> recover. Investigate and implement solution to optimize this process.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)