[
https://issues.apache.org/jira/browse/OAK-12070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18053932#comment-18053932
]
Julian Sedding edited comment on OAK-12070 at 1/23/26 1:59 PM:
---------------------------------------------------------------
After the changes from my PR, memory usage per segment is reduced to
* one RemoteSegmentArchiveEntry, consuming 32 bytes each
* one UUID, consuming 32 bytes each
I.e. 64 bytes per segment, or a reduction by ~58%.
{noformat}
num #instances #bytes class name (module)
-------------------------------------------------------
1: 3360428 924666248 [B ([email protected])
2: 5853351 187307232 java.util.UUID ([email protected])
3: 5796941 185502112
org.apache.jackrabbit.oak.segment.remote.RemoteSegmentArchiveEntry
4: 69866 98120280 [Ljava.lang.Object; ([email protected])
5: 68400 82323424 Ljdk.internal.vm.FillerArray;
([email protected])
6: 2299470 55187280 java.lang.String ([email protected])
7: 50895 39727792 [J ([email protected])
8: 827032 39697536
org.apache.jackrabbit.oak.cache.CacheLIRS$Entry
9: 825904 33036160
org.apache.jackrabbit.oak.segment.ReaderCache$CacheKey
10: 81 23069968
[Lorg.apache.jackrabbit.oak.cache.CacheLIRS$Entry;
...
442: 875 28000 java.util.ImmutableCollections$MapN
([email protected])
...
524: 485 11640
org.apache.jackrabbit.oak.segment.file.tar.GCGeneration
...
{noformat}
Granted, there are a few other "new" objects in memory, but their contribution
is not directly linked to the number of segments. E.g. {{UUID}} and
{{RemoteSegmentArchiveEntry}} instances are now stored in a
{{java.util.ImmutableCollections$MapN}}, which holds its data in an
{{Object[]}}. This replaces the {{java.util.LinkedHashMap$Entry}} instances.
The above stats indicate an overhead for {{Object[]}} per segment of ~17 bytes,
assuming all such arrays are used for this purpose.
The reduction of size of {{RemoteSegmentArchiveEntry}} is due to two changes.
Firstly, the {{msb}} and {{lsb}} fields are folded into a {{UUID}} reference,
which references the same {{UUID}} instance used as the key in the mapping.
Secondly, all {{GCGeneration}} related fields (2x int, 1x boolean}} are folded
into a reference to a {{GCGeneration}} instance. Due to the assumption that
there are far less distinct {{GCGeneration}} permutations around than segments,
{{GCGeneration}} instances are pooled (and thus deduplicated in-memory).
Factoring all of these objects in, the memory per segment is around 81 bytes
((187307232+185502112+98120280+28000+11640)/5796941). Still a reduction by at
least 46%.
Or in total terms, a reduction of heap space from 850MB to 449MB for this
particular segmentstore.
was (Author: jsedding):
After the changes from my PR, memory usage per segment is reduced to
* one RemoteSegmentArchiveEntry, consuming 32 bytes each
* one UUID, consuming 32 bytes each
I.e. 64 bytes per segment, or a reduction by ~58%.
{noformat}
num #instances #bytes class name (module)
-------------------------------------------------------
1: 3360428 924666248 [B ([email protected])
2: 5853351 187307232 java.util.UUID ([email protected])
3: 5796941 185502112
org.apache.jackrabbit.oak.segment.remote.RemoteSegmentArchiveEntry
4: 69866 98120280 [Ljava.lang.Object; ([email protected])
5: 68400 82323424 Ljdk.internal.vm.FillerArray;
([email protected])
6: 2299470 55187280 java.lang.String ([email protected])
7: 50895 39727792 [J ([email protected])
8: 827032 39697536
org.apache.jackrabbit.oak.cache.CacheLIRS$Entry
9: 825904 33036160
org.apache.jackrabbit.oak.segment.ReaderCache$CacheKey
10: 81 23069968
[Lorg.apache.jackrabbit.oak.cache.CacheLIRS$Entry;
...
442: 875 28000 java.util.ImmutableCollections$MapN
([email protected])
...
524: 485 11640
org.apache.jackrabbit.oak.segment.file.tar.GCGeneration
...
{noformat}
Granted, there are a few other "new" objects in memory, but their contribution
is not directly linked to the number of segments. E.g. {{UUID}} and
{{RemoteSegmentArchiveEntry}} instances are now stored in a
{{java.util.ImmutableCollections$MapN}}, which holds its data in an
{{Object[]}}. This replaces the {{java.util.LinkedHashMap$Entry}} instances.
The above stats indicate an overhead for {{Object[]}} per segment of ~17 bytes,
assuming all such arrays are used for this purpose.
The reduction of size of {{RemoteSegmentArchiveEntry}} is due to two changes.
Firstly, the {{msb}} and {{lsb}} fields are folded into a {{UUID}} reference,
which references the same {{UUID}} instance used as the key in the mapping.
Secondly, all {{GCGeneration}} related fields (2x int, 1x boolean}} are folded
into a reference to a {{GCGeneration}} instance. Due to the assumption that
there are far less distinct {{GCGeneration}} permutations around than segments,
{{GCGeneration}} instances are pooled (and thus deduplicated in-memory).
Factoring all of these objects in, the memory per segment is around 81 bytes
((187307232+185502112+98120280+28000+11640)/5796941). Still a reduction by at
least 46%.
Or in total terms, a reduction of heap space from 850MB to 449MB for this
particular segmentstore.
> Reduce memory consumption of azure segment stores
> -------------------------------------------------
>
> Key: OAK-12070
> URL: https://issues.apache.org/jira/browse/OAK-12070
> Project: Jackrabbit Oak
> Issue Type: Improvement
> Components: segment-azure
> Reporter: Julian Sedding
> Assignee: Julian Sedding
> Priority: Major
>
> An Azure segmentstore consumes quite a lot of memory. Here is an extreme
> example, of a store containing ~5.8mio entries.
> Per segment, there are
> * two UUID instances, consuming 32 bytes each
> * one RemoteSegmentArchiveEntry, consuming 48 bytes each
> * one java.util.LinkedHashMap$Entry, consuming 40 bytes each
> I.e. each segment occupies 152 bytes in-memory, leaving some potential for
> optimization.
> {noformat}
> num #instances #bytes class name (module)
> -------------------------------------------------------
> 1: 2212920 729326320 [B ([email protected])
> 2: 11911003 381152096 java.util.UUID ([email protected])
> 3: 5793352 278080896
> org.apache.jackrabbit.oak.segment.remote.RemoteSegmentArchiveEntry
> 4: 105964 247119568 Ljdk.internal.vm.FillerArray;
> ([email protected])
> 5: 5812476 232499040 java.util.LinkedHashMap$Entry
> ([email protected]){noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)