[ 
https://issues.apache.org/jira/browse/OAK-12070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18053932#comment-18053932
 ] 

Julian Sedding edited comment on OAK-12070 at 1/23/26 1:59 PM:
---------------------------------------------------------------

After the changes from my PR, memory usage per segment is reduced to
 * one RemoteSegmentArchiveEntry, consuming 32 bytes each
 * one UUID, consuming 32 bytes each

I.e. 64 bytes per segment, or a reduction by ~58%.
{noformat}
 num     #instances         #bytes  class name (module)
-------------------------------------------------------
   1:       3360428      924666248  [B ([email protected])
   2:       5853351      187307232  java.util.UUID ([email protected])
   3:       5796941      185502112  
org.apache.jackrabbit.oak.segment.remote.RemoteSegmentArchiveEntry
   4:         69866       98120280  [Ljava.lang.Object; ([email protected])
   5:         68400       82323424  Ljdk.internal.vm.FillerArray; 
([email protected])
   6:       2299470       55187280  java.lang.String ([email protected])
   7:         50895       39727792  [J ([email protected])
   8:        827032       39697536  
org.apache.jackrabbit.oak.cache.CacheLIRS$Entry
   9:        825904       33036160  
org.apache.jackrabbit.oak.segment.ReaderCache$CacheKey
  10:            81       23069968  
[Lorg.apache.jackrabbit.oak.cache.CacheLIRS$Entry;
...
 442:           875          28000  java.util.ImmutableCollections$MapN 
([email protected])
...
 524:           485          11640  
org.apache.jackrabbit.oak.segment.file.tar.GCGeneration
...
{noformat}
 

Granted, there are a few other "new" objects in memory, but their contribution 
is not directly linked to the number of segments. E.g. {{UUID}} and 
{{RemoteSegmentArchiveEntry}} instances are now stored in a 
{{java.util.ImmutableCollections$MapN}}, which holds its data in an 
{{Object[]}}. This replaces the {{java.util.LinkedHashMap$Entry}} instances. 
The above stats indicate an overhead for {{Object[]}} per segment of ~17 bytes, 
assuming all such arrays are used for this purpose.

The reduction of size of {{RemoteSegmentArchiveEntry}} is due to two changes. 
Firstly, the {{msb}} and {{lsb}} fields are folded into a {{UUID}} reference, 
which references the same {{UUID}} instance used as the key in the mapping. 
Secondly, all {{GCGeneration}} related fields (2x int, 1x boolean}} are folded 
into a reference to a {{GCGeneration}} instance. Due to the assumption that 
there are far less distinct {{GCGeneration}} permutations around than segments, 
{{GCGeneration}} instances are pooled (and thus deduplicated in-memory).

Factoring all of these objects in, the memory per segment is around 81 bytes 
((187307232+185502112+98120280+28000+11640)/5796941). Still a reduction by at 
least 46%.

Or in total terms, a reduction of heap space from 850MB to 449MB for this 
particular segmentstore.




was (Author: jsedding):
After the changes from my PR, memory usage per segment is reduced to
 * one RemoteSegmentArchiveEntry, consuming 32 bytes each
 * one UUID, consuming 32 bytes each

I.e. 64 bytes per segment, or a reduction by ~58%.
{noformat}
 num     #instances         #bytes  class name (module)
-------------------------------------------------------
   1:       3360428      924666248  [B ([email protected])
   2:       5853351      187307232  java.util.UUID ([email protected])
   3:       5796941      185502112  
org.apache.jackrabbit.oak.segment.remote.RemoteSegmentArchiveEntry
   4:         69866       98120280  [Ljava.lang.Object; ([email protected])
   5:         68400       82323424  Ljdk.internal.vm.FillerArray; 
([email protected])
   6:       2299470       55187280  java.lang.String ([email protected])
   7:         50895       39727792  [J ([email protected])
   8:        827032       39697536  
org.apache.jackrabbit.oak.cache.CacheLIRS$Entry
   9:        825904       33036160  
org.apache.jackrabbit.oak.segment.ReaderCache$CacheKey
  10:            81       23069968  
[Lorg.apache.jackrabbit.oak.cache.CacheLIRS$Entry;
...
442:           875          28000  java.util.ImmutableCollections$MapN 
([email protected])
...
 524:           485          11640  
org.apache.jackrabbit.oak.segment.file.tar.GCGeneration
...
{noformat}
 

Granted, there are a few other "new" objects in memory, but their contribution 
is not directly linked to the number of segments. E.g. {{UUID}} and 
{{RemoteSegmentArchiveEntry}} instances are now stored in a 
{{java.util.ImmutableCollections$MapN}}, which holds its data in an 
{{Object[]}}. This replaces the {{java.util.LinkedHashMap$Entry}} instances. 
The above stats indicate an overhead for {{Object[]}} per segment of ~17 bytes, 
assuming all such arrays are used for this purpose.

The reduction of size of {{RemoteSegmentArchiveEntry}} is due to two changes. 
Firstly, the {{msb}} and {{lsb}} fields are folded into a {{UUID}} reference, 
which references the same {{UUID}} instance used as the key in the mapping. 
Secondly, all {{GCGeneration}} related fields (2x int, 1x boolean}} are folded 
into a reference to a {{GCGeneration}} instance. Due to the assumption that 
there are far less distinct {{GCGeneration}} permutations around than segments, 
{{GCGeneration}} instances are pooled (and thus deduplicated in-memory).

Factoring all of these objects in, the memory per segment is around 81 bytes 
((187307232+185502112+98120280+28000+11640)/5796941). Still a reduction by at 
least 46%.

Or in total terms, a reduction of heap space from 850MB to 449MB for this 
particular segmentstore.



> Reduce memory consumption of azure segment stores
> -------------------------------------------------
>
>                 Key: OAK-12070
>                 URL: https://issues.apache.org/jira/browse/OAK-12070
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: segment-azure
>            Reporter: Julian Sedding
>            Assignee: Julian Sedding
>            Priority: Major
>
> An Azure segmentstore consumes quite a lot of memory. Here is an extreme 
> example, of a store containing ~5.8mio entries.
> Per segment, there are
>  * two UUID instances, consuming 32 bytes each
>  * one RemoteSegmentArchiveEntry, consuming 48 bytes each
>  * one java.util.LinkedHashMap$Entry, consuming 40 bytes each
> I.e. each segment occupies 152 bytes in-memory, leaving some potential for 
> optimization.
> {noformat}
> num     #instances         #bytes  class name (module)
> -------------------------------------------------------
>    1:       2212920      729326320  [B ([email protected])
>    2:      11911003      381152096  java.util.UUID ([email protected])
>    3:       5793352      278080896  
> org.apache.jackrabbit.oak.segment.remote.RemoteSegmentArchiveEntry
>    4:        105964      247119568  Ljdk.internal.vm.FillerArray; 
> ([email protected])
>    5:       5812476      232499040  java.util.LinkedHashMap$Entry 
> ([email protected]){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to