[ 
https://issues.apache.org/jira/browse/OAK-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16375128#comment-16375128
 ] 

Matt Ryan commented on OAK-7083:
--------------------------------

Let me try to describe in more detail how GC works with the 
{{CompositeDataStore}} without the change to {{MarkSweepGarbageCollector}}.

Let's start with the Primary repo.  It has three nodes in the repo, A1-A3. It 
is connected to data store DS_P which has blobs A1_B-A3-B and a repository file 
in it.
{noformat}
P
  - A1
  - A2
  - A3

DS_P
  - repository-P
  - A1_B
  - A2_B
  - A3_B
{noformat}
At this point we create the Secondary repo by cloning the Primary. It has a 
copy of the Primary node store and refers directly to data store DS_P, which 
now has another repository file in it. Secondary also has another data store, 
DS_S, which is empty other than the repository file for the Secondary repo.
 Note that the {{CompositeDataStore}} writes the metadata files to EVERY 
delegate data store, so this way the repository file appears in every data 
store.
{noformat}
P                                        S
  - A1                                     - A1
  - A2                                     - A2
  - A3                                     - A3

DS_P                                     DS_S
  - repository-P                           - repository-S
  - repository-S
  - A1_B
  - A2_B
  - A3_B
{noformat}
Suppose on Primary nodes A2 and A3 are deleted, and on Secondary nodes A1 and 
A3 are deleted. No GC has been run yet. So now we have this:
{noformat}
P                                       S
  - A1                                    - (A1-deleted)
  - (A2-deleted)                          - A2
  - (A3-deleted)                          - (A3-deleted)

DS_P                                    DS_S
  - repository-P                          - repository-S
  - repository-S
  - A1_B
  - A2_B
  - A3_B
{noformat}
Now on Primary nodes B4 and B5 are created and on Secondary nodes C6 and C7 are 
created.
{noformat}
P                                       S
  - A1                                    - (A1-deleted)
  - (A2-deleted)                          - A2
  - (A3-deleted)                          - (A3-deleted)
  - B4                                    - C6
  - B5                                    - C7

DS_P                                    DS_S
  - repository-P                          - repository-S
  - repository-S                          - C6_B
  - A1_B                                  - C7_B
  - A2_B
  - A3_B
  - B4_B
  - B5_B
{noformat}
Now node B5 is deleted from Primary and node C7 is deleted from Secondary. 
Still no GC has been run yet.
{noformat}
P                                       S
  - A1                                    - (A1-deleted)
  - (A2-deleted)                          - A2
  - (A3-deleted)                          - (A3-deleted)
  - B4                                    - C6
  - (B5-deleted)                          - (C7-deleted)

DS_P                                    DS_S
  - repository-P                          - repository-S
  - repository-S                          - C6_B
  - A1_B                                  - C7_B
  - A2_B
  - A3_B
  - B4_B
  - B5_B
{noformat}
Let's suppose GC runs and Primary does a mark first. When it is done it will 
create a references file with all of the references it knows about that should 
be kept.
{noformat}
P                                       S
  - A1                                    - (A1-deleted)
  - (A2-deleted)                          - A2
  - (A3-deleted)                          - (A3-deleted)
  - B4                                    - C6
  - (B5-deleted)                          - (C7-deleted)

DS_P                                    DS_S
  - repository-P                          - repository-S
  - repository-S                          - C6_B
  - references-P (A1, B4)                 - C7_B
  - A1_B
  - A2_B
  - A3_B
  - B4_B
  - B5_B
{noformat}
Note at this point Primary cannot complete a sweep because Secondary has not 
performed a mark phase yet. There is no references file in DS_P yet from 
Secondary.

Later Secondary performs the mark phase and it creates a references file with 
all of the references it knows about that should be kept. Notice the references 
file gets created in both DS_S and DS_P, since {{CompositeDataStore}} performs 
metadata writes to all delegate data stores.
{noformat}
P                                       S
  - A1                                    - (A1-deleted)
  - (A2-deleted)                          - A2
  - (A3-deleted)                          - (A3-deleted)
  - B4                                    - C6
  - (B5-deleted)                          - (C7-deleted)

DS_P                                    DS_S
  - repository-P                          - repository-S
  - repository-S                          - references-S (A2, C6)
  - references-P (A1, B4)                 - C6_B
  - references-S (A2, C6)                 - C7_B
  - A1_B
  - A2_B
  - A3_B
  - B4_B
  - B5_B
{noformat}
At some later time suppose the Primary attempts to run the sweep phase. Now it 
can do this because there is a references file for every repository file in 
DS_P.
 Primary will take all the blob references in all the references files, 
creating set (A1, A2, B4, C6). Then it will combine those with the blob 
references it sees locally, which is set (A1, B4), leaving (A1, A2, B4, C6) in 
its references file.
{noformat}
P                                       S
  - A1                                    - (A1-deleted)
  - (A2-deleted)                          - A2
  - (A3-deleted)                          - (A3-deleted)
  - B4                                    - C6
  - (B5-deleted)                          - (C7-deleted)

DS_P                                    DS_S
  - repository-P                          - repository-S
  - repository-S                          - references-S (A2, C6)
  - references-P (A1, A2, B4, C6)         - C6_B
  - references-S (A2, C6)                 - C7_B
  - A1_B
  - A2_B
  - A3_B
  - B4_B
  - B5_B
{noformat}
These references will then be matched against the Primary's available 
references, which is set (A1, A2, A3, B4, B5) (from DS_P). The items in the set 
of available references that are not in the Primary's references file will be 
those identified as GC candidates, which is set (A3, B5). So Primary will then 
only delete blobs A3_B and B5_B.
{noformat}
P                                       S
  - A1                                    - (A1-deleted)
  - (A2-deleted)                          - A2
  - (A3-deleted)                          - (A3-deleted)
  - B4                                    - C6
  - (B5-deleted)                          - (C7-deleted)

DS_P                                    DS_S
  - repository-P                          - repository-S
  - repository-S                          - references-S (A2, C6)
  - references-P (A1, A2, B4, C6)         - C6_B
  - references-S (A2, C6)                 - C7_B
  - A1_B
  - A2_B
  - (A3_B-deleted)
  - B4_B
  - (B5_B-deleted)
{noformat}
At this point, if using the current garbage collector, Primary will then delete 
all of the references files in DS_P.
{noformat}
P                                       S
  - A1                                    - (A1-deleted)
  - (A2-deleted)                          - A2
  - (A3-deleted)                          - (A3-deleted)
  - B4                                    - C6
  - (B5-deleted)                          - (C7-deleted)

DS_P                                    DS_S
  - repository-P                          - repository-S
  - repository-S                          - references-S (A2, C6)
  - A1_B                                  - C6_B
  - A2_B                                  - C7_B
  - (A3_B-deleted)
  - B4_B
  - (B5_B-deleted)
{noformat}
Now if Secondary attempts to sweep it will fail. The reason it fails is this: 
From the point of view of the {{CompositeDataStore}}, there is a mismatch 
between the number of repository files and the number of references files. So 
the sweep will be canceled before any further action is taken.
 The {{getMetadataFiles() method of CompositeDataStore}} reads files from all 
the delegates, but returns unique entries only. But still it sees two 
repository files - repository-P and repository-S - but only one references 
file. So no GC is performed and blob C7_B remains uncollected.

It is important to note here that the order of which repository sweeps first is 
not important. If Secondary had swept first instead of primary, the same 
problem would remain - 2 repository files in DS_P but no references files - and 
in that case blob B5_B remains uncollected.

With the change I made to the {{MarkSweepGarbageCollector}}, after Primary 
finishes the sweep it does not remove the references files, but instead first 
adds a sweepComplete marker. References files can only be collected after a 
sweep is performed by every repository. So here's how things look after Primary 
finishes sweeping:
{noformat}
P                                       S
  - A1                                    - (A1-deleted)
  - (A2-deleted)                          - A2
  - (A3-deleted)                          - (A3-deleted)
  - B4                                    - C6
  - (B5-deleted)                          - (C7-deleted)

DS_P                                    DS_S
  - repository-P                          - repository-S
  - repository-S                          - references-S (A2, C6)
  - references-P (A2, A2, B4, C6)         - C6_B
  - references-S (A2, C6)                 - C7_B
  - sweepComplete-P
  - A1_B
  - A2_B
  - (A3_B-deleted)
  - B4_B
  - (B5_B-deleted)
{noformat}
Now when Secondary attempts to sweep, it will see two repository files - 
repository-P and repository-S - and two references files - references-P and 
references-S. So it can proceed with the sweep phase.
 Secondary combines the references into the set (A1, A2, B4, C6). These are 
then combined with with the available references it counted from DS_S and DS_P, 
which is set (A1, A2, B4, C6, C7), resulting in GC candidates set (C7). Thus it 
deletes C7_B.
{noformat}
P                                       S
  - A1                                    - (A1-deleted)
  - (A2-deleted)                          - A2
  - (A3-deleted)                          - (A3-deleted)
  - B4                                    - C6
  - (B5-deleted)                          - (C7-deleted)

DS_P                                    DS_S
  - repository-P                          - repository-S
  - repository-S                          - references-S (A1, A2, B4, C6)
  - references-P (A1, A2, B4, C6)         - C6_B
  - references-S (A1, A2, B4, C6)         - (C7_B-deleted)
  - sweepComplete-P
  - A1_B
  - A2_B
  - (A3_B-deleted)
  - B4_B
  - (B5_B-deleted)
{noformat}
Secondary then adds a sweepComplete marker.
{noformat}
P                                       S
  - A1                                    - (A1-deleted)
  - (A2-deleted)                          - A2
  - (A3-deleted)                          - (A3-deleted)
  - B4                                    - C6
  - (B5-deleted)                          - (C7-deleted)

DS_P                                    DS_S
  - repository-P                          - repository-S
  - repository-S                          - references-S (A1, A2, B4, C6)
  - references-P (A1, A2, B4, C6)         - C6_B
  - references-S (A1, A2, B4, C6)         - (C7_B-deleted)
  - sweepComplete-P
  - sweepComplete-S
  - A1_B
  - A2_B
  - (A3_B-deleted)
  - B4_B
  - (B5_B-deleted)
{noformat}
Then seeing a sweepComplete marker for every repository, Secondary knows that 
all the attached repositories have completed the sweep and so it can clean up 
all the references files (and the sweepComplete markers also).
{noformat}
P                                       S
  - A1                                    - (A1-deleted)
  - (A2-deleted)                          - A2
  - (A3-deleted)                          - (A3-deleted)
  - B4                                    - C6
  - (B5-deleted)                          - (C7-deleted)

DS_P                                    DS_S
  - repository-P                          - repository-S
  - repository-S                          - C6_B
  - A1_B                                  - (C7_B-deleted)
  - A2_B
  - (A3_B-deleted)
  - B4_B
  - (B5_B-deleted)
{noformat}
This leads to a consistent result. A1_B is not deleted, despite A1 being 
deleted from Secondary, because it is still referenced by Primary. A2_B is not 
deleted, despite A2 being deleted from Primary, because it is still referenced 
by Secondary. A3_B is deleted since A3 was deleted by both. B5_B is deleted 
since B5 was deleted from Primary and Secondary didn't know about B5. C7_B is 
deleted since C7 was deleted from Secondary and Primary didn't know about C7.

The DSGC test in 
[CompositeDataStoreRORWIT.java|https://github.com/mattvryan/jackrabbit-oak/blob/0164c26db470ffdc9cf80c704f9cba20f4f181e0/oak-blob-composite/src/test/java/org/apache/jackrabbit/oak/blob/composite/CompositeDataStoreRORWIT.java]
 goes into more detail. Please take a look at that test if you have not 
already.  I believe it covers all the cases and explains everything in the 
comments.

> CompositeDataStore - ReadOnly/ReadWrite Delegate Support
> --------------------------------------------------------
>
>                 Key: OAK-7083
>                 URL: https://issues.apache.org/jira/browse/OAK-7083
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>            Reporter: Matt Ryan
>            Assignee: Matt Ryan
>            Priority: Major
>
> Support a specific composite data store use case, which is the following:
> * One instance uses no composite data store, but instead is using a single 
> standard Oak data store (e.g. FileDataStore)
> * Another instance is created by snapshotting the first instance node store, 
> and then uses a composite data store to refer to the first instance's data 
> store read-only, and refers to a second data store as a writable data store
> One way this can be used is in creating a test or staging instance from a 
> production instance.  At creation, the test instance will look like 
> production, but any changes made to the test instance do not affect 
> production.  The test instance can be quickly created from production by 
> cloning only the node store, and not requiring a copy of all the data in the 
> data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to