[ 
https://issues.apache.org/jira/browse/OAK-7083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373250#comment-16373250
 ] 

Matt Ryan edited comment on OAK-7083 at 2/22/18 9:22 PM:
---------------------------------------------------------

(From [~amjain] via oak-dev)
{quote} bq. The solution for {{SharedDataStore}} currently is to require all 
repositories to run a Mark phase then run the Sweep phase on one of them.
Yes. Sorry, I didn’t mention that. I was trying to be brief and ended up being 
unclear. In the situation I described above it is definitely running the mark 
phase first and then the sweep phase. The problem is still as I described - no 
matter which one runs sweep first, it cannot delete all the binaries that may 
possibly have been deleted on both systems.
{quote}
The problem is because that's how the systems are set up. For this particular 
problem on the Secondary there is no reason to even account for the Primary's 
datastore as it should not and cannot delete anything in there.
{quote} bq. Besides there's a problem of the Sweep phase on the primary 
encountering blobs it does not know about (from the secondary) and which it 
cannot delete creating an unpleasant experience. As I understand the Primary 
could be a production system and having these sort of errors crop up would be 
problematic.
If they are regarded as errors, yes. Currently this logs a WARN level message 
(not an ERROR) which suggests that sometimes not all the binaries targeted for 
deletion will actually be deleted.
 So this might be an issue of setting clear expectations. But I do see the 
point.
{quote}
Yes these are logged as WARN as these are not fatal and empirically these are 
problematic and is questioned by customers. But apart from that there is a 
performance impact also as each binary is attempted for deletion which incurs a 
penalty.
{quote} bq. Encode the blobs ids on the Secondary with the {{DataStore}} 
location/type with which we can distinguish the blob ids belonging to the 
respective {{DataStore}}s.
That’s a solution that only works in this very specific use case of 
{{CompositeDataStore}}. In the future if we were ever to want to support 
different scenarios we would then have to reconsider how it encodes blobs for 
each delegate. Would that mean that data written to a data store by the 
{{CompositeDataStore}} could not be read by another {{CompositeDataStore}} 
referencing the same delegate?
{quote}
But encoding of blob ids is needed anyways irrespective of the GC no? 
Otherwise, how does the {{CompositeDataStore}} redirect the calls to CRUD on 
the respective DSs? And did not understand how encoding the blob id with 
information about the DS preclude it from reading. It has to have the same 
semantics for the same delegate. But yes it does preclude moving the blobs from 
one subspace to another. But I don't think that's the use case anyways.
{quote} bq. Secondary's Mark phase only redirects the Primary owned blobids to 
the references file in the Primary's {{DataStore}} (Primary's DataStore 
operating as Shared).
The {{DataStore}} has no knowledge of the garbage collection stages. So IIUC 
this would require creating a new garbage collector that is aware of composite 
data stores and has the ability to interact with the {{CompositeDataStore}} in 
a tightly coupled fashion. Either that or we would have to enhance the data 
store API (for example, add a new interface or extend an interface so it can be 
precisely controlled by the garbage collector). Or both.
{quote}
{{DataStore}} does not have knowledge when GC is taking place. But it does have 
helper methods which are used by GC. Yes I would think that the methods 
currently existing for purpose of GC need to be enhanced and the Composite 
would have some intelligence on execution of some methods for e.g. delete and 
the metadata methods with some information about the delegates.
{quote} bq. Secondary executes GC for its {{DataStore}} independently and does 
not worry about the Shared blobids (already taken care of above).
Same issue - GC happens outside of the control of the {{DataStore}}.

It’s a good idea Amit - something I struggled with quite a while. I considered 
the same approach as well. But it tightly binds garbage collection to the data 
store, whereas now they are currently very loosely bound. GC leverages the 
{{DataStore}} APIs to do GC tasks (like reading and writing metadata files) but 
the {{DataStore}} doesn’t have any knowledge that GC is even happening.

So i don’t see how the {{CompositeDataStore}} could control execution of GC 
only on the independent data store.
{quote}
It does not control execution of the GC but it does control the GC helper 
methods and uses info already available with it for the delegates. Also, we 
could simply have GC instances bound to each delegate {{DataStore}}. This also 
would be similar to a case where we use the {{CompositeDataStore}} for 
internally creating a separate lucene DataStore.
{quote}Furthermore, future uses of {{CompositeDataStore}} might not be so 
clear-cut. A {{CompositeDataStore}} might have 5 delegates, some of which are 
shared, some are not, some are read-only, some are not. How would it know which 
ones to GC independently and which ones to do shared?
{quote}
Shared is sort of automatic setup currently on startup based on a unique 
clusterid/repositoryid. For Shared once it sees different repository ids 
registered it only proceeds with Mark phase once all references from all 
registered repositories are available.
{quote}I think it is better to leave the GC logic where it is and let the 
{{DataStore}} (and the {{CompositeDataStore}}) remain unaware of GC logic, if 
possible.
 I’m confident the solution I proposed works correctly, in testing. I 
understand there are undesirable consequences. I also get the point you made, a 
very good one, which is that this is unlikely to work well in the real world 
due to how production systems function.
{quote}
It might be Ok for this particular problem. But this does not work in the 
existing {{DataStore}} setups as I outlined previously.
{quote}What else could we do to address this?
{quote}
I think we need a better solution on this. But we could do either below to 
proceed currently:
 * Add this sweep state change thing only in case of {{CompositeDataStore}}, Or
 * For each repository
 ** Execute Mark on all repositories/ Sweep


was (Author: mattvryan):
(From [~amjain] via oak-dev)
{quote}{quote}The solution for {{SharedDataStore}} currently is to require all 
repositories to run a Mark phase then run the Sweep phase on one of them.
{quote}
Yes. Sorry, I didn’t mention that. I was trying to be brief and ended up being 
unclear. In the situation I described above it is definitely running the mark 
phase first and then the sweep phase. The problem is still as I described - no 
matter which one runs sweep first, it cannot delete all the binaries that may 
possibly have been deleted on both systems.
{quote}
The problem is because that's how the systems are set up. For this particular 
problem on the Secondary there is no reason to even account for the Primary's 
datastore as it should not and cannot delete anything in there.
{quote}{quote}Besides there's a problem of the Sweep phase on the primary 
encountering blobs it does not know about (from the secondary) and which it 
cannot delete creating an unpleasant experience. As I understand the Primary 
could be a production system and having these sort of errors crop up would be 
problematic.
{quote}
If they are regarded as errors, yes. Currently this logs a WARN level message 
(not an ERROR) which suggests that sometimes not all the binaries targeted for 
deletion will actually be deleted.
 So this might be an issue of setting clear expectations. But I do see the 
point.
{quote}
Yes these are logged as WARN as these are not fatal and empirically these are 
problematic and is questioned by customers. But apart from that there is a 
performance impact also as each binary is attempted for deletion which incurs a 
penalty.
{quote}{quote}Encode the blobs ids on the Secondary with the {{DataStore}} 
location/type with which we can distinguish the blob ids belonging to the 
respective {{DataStore}}s.
{quote}
That’s a solution that only works in this very specific use case of 
{{CompositeDataStore}}. In the future if we were ever to want to support 
different scenarios we would then have to reconsider how it encodes blobs for 
each delegate. Would that mean that data written to a data store by the 
{{CompositeDataStore}} could not be read by another {{CompositeDataStore}} 
referencing the same delegate?
{quote}
But encoding of blob ids is needed anyways irrespective of the GC no? 
Otherwise, how does the {{CompositeDataStore}} redirect the calls to CRUD on 
the respective DSs? And did not understand how encoding the blob id with 
information about the DS preclude it from reading. It has to have the same 
semantics for the same delegate. But yes it does preclude moving the blobs from 
one subspace to another. But I don't think that's the use case anyways.
{quote}{quote}Secondary's Mark phase only redirects the Primary owned blobids 
to the references file in the Primary's {{DataStore}} (Primary's DataStore 
operating as Shared).
{quote}
The {{DataStore}} has no knowledge of the garbage collection stages. So IIUC 
this would require creating a new garbage collector that is aware of composite 
data stores and has the ability to interact with the {{CompositeDataStore}} in 
a tightly coupled fashion. Either that or we would have to enhance the data 
store API (for example, add a new interface or extend an interface so it can be 
precisely controlled by the garbage collector). Or both.
{quote}
{{DataStore}} does not have knowledge when GC is taking place. But it does have 
helper methods which are used by GC. Yes I would think that the methods 
currently existing for purpose of GC need to be enhanced and the Composite 
would have some intelligence on execution of some methods for e.g. delete and 
the metadata methods with some information about the delegates.
{quote}{quote}Secondary executes GC for its {{DataStore}} independently and 
does not worry about the Shared blobids (already taken care of above).
{quote}
Same issue - GC happens outside of the control of the {{DataStore}}.

It’s a good idea Amit - something I struggled with quite a while. I considered 
the same approach as well. But it tightly binds garbage collection to the data 
store, whereas now they are currently very loosely bound. GC leverages the 
{{DataStore}} APIs to do GC tasks (like reading and writing metadata files) but 
the {{DataStore}} doesn’t have any knowledge that GC is even happening.

So i don’t see how the {{CompositeDataStore}} could control execution of GC 
only on the independent data store.
{quote}
It does not control execution of the GC but it does control the GC helper 
methods and uses info already available with it for the delegates. Also, we 
could simply have GC instances bound to each delegate {{DataStore}}. This also 
would be similar to a case where we use the {{CompositeDataStore}} for 
internally creating a separate lucene DataStore.
{quote}Furthermore, future uses of {{CompositeDataStore}} might not be so 
clear-cut. A {{CompositeDataStore}} might have 5 delegates, some of which are 
shared, some are not, some are read-only, some are not. How would it know which 
ones to GC independently and which ones to do shared?
{quote}
Shared is sort of automatic setup currently on startup based on a unique 
clusterid/repositoryid. For Shared once it sees different repository ids 
registered it only proceeds with Mark phase once all references from all 
registered repositories are available.
{quote}I think it is better to leave the GC logic where it is and let the 
{{DataStore}} (and the {{CompositeDataStore}}) remain unaware of GC logic, if 
possible.
 I’m confident the solution I proposed works correctly, in testing. I 
understand there are undesirable consequences. I also get the point you made, a 
very good one, which is that this is unlikely to work well in the real world 
due to how production systems function.
{quote}
It might be Ok for this particular problem. But this does not work in the 
existing {{DataStore}} setups as I outlined previously.
{quote}What else could we do to address this?
{quote}
I think we need a better solution on this. But we could do either below to 
proceed currently:
 * Add this sweep state change thing only in case of {{CompositeDataStore}}, Or
 * For each repository
 ** Execute Mark on all repositories/ Sweep

> CompositeDataStore - ReadOnly/ReadWrite Delegate Support
> --------------------------------------------------------
>
>                 Key: OAK-7083
>                 URL: https://issues.apache.org/jira/browse/OAK-7083
>             Project: Jackrabbit Oak
>          Issue Type: New Feature
>          Components: blob, blob-cloud, blob-cloud-azure, blob-plugins
>            Reporter: Matt Ryan
>            Assignee: Matt Ryan
>            Priority: Major
>
> Support a specific composite data store use case, which is the following:
> * One instance uses no composite data store, but instead is using a single 
> standard Oak data store (e.g. FileDataStore)
> * Another instance is created by snapshotting the first instance node store, 
> and then uses a composite data store to refer to the first instance's data 
> store read-only, and refers to a second data store as a writable data store
> One way this can be used is in creating a test or staging instance from a 
> production instance.  At creation, the test instance will look like 
> production, but any changes made to the test instance do not affect 
> production.  The test instance can be quickly created from production by 
> cloning only the node store, and not requiring a copy of all the data in the 
> data store.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to