Re: [PROPOSAL] make Cluster Management Service CRUD operations thread safe

2020-05-28 Thread Jinmei Liao
Donal had a good comment on the wiki page, and I replied. Maybe that would help 
answer your question of ID:

https://cwiki.apache.org/confluence/display/GEODE/Make+Cluster+Management+Service(CMS)+Thread+Safe

From: Anilkumar Gingade 
Sent: Thursday, May 28, 2020 12:02 PM
To: dev@geode.apache.org 
Subject: Re: [PROPOSAL] make Cluster Management Service CRUD operations thread 
safe

Yes, the DLock machinery handles (has option) dlock grantor departure...

As I understand, right now we have dlock at config persistence layer, but this 
does not guarantee preserving the order in which the config changes are 
applied. E.g.: A create region command followed by destroy could be persisted 
in reverse order.

And the proposal here is to move the lock from persist layer to higher level to 
preserve the ordering. In this case the cost of taking Dlock remains same, 
except the lock window is increased to much higher.
Question is; Is the window so high that it impacts the user experience; 
considering config changes are not often or less concurrent (multiple clients 
changing/updating the config).
By having one single locking scheme, the logic remains simple, stable and 
manageable.

>> " Another way is to use a dlock per ID to only synchronize CRUD operation on 
>> the same ID element"
Is this possible for all config commands...What is "ID" refers to, is it region 
name (region level ops); if so then, does this need parsing the command request 
or is it already available? E.g. create index

>> not sure what's the cost of creating a dlock
The cost depends on who is the dlock grantor? If create request is on the 
grantor itself, its cheaper...If it’s a peer node, than the cost is sending the 
lock request message to the grantor. The cost is with sending message to the 
Grantor, in most cases. Which is not bad, considering the configuration does 
not change frequently.

-Anil.

On 5/28/20, 11:08 AM, "Jinmei Liao"  wrote:

Simultaneous updates to configurations are already protected by a different 
dlock, so I assume they can be made safely.

Typically a CMS operation involves two parts:

1) updates to the servers to "realize" the configuration
2) updates to the configurations to "persist" it.

The purpose of the CMS level dlock is to make these two parts atomic, if a 
create/delete operations of the same element happened not in an atomic fashion, 
we would end up with inconsistent state between what's persisted and what's 
realized.

I believe the dlock can be configured to expire after a period of time.

From: Anthony Baker 
Sent: Thursday, May 28, 2020 10:40 AM
    To: dev@geode.apache.org 
Subject: Re: [PROPOSAL] make Cluster Management Service CRUD operations 
thread safe

I think the first question to answer is:  can simultaneous updates to 
configuration be made safely?  Or what is the critical section of code that 
needs to be protected?

Another thing to consider with dlocks is what happens in the failure case 
when the lock is not properly released.  Does it continue to block all 
management /configuration operations?  Does it expire after a period of time?

Anthony


> On May 28, 2020, at 10:17 AM, Jinmei Liao  wrote:
>
> The proposal is proposing using ONE single dlock to synchronize all CMS 
CRUD operations, that means, in a given time, only one CRUD operation in CMS is 
allowed in the entire cluster, this seems a bit too harsh.
>
> Another way is to use a dlock per ID to only synchronize CRUD operation 
on the same ID element. That way, we allow more concurrency. I am just not sure 
what's the cost of creating a dlock. Is the the cost of creating a dlock per ID 
warrants the performance gain?
>
> Comment/Suggestions?
>
> Jinmei
> 
> From: Jinmei Liao 
> Sent: Tuesday, May 26, 2020 1:02 PM
> To: dev@geode.apache.org 
> Subject: [PROPOSAL] make Cluster Management Service CRUD operations 
thread safe
>
> Hi, Geode Community,
>
> Currently, the CMS CRUD operations are not thread safe, if one call tries 
to create a region, and another call tries to delete the same region, if timing 
is off, we could end up with inconsistent state (what's in cluster config and 
what's actually on the server). So we should make these operations thread safe. 
Here is the proposal to try to achieve it:
>
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FMake%2BCluster%2BManagement%2BService%2528CMS%2529%2BThread%2BSafedata=02%7C01%7Cjiliao%40vmware.com%7C32464d1d0bb74930fe7508d8033a15bc%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637262895297900183sdata=dBuD9e4FaqAe74Xf0fn%2FdJBMlMbOAXWcx0wi0tCFNfY%3Dreserved=0
>
> Comments/suggestions welcome.
>
> Jinmei




Re: [PROPOSAL] make Cluster Management Service CRUD operations thread safe

2020-05-28 Thread Anilkumar Gingade
Yes, the DLock machinery handles (has option) dlock grantor departure...

As I understand, right now we have dlock at config persistence layer, but this 
does not guarantee preserving the order in which the config changes are 
applied. E.g.: A create region command followed by destroy could be persisted 
in reverse order.

And the proposal here is to move the lock from persist layer to higher level to 
preserve the ordering. In this case the cost of taking Dlock remains same, 
except the lock window is increased to much higher.
Question is; Is the window so high that it impacts the user experience; 
considering config changes are not often or less concurrent (multiple clients 
changing/updating the config). 
By having one single locking scheme, the logic remains simple, stable and 
manageable.
 
>> " Another way is to use a dlock per ID to only synchronize CRUD operation on 
>> the same ID element"
Is this possible for all config commands...What is "ID" refers to, is it region 
name (region level ops); if so then, does this need parsing the command request 
or is it already available? E.g. create index 
 
>> not sure what's the cost of creating a dlock
The cost depends on who is the dlock grantor? If create request is on the 
grantor itself, its cheaper...If it’s a peer node, than the cost is sending the 
lock request message to the grantor. The cost is with sending message to the 
Grantor, in most cases. Which is not bad, considering the configuration does 
not change frequently. 

-Anil.

On 5/28/20, 11:08 AM, "Jinmei Liao"  wrote:

Simultaneous updates to configurations are already protected by a different 
dlock, so I assume they can be made safely.

Typically a CMS operation involves two parts:

1) updates to the servers to "realize" the configuration
2) updates to the configurations to "persist" it.

The purpose of the CMS level dlock is to make these two parts atomic, if a 
create/delete operations of the same element happened not in an atomic fashion, 
we would end up with inconsistent state between what's persisted and what's 
realized.

I believe the dlock can be configured to expire after a period of time.

From: Anthony Baker 
Sent: Thursday, May 28, 2020 10:40 AM
    To: dev@geode.apache.org 
Subject: Re: [PROPOSAL] make Cluster Management Service CRUD operations 
thread safe

I think the first question to answer is:  can simultaneous updates to 
configuration be made safely?  Or what is the critical section of code that 
needs to be protected?

Another thing to consider with dlocks is what happens in the failure case 
when the lock is not properly released.  Does it continue to block all 
management /configuration operations?  Does it expire after a period of time?

Anthony


> On May 28, 2020, at 10:17 AM, Jinmei Liao  wrote:
>
> The proposal is proposing using ONE single dlock to synchronize all CMS 
CRUD operations, that means, in a given time, only one CRUD operation in CMS is 
allowed in the entire cluster, this seems a bit too harsh.
>
> Another way is to use a dlock per ID to only synchronize CRUD operation 
on the same ID element. That way, we allow more concurrency. I am just not sure 
what's the cost of creating a dlock. Is the the cost of creating a dlock per ID 
warrants the performance gain?
>
> Comment/Suggestions?
>
> Jinmei
> 
> From: Jinmei Liao 
> Sent: Tuesday, May 26, 2020 1:02 PM
> To: dev@geode.apache.org 
> Subject: [PROPOSAL] make Cluster Management Service CRUD operations 
thread safe
>
> Hi, Geode Community,
>
> Currently, the CMS CRUD operations are not thread safe, if one call tries 
to create a region, and another call tries to delete the same region, if timing 
is off, we could end up with inconsistent state (what's in cluster config and 
what's actually on the server). So we should make these operations thread safe. 
Here is the proposal to try to achieve it:
>
> 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FMake%2BCluster%2BManagement%2BService%2528CMS%2529%2BThread%2BSafedata=02%7C01%7Cagingade%40vmware.com%7Cebd6b928654b4946771e08d803322884%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637262861250474025sdata=24psRmhotSCG%2BzjULBohdcyyJ3y%2FsWKii99ipdPFGRE%3Dreserved=0
>
> Comments/suggestions welcome.
>
> Jinmei




Re: [PROPOSAL] make Cluster Management Service CRUD operations thread safe

2020-05-28 Thread Jinmei Liao
Simultaneous updates to configurations are already protected by a different 
dlock, so I assume they can be made safely.

Typically a CMS operation involves two parts:

1) updates to the servers to "realize" the configuration
2) updates to the configurations to "persist" it.

The purpose of the CMS level dlock is to make these two parts atomic, if a 
create/delete operations of the same element happened not in an atomic fashion, 
we would end up with inconsistent state between what's persisted and what's 
realized.

I believe the dlock can be configured to expire after a period of time.

From: Anthony Baker 
Sent: Thursday, May 28, 2020 10:40 AM
To: dev@geode.apache.org 
Subject: Re: [PROPOSAL] make Cluster Management Service CRUD operations thread 
safe

I think the first question to answer is:  can simultaneous updates to 
configuration be made safely?  Or what is the critical section of code that 
needs to be protected?

Another thing to consider with dlocks is what happens in the failure case when 
the lock is not properly released.  Does it continue to block all management 
/configuration operations?  Does it expire after a period of time?

Anthony


> On May 28, 2020, at 10:17 AM, Jinmei Liao  wrote:
>
> The proposal is proposing using ONE single dlock to synchronize all CMS CRUD 
> operations, that means, in a given time, only one CRUD operation in CMS is 
> allowed in the entire cluster, this seems a bit too harsh.
>
> Another way is to use a dlock per ID to only synchronize CRUD operation on 
> the same ID element. That way, we allow more concurrency. I am just not sure 
> what's the cost of creating a dlock. Is the the cost of creating a dlock per 
> ID warrants the performance gain?
>
> Comment/Suggestions?
>
> Jinmei
> 
> From: Jinmei Liao 
> Sent: Tuesday, May 26, 2020 1:02 PM
> To: dev@geode.apache.org 
> Subject: [PROPOSAL] make Cluster Management Service CRUD operations thread 
> safe
>
> Hi, Geode Community,
>
> Currently, the CMS CRUD operations are not thread safe, if one call tries to 
> create a region, and another call tries to delete the same region, if timing 
> is off, we could end up with inconsistent state (what's in cluster config and 
> what's actually on the server). So we should make these operations thread 
> safe. Here is the proposal to try to achieve it:
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FMake%2BCluster%2BManagement%2BService%2528CMS%2529%2BThread%2BSafedata=02%7C01%7Cjiliao%40vmware.com%7C63f2edc4f0dd43257ae608d8032e4c41%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637262844668586785sdata=5fj2V6IWdKjkIpEq5pUL86IkAvPKS5QYSHPuy3qlFoE%3Dreserved=0
>
> Comments/suggestions welcome.
>
> Jinmei



Re: [PROPOSAL] make Cluster Management Service CRUD operations thread safe

2020-05-28 Thread Anthony Baker
I think the first question to answer is:  can simultaneous updates to 
configuration be made safely?  Or what is the critical section of code that 
needs to be protected?

Another thing to consider with dlocks is what happens in the failure case when 
the lock is not properly released.  Does it continue to block all management 
/configuration operations?  Does it expire after a period of time?

Anthony


> On May 28, 2020, at 10:17 AM, Jinmei Liao  wrote:
> 
> The proposal is proposing using ONE single dlock to synchronize all CMS CRUD 
> operations, that means, in a given time, only one CRUD operation in CMS is 
> allowed in the entire cluster, this seems a bit too harsh.
> 
> Another way is to use a dlock per ID to only synchronize CRUD operation on 
> the same ID element. That way, we allow more concurrency. I am just not sure 
> what's the cost of creating a dlock. Is the the cost of creating a dlock per 
> ID warrants the performance gain?
> 
> Comment/Suggestions?
> 
> Jinmei
> 
> From: Jinmei Liao 
> Sent: Tuesday, May 26, 2020 1:02 PM
> To: dev@geode.apache.org 
> Subject: [PROPOSAL] make Cluster Management Service CRUD operations thread 
> safe
> 
> Hi, Geode Community,
> 
> Currently, the CMS CRUD operations are not thread safe, if one call tries to 
> create a region, and another call tries to delete the same region, if timing 
> is off, we could end up with inconsistent state (what's in cluster config and 
> what's actually on the server). So we should make these operations thread 
> safe. Here is the proposal to try to achieve it:
> 
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FMake%2BCluster%2BManagement%2BService%2528CMS%2529%2BThread%2BSafedata=02%7C01%7Cbakera%40vmware.com%7C08663df2f1c846dddaaf08d8032d9d8f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637262841746056995sdata=eayoO7sOyyFgSsOP%2FXiN2U1L5YTbbukmBVkVmXtR3jI%3Dreserved=0
> 
> Comments/suggestions welcome.
> 
> Jinmei



Re: [PROPOSAL] make Cluster Management Service CRUD operations thread safe

2020-05-28 Thread Jinmei Liao
The proposal is proposing using ONE single dlock to synchronize all CMS CRUD 
operations, that means, in a given time, only one CRUD operation in CMS is 
allowed in the entire cluster, this seems a bit too harsh.

Another way is to use a dlock per ID to only synchronize CRUD operation on the 
same ID element. That way, we allow more concurrency. I am just not sure what's 
the cost of creating a dlock. Is the the cost of creating a dlock per ID 
warrants the performance gain?

Comment/Suggestions?

Jinmei

From: Jinmei Liao 
Sent: Tuesday, May 26, 2020 1:02 PM
To: dev@geode.apache.org 
Subject: [PROPOSAL] make Cluster Management Service CRUD operations thread safe

Hi, Geode Community,

Currently, the CMS CRUD operations are not thread safe, if one call tries to 
create a region, and another call tries to delete the same region, if timing is 
off, we could end up with inconsistent state (what's in cluster config and 
what's actually on the server). So we should make these operations thread safe. 
Here is the proposal to try to achieve it:

https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGEODE%2FMake%2BCluster%2BManagement%2BService%2528CMS%2529%2BThread%2BSafedata=02%7C01%7Cjiliao%40vmware.com%7C06dd93ddb8f947b017cd08d80325d7cb%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637262808357520050sdata=l0fbTN%2Fb1QK%2F8fnNHYbFQ6CtywAgKZpTE0Jv33ab%2BUk%3Dreserved=0

Comments/suggestions welcome.

Jinmei