[jira] [Commented] (HBASE-12439) Procedure V2

2018-03-06 Thread Chance Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16388965#comment-16388965
 ] 

Chance Li commented on HBASE-12439:
---

[~stack]

thanks, sir.

{quote}less throughput, but way less CPU used (better 95th percentiles, etc), 
and handlers free for other tasks\{quote}

Yes sir, this is our purpose.

{quote}Would it make sense to put this limit behind a configuration 
gate?\{quote}
I think we need it. It is mainly for to enable us to 'remove' it , even though 
we can't totally remove it because of the added counter on the path.

{quote}Also of note is how your compares show the asyncwal being slightly 
better than old wal.\{quote}
Let me check again. And this async_wal is one of Durablilty types. it's not WAL.

[~carp84] ok, I will work on it.

> Procedure V2
> 
>
> Key: HBASE-12439
> URL: https://issues.apache.org/jira/browse/HBASE-12439
> Project: HBase
>  Issue Type: New Feature
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Priority: Major
>  Labels: reliability
> Attachments: ProcedureV2b.pdf, Procedurev2Notification-Bus.pdf, 
> Procedurev2Notification-BusRoadmap.pdf
>
>
> Procedure v2 (aka Notification Bus) aims to provide a unified way to build:
> * multi-steps procedure with a rollback/rollforward ability in case of 
> failure (e.g. create/delete table)
> ** HBASE-12070
> * notifications across multiple machines (e.g. ACLs/Labels/Quotas cache 
> updates)
> ** Make sure that every machine has the grant/revoke/label
> ** Enforce "space limit" quota across the namespace
> ** HBASE-10295 eliminate permanent replication zk node
> * procedures across multiple machines (e.g. Snapshots)
> * coordinated long-running procedures (e.g. compactions, splits, ...)
> * Synchronous calls, with the ability to see the state/result in case of 
> failure.
> ** HBASE-11608 sync split
> still work in progress/initial prototype: https://reviews.apache.org/r/27703/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-12439) Procedure V2

2018-03-02 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383338#comment-16383338
 ] 

stack commented on HBASE-12439:
---

Link to doc on Pv2 for Devs

> Procedure V2
> 
>
> Key: HBASE-12439
> URL: https://issues.apache.org/jira/browse/HBASE-12439
> Project: HBase
>  Issue Type: New Feature
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Priority: Major
>  Labels: reliability
> Attachments: ProcedureV2b.pdf, Procedurev2Notification-Bus.pdf, 
> Procedurev2Notification-BusRoadmap.pdf
>
>
> Procedure v2 (aka Notification Bus) aims to provide a unified way to build:
> * multi-steps procedure with a rollback/rollforward ability in case of 
> failure (e.g. create/delete table)
> ** HBASE-12070
> * notifications across multiple machines (e.g. ACLs/Labels/Quotas cache 
> updates)
> ** Make sure that every machine has the grant/revoke/label
> ** Enforce "space limit" quota across the namespace
> ** HBASE-10295 eliminate permanent replication zk node
> * procedures across multiple machines (e.g. Snapshots)
> * coordinated long-running procedures (e.g. compactions, splits, ...)
> * Synchronous calls, with the ability to see the state/result in case of 
> failure.
> ** HBASE-11608 sync split
> still work in progress/initial prototype: https://reviews.apache.org/r/27703/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-12439) Procedure V2

2017-08-15 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16128121#comment-16128121
 ] 

stack commented on HBASE-12439:
---

Moving out of 2.0.0. Subtask notification bus will not be done for 2.0 release.

> Procedure V2
> 
>
> Key: HBASE-12439
> URL: https://issues.apache.org/jira/browse/HBASE-12439
> Project: HBase
>  Issue Type: New Feature
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>  Labels: reliability
> Attachments: ProcedureV2b.pdf, Procedurev2Notification-Bus.pdf, 
> Procedurev2Notification-BusRoadmap.pdf
>
>
> Procedure v2 (aka Notification Bus) aims to provide a unified way to build:
> * multi-steps procedure with a rollback/rollforward ability in case of 
> failure (e.g. create/delete table)
> ** HBASE-12070
> * notifications across multiple machines (e.g. ACLs/Labels/Quotas cache 
> updates)
> ** Make sure that every machine has the grant/revoke/label
> ** Enforce "space limit" quota across the namespace
> ** HBASE-10295 eliminate permanent replication zk node
> * procedures across multiple machines (e.g. Snapshots)
> * coordinated long-running procedures (e.g. compactions, splits, ...)
> * Synchronous calls, with the ability to see the state/result in case of 
> failure.
> ** HBASE-11608 sync split
> still work in progress/initial prototype: https://reviews.apache.org/r/27703/



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-12439) Procedure V2

2016-12-29 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15786374#comment-15786374
 ] 

Enis Soztutar commented on HBASE-12439:
---

Thanks, that would be great. 

> Procedure V2
> 
>
> Key: HBASE-12439
> URL: https://issues.apache.org/jira/browse/HBASE-12439
> Project: HBase
>  Issue Type: New Feature
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>  Labels: reliability
> Fix For: 2.0.0
>
> Attachments: Procedurev2Notification-Bus.pdf, 
> Procedurev2Notification-BusRoadmap.pdf
>
>
> Procedure v2 (aka Notification Bus) aims to provide a unified way to build:
> * multi-steps procedure with a rollback/rollforward ability in case of 
> failure (e.g. create/delete table)
> ** HBASE-12070
> * notifications across multiple machines (e.g. ACLs/Labels/Quotas cache 
> updates)
> ** Make sure that every machine has the grant/revoke/label
> ** Enforce "space limit" quota across the namespace
> ** HBASE-10295 eliminate permanent replication zk node
> * procedures across multiple machines (e.g. Snapshots)
> * coordinated long-running procedures (e.g. compactions, splits, ...)
> * Synchronous calls, with the ability to see the state/result in case of 
> failure.
> ** HBASE-11608 sync split
> still work in progress/initial prototype: https://reviews.apache.org/r/27703/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12439) Procedure V2

2016-12-29 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15786372#comment-15786372
 ] 

stack commented on HBASE-12439:
---

[~enis] There is not much. I'm putting together what I can find and will post 
over in HBASE-14350  There is more to do. Was going to have a chat with 
[~syuanjiang]. Probably a good thing to have a sync on. Will put out a call.

> Procedure V2
> 
>
> Key: HBASE-12439
> URL: https://issues.apache.org/jira/browse/HBASE-12439
> Project: HBase
>  Issue Type: New Feature
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>  Labels: reliability
> Fix For: 2.0.0
>
> Attachments: Procedurev2Notification-Bus.pdf, 
> Procedurev2Notification-BusRoadmap.pdf
>
>
> Procedure v2 (aka Notification Bus) aims to provide a unified way to build:
> * multi-steps procedure with a rollback/rollforward ability in case of 
> failure (e.g. create/delete table)
> ** HBASE-12070
> * notifications across multiple machines (e.g. ACLs/Labels/Quotas cache 
> updates)
> ** Make sure that every machine has the grant/revoke/label
> ** Enforce "space limit" quota across the namespace
> ** HBASE-10295 eliminate permanent replication zk node
> * procedures across multiple machines (e.g. Snapshots)
> * coordinated long-running procedures (e.g. compactions, splits, ...)
> * Synchronous calls, with the ability to see the state/result in case of 
> failure.
> ** HBASE-11608 sync split
> still work in progress/initial prototype: https://reviews.apache.org/r/27703/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12439) Procedure V2

2016-12-29 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15786351#comment-15786351
 ] 

Enis Soztutar commented on HBASE-12439:
---

Thanks Stack for putting these docs out. Do we have a lower-level write up on 
the assignment manager design? The high-level doc is there, but does not talk 
about anything specific to locking, ordering, how servers are managed, etc. It 
has been some time I've looked at the recent state (for AM), but is the new 
patches cover the whole design, or is there more stuff to do? 

> Procedure V2
> 
>
> Key: HBASE-12439
> URL: https://issues.apache.org/jira/browse/HBASE-12439
> Project: HBase
>  Issue Type: New Feature
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>  Labels: reliability
> Fix For: 2.0.0
>
> Attachments: Procedurev2Notification-Bus.pdf, 
> Procedurev2Notification-BusRoadmap.pdf
>
>
> Procedure v2 (aka Notification Bus) aims to provide a unified way to build:
> * multi-steps procedure with a rollback/rollforward ability in case of 
> failure (e.g. create/delete table)
> ** HBASE-12070
> * notifications across multiple machines (e.g. ACLs/Labels/Quotas cache 
> updates)
> ** Make sure that every machine has the grant/revoke/label
> ** Enforce "space limit" quota across the namespace
> ** HBASE-10295 eliminate permanent replication zk node
> * procedures across multiple machines (e.g. Snapshots)
> * coordinated long-running procedures (e.g. compactions, splits, ...)
> * Synchronous calls, with the ability to see the state/result in case of 
> failure.
> ** HBASE-11608 sync split
> still work in progress/initial prototype: https://reviews.apache.org/r/27703/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12439) Procedure V2

2016-12-29 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15786047#comment-15786047
 ] 

stack commented on HBASE-12439:
---

I removed ProcedureV2.pdf doc because it subsumed by attached, existing 
ProcedureV2Notification-Bus.pdf doc.

> Procedure V2
> 
>
> Key: HBASE-12439
> URL: https://issues.apache.org/jira/browse/HBASE-12439
> Project: HBase
>  Issue Type: New Feature
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>  Labels: reliability
> Fix For: 2.0.0
>
> Attachments: Procedurev2Notification-Bus.pdf, 
> Procedurev2Notification-BusRoadmap.pdf
>
>
> Procedure v2 (aka Notification Bus) aims to provide a unified way to build:
> * multi-steps procedure with a rollback/rollforward ability in case of 
> failure (e.g. create/delete table)
> ** HBASE-12070
> * notifications across multiple machines (e.g. ACLs/Labels/Quotas cache 
> updates)
> ** Make sure that every machine has the grant/revoke/label
> ** Enforce "space limit" quota across the namespace
> ** HBASE-10295 eliminate permanent replication zk node
> * procedures across multiple machines (e.g. Snapshots)
> * coordinated long-running procedures (e.g. compactions, splits, ...)
> * Synchronous calls, with the ability to see the state/result in case of 
> failure.
> ** HBASE-11608 sync split
> still work in progress/initial prototype: https://reviews.apache.org/r/27703/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12439) Procedure V2

2015-09-01 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14725701#comment-14725701
 ] 

Stephen Yuan Jiang commented on HBASE-12439:


Update the roadmap based on discussion between [~mbertozzi] and [~syuanjiang] 
and feedback from August dev meetup discussion.  Basically the focus would be 
on the upcoming 2.0 release to have new assignment manager using Proc-V2.  

> Procedure V2
> 
>
> Key: HBASE-12439
> URL: https://issues.apache.org/jira/browse/HBASE-12439
> Project: HBase
>  Issue Type: New Feature
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>  Labels: reliability
> Attachments: Procedure v2 roadmap.pdf, ProcedureV2.pdf, 
> Procedurev2Notification-Bus.pdf
>
>
> Procedure v2 (aka Notification Bus) aims to provide a unified way to build:
> * multi-steps procedure with a rollback/rollforward ability in case of 
> failure (e.g. create/delete table)
> ** HBASE-12070
> * notifications across multiple machines (e.g. ACLs/Labels/Quotas cache 
> updates)
> ** Make sure that every machine has the grant/revoke/label
> ** Enforce "space limit" quota across the namespace
> ** HBASE-10295 eliminate permanent replication zk node
> * procedures across multiple machines (e.g. Snapshots)
> * coordinated long-running procedures (e.g. compactions, splits, ...)
> * Synchronous calls, with the ability to see the state/result in case of 
> failure.
> ** HBASE-11608 sync split
> still work in progress/initial prototype: https://reviews.apache.org/r/27703/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12439) Procedure V2

2015-06-29 Thread Stephen Yuan Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14607551#comment-14607551
 ] 

Stephen Yuan Jiang commented on HBASE-12439:


As we completed the Phase 1 of PV2 in HBASE 1.1 release and made some 
enhancement in HBASE 1.2 release.  We are going to continue the Phase 2 of PV2 
work in HBASE 1.3.  Attached the document with the list of items in the 
pipeline. 

> Procedure V2
> 
>
> Key: HBASE-12439
> URL: https://issues.apache.org/jira/browse/HBASE-12439
> Project: HBase
>  Issue Type: New Feature
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>  Labels: reliability
> Attachments: ProcedureV2.pdf, Procedurev2Notification-Bus.pdf
>
>
> Procedure v2 (aka Notification Bus) aims to provide a unified way to build:
> * multi-steps procedure with a rollback/rollforward ability in case of 
> failure (e.g. create/delete table)
> ** HBASE-12070
> * notifications across multiple machines (e.g. ACLs/Labels/Quotas cache 
> updates)
> ** Make sure that every machine has the grant/revoke/label
> ** Enforce "space limit" quota across the namespace
> ** HBASE-10295 eliminate permanent replication zk node
> * procedures across multiple machines (e.g. Snapshots)
> * coordinated long-running procedures (e.g. compactions, splits, ...)
> * Synchronous calls, with the ability to see the state/result in case of 
> failure.
> ** HBASE-11608 sync split
> still work in progress/initial prototype: https://reviews.apache.org/r/27703/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12439) Procedure V2

2015-02-03 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303025#comment-14303025
 ] 

Matteo Bertozzi commented on HBASE-12439:
-

{quote}What is the diagram that talks about "branch coordinators"? Does not 
seem mentioned in the text.{quote}
yeah, in the text there is a passage line but nothing more: "Assuming no other 
coordinator between the Master and the Region Server, the operation is sent 
down to the executor (Region Server) and the Master will be responsible to 
retry/resend the operation".
I was thinking at a multi-master case where each master (in the pic branch 
coordinator) is responsible for a set of RSs and the root-master (in the pic 
root coordinator) is doing coordination between the masters. but that's not 
important, it is just an implementation detail on how the "first level" of the 
procedure is implemented.

> Procedure V2
> 
>
> Key: HBASE-12439
> URL: https://issues.apache.org/jira/browse/HBASE-12439
> Project: HBase
>  Issue Type: New Feature
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Minor
> Attachments: ProcedureV2.pdf, Procedurev2Notification-Bus.pdf
>
>
> Procedure v2 (aka Notification Bus) aims to provide a unified way to build:
> * multi-steps procedure with a rollback/rollforward ability in case of 
> failure (e.g. create/delete table)
> ** HBASE-12070
> * notifications across multiple machines (e.g. ACLs/Labels/Quotas cache 
> updates)
> ** Make sure that every machine has the grant/revoke/label
> ** Enforce "space limit" quota across the namespace
> ** HBASE-10295 eliminate permanent replication zk node
> * procedures across multiple machines (e.g. Snapshots)
> * coordinated long-running procedures (e.g. compactions, splits, ...)
> * Synchronous calls, with the ability to see the state/result in case of 
> failure.
> ** HBASE-11608 sync split
> still work in progress/initial prototype: https://reviews.apache.org/r/27703/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12439) Procedure V2

2015-02-03 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14303017#comment-14303017
 ] 

Matteo Bertozzi commented on HBASE-12439:
-

{quote}FATE calls the above "adompotent", since the step can be in partially 
done or failed. So the step should work over the result of a partial execution 
from previous. For example, a step for creating a dir for the table in hdfs 
should not fail if the directory is already there.{quote}
here the logic is the same, once you execute a step if there is a non retryable 
code failure there will be a rollback step called.
the logic to revert partial step is responsibility of the execute()/rollback() 
implementation not of the framework, the framework only knows if a step is 
supposed to be executed or rollback, it has no knowledge about what you are 
doing.  

{quote}I think we should address fencing as a first level goal, and mention it 
in the state store implementation. If we make it explicit in store, alternative 
implementations if any has to take that into account {quote}
agreed, I'm not yet at this point. I'm still at making sure the 
execution/rollback was as expected.

{quote}This is easy to workaround. We can have two state store implementations. 
One is a smaller scale zk based one, for doing bootstrap. The other is for 
usual operations. However, I think we still do not need a table yet, but a 
state store can be implemented as a region opened in master. This way, we do 
not have to re-implement yet another wal, and custom in-memory data structures. 
Let me experiment with this approach on top of this patch.{quote}
The reason I choose the wal was to support assignment, all the logged events 
will probably trigger to many flush and compactions. and we don't really need 
this data to be compacted. but maybe a simple tuning on the region to avoid 
compaction and relying on TTL may be just fine and avoid the problem. didn't 
look into it too much, if you have time to experiment with it feel free to post 
a patch or just suggestions on how to change it.

> Procedure V2
> 
>
> Key: HBASE-12439
> URL: https://issues.apache.org/jira/browse/HBASE-12439
> Project: HBase
>  Issue Type: New Feature
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Minor
> Attachments: ProcedureV2.pdf, Procedurev2Notification-Bus.pdf
>
>
> Procedure v2 (aka Notification Bus) aims to provide a unified way to build:
> * multi-steps procedure with a rollback/rollforward ability in case of 
> failure (e.g. create/delete table)
> ** HBASE-12070
> * notifications across multiple machines (e.g. ACLs/Labels/Quotas cache 
> updates)
> ** Make sure that every machine has the grant/revoke/label
> ** Enforce "space limit" quota across the namespace
> ** HBASE-10295 eliminate permanent replication zk node
> * procedures across multiple machines (e.g. Snapshots)
> * coordinated long-running procedures (e.g. compactions, splits, ...)
> * Synchronous calls, with the ability to see the state/result in case of 
> failure.
> ** HBASE-11608 sync split
> still work in progress/initial prototype: https://reviews.apache.org/r/27703/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12439) Procedure V2

2015-02-02 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14302746#comment-14302746
 ] 

Enis Soztutar commented on HBASE-12439:
---

Thanks Matteo. This is good. Similar to what has been discussed in other jiras, 
but with some implementation this time. 
bq. Suggest add JIRA number to doc. Suggest a sentence on how PV2 is NOT FATE. 
Add the work 'idempotent' in around this text "...in a way that each step must 
be able to be executed multiple times (generating the same result) a..." 
(although if a rollback, I suppose it not idempotent?)
The way I see it is that FATE execution is in stack, versus here is DAG. FATE 
calls the above "adompotent", since the step can be in partially done or 
failed. So the step should work over the result of a partial execution from 
previous. For example, a step for creating a dir for the table in hdfs should 
not fail if the directory is already there. 

What is the diagram that talks about "branch coordinators"? Does not seem 
mentioned in the text. 

I think we should address fencing as a first level goal, and mention it in the 
state store implementation. If we make it explicit in store, alternative 
implementations if any has to take that into account. Fencing is really 
important because current master lacks it, and it is a potential cause for 
wracking havoc on the cluster. Proper fencing can only be achieved through the 
store, and only if active master does a state store operation for every action. 
For example, the master can do a "register master' procedure as a way to commit 
its state, and prevent the previous master to do any more operation. I could 
not see a use of fencing through wal (or recover lease, etc) in the patch. 

bq. The main problem of using a table is that you end up with the chicken egg 
problem.
This is easy to workaround. We can have two state store implementations. One is 
a smaller scale zk based one, for doing bootstrap. The other is for usual 
operations. However, I think we still do not need a table yet, but a state 
store can be implemented as a region opened in master. This way, we do not have 
to re-implement yet another wal, and custom in-memory data structures. Let me 
experiment with this approach on top of this patch. 

I'll take a more closer look at the patch as well. 



 





> Procedure V2
> 
>
> Key: HBASE-12439
> URL: https://issues.apache.org/jira/browse/HBASE-12439
> Project: HBase
>  Issue Type: New Feature
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Minor
> Attachments: ProcedureV2.pdf, Procedurev2Notification-Bus.pdf
>
>
> Procedure v2 (aka Notification Bus) aims to provide a unified way to build:
> * multi-steps procedure with a rollback/rollforward ability in case of 
> failure (e.g. create/delete table)
> ** HBASE-12070
> * notifications across multiple machines (e.g. ACLs/Labels/Quotas cache 
> updates)
> ** Make sure that every machine has the grant/revoke/label
> ** Enforce "space limit" quota across the namespace
> ** HBASE-10295 eliminate permanent replication zk node
> * procedures across multiple machines (e.g. Snapshots)
> * coordinated long-running procedures (e.g. compactions, splits, ...)
> * Synchronous calls, with the ability to see the state/result in case of 
> failure.
> ** HBASE-11608 sync split
> still work in progress/initial prototype: https://reviews.apache.org/r/27703/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12439) Procedure V2

2015-02-02 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14301960#comment-14301960
 ] 

stack commented on HBASE-12439:
---

bq. No, Procedure is the internal name for "I'm doing something".

Its confusing swapping in Operation for Procedure when you've done all this 
work talking up Procedures and of how Procedures are made of zero or more 
Sub-Procedures.

Ok on the rest [~mbertozzi] Looks great.



> Procedure V2
> 
>
> Key: HBASE-12439
> URL: https://issues.apache.org/jira/browse/HBASE-12439
> Project: HBase
>  Issue Type: New Feature
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Minor
> Attachments: ProcedureV2.pdf, Procedurev2Notification-Bus.pdf
>
>
> Procedure v2 (aka Notification Bus) aims to provide a unified way to build:
> * multi-steps procedure with a rollback/rollforward ability in case of 
> failure (e.g. create/delete table)
> ** HBASE-12070
> * notifications across multiple machines (e.g. ACLs/Labels/Quotas cache 
> updates)
> ** Make sure that every machine has the grant/revoke/label
> ** Enforce "space limit" quota across the namespace
> ** HBASE-10295 eliminate permanent replication zk node
> * procedures across multiple machines (e.g. Snapshots)
> * coordinated long-running procedures (e.g. compactions, splits, ...)
> * Synchronous calls, with the ability to see the state/result in case of 
> failure.
> ** HBASE-11608 sync split
> still work in progress/initial prototype: https://reviews.apache.org/r/27703/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12439) Procedure V2

2015-02-02 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14301894#comment-14301894
 ] 

Matteo Bertozzi commented on HBASE-12439:
-

{quote}The API is Admin.doOperation(). Should it be Admin.doProcedure? (In 
doc., you start talking about 'operations' when you were talking about 
'procedures' up to this).{quote}
No, Procedure is the internal name for "I'm doing something".
Example: "delete table" maps to a procedure but "snapshot" may map to multiple 
procedures (depending on how you view the execution it may be the master part + 
the RS subprocs or it may be just the "snapshot namespace" with the table 
snapshots as subprocedures).
another example is create table, if you see "create table operation" as "create 
procedure" + "assignment procedure". more in general I use "operation" because 
there may be more work than just call the procedure.

{quote}"If the master does not receve a response within a timeout, or the 
region was reassigned, it will resend the execution request.", master will just 
retry for ever?{quote}
It depends, e.g. if the procedure is "assignment" yes. if the procedure is 
"snapshot" it will timeout after Nsec.

{quote}For the TwoPhaseProcedure, would be good to draw out the steps as you 
have done for the OnePhaseProcedure procedure. Would help me figure if I get 
how this 'staging' stuff works.{quote}
sure, still in progress

{quote}What is this? "(The sync­client implementation can be done for the 2.0 
branch, but we can’t backport that to
keep the compatibility. New client methods can be added using the 
procedure)"{quote}
our Admin is not really sync, for example create table and similar depends on 
the order of the operation master-side. so if you change the code of the 
handler but not the client, the client will be async since the operation server 
side may not be completed.
With the procedure you are waiting on the proc to be completed, so you can 
change the server side as much as you want and the client don't care about it. 
and you also get the master failover for free. e.g. In the middle of Create 
Table the master goes down, the client is spinning on isDone(createProcId) the 
backup master complete the create table and the client receive the isDone() = 
true.

> Procedure V2
> 
>
> Key: HBASE-12439
> URL: https://issues.apache.org/jira/browse/HBASE-12439
> Project: HBase
>  Issue Type: New Feature
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Minor
> Attachments: ProcedureV2.pdf, Procedurev2Notification-Bus.pdf
>
>
> Procedure v2 (aka Notification Bus) aims to provide a unified way to build:
> * multi-steps procedure with a rollback/rollforward ability in case of 
> failure (e.g. create/delete table)
> ** HBASE-12070
> * notifications across multiple machines (e.g. ACLs/Labels/Quotas cache 
> updates)
> ** Make sure that every machine has the grant/revoke/label
> ** Enforce "space limit" quota across the namespace
> ** HBASE-10295 eliminate permanent replication zk node
> * procedures across multiple machines (e.g. Snapshots)
> * coordinated long-running procedures (e.g. compactions, splits, ...)
> * Synchronous calls, with the ability to see the state/result in case of 
> failure.
> ** HBASE-11608 sync split
> still work in progress/initial prototype: https://reviews.apache.org/r/27703/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12439) Procedure V2

2015-02-02 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14301738#comment-14301738
 ] 

stack commented on HBASE-12439:
---

Suggest add JIRA number to doc. Suggest a sentence on how PV2 is NOT FATE.  Add 
the work 'idempotent' in around this text "...in a way that each step must be 
able to be executed multiple times (generating the same result) a..." (although 
if a rollback, I suppose it not idempotent?).  The API is Admin.doOperation(). 
Should it be Admin.doProcedure?  (In doc., you start talking about 'operations' 
when you were talking about 'procedures' up to this).  This is good: "The 
Region Server will not persist any state, retry or re­execute any previously 
pending
operation on restart, everything is coordinated by the master".  

On this, "If the master does not receve a response within a timeout, or the 
region was reassigned, it will resend the execution request.", master will just 
retry for ever?

For the TwoPhaseProcedure, would be good to draw out the steps as you have done 
for the OnePhaseProcedure procedure. Would help me figure if I get how this 
'staging' stuff works.

What is this? "(The sync­client implementation...")

Doc is excellent.




> Procedure V2
> 
>
> Key: HBASE-12439
> URL: https://issues.apache.org/jira/browse/HBASE-12439
> Project: HBase
>  Issue Type: New Feature
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Minor
> Attachments: ProcedureV2.pdf, Procedurev2Notification-Bus.pdf
>
>
> Procedure v2 (aka Notification Bus) aims to provide a unified way to build:
> * multi-steps procedure with a rollback/rollforward ability in case of 
> failure (e.g. create/delete table)
> ** HBASE-12070
> * notifications across multiple machines (e.g. ACLs/Labels/Quotas cache 
> updates)
> ** Make sure that every machine has the grant/revoke/label
> ** Enforce "space limit" quota across the namespace
> ** HBASE-10295 eliminate permanent replication zk node
> * procedures across multiple machines (e.g. Snapshots)
> * coordinated long-running procedures (e.g. compactions, splits, ...)
> * Synchronous calls, with the ability to see the state/result in case of 
> failure.
> ** HBASE-11608 sync split
> still work in progress/initial prototype: https://reviews.apache.org/r/27703/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12439) Procedure V2

2014-11-11 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207164#comment-14207164
 ] 

Matteo Bertozzi commented on HBASE-12439:
-

{quote}This will be a huge feature for MTTR and online reliability – why the 
Minor label?{quote}
because it is probably off to a 1.x or 2.x, since there are a bunch of core 
changes (e.g. handlers, maybe assignment, ...)

{quote}client submits a "procedure" that it's interested in observing through 
it's execution progress{quote}
correct, e.g. you call create table you get a procedure id that you can wait 
on, or check for a progress state if there is any exposed by the procedure.

{quote}a procedure is defined as a DAG of sub-procedures that are required to 
complete procedure execution{quote}
correct

{quote}multiple sub-procedures can be executed in parallel{quote}
correct, the example in my mind here is Snapshot or EnableTable where the 
"Enable Procedure" spawn the sub-procedures for assigning each region and they 
can be executed in parallel.

{quote}a sub-procedure can define an action that must be taken on multiple 
hosts{quote}
not sure to understand this one. a sub-procedure is an operation. it can be a 
simple "write to meta" or it can be "send a snapshot request to the RS". If you 
are thinking about stuff like Snapshots or ACL cache updates you basically have 
two components a coordinator on the Master side and an Executor on the RS, so 
the Procedure on the master looks like "send the operation to the RSs and wait 
on ACK do the finalization".

{quote}DAG execution progress is tracked through a storage system{quote}
correct, every time a procedure is executed we write the state out and we can 
resume from that point. So we are in the middle of a create table, the master 
goes down the backup master can start from stepN of that create table that was 
in progress

{quote}procedure execution can be halted and reverted at any time{quote}
yes, you send an abort() to the procedure and if it was started it is rolledback

{quote}completed DAG sub-procedures must be able to roll-back in the event of 
procedure revert{quote}
yes, each step should implement a rollback() and that is called when one of the 
steps failed or the user aborted

{quote}procedure execution is tied to transitions through a persisted state 
machine{quote}
correct. 

{quote}all procedures have the same set of states through which they can 
transition{quote}
not sure what you mean, but the framework has its own fixed set of state 
"runnable/waiting/rollingback/failed/completed" and the user code that 
implement the procedure doesn't care about this stuff. It just care about 
"step1 -> step 2 -> step 3"

{quote}Why implement a separate store? Can we not use a system table for the 
procedure state store?{quote}
The store is just an interface insert/remove, you can use what ever you want. 
The main problem of using a table is that you end up with the chicken egg 
problem.
How can I create a table if I need a table to write the procedure state?
then I can say that if you use just a wal you can just drop the wal once there 
are no procedure running, so you can avoid the compaction overhead and so on.. 
but that is just an optimization the main problem is the startup loop.

> Procedure V2
> 
>
> Key: HBASE-12439
> URL: https://issues.apache.org/jira/browse/HBASE-12439
> Project: HBase
>  Issue Type: New Feature
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Minor
> Attachments: ProcedureV2.pdf
>
>
> Procedure v2 (aka Notification Bus) aims to provide a unified way to build:
> * multi-steps procedure with a rollback/rollforward ability in case of 
> failure (e.g. create/delete table)
> ** HBASE-12070
> * notifications across multiple machines (e.g. ACLs/Labels/Quotas cache 
> updates)
> ** Make sure that every machine has the grant/revoke/label
> ** Enforce "space limit" quota across the namespace
> ** HBASE-10295 eliminate permanent replication zk node
> * procedures across multiple machines (e.g. Snapshots)
> * coordinated long-running procedures (e.g. compactions, splits, ...)
> * Synchronous calls, with the ability to see the state/result in case of 
> failure.
> ** HBASE-11608 sync split
> still work in progress/initial prototype: https://reviews.apache.org/r/27703/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12439) Procedure V2

2014-11-11 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207125#comment-14207125
 ] 

Nick Dimiduk commented on HBASE-12439:
--

This will be a huge feature for MTTR and online reliability -- why the Minor 
label?

I'm not clear on some of the abstractions. Please comment as to whether the 
below observations are true or false.
 - client submits a "procedure" that it's interested in observing through it's 
execution progress
 - a procedure is defined as a DAG of sub-procedures that are required to 
complete procedure execution
 - multiple sub-procedures can be executed in parallel
 - a sub-procedure can define an action that must be taken on multiple hosts
 - DAG execution progress is tracked through a storage system
 - procedure execution can be halted and reverted at any time
 - completed DAG sub-procedures must be able to roll-back in the event of 
procedure revert
 - procedure execution is tied to transitions through a persisted state machine
 - all procedures have the same set of states through which they can transition

Why implement a separate store? Can we not use a system table for the procedure 
state store?

> Procedure V2
> 
>
> Key: HBASE-12439
> URL: https://issues.apache.org/jira/browse/HBASE-12439
> Project: HBase
>  Issue Type: New Feature
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Minor
> Attachments: ProcedureV2.pdf
>
>
> Procedure v2 (aka Notification Bus) aims to provide a unified way to build:
> * multi-steps procedure with a rollback/rollforward ability in case of 
> failure (e.g. create/delete table)
> ** HBASE-12070
> * notifications across multiple machines (e.g. ACLs/Labels/Quotas cache 
> updates)
> ** Make sure that every machine has the grant/revoke/label
> ** Enforce "space limit" quota across the namespace
> ** HBASE-10295 eliminate permanent replication zk node
> * procedures across multiple machines (e.g. Snapshots)
> * coordinated long-running procedures (e.g. compactions, splits, ...)
> * Synchronous calls, with the ability to see the state/result in case of 
> failure.
> ** HBASE-11608 sync split
> still work in progress/initial prototype: https://reviews.apache.org/r/27703/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12439) Procedure V2

2014-11-07 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14202666#comment-14202666
 ] 

stack commented on HBASE-12439:
---

Doc is great. When you have a chance, a few examples would help.

> Procedure V2
> 
>
> Key: HBASE-12439
> URL: https://issues.apache.org/jira/browse/HBASE-12439
> Project: HBase
>  Issue Type: New Feature
>  Components: master
>Affects Versions: 2.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Minor
> Attachments: ProcedureV2.pdf
>
>
> Procedure v2 (aka Notification Bus) aims to provide a unified way to build:
> * multi-steps procedure with a rollback/rollforward ability in case of 
> failure (e.g. create/delete table)
> ** HBASE-12070
> * notifications across multiple machines (e.g. ACLs/Labels/Quotas cache 
> updates)
> ** Make sure that every machine has the grant/revoke/label
> ** Enforce "space limit" quota across the namespace
> ** HBASE-10295 eliminate permanent replication zk node
> * procedures across multiple machines (e.g. Snapshots)
> * coordinated long-running procedures (e.g. compactions, splits, ...)
> * Synchronous calls, with the ability to see the state/result in case of 
> failure.
> ** HBASE-11608 sync split
> still work in progress/initial prototype: https://reviews.apache.org/r/27703/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)