[jira] [Commented] (HBASE-20828) Finish-up AMv2 Design/List of Tenets/Specification of operation

2018-10-23 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661524#comment-16661524
 ] 

stack commented on HBASE-20828:
---

I need to write up what is in here. The subtasks have changed AMv2 for the 
better. Stuff like HBASE-21278 where now we do not try to rollback successful 
procedures but rather the parent needs to schedule compensatory, new Procedures 
needs evangelizing. Ditto the background task that is trying to limit our 
backlog of master proc wals TODO.

> Finish-up AMv2 Design/List of Tenets/Specification of operation
> ---
>
> Key: HBASE-20828
> URL: https://issues.apache.org/jira/browse/HBASE-20828
> Project: HBase
>  Issue Type: Umbrella
>  Components: amv2
>Reporter: stack
>Priority: Major
>
> AMv2 is missing specification. There are too many grey-areas still. Also 
> missing are a concise listing of the tenets of AMv2 operation. Here are some 
> examples:
>  * HBASE-19529 "Handle null states in AM": Asks how we should treat null 
> state in hbase:meta. What does it 'mean'. We seem to treat it differently 
> dependent on context. Needs clarification. [~Apache9] recently asked similar 
> about the meaning of OFFLINE.
>  * Logging needs to have a particular form to help trace Procedure progress; 
> needs a write-up.
> Lets fill in items to address in this umbrella issue. Can address in 
> subissues and produce specification doc too. We have the below but these are 
> mostly (incomplete) description for devs on pv2 and amv2; the specification 
> is missing:
> http://hbase.apache.org/book.html#pv2
> http://hbase.apache.org/book.html#amv2
> (Other areas include addressing what is up w/ rollback -- when, how much, and 
> when it is not appropriate -- as well as recommendation on Procedures 
> coarseness, locking -- is it ok to lock table in alter table procedure for 
> the life of the procedure? -- and so on).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20828) Finish-up AMv2 Design/List of Tenets/Specification of operation

2018-09-12 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612522#comment-16612522
 ] 

stack commented on HBASE-20828:
---

How to address the issue where a STUCK procedure holds up free-up of master 
proc WALs? If a Master crash, replaying all edits though most may belong to 
finished procedures, can keep the  Master occupied for a good amount of time 
reconstructing in-memory AMv2 state. Chatting w/ Duo, could we do a Procedure 
Store that was region based? What would it take? What would the model look 
like? It would at a minimum purge one of the three methods of writing WALs in 
HDFS that we currently have (MasterProcWAL has its own way of doing appends). 
TODO.

> Finish-up AMv2 Design/List of Tenets/Specification of operation
> ---
>
> Key: HBASE-20828
> URL: https://issues.apache.org/jira/browse/HBASE-20828
> Project: HBase
>  Issue Type: Umbrella
>  Components: amv2
>Reporter: stack
>Priority: Major
>
> AMv2 is missing specification. There are too many grey-areas still. Also 
> missing are a concise listing of the tenets of AMv2 operation. Here are some 
> examples:
>  * HBASE-19529 "Handle null states in AM": Asks how we should treat null 
> state in hbase:meta. What does it 'mean'. We seem to treat it differently 
> dependent on context. Needs clarification. [~Apache9] recently asked similar 
> about the meaning of OFFLINE.
>  * Logging needs to have a particular form to help trace Procedure progress; 
> needs a write-up.
> Lets fill in items to address in this umbrella issue. Can address in 
> subissues and produce specification doc too. We have the below but these are 
> mostly (incomplete) description for devs on pv2 and amv2; the specification 
> is missing:
> http://hbase.apache.org/book.html#pv2
> http://hbase.apache.org/book.html#amv2
> (Other areas include addressing what is up w/ rollback -- when, how much, and 
> when it is not appropriate -- as well as recommendation on Procedures 
> coarseness, locking -- is it ok to lock table in alter table procedure for 
> the life of the procedure? -- and so on).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20828) Finish-up AMv2 Design/List of Tenets/Specification of operation

2018-08-23 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590242#comment-16590242
 ] 

stack commented on HBASE-20828:
---

Another idea that has been coming up in discussion (Suggested by [~Apache9]) is 
that a Procedure should be able to say 'holdLock' for a subset of its lifetime. 
The illustration is ModifyTable where it should be able to release the global 
lock when reassigns start. TODO.

> Finish-up AMv2 Design/List of Tenets/Specification of operation
> ---
>
> Key: HBASE-20828
> URL: https://issues.apache.org/jira/browse/HBASE-20828
> Project: HBase
>  Issue Type: Umbrella
>  Components: amv2
>Reporter: stack
>Priority: Major
>
> AMv2 is missing specification. There are too many grey-areas still. Also 
> missing are a concise listing of the tenets of AMv2 operation. Here are some 
> examples:
>  * HBASE-19529 "Handle null states in AM": Asks how we should treat null 
> state in hbase:meta. What does it 'mean'. We seem to treat it differently 
> dependent on context. Needs clarification. [~Apache9] recently asked similar 
> about the meaning of OFFLINE.
>  * Logging needs to have a particular form to help trace Procedure progress; 
> needs a write-up.
> Lets fill in items to address in this umbrella issue. Can address in 
> subissues and produce specification doc too. We have the below but these are 
> mostly (incomplete) description for devs on pv2 and amv2; the specification 
> is missing:
> http://hbase.apache.org/book.html#pv2
> http://hbase.apache.org/book.html#amv2
> (Other areas include addressing what is up w/ rollback -- when, how much, and 
> when it is not appropriate -- as well as recommendation on Procedures 
> coarseness, locking -- is it ok to lock table in alter table procedure for 
> the life of the procedure? -- and so on).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20828) Finish-up AMv2 Design/List of Tenets/Specification of operation

2018-07-26 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16559208#comment-16559208
 ] 

stack commented on HBASE-20828:
---

Explain that there are "two dimensions" of exclusivity when we talk about 
Procedure execution ([~Apache9]'s words). There is exclusion around the entity 
that the Procedure is working on -- the Region, etc. -- and for this the 
framework has a locking facility (hasLock, etc.). But then there is also 
nothing in the framework currently to prevent multi-threaded execution of 
steps, possible if one worker thread execution runs into a "suspend". See 
HBASE-20939 for discussion. The case is rare. Ideally the framework would 
ensure single-threaded execution but for now, until we get more experience, see 
the 'trick' in HBASE-20939.

> Finish-up AMv2 Design/List of Tenets/Specification of operation
> ---
>
> Key: HBASE-20828
> URL: https://issues.apache.org/jira/browse/HBASE-20828
> Project: HBase
>  Issue Type: Umbrella
>  Components: amv2
>Reporter: stack
>Priority: Major
>
> AMv2 is missing specification. There are too many grey-areas still. Also 
> missing are a concise listing of the tenets of AMv2 operation. Here are some 
> examples:
>  * HBASE-19529 "Handle null states in AM": Asks how we should treat null 
> state in hbase:meta. What does it 'mean'. We seem to treat it differently 
> dependent on context. Needs clarification. [~Apache9] recently asked similar 
> about the meaning of OFFLINE.
>  * Logging needs to have a particular form to help trace Procedure progress; 
> needs a write-up.
> Lets fill in items to address in this umbrella issue. Can address in 
> subissues and produce specification doc too. We have the below but these are 
> mostly (incomplete) description for devs on pv2 and amv2; the specification 
> is missing:
> http://hbase.apache.org/book.html#pv2
> http://hbase.apache.org/book.html#amv2
> (Other areas include addressing what is up w/ rollback -- when, how much, and 
> when it is not appropriate -- as well as recommendation on Procedures 
> coarseness, locking -- is it ok to lock table in alter table procedure for 
> the life of the procedure? -- and so on).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20828) Finish-up AMv2 Design/List of Tenets/Specification of operation

2018-07-21 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551734#comment-16551734
 ] 

stack commented on HBASE-20828:
---

More:

Currently Master does not run the cluster shutdown -- i.e. close of regions -- 
but rather RS notice the cluster shutdown and do their own closing. The Master 
gets notice of CLOSE but since it did not initiate the CLOSE, it complains it 
knows nothing about the Procedure and then just ignores it. Confusing to an 
operator.

And this stuff from HBASE-20846 needs adding...

1. Make hasLock method final, and add a locked field in Procedure to record 
whether we have the lock. We will set it to true in doAcquireLock and to false 
in doReleaseLock. The sub procedures do not need to manage it any more.

2. Also added a locked field in the proto message. When storing, the field will 
be set according to the return value of hasLock. And when loading, there is a 
new field in Procedure called lockedWhenLoading. We will set it to true if the 
locked field in proto message is true.

3. The reason why we can not set the locked field directly to true by calling 
doAcquireLock is that, during initialization, most procedures need to wait 
until master is initialized. So the solution here is that, we introduced a new 
method called waitInitialized in Procedure, and move the wait master 
initialized related code from acquireLock to this method. And we added a 
restoreLock method to Procedure, if lockedWhenLoading is true, we will call the 
acquireLock to get the lock, but do not set locked to true. And later when we 
call doAcquireLock and pass the waitInitialized check, we will test 
lockedWhenLoading, if it is true, when we just set the locked field to true and 
return, without actually calling the acquireLock method since we have already 
called it once.



> Finish-up AMv2 Design/List of Tenets/Specification of operation
> ---
>
> Key: HBASE-20828
> URL: https://issues.apache.org/jira/browse/HBASE-20828
> Project: HBase
>  Issue Type: Umbrella
>  Components: amv2
>Reporter: stack
>Priority: Major
>
> AMv2 is missing specification. There are too many grey-areas still. Also 
> missing are a concise listing of the tenets of AMv2 operation. Here are some 
> examples:
>  * HBASE-19529 "Handle null states in AM": Asks how we should treat null 
> state in hbase:meta. What does it 'mean'. We seem to treat it differently 
> dependent on context. Needs clarification. [~Apache9] recently asked similar 
> about the meaning of OFFLINE.
>  * Logging needs to have a particular form to help trace Procedure progress; 
> needs a write-up.
> Lets fill in items to address in this umbrella issue. Can address in 
> subissues and produce specification doc too. We have the below but these are 
> mostly (incomplete) description for devs on pv2 and amv2; the specification 
> is missing:
> http://hbase.apache.org/book.html#pv2
> http://hbase.apache.org/book.html#amv2
> (Other areas include addressing what is up w/ rollback -- when, how much, and 
> when it is not appropriate -- as well as recommendation on Procedures 
> coarseness, locking -- is it ok to lock table in alter table procedure for 
> the life of the procedure? -- and so on).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20828) Finish-up AMv2 Design/List of Tenets/Specification of operation

2018-07-18 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547915#comment-16547915
 ] 

stack commented on HBASE-20828:
---

Another idea (From [~Apache9] down in HBASE-20878: "Maybe we could introduce a 
state called ABNORMALLY_CLOSED, which indicates that the region will be 
processed by SCP."

Also, explain lasthost in hbase:meta. See comment in HBASE-20878 "The lastHost 
should not be used for critical condition... (See HBASE-20792)". Or, explain 
all fields in hbase:meta and how they change -- after consideration (e.g. in 
the case of info:sn/lasthost, the prescription is not to rely on it).



> Finish-up AMv2 Design/List of Tenets/Specification of operation
> ---
>
> Key: HBASE-20828
> URL: https://issues.apache.org/jira/browse/HBASE-20828
> Project: HBase
>  Issue Type: Umbrella
>  Components: amv2
>Reporter: stack
>Priority: Major
>
> AMv2 is missing specification. There are too many grey-areas still. Also 
> missing are a concise listing of the tenets of AMv2 operation. Here are some 
> examples:
>  * HBASE-19529 "Handle null states in AM": Asks how we should treat null 
> state in hbase:meta. What does it 'mean'. We seem to treat it differently 
> dependent on context. Needs clarification. [~Apache9] recently asked similar 
> about the meaning of OFFLINE.
>  * Logging needs to have a particular form to help trace Procedure progress; 
> needs a write-up.
> Lets fill in items to address in this umbrella issue. Can address in 
> subissues and produce specification doc too. We have the below but these are 
> mostly (incomplete) description for devs on pv2 and amv2; the specification 
> is missing:
> http://hbase.apache.org/book.html#pv2
> http://hbase.apache.org/book.html#amv2
> (Other areas include addressing what is up w/ rollback -- when, how much, and 
> when it is not appropriate -- as well as recommendation on Procedures 
> coarseness, locking -- is it ok to lock table in alter table procedure for 
> the life of the procedure? -- and so on).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20828) Finish-up AMv2 Design/List of Tenets/Specification of operation

2018-07-05 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533750#comment-16533750
 ] 

stack commented on HBASE-20828:
---

Would have to be more than a boolean so can say whether exclusive or not and 
then there are the locks that are kept for the life of the procedure vs just 
for each procedure step. The way nonces are done by the framework automatically 
is kinda nice. Would be cool if could do same for locks. See 
ProcedureExecutor#loadProcedures where it reads nonce from each Procedure... 
Framework does the serialization of nonces for the Procedure in background.

> Finish-up AMv2 Design/List of Tenets/Specification of operation
> ---
>
> Key: HBASE-20828
> URL: https://issues.apache.org/jira/browse/HBASE-20828
> Project: HBase
>  Issue Type: Umbrella
>  Components: amv2
>Reporter: stack
>Priority: Major
>
> AMv2 is missing specification. There are too many grey-areas still. Also 
> missing are a concise listing of the tenets of AMv2 operation. Here are some 
> examples:
>  * HBASE-19529 "Handle null states in AM": Asks how we should treat null 
> state in hbase:meta. What does it 'mean'. We seem to treat it differently 
> dependent on context. Needs clarification. [~Apache9] recently asked similar 
> about the meaning of OFFLINE.
>  * Logging needs to have a particular form to help trace Procedure progress; 
> needs a write-up.
> Lets fill in items to address in this umbrella issue. Can address in 
> subissues and produce specification doc too. We have the below but these are 
> mostly (incomplete) description for devs on pv2 and amv2; the specification 
> is missing:
> http://hbase.apache.org/book.html#pv2
> http://hbase.apache.org/book.html#amv2
> (Other areas include addressing what is up w/ rollback -- when, how much, and 
> when it is not appropriate -- as well as recommendation on Procedures 
> coarseness, locking -- is it ok to lock table in alter table procedure for 
> the life of the procedure? -- and so on).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20828) Finish-up AMv2 Design/List of Tenets/Specification of operation

2018-07-05 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533637#comment-16533637
 ] 

Duo Zhang commented on HBASE-20828:
---

[~stack] I do not think it will change the master startup too much, just 
restore the locks before starting workers. And for serialize the locks, I think 
store the action is enough, i.e, just add a boolean field in the procedure 
proto message to indicate that whether it has the lock. If it is true after 
loading, we call acquireLock on the procedure? Oh, there maybe a problem that 
in the normal execution of a procedure we will always call acquireLock, which 
will be a redundant one now... Need some hack...

> Finish-up AMv2 Design/List of Tenets/Specification of operation
> ---
>
> Key: HBASE-20828
> URL: https://issues.apache.org/jira/browse/HBASE-20828
> Project: HBase
>  Issue Type: Umbrella
>  Components: amv2
>Reporter: stack
>Priority: Major
>
> AMv2 is missing specification. There are too many grey-areas still. Also 
> missing are a concise listing of the tenets of AMv2 operation. Here are some 
> examples:
>  * HBASE-19529 "Handle null states in AM": Asks how we should treat null 
> state in hbase:meta. What does it 'mean'. We seem to treat it differently 
> dependent on context. Needs clarification. [~Apache9] recently asked similar 
> about the meaning of OFFLINE.
>  * Logging needs to have a particular form to help trace Procedure progress; 
> needs a write-up.
> Lets fill in items to address in this umbrella issue. Can address in 
> subissues and produce specification doc too. We have the below but these are 
> mostly (incomplete) description for devs on pv2 and amv2; the specification 
> is missing:
> http://hbase.apache.org/book.html#pv2
> http://hbase.apache.org/book.html#amv2
> (Other areas include addressing what is up w/ rollback -- when, how much, and 
> when it is not appropriate -- as well as recommendation on Procedures 
> coarseness, locking -- is it ok to lock table in alter table procedure for 
> the life of the procedure? -- and so on).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20828) Finish-up AMv2 Design/List of Tenets/Specification of operation

2018-07-05 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533621#comment-16533621
 ] 

stack commented on HBASE-20828:
---

bq.  ...we do not restore the locks when loading procedures. 
bq. Seems like we need to serialize the locks the procedure has acquired. After 
master restarts, we can execute exactly those procedures which are running 
before master restart

Good one [~allan163]. Dropping locks over master fail is a bad bug. Need basic 
fix for branch-2.0. Could do as we do nonces... serializing and restoring post 
crash as it does? Putting up locks on WAL replay will change Master startup 
character but should make it more robust.



> Finish-up AMv2 Design/List of Tenets/Specification of operation
> ---
>
> Key: HBASE-20828
> URL: https://issues.apache.org/jira/browse/HBASE-20828
> Project: HBase
>  Issue Type: Umbrella
>  Components: amv2
>Reporter: stack
>Priority: Major
>
> AMv2 is missing specification. There are too many grey-areas still. Also 
> missing are a concise listing of the tenets of AMv2 operation. Here are some 
> examples:
>  * HBASE-19529 "Handle null states in AM": Asks how we should treat null 
> state in hbase:meta. What does it 'mean'. We seem to treat it differently 
> dependent on context. Needs clarification. [~Apache9] recently asked similar 
> about the meaning of OFFLINE.
>  * Logging needs to have a particular form to help trace Procedure progress; 
> needs a write-up.
> Lets fill in items to address in this umbrella issue. Can address in 
> subissues and produce specification doc too. We have the below but these are 
> mostly (incomplete) description for devs on pv2 and amv2; the specification 
> is missing:
> http://hbase.apache.org/book.html#pv2
> http://hbase.apache.org/book.html#amv2
> (Other areas include addressing what is up w/ rollback -- when, how much, and 
> when it is not appropriate -- as well as recommendation on Procedures 
> coarseness, locking -- is it ok to lock table in alter table procedure for 
> the life of the procedure? -- and so on).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20828) Finish-up AMv2 Design/List of Tenets/Specification of operation

2018-07-05 Thread Allan Yang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533310#comment-16533310
 ] 

Allan Yang commented on HBASE-20828:


Seems like we need to serialize the locks the procedure has acquired. After 
master restarts, we can execute exactly those procedures which are running 
before master restart

> Finish-up AMv2 Design/List of Tenets/Specification of operation
> ---
>
> Key: HBASE-20828
> URL: https://issues.apache.org/jira/browse/HBASE-20828
> Project: HBase
>  Issue Type: Umbrella
>  Components: amv2
>Reporter: stack
>Priority: Major
>
> AMv2 is missing specification. There are too many grey-areas still. Also 
> missing are a concise listing of the tenets of AMv2 operation. Here are some 
> examples:
>  * HBASE-19529 "Handle null states in AM": Asks how we should treat null 
> state in hbase:meta. What does it 'mean'. We seem to treat it differently 
> dependent on context. Needs clarification. [~Apache9] recently asked similar 
> about the meaning of OFFLINE.
>  * Logging needs to have a particular form to help trace Procedure progress; 
> needs a write-up.
> Lets fill in items to address in this umbrella issue. Can address in 
> subissues and produce specification doc too. We have the below but these are 
> mostly (incomplete) description for devs on pv2 and amv2; the specification 
> is missing:
> http://hbase.apache.org/book.html#pv2
> http://hbase.apache.org/book.html#amv2
> (Other areas include addressing what is up w/ rollback -- when, how much, and 
> when it is not appropriate -- as well as recommendation on Procedures 
> coarseness, locking -- is it ok to lock table in alter table procedure for 
> the life of the procedure? -- and so on).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20828) Finish-up AMv2 Design/List of Tenets/Specification of operation

2018-07-05 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533292#comment-16533292
 ] 

Duo Zhang commented on HBASE-20828:
---

Oh seems the above way to restore lock is still not enough, think of a 
procedure which holdLock returns true but no sub procedure, and it is in the 
middle of the execution when master restarts, we also need to restore its lock 
otherwise there maybe problem if other procedures can acquire the lock before 
it...

> Finish-up AMv2 Design/List of Tenets/Specification of operation
> ---
>
> Key: HBASE-20828
> URL: https://issues.apache.org/jira/browse/HBASE-20828
> Project: HBase
>  Issue Type: Umbrella
>  Components: amv2
>Reporter: stack
>Priority: Major
>
> AMv2 is missing specification. There are too many grey-areas still. Also 
> missing are a concise listing of the tenets of AMv2 operation. Here are some 
> examples:
>  * HBASE-19529 "Handle null states in AM": Asks how we should treat null 
> state in hbase:meta. What does it 'mean'. We seem to treat it differently 
> dependent on context. Needs clarification. [~Apache9] recently asked similar 
> about the meaning of OFFLINE.
>  * Logging needs to have a particular form to help trace Procedure progress; 
> needs a write-up.
> Lets fill in items to address in this umbrella issue. Can address in 
> subissues and produce specification doc too. We have the below but these are 
> mostly (incomplete) description for devs on pv2 and amv2; the specification 
> is missing:
> http://hbase.apache.org/book.html#pv2
> http://hbase.apache.org/book.html#amv2
> (Other areas include addressing what is up w/ rollback -- when, how much, and 
> when it is not appropriate -- as well as recommendation on Procedures 
> coarseness, locking -- is it ok to lock table in alter table procedure for 
> the life of the procedure? -- and so on).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20828) Finish-up AMv2 Design/List of Tenets/Specification of operation

2018-07-04 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16533239#comment-16533239
 ] 

Duo Zhang commented on HBASE-20828:
---

[~allan163] has found a problem when restarting master, that we do not restore 
the locks when loading procedures. And then we  found that, the assumption in 
MasterProcedureScheduler.waitRegions is not correct, as the parent procedure of 
RegionTransitionProcedure may not have the table lock(think of SCP).

So here I think there are two problems which need to be fixed.

First is that, we need to restore the locks when loading procedures. A first 
thought is that, after loading all the procedures and the procedure execution 
stacks, we scan all the procedures which have sub procedures, and then for 
every stack, we start from the root procedure, test the holdLock method, if it 
returns true, then we will call the acquireLock method of it to get the lock. 
Not sure if there are still corner cases. [~allan163] PTAL.

And for the waitRegions method, I think we should apply the patch in 
HBASE-20846, i.e, always try to acquire the shared lock. But the implementation 
of procedure lock needs a bit modification. If the parent procedure already 
held the exclusive lock, instead of returning false to let the procedure wait, 
we should return true to let the procedure go on. The locks which have already 
been held by parent procedures should also be considered as held by sub 
procedures. This is OK as we can make sure that the parent procedure will not 
release the lock before the sub procedures, as it can only be executed again 
after all the sub procedures have finished.

> Finish-up AMv2 Design/List of Tenets/Specification of operation
> ---
>
> Key: HBASE-20828
> URL: https://issues.apache.org/jira/browse/HBASE-20828
> Project: HBase
>  Issue Type: Umbrella
>  Components: amv2
>Reporter: stack
>Priority: Major
>
> AMv2 is missing specification. There are too many grey-areas still. Also 
> missing are a concise listing of the tenets of AMv2 operation. Here are some 
> examples:
>  * HBASE-19529 "Handle null states in AM": Asks how we should treat null 
> state in hbase:meta. What does it 'mean'. We seem to treat it differently 
> dependent on context. Needs clarification. [~Apache9] recently asked similar 
> about the meaning of OFFLINE.
>  * Logging needs to have a particular form to help trace Procedure progress; 
> needs a write-up.
> Lets fill in items to address in this umbrella issue. Can address in 
> subissues and produce specification doc too. We have the below but these are 
> mostly (incomplete) description for devs on pv2 and amv2; the specification 
> is missing:
> http://hbase.apache.org/book.html#pv2
> http://hbase.apache.org/book.html#amv2
> (Other areas include addressing what is up w/ rollback -- when, how much, and 
> when it is not appropriate -- as well as recommendation on Procedures 
> coarseness, locking -- is it ok to lock table in alter table procedure for 
> the life of the procedure? -- and so on).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20828) Finish-up AMv2 Design/List of Tenets/Specification of operation

2018-07-01 Thread Duo Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529295#comment-16529295
 ] 

Duo Zhang commented on HBASE-20828:
---

Talked with [~stack] offline with the high level design of AMv2, this is what 
we think is useful to make the implementation cleaner

1. Region will be created with CLOSED state.
2. For a region with a state other than OPEN state, there will be a RIT 
associated with it, and it will finally be transited to OPEN state(unless we 
are disabling the table)
3. Regions for a disabled table are in CLOSED state.
4. The typical transiting path: OPEN -> CLOSING -> CLOSED -> OPENING -> OPEN.
5. If a server is crashed, all regions on it will be transited to CLOSED 
directly without CLOSING state.
6. A RIT procedure will never fail, unless the target server is crashed, i.e, 
only SCP can break the execution of RIT procedure.

#6 is the most difficult part here. What in my mind is that, let's get rid of 
the AssignProcedure, UnassignProcedure, MoveRegionProcedure, and only introduce 
a single RegionTransition procedure. It can cover all the transiting life cycle 
of a region, and can also start/stop at a particular state, i.e, for creating a 
table or SCP, we could start from CLOSED state directly, and if we are 
disabling a table, we could let it stop at CLOSED state.

The advantage here will be that, we can make sure that there is only one RIT 
procedure for a region. In the old time, MRP is not a RIT so we may miss it... 
And also, the logic for breaking the execution of the RIT in SCP will be 
easier, if there is already a RIT for it, just tell it what to do, if not, 
schedule one. We do not need to fear that if there is a MRP and after we fail 
the UnassignProcedure it will schedule a AssignProcedure soon and cause 
something wrong...


> Finish-up AMv2 Design/List of Tenets/Specification of operation
> ---
>
> Key: HBASE-20828
> URL: https://issues.apache.org/jira/browse/HBASE-20828
> Project: HBase
>  Issue Type: Umbrella
>  Components: amv2
>Reporter: stack
>Priority: Major
>
> AMv2 is missing specification. There are too many grey-areas still. Also 
> missing are a concise listing of the tenets of AMv2 operation. Here are some 
> examples:
>  * HBASE-19529 "Handle null states in AM": Asks how we should treat null 
> state in hbase:meta. What does it 'mean'. We seem to treat it differently 
> dependent on context. Needs clarification. [~Apache9] recently asked similar 
> about the meaning of OFFLINE.
>  * Logging needs to have a particular form to help trace Procedure progress; 
> needs a write-up.
> Lets fill in items to address in this umbrella issue. Can address in 
> subissues and produce specification doc too. We have the below but these are 
> mostly (incomplete) description for devs on pv2 and amv2; the specification 
> is missing:
> http://hbase.apache.org/book.html#pv2
> http://hbase.apache.org/book.html#amv2
> (Other areas include addressing what is up w/ rollback -- when, how much, and 
> when it is not appropriate -- as well as recommendation on Procedures 
> coarseness, locking -- is it ok to lock table in alter table procedure for 
> the life of the procedure? -- and so on).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)