[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2015-09-01 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14725774#comment-14725774
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

Long live AssignmentManager! :)

> Generic framework for Master-coordinated tasks
> --
>
> Key: HBASE-5487
> URL: https://issues.apache.org/jira/browse/HBASE-5487
> Project: HBase
>  Issue Type: New Feature
>  Components: master, regionserver, Zookeeper
>Affects Versions: 0.94.0
>Reporter: Mubarak Seyed
>Priority: Critical
> Attachments: Entity management in Master - part 1.pdf, Entity 
> management in Master - part 1.pdf, Is the FATE of Assignment Manager 
> FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, 
> hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf
>
>
> Need a framework to execute master-coordinated tasks in a fault-tolerant 
> manner. 
> Master-coordinated tasks such as online-scheme change and delete-range 
> (deleting region(s) based on start/end key) can make use of this framework.
> The advantages of framework are
> 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
> master-coordinated tasks
> 2. Ability to abstract the common functions across Master -> ZK and RS -> ZK
> 3. Easy to plugin new master-coordinated tasks without adding code to core 
> components



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-12-17 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850594#comment-13850594
 ] 

Jonathan Hsieh commented on HBASE-5487:
---

Having snapshots succeed while splits, merges and alters can be handled with 
the open synchronization point.  Having snapshots succeed through failovers 
would require some major revamping.  We can file that issue -- roughly it would 
be coordinating based on region name instead of region server name. 
(non-trivial work).

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Entity 
 management in Master - part 1.pdf, Is the FATE of Assignment Manager 
 FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, 
 hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-12-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849396#comment-13849396
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

That's an interesting one. Given that snapshots by default have no guarantees 
wrt consistent writes between regions (or do they), seems like snapshot should 
get the latest schema in case of concurrent alter. Is there any consideration 
(other the arguably implementation issues of not recovering from close-open) 
that would prevent that? For consistent snapshots presumably the schema can be 
snapshotted first, I am assuming they don't stop the world and just take 
seqId/mvcc/ts or something, so the newer values with new schema will just not 
exist.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Entity 
 management in Master - part 1.pdf, Is the FATE of Assignment Manager 
 FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, 
 hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-12-16 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849403#comment-13849403
 ] 

Jonathan Hsieh commented on HBASE-5487:
---

The problem isn't that you would get snapshots with inconsistent schemas if the 
two operations were issued concurrently.  It is that open is async and outside 
the table write lock which means  the snapshot would fail because the region 
may no have been open. 

This is a particular case where we would want the open routines to act 
synchronously with table alters and split daugher region opens (both open 
before table lock released and snapshot can happen).



 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Entity 
 management in Master - part 1.pdf, Is the FATE of Assignment Manager 
 FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, 
 hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-12-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849452#comment-13849452
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

IMHO this, in case of opens, promotes not being fault tolerant. In large 
clusters you cannot get around servers failing and regions closing and 
reopening. Snapshot should just be able to ride over that. Splits are more 
interesting.
Esp. if snapshots are used more (MR over snapshots), it may be nonviable to 
prevent splits and other operations for the duration of every snapshot, alter, 
...

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Entity 
 management in Master - part 1.pdf, Is the FATE of Assignment Manager 
 FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, 
 hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-12-16 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849682#comment-13849682
 ] 

Andrew Purtell commented on HBASE-5487:
---

MR over snapshots is already a terrible idea from a security perspective. 

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Entity 
 management in Master - part 1.pdf, Is the FATE of Assignment Manager 
 FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, 
 hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-12-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849732#comment-13849732
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

Yet, it's a very good idea from perf perspective, and logical given that many 
large MR jobs don't need realtime data. Snapshots can still be secured, and 
table-level granularity is sufficient for most cases I'd suspect.
Regardless, it was just an example here.
MR over snapshots can be discussed HBASE-8369 :)

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Entity 
 management in Master - part 1.pdf, Is the FATE of Assignment Manager 
 FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, 
 hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-12-16 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849738#comment-13849738
 ] 

Andrew Purtell commented on HBASE-5487:
---

bq. Snapshots can still be secured

This is debatable, and that is my point for bringing it up here. All of the 
enterprise customers I interact with universally want more than table-level 
granularity, which is why we spent so much time on cell granularity features 
recently - all of which are totally defeated by MR over snapshots. 

Bringing up MR snapshots as technical justification for other arguments needs 
qualification that MR over snapshots itself may have limited applicability.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Entity 
 management in Master - part 1.pdf, Is the FATE of Assignment Manager 
 FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, 
 hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-12-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849798#comment-13849798
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

There are other justifications... my point is that having a lock over 
distributed, lengthy operations on tables, esp. with region-level component 
blocking table-level ops also, is the king of all epic locks, and can cause 
lots of problems, esp. in large clusters. Snapshot is just one example.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Entity 
 management in Master - part 1.pdf, Is the FATE of Assignment Manager 
 FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, 
 hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-12-12 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846861#comment-13846861
 ] 

Jonathan Hsieh commented on HBASE-5487:
---

Matteo and Aleks bring up an interesting case that any new master design should 
handle.  HBASE-10136

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Entity 
 management in Master - part 1.pdf, Is the FATE of Assignment Manager 
 FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, 
 hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-12-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13841811#comment-13841811
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

Ah well, I never got to part 2. Did you guys make progress on this? I may have 
time to resurrect this again soon.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Entity 
 management in Master - part 1.pdf, Is the FATE of Assignment Manager 
 FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, 
 hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-19 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799905#comment-13799905
 ] 

Jonathan Hsieh commented on HBASE-5487:
---

[~saint@gmail.com]  acked. 

Let's post design docs here, and move discussions comparing them to the mailing 
list.  

[~sershe] Let's name threads prefixed with [hbase-5487] in subject, and maybe 
rename subject lines if we get into a more focused discussion that warrants it 
own thread (it one part gets long), and in general reply inline.  (I found this 
interesting http://en.wikipedia.org/wiki/Posting_style).  

I'll start by copying and pasting unresolved parts of the response-reply above 
to the dev mailing list.


 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, 
 hbckMasterV2b-long.pdf, hbckMasterV2-long.pdf, Is the FATE of Assignment 
 Manager FATE.pdf, Region management in Master5.docx, Region management in 
 Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-18 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799465#comment-13799465
 ] 

Devaraj Das commented on HBASE-5487:


Quick comments:
1. Master knows of all external updates to the system store - Are there such 
updates happening without master's knowledge
2. I presume once the client is told an operation is accepted, it would be 
saved/queued somewhere so even if a different node picks up the master's 
duties, it can execute the operation. Related to that is that the master should 
be able to get back with the correct return code for the operation even in the 
case of fail-overs. Also, the master could have triggered some operations 
like shutdown handling that should be completed.
3. I think we should support asynchronous operations (submit an operation and 
check periodically or something). There is no guarantee when a certain 
operation will complete especially when the operation requires co-ordination 
with other nodes and/or the node is falling behind in executing operations. We 
shouldn't force the model to be synchronous (we do not want to hold up precious 
node resources which we will in synchronous mode).
4. Maybe, we should explicitly state handling cases where the master sends a 
region operation to a regionserver and the regionserver doesn't get back within 
some timeout, as one of the requirements. Fencing the regionserver etc are the 
possible actions when this happens.
5. Should we fail-fast on the client side in case of conflicts? For example, if 
a client issued drop table and this operation is in progress. Another client 
comes in and says create table with the same name. We should allow clients to 
read the store without going through the master.
6. Wondering whether we need to differentiate priorities/ordering/etc. for 
operations like move region initiated by the master/balancer versus initiated 
by the user. Who wins, etc. These operations are advanced and won't be 
commonplace but worth calling it out?

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Region 
 management in Master5.docx, Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-18 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799576#comment-13799576
 ] 

Jonathan Hsieh commented on HBASE-5487:
---

Yesterday, I shared with sergey and some of the folks interested this a draft 
of the design I've been working on (I'll call it the hbck-master) and a list of 
questions related to Sergey's design.  Since sergey's has got master5 in the 
name of the doc I'll refer to it as master5.  He's answered some question in 
email but we should do technical discussions out here.  We'll be working 
together to hash out holes in each others designs and potentially merge 
designs.  



I have a lot of questions.  I'll hit the big questions first. Also would i be 
possible to put a version of this up as gdoc so we can point out nits and 
places that need minor clarification?   (I have a marked up physical copy 
version of the doc, would be easier to provide feedback).

Main Concerns:

What is a failure and how do you react to failures? I think the master5 design 
needs to spend more effort  to considering failure and recovery cases. I claim 
there are 4 types of responses from a networked IO operation -  two states we 
normally deal with ack successful, ack failed (nack) and unknown due to timeout 
 that succeeded (timeout success) and unknown due to timeout that failed 
(timeout failed). We have historically missed the last two timeout cases or 
assumed timeout means failure nack. It seems that master5 makes the same 
assumptions. 

I'm very concerned about what we need to do to invalidate information cached RS 
information at clients in the case of hang, and that will violate the isolation 
guarantees that we claim to provide.  I really want a slice in-depth failure 
handling case analysis including the master with cached rs assignments for move 
and something more complicated such as split or alter.

I really want more invariant specified for the FSM states.  e.g. if a region is 
in state X, does it have a row in meta? does have data on the FS? is it open on 
another region? is it open on only one region? I think having 8 pages of tables 
at the back of the master5 doc can be more concise and precise which will help 
us get attempt to prove correctness.  

Clarification questions:

1)  State update coordination.  What is a state updates from the outside  Do 
RS's initiate splitting on their own?  Maybe a picture would help so we can 
figure out if it is similar or different from hbck-master's?

2) Single point of truth.  What is this truth? what the user specficied 
actions?  what the rs's are reporting?  the last state we were confirmed to be 
at? hbck-master tries to define what single point of truth means by defining 
intended, current, and actual state data with durability properties on each 
kind. What do clients look at who modifies what? 

3) Table record: if regions is out of date, it should be closed and reopened. 
It is not clear in master5 how regionservers find out that they are out of 
date. Moreover, how do clients talking to those RS's with stale versions know 
they are going to the correct RS especially in the face of RS failures due to 
timeout?

4) region record: transition states.  Shouldn't be defined as part of the 
region record? (This is really similar to hbck-masters current state and 
intended state. )

5) Note on user operations: the forgetting thing is scary to me -- in your move 
split example, what happens if an RS reads state that is forgotten?  

6) table state machine. how do we guarantee clients are not writing to against 
out of date region versions? (in hang situations, regions could be open on 
multple places -- the hung RS and the new RS the region was assigned to and 
successfully opened on)  

7) region state machine.  Earlier draft hand splitting and merge cases.  Are 
they elided in master5 or are not present any more. How would this get extended 
handle jeffrey's distributed log replay/fast write recovery feature?  

8) logical interactions:  sounds like master5 allows concurrent operations in 
specfiic regions and and specfiic table.  (e.g. it will allow moves and splits 
and merges on the same region).  hbck-master (though not fully documented) only 
allows certain region transitions when the table is enabled or if the table is 
disabled.  Are we sure we don't get into race conditions?  What happens if 
disable gets issued -- its possible for someone to reopens the region and for 
old clients to continue writing to it even though it is closed?

nit. 9) in cursive mean in italics. :) 

10) The table operations section have tables which I believe are the actions 
between FSM states in the table or region fsms.  Is this correct?  Can the 
edges be labeled to describe which steps these transitions correspond to?

Short doc:
nit: Design Constraints, code should: Have AM logic isolated from the 
persistent storage of state.  

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-18 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799619#comment-13799619
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

Answers lifted from email also (some fixes + one answer was modified due to 
clarification here :)).

bq.  What is a failure and how do you react to failures? I think the master5 
design needs to spend more effort  to considering failure and recovery cases. I 
claim there are 4 types of responses from a networked IO operation -  two 
states we normally deal with ack successful, ack failed (nack) and unknown due 
to timeout  that succeeded (timeout success) and unknown due to timeout that 
failed (timeout failed). We have historically missed the last two cases and 
they aren't considered in the master5 design. 

There are a few considerations. Let me examine if there are other cases than 
these.
I am assuming the collocated table, which should reduce such cases for state 
(probably, if collocated table cannot be written reliably, master must 
stop-the-world and fail over).
When RS contacts master to do state update, it errs on the side of caution - no 
state update, no open region (or split).
Thus, except for the case of multiple masters running, we can always assume RS 
didn't online the region if we don't know about it.
Then, for messages to RS, see Note on messages; they are idempotent so they 
can always be resent.

bq.  1) State update coordination.  What is a state updates from the outside  
Do RS's initiate splitting on their own?  Maybe a picture would help so we can 
figure out if it is similar or different from hbck-master's?

Yes, these are RS messages. They are mentioned in some operation descriptions 
in part 2 - opening-opened, closing-closed; splitting, etc.

bq.  2) Single point of truth.  hbck-master tries to define what single point 
of truth means by defining intended, current, and actual state data with 
durability properties on each kind. What do clients look at who modifies what? 

Sorry, don't understand the question. I mean single source of truth mainly 
about what is going on with the region; it is described in design 
considerations.
I like the idea of intended state, however without more detailed reading I am 
not sure how it works for multiple ops e.g. master recovering the region while 
the user intends to split it, so the split should be executed after it's opened.

bq.  3) Table record: if regions is out of date, it should be closed and 
reopened. It is not clear in master5 how regionservers find out that they are 
out of date. Moreover, how do clients talking to those RS's with stale versions 
know they are going to the correct RS especially in the face of RS failures due 
to timeout?

On alter (and startup if failed), master tries to reopen all regions that are 
out of date.
Regions that are not opened with either pick up the new version when they are 
opened, or (e.g. if they are now Opening with old version) master discovers 
they are out of date when they are transitioned to Opened by RS, and reopens 
them again.

As for any case of alter on enabled table, there are no guarantees for clients.
To provide these w/o disable/enable (or logical equivalent of coordinating all 
close-s and open-s), one would need some form of version-time-travel, or 
waiting for versions, or both.

bq.  4) region record: transition states.  This is really similar to 
hbck-masters current state and intended state.  Shouldn't be defined as part of 
the region record?

I mention somewhere that could be done. One thing is that if several paths are 
possible between states, it's useful to know which is taken.
But do note that I store user intent separately from what is currently going 
on, so they are not exactly similar as far as I see.

bq.  5) Note on user operations: the forgetting thing is scary to me -- in your 
move split example, what happens if an RS reads state that is forgotten?  

I think my description of this might be too vague. State is not forgotten; 
previous intent is forgotten. I.e. if user does several operations in order 
that conflict (e.g. split and then merge), the first one will be canceled 
(safely :)).
Also, RS does not read state as a guideline to what needs to be done.

bq.  6) table state machine. how do we guarantee clients are writing from the 
correct version in the in failures?

The intent is to fence the WAL for region server, the way we do now. One could 
also use other mechanism.
Perhaps I could specify it more clearly; I think the problem of making sure RS 
is dead is nearly orthogonal.
In my model, due to how opening region is committed to opened, we can only be 
unsure when the region is in Opened state (or similar states such as Splitting 
which are not present in my current version, but will be added).
In that case, in absence of normal transition, we cannot do literally anything 
with the region unless 

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-18 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799634#comment-13799634
 ] 

stack commented on HBASE-5487:
--

Suggest moving the out on the dev mailing list as per the bible quoted below.  
Start a thread there?


From Producing Open Source Software:

Make sure the bug tracker doesn't turn into a discussion forum.
Although it is important to maintain a human presence in the bug
tracker, it is not fundamentally suited to real-time discussion. Think
of it rather as an archiver, a way to organize facts and references
to other discussions, primarily those that take place on mailing lists.

There are two reasons to make this distinction. First, the bug
tracker is more cumbersome to use than the mailing lists (or than
real-time chat forums, for that matter). This is not because bug
trackers have bad user interface design, it's just that their interfaces
were designed for capturing and presenting discrete states, not
free-flowing discussions. Second, not everyone who should be
involved in discussing a given issue is necessarily watching the bug
tracker. Part of good issue management...is to make sure each issue
is brought to the right peoples' attention, rather than requiring every
developer to monitor all issues. In the section called “No
Conversations in the Bug Tracker” in
Chapter 6, Communications, we'll look at ways to make sure people
don't accidentally siphon discussions out of appropriate forums
and into the bug tracker.

Pg. 50 of http://producingoss.com/en/producingoss.pdf


 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, 
 hbckMasterV2-long.pdf, Region management in Master5.docx, Region management 
 in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-18 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799660#comment-13799660
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

We need some convention for inline responses in the mailing list (or tell me if 
there's one) :)

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, 
 hbckMasterV2-long.pdf, Region management in Master5.docx, Region management 
 in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-17 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798467#comment-13798467
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

The doc hasn't been out for long; just clarifying - anyone interested in 
providing feedback for part 1? 
It'd be really nice to start working out implementation details in part 2 with 
some confidence, and/or writing code. Should I assume lack of interest or 
silent agreement to rewrite according to part 1?

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Region 
 management in Master5.docx, Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-17 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798612#comment-13798612
 ] 

Jonathan Hsieh commented on HBASE-5487:
---

I'm doing a pass, will provide feedback tomorrow.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Region 
 management in Master5.docx, Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-16 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796530#comment-13796530
 ] 

Nicolas Liochon commented on HBASE-5487:


+1 for Enis' requirements list :-).
I tend to think that AM and meta should be collocated.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master5.docx, Region management in 
 Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-16 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796855#comment-13796855
 ] 

Nick Dimiduk commented on HBASE-5487:
-

I'm also a fan of Enis's list, particularly AM should be understandable by 
simple human beings like myself.

The observation I'll add here is that AM and meta don't necessarily need to be 
collocated. What is necessary is that AM maintain a strongly consistent view of 
the world, at least from what I understand about the current design. That 
requirement can be relaxed iff there's an explicitly distributed state 
management system. Such a system is probably composed out of idempotent 
operations over CRDTs.

I also question the wisdom of moving away from ZK for management of active 
cluster state, primarily because in our current architecture, that component is 
completely out of band of data operations. Meaning, the activities which put 
stress on the configuration consensus bits are different from the operations 
that put stress on a data provider. (Yes, data activity results in region 
relocation, but that's a maintenance task, not direct involvement.) Moving to 
dependency on collocation unnecessarily conflates those two aspects of the 
system.

If the issues with Zookeeper originate from implementation details, why not fix 
implementation rather than look to a new architecture? For instance, the CoreOS 
folk have a little something called [etcd|https://github.com/coreos/etcd#etcd]. 
Raft specifically may not provide the correct kind of available consensus we 
need; the idea is to examine both the baby and the bathwater.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master5.docx, Region management in 
 Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-16 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796905#comment-13796905
 ] 

Nicolas Liochon commented on HBASE-5487:


bq. AM and meta don't necessarily need to be collocated
If there are separated, you double the failure probability, as you need both AM 
and .META. to work. Moreover, speaking to .meta. becomes a distributed problem, 
while its less the case when they are collocated (only less because of HDFS).

bq. moving away from ZK for management 
I believe we will need it to determine who is the AM lead. I don't really know 
about storing in zookeeper vs. meta. As Jimmy said using zookeeper to do rpc 
calls seems wrong however.

I guess this can be decided later. For the requirements, I don't have anything 
to add to Enis' list.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master5.docx, Region management in 
 Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-16 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796982#comment-13796982
 ] 

ramkrishna.s.vasudevan commented on HBASE-5487:
---

Started going through this document.  With my experience with AM definitely the 
number of states we have and the dependency on ZK callback makes things bit 
difficult to understand and track and the state of truth is spread across.
In the doc, for the create table scenario there are cases where the Create 
table failure on master abort will result in a table creation that has lesser 
number of regions actually specified by the clients in the split.
The master failover part is another critical area as how we collect the alive 
and dead RS list and the list of Regions that were partially in either 
opening/closing and splitting. It is this failiure condition where we end up in 
lot of hidden areas.  
Will read the document and share the ideas if any.  

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master5.docx, Region management in 
 Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-16 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796985#comment-13796985
 ] 

ramkrishna.s.vasudevan commented on HBASE-5487:
---

HBASE-5583 is one such JIRA that handles create table failure cases.  

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master5.docx, Region management in 
 Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797043#comment-13797043
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

I don't think it can happen on create. Until all regions are moved to Closed 
state after being created (atomically via multi-row tx), table won't leave 
Creating state. If there's failover all regions are erased and created from 
scratch. Create table is rare enough for that to work.

[~enis] Wrt req list, mostly agree, however:
bq. Bulk region operations
Can you please elaborate? Is it the same as modifying several regions' state 
under multi-row lock?

bq. Region operations should be isolated from [snip] table operations 
(disabling / disabled table, schema changes, etc) and cluster shutdown. AM 
[snip] should NEVER know about table state (disable/disabling). 
Strongly disagree with this. If we are doing bunch of balancing and user 
disables a table at the same time, we have to handle it.
If user tries to force-assign regions of a table that is halfway thru create, 
we have to handle this. 
For alter, we need to reopen regions, which will have to work w/splits and 
merges (it's covered in my doc).
For what purpose do you want to isolate them?
AM should not know about details e.g. schema logic, but it should know about 
logistics.

bq. No master abort when a region’s state cannot be determined. This results in 
support cases where master cannot start, and without master things become even 
worse. We should “quarantine” the regions if needed absolutely.
That is dangerous. IIRC in my spec I only put master abort if somebody changes 
table state under master; but in general, if region is in unknown state it's 
better to make admin act, than to just silently disappear part of data - that 
can lead to wrong results.
Perhaps table needs to be quaranteened then.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master5.docx, Region management in 
 Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797368#comment-13797368
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

One more update from discussion here:
- we currently have many operations that cannot be monitored other than by side 
effects (create table), or at all. We need good way for user to wait for 
operations. Given that we send request to master, and many operations can 
recover from master failure, we cannot use simple async API with request and 
async response (at least not on the lowest level - client library can hide 
master failover and provide that API). The lowest-level master API should 
involve some sort of persistent operation cookie, so that you could still wait 
for operation after failover.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Region management in Master5.docx, Region management in 
 Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797380#comment-13797380
 ] 

stack commented on HBASE-5487:
--

Are we conflating functionality here (going by last comment above by Sergey)?  
There is AM and then there is another facility that uses AM to run sequences of 
steps to achieve an end (e.g. enable table)?   Or is the notion that a revamped 
AM would do all?  The long-running (enable a table w/ 1M regions) and 
short-term (assign region)?  If it is to do both, I suggest we call the new 
facility GOD.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Region management in Master5.docx, Region management in 
 Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797395#comment-13797395
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

You'd need some way to connect these end with what AM is doing, so AM will 
have to support operations attached to its actions even if there's separate 
operation management. 
Moreover, you'd find out that these are not steps, they are state goals.
For example, if you are disabling table, you want to close regions. So in case 
of separate operation manager, you might create tasks to close all regions. But 
what if some server fails? Now some of your regions are already closed. 
Separate operations to close region might fail now, but the goal is achieved. 
If I start disabling table and then kill all RS-es, the table is now disabled 
:) But all operations would fail.
State goals fit much more naturally in AM than steps. I want to avoid steps 
as much as possible.

Stateful (as in, having separate step) multi-step operations are also hard to 
coordinate. In the above example, during recovery, you don't want to reopen 
region if the table is disabling, but you don't know until it's actually 
disabled if the table disable is an external operation. 


 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Region management in Master5.docx, Region management in 
 Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-16 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797409#comment-13797409
 ] 

Enis Soztutar commented on HBASE-5487:
--

I think as a mental exercise to validate the new design, we should think about 
the cases for the following issues opened recently so that we can ensure that 
these classes of problems are eliminated: 

- HBASE-9724 Failed region split is not handled correctly by AM
- HBASE-9721 meta assignment did not timeout
- HBASE-9696 Master recovery ignores online merge znode
- HBASE-9777 Two consecutive RS crashes could lead to their SSH stepping on 
each other's toes and cause master abort
- HBASE-9773 Master aborted when hbck asked the master to assign a region that 
was already online
- HBASE-9525 Move region right after a region split is dangerous
- HBASE-9514 Prevent region from assigning before log splitting is done
- HBASE-9480 Regions are unexpectedly made offline in certain failure conditions
- HBASE-9387 Region could get lost during assignment

bq. Can you please elaborate? Is it the same as modifying several regions' 
state under multi-row lock?
Bulk loading requirement is there, so that we do multiple operations in 
parallel, sending openRegions rpcs for multiple regions at the same time, and 
not doing one-by-one assignment. That is all. 

bq. That is dangerous. IIRC in my spec I only put master abort if somebody 
changes table state under master; but in general, if region is in unknown state 
it's better to make admin act, than to just silently disappear part of data - 
that can lead to wrong results.
Quaranteing the table or region is fine, but master should not be down because 
of this (for example, a region can fail to open and you would want to track how 
many times the region failed to open so that you can decide at some point that 
the region should be quarantened state (or failed open state). I think there 
was some issue the region bouncing from server to server indefinitely. 

For table operations intermixing with region operations, I'll have to read your 
updated doc. 


 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Region management in Master5.docx, Region management in 
 Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797472#comment-13797472
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

Ok, I split the doc in half. That way it will be easier to read and manager.
Part 1 is ready (as a current version), and describes high level design, 
operation semantics and interaction (I think the latter might be interesting 
for [~jmhsieh]
It also tries to capture the requirement lists above and high-level 
implementation (whatever is agreed upon to some degree).
Please tell me if something is missing or wrong.

Part 2 I will keep attaching updates. It covers the design of operations - 
state machines, exact steps, how client tracks it, how recovery works, etc. It 
will follow part 1.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Region 
 management in Master5.docx, Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-16 Thread Eric Newton (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797507#comment-13797507
 ] 

Eric Newton commented on HBASE-5487:


I'm sorry for asking such a basic question... could someone please comment: 
what does AM stands for?

I did a quick search through the ticket and the attachments and it didn't pop 
out at me.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Region 
 management in Master5.docx, Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797511#comment-13797511
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

AssignmentManager, a class in HBase master. Often but not always, when talking 
about it people also imply bunch of auxiliary classes around it like 
ServerShutdownHandler, RegionClosed/OpenedHandler, ZKTable, etc. Which together 
implement region assignment in HBase

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Region 
 management in Master5.docx, Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-16 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797538#comment-13797538
 ] 

Feng Honghua commented on HBASE-5487:
-

bq.I also question the wisdom of moving away from ZK for management of active 
cluster state...If the issues with Zookeeper originate from implementation 
details, why not fix implementation rather than look to a new architecture?
Using system table rather than ZK to store state info is for better (cluster 
restart) performance for big cluster with such as 250K regions. Certainly if we 
change the way of using ZK ( let master be the single point to read/write ZK, 
not using ZK's watch/notify mechanism), no correctness/logic difference between 
using system table and using ZK

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Region 
 management in Master5.docx, Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-15 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795680#comment-13795680
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

[~jmhsieh] I am writing out very detailed operation and failover descriptions 
right now :)

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-15 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795737#comment-13795737
 ] 

Jonathan Hsieh commented on HBASE-5487:
---

[~sershe] Looking forward to it!  

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-15 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796362#comment-13796362
 ] 

Enis Soztutar commented on HBASE-5487:
--

I also started a document some time ago, but never got to finish it to the 
level of details I would like. However, I think we can agree on the design 
goals section which I augmented from the discussion so far:

- Robust implementation
- Compressive test coverage by mocking server and region assignment states 
(unit testable without MiniCluster and CM stuff)
- Bulk region operations
- Region operations should be isolated from server operations (AM vs SSH, log 
splitting), and table operations (disabling / disabled table, schema changes, 
etc) and cluster shutdown. AM and SSH should NEVER know about table state 
(disable/disabling). Server liveness checks can only be done as an optimization 
(servers can fail after the check is done)
- There should be one source of truth
- Should be compatible with master failover, and concurrent region 
operations(split, RS failover, balancer, etc)
- AM should guarantee that a region can be hosted by a single region server at 
any given time
- AM should be understandable by simple human beings like myself
- Actions for AM should be logged (possibly separately). We would like to be 
able to construct the history for the regions from logs or some persisted 
state. 
- Assignment should be performant and parallelizable. We should target handling 
millions of regions and thousands of servers. A single region assignment should 
complete under 1 sec. (1PB data with 1 GB regions  = 1M regions)
- No master abort when a region’s state cannot be determined. This results in 
support cases where master cannot start, and without master things become even 
worse. We should “quarantine” the regions if needed absolutely.  


 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-15 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796384#comment-13796384
 ] 

stack commented on HBASE-5487:
--

@enis List makes for pretty good set of requirements.  We used to talk 100k 
regions but folks are long past that now so we are behind the curve (Flurry are 
250k IIRC) and we may want to tend away from a few large regions and more 
toward many small regions if we can get AM to perform (advantages: smaller 
compression runs, easier to free up WALs, etc)

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master5.docx, Region management in 
 Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-14 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794232#comment-13794232
 ] 

Jimmy Xiang commented on HBASE-5487:


Good. I think we are on the same page. 

bq.  just using it as a reliable storage.
We probably won't use ZK as a pure storage. Meta table + cache is a good 
alternative.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794304#comment-13794304
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

[~jxiang] by janitor, I mean not timeout monitor, but something picking up 
timeouts of non-master ops like open.
It's a rare case and probably never happens in int tests, but there can be a 
case where RS is taking too long to open.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-14 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794318#comment-13794318
 ] 

Jimmy Xiang commented on HBASE-5487:


I see.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794792#comment-13794792
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

Ok, it's harder than I thought, I don't think I will be done today... but I 
think I have a clear picture now that covers the above feedback, so I am trying 
to cover all the failover scenarios and state conflicts.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-14 Thread Eric Newton (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794862#comment-13794862
 ] 

Eric Newton commented on HBASE-5487:


Accumulo does manage tablet (region) assignment tracking through the metadata 
table, and further, uses a distributed state machine to scale up a little 
beyond a single master node. I have been meaning to write it up, but I've not 
had a chance.

I've not kept up with every HBase improvement, so I don't know if it is 
pertinent... the accumulo metadata table is typically spread out over 50 - 100% 
of the available tablet (region) servers.

Still, the metadata table, and especially the root table(t), is subject to 
hot-spotting on large map/reduce jobs where hundreds (or thousands) of clients 
are learning tablet locations at the same time.  Block caching is important, 
but at some point massive numbers of simultaneous RPC requests to a single node 
cause delays, or even timeouts and failures.

But using accumulo to store accumulo state has scaled well.

Accumulo has 2 frameworks for master tasks:

* master general state processing: a table should be online, assignments are 
recorded and servers repeatedly informed
* FATE processing, where multi-stage operations are saved, executed and 
progress is re-recorded

The first is general maintenance: keeping the system running.  Tablets are 
assigned, unassigned and in-general balanced.

The second allows for temporal deviance: tablets are taken offline for a merge, 
for example.  The step-by-step allocation of resources and state are walked, 
each step recording progress in zookeeper.



 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794892#comment-13794892
 ] 

stack commented on HBASE-5487:
--

Thanks for the helpful input [~ecn]

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-14 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794915#comment-13794915
 ] 

Jonathan Hsieh commented on HBASE-5487:
---

FYI, I've been looking at our support cases, and have been thinking and writing 
up a clean slate design for a master redesign with the problems we've faced in 
the field in mind. I focus a bit more on invariants necessary in the different 
states, state transitions with master interactions, extensibility of the model, 
and on the recovery strategy.  It basically takes a pessimistic view of the 
world and if I had to summarize its spirit, I'd call it the 
hbck-all-the-time.master.

It is currently durable storage agnostic but requires atomic CAS operations 
(single row or single znode should be sufficient). When I re-read this thread 
it could use either of the implementation details described here (zk vs meta, 
etc).  It sounds like being based in hbase is preferred so a little more 
thought is going in that direction.   I'm working currently on examples of how 
to extend it for new features currently (like fast write recovery aka 
distributed log replay) and proving to myself that it would be immune from 
problems we've encountered before like double assignments, conflicting 
concurrent operations  (especially during recovery), and regions stuck in 
transitions in the face of failures, hangs or juliet pauses.

I read Sergey's doc after my first cut and while there are some similarities it 
deviates in other places.  (I definitely want more on the error recovery and 
error prevention mechanics). My hope is to share it sometime this later week so 
that folks can read, discuss and compare the different designs presented at the 
upcoming dev meeting.  Before and jirae are file for implementation also 
consider things like upgrades, compatibility and performance.

I'm also hoping I'll have time to take a look at the accumulo master's design 
as well for the discussion.  

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-13 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793881#comment-13793881
 ] 

Feng Honghua commented on HBASE-5487:
-

[~jxiang]
bq. to the uncertainty due to ZK, I don't think it is because the way how we 
use it. It is more because ZK doesn't support continuous events. You have to 
set the watch again after each event callback. The problem is that after an 
event is triggered, when we try to get the data, the data could be changed 
again so an event is missed that will cause state jump.
Agree. 'one-time watch' and 'asynchronous event notification' are the root 
cause of current AM problem ( I mentioned in above comment, you can find it:-) 
). And when I said 'because the way we use it', I meant we use ZK's watch/event 
mechanism: A process(RS) updates ZK, and B process(master) gets notified the 
update via watch event. If we use ZK just as a reliable storage, just the way 
of using meta table, it makes no difference we use meta table or ZK (except 
performance difference)
In the theme of using meta table, we adopt another communication pattern for 
tasks(assign/split/merge): master requests RS to do something(and master stores 
the task progress/state to meta table), RS responses master of its progress 
periodically, master changes the task progress in both memory and meta table... 
---under this theme we can use ZK to replace meta table, and avoid previous 
state transition miss problem as well, since we don't use ZK's watch/event 
mechanism, just using it as a reliable storage. right?
Just clarify, I think we share the same understanding of this problem, you can 
check my above comments :-)

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-12 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793436#comment-13793436
 ] 

Jimmy Xiang commented on HBASE-5487:


[~fenghh], to the uncertainty due to ZK, I don't think it is because the way 
how we use it.  It is more because ZK doesn't support continuous events.  You 
have to set the watch again after each event callback.  The problem is that 
after an event is triggered, when we try to get the data, the data could be 
changed again so an event is missed that will cause state jump.

Currently, we do have a region state machine.  However, the machine is not 
strict due to the ZK thing.  We could jump over some state, which make the 
state transition machine can't be strictly enforced.  If we go without ZK, we 
can have a strict state machine to follow. That will make things much 
predictable.

[~sershe], to the janitor, I think we don't need it.  Currently, we have a 
timeout monitor.  But it is disabled and will be removed soon I think.  Without 
the monitor, ITBLL with CM runs very well. With 0.96 tip, I tried to run ITBLL 
with CM with aggressive region moving, and it is perfectly fine. If a RS is 
gone, SSH should handle it properly and assign regions.  If there is a janitor, 
it will compete with SSH in this case, which probably does more harm than good.

To make some RS to serve the role of master, besides we can have meta on it, we 
can have some (not all, of course, to make [~jesse_yates] happy :) ) system 
tables on it too. This way, we can support level region assignments, i.e. we 
can open some regions before the rest, if these regions can be assigned to the 
master RS, or we can open on this master RS at first, then move away later 
after system is fully started. This applies to some special regions only for 
sure.

Now, we bundle two import modules (master + meta) in one RS. It is critical to 
make sure it has light load, not die too often (even better, not die at all). 
So I think we should move other regions out of the RS once it's promoted to be 
the master one.

I think we should allow only a list of RS with good hardware to be master, if 
not all RS nodes have decent/same hardware.


 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-11 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793086#comment-13793086
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

I think it's the approach discussed above.

I will update the doc on monday, I think I'm sold on collocated system table. 
Initially we can just run an RS that runs master library and only hosts 
hardcoded system regions as master.
Then probably any RS (with caveats) can host the master regions and act as 
master, so recovery can become much easier.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-11 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793208#comment-13793208
 ] 

Aaron T. Myers commented on HBASE-5487:
---

bq. ZK is used for the selection of the primary nn (via the failure 
controllers) but I believe the journal nodes (that do the durable consensus 
logging) does not use ZK at all. Todd Lipcon or Aaron T. Myers can confirm.

I can confirm this. The QJM in the NN uses its own (heavily ZK-inspired) 
consensus protocol, but does not rely on ZK itself. The only thing HDFS 
currently uses ZK for is for the leader election of the active NN, as Jon says 
here.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-10 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791516#comment-13791516
 ] 

Feng Honghua commented on HBASE-5487:
-

bq.Master is the Actor. Having it go across a network to get/set the 'state' in 
a service that is non-transactional wasn't our smartest move.
Regionservers currently report state via ZK. Master reads it from ZK. Would be 
better if RS just reported directly to RS.
[~stack] Yes, this is exactly what I proposed in HBASE-9726 :-)

bq.I am wondering whether it makes sense to update the meta table from the 
various regionservers on the region state changes or go via the master.. But 
maybe the master doesn't need to be a bottleneck if possible. A regionserver 
could first update the meta table, and then just notify the master that a 
certain transition was done; the master could initiate the next transition
[~devaraj] It would be better to let master updates the meta table rather than 
let various regionservers do it. Master being the single actor and 
truth-maintainer can avoid many tricky bugs/problems. And for frequent state 
changes to the meta table, the regionserver serving the (state) meta table 
would be sooner the bottleneck than master which issues the update requests, so 
whether it doesn't matter the update requests are from the master or from 
various regionservers.

bq.I prefer not to use ZK since it's kind of the root cause of uncertainty: has 
the master/region server got/processed the event? has the znode hijacked since 
master/region server changes its mind?
We should store the state in meta table which is cached in the memory.
Whether to use coprocessor it is not a big concern to me. If we don't use 
coprocessor, I prefer to use the master as the proxy to do all meta table 
updates. Otherwise, we need to listen to something for updates.
[~jxiang] Agree. IMO ZK alone is not the root cause of uncertainty, the current 
usage pattern of ZK is the root cause, the pattern that regionserver updates 
state in ZK and master listens to the ZK and updates states in its local memory 
accordingly exhibits too many tricky scenarios/bugs due to ZK watch is 
one-time(which can result in missed state transition) and the 
notification/process is asyncronous(which can lead to 
delayed/non-update-to-date state in master memory). And by replacing ZK with 
meta table, we also need to discard this 'RS updates - master listen' pattern 
since meta table inherently lack listen-notify mechanism:-).

bq.I think ZK got a bad reputation not on its own merit, but on how we use it.
I can see that problems exist but IMHO advantages outweigh the disadvantages 
compared to system table.
Co-located system table, I am not so sure, but so far there's no even 
high-level design for this (for example - do all splits have to go thru 
master/system table now? how does it recover? etc.).
Perhaps we should abstract an async persistence mechanism sufficiently and then 
decide. Whether it would be ZK+notifications, or system table, or memory + wal, 
or colocated system table, or what.
The problem is that the usage inside master of that interface would depend on 
perf characteristics.
Anyway, we can work out the state transitions/concurrency/recovery without 
tying 100% to particular store.
[~sershe] Agree on ZK got a bad reputation not on its own merit, but on how we 
use it., especially if you mean currently master relies on ZK 
watch/notification to maintain/update master's in-memory region state. IMO this 
is almost the biggest root cause of current assignment design. If we just uses 
ZK the same way as using meta table to storing states, it makes no that big 
difference to store the states in ZK or meta table, right(except using meta 
table can have much better performance for restart of a big cluster with large 
amount of regions)? But using ZK's update/listen pattern does make the 
difference.

bq.btw, any input on actor model? 
Things queue up operations/notifications (ops) for master; AM runs on timer 
or when queue is non-empty, having as inputs, cluster state (incl. ongoing 
internal actions it ordered before e.g. OPENING state for a region) plus new 
ops from queue, on a single thread; generates new actions (not physically doing 
anything e,g, talking to RS); the ops state and cluster state is persisted; 
then actions are executed on different threads (e.g. messages sent to RS-es, 
etc.), and AM runs again, or sleeps for some time if ops queue is empty.
That is a different model, not sure if it scales for large clusters.
[~sershe] operations/notifications means RS responses action progress to 
master? Master is the single point to update the state truth(to meta table) 
and RS doesn't know where the states are stored and doesn't access them 
directly, right? I think a communication/storage diagram can help a lot for an 
overall clear understanding here:-)

 Generic framework for 

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-10 Thread Feng Honghua (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791520#comment-13791520
 ] 

Feng Honghua commented on HBASE-5487:
-

Since HBASE-9726 is closed as duplicated with this one, I copied the proposal 
of HBASE-9726 here for discussion/reference:

Current assignment process (also split process) relies on ZK for the 
communication between master and regionserver. This pattern has two drawbacks: 
  1. For cluster with big number of regions(say, 10K-100K regions), ZK becomes 
the bottleneck for cluster restart since the assignment/split status/progress 
is stored in ZK due to ZK's limited write throughput 
  2. Since ZK's watch is one-time and the event notification/process is 
asynchronous, there is no guarantee for master(the watcher) to be notified of 
the up-to-date status/progress in time, thereby master relies on idempotence 
for its correctness, which makes the logic/code very hard to 
understand/maintain 

A new assignment design proposal is as below: 
  1. Assignment/split status/progress is stored in a system table(say 
'assignTable') as meta table rather than ZK to improve the write throughput, 
hence to improve the proformance of restart for cluster with large number of 
regions. 
  2. The communication pattern for assignment/split is changed this way: master 
talks directly with regionserver(master issues assign request to regionserver, 
regionserver responses the assign progress to master) and records the 
status/progress of each assignment/split in the 'assignTable', in case of 
master failure, new active master reads the 'assignTable' to rebuilds the 
knowledge of the ongoing assignmeng/split tasks and continues from that 
knowledge. (regionserver doesn't write to the 'assignTable') 

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-09 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790109#comment-13790109
 ] 

Devaraj Das commented on HBASE-5487:


bq. we still need a reliable store (ZK, system table, or master WAL). It seems 
ZK is the most scalable and best suited for the task

[~sershe], not ZK, IMHO. Let's use one of our internal storages rather than 
external system for storing the region state. I am all for removing ZK 
altogether from HBase. One less distributed system to worry about. One less 
component to manage. We already have heartbeats from RSs to master, and region 
open/close RPCs from master to the RSs. I think we have enough communication 
already in place between the master and RSs to deal with region states We 
also have chores in the master that tries to take some actions based on 
assignment timeouts... 

Would this model work (conceptually). It's late night here; please pardon me if 
there are glaring issues :-) Please bear with me :-)

All region state manipulation operations are initiated by the master and they 
act upon the meta region. We have extra columns to store the state of the 
region etc in the meta table. The initial rows are created by the master and 
the state of the regions are UNASSIGNED. This is not new - we already do this 
but IIRC we don't store the state of the region. Some state transitions happen 
through method executions and some of those method executions are RPCs from the 
master to some regionserver. I think that the states would be more granular 
here (to prevent potential replay/repetitions of large operations). I am 
wondering whether it makes sense to update the meta table from the various 
regionservers on the region state changes or go via the master.. But maybe the 
master doesn't need to be a bottleneck if possible. A regionserver could first 
update the meta table, and then just notify the master that a certain 
transition was done; the master could initiate the next transition ([~eclark] 
comment about coprocessor can probably be made to apply in this context). Only 
when a state change is recorded in meta, the operation is considered successful.

Also, there is a chore (probably enhance catalog-janitor) in the master that 
periodically goes over the meta table and restarts (along with some 
diagnostics; probing regionservers in question etc.) failed/stuck state 
transitions. This chore runs once as soon as the master is started and the meta 
region is assigned to take care of transitions that were started in the 
previous life of the master and which are now waiting for some action from the 
master. For example, if the state was OPENING for a certain region, and the 
master crashed, the master would send a openRegion RPC to the region assignee 
upon restart. The region assignee would have been recorded as a column in the 
row for the region by the previous master.

I think we should also save the operations that was initiated by the client on 
the master (either in WAL or in some system table) so that the master doesn't 
lose track of those and can execute them in the face of crashes  restarts. For 
example, if the user had sent a 'split region' operation and the master crashed.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-09 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790151#comment-13790151
 ] 

Nicolas Liochon commented on HBASE-5487:


bq. zk vs. non zk.
ZK is used in HDFS HA, no? So any way we have it in our architecture. Then 
using it for permanent data is another discussion (stuff like ZOOKEEPER-1147 
makes it interesting.
I would personally prefer to remove the master rather than adding functions to 
it. Saying that there are some specific threads in the region servers holding 
.meta. is acceptable imho. 

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-09 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790172#comment-13790172
 ] 

Jonathan Hsieh commented on HBASE-5487:
---

ZK is used for the selection of the primary nn (via the failure controllers) 
but I believe the journal nodes (that do the durable consensus logging) does 
not use ZK at all. [~tlipcon]or [~atm] can confirm.


 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-09 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790433#comment-13790433
 ] 

Devaraj Das commented on HBASE-5487:


Removing the separate master daemon is fine by me, Nicolas. However, we still 
need someone to do various operations (servicing user requests and other 
janitorial tasks). Long back we were discussing that a random region server 
(elected via zk) could perform the master role.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-09 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790442#comment-13790442
 ] 

Jimmy Xiang commented on HBASE-5487:


I prefer not to use ZK since it's kind of the root cause of uncertainty: has 
the master/region server got/processed the event? has the znode hijacked since 
master/region server changes its mind?

We should store the state in meta table which is cached in the memory. 

Whether to use coprocessor it is not a big concern to me.  If we don't use 
coprocessor, I prefer to use the master as the proxy to do all meta table 
updates. Otherwise, we need to listen to something for updates.

We should not have another janitor/chore. If an action is failed, it must be 
because of something unrecoverable by itself, not because of a bug in our code. 
 It should stay failed until the issue is resolved.

We need to have something like FATE in accumulo to queue/retry actions taking 
several steps like split/merge/move.

It is a nice-to-have to keep a history of region state transition.


 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-09 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790451#comment-13790451
 ] 

Nicolas Liochon commented on HBASE-5487:


bq. However, we still need someone to do various operations (servicing user 
requests and other janitorial tasks). 
Yeah, I agree. Balance is a good example. Less say that I'm more comfortable w/ 
something that lowers the role of the master than the opposite.

bq. I prefer not to use ZK 
When you say this, Jimmy, do you mean no ZK in HBase at all, or No ZK for 
permanent data, or No ZK at all for assignment? 

bq. We should store the state in meta table which is cached in the memory. 
I'm fine with that (if we can make it work :-) )

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-09 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790460#comment-13790460
 ] 

Jimmy Xiang commented on HBASE-5487:


Nicolas, I mean no ZK for assignment.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-09 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791007#comment-13791007
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

Big response to not-responded-to recent comments.
Let me update the doc, EOW-ish probably depending on the number of bugs 
surfacing ;)

[~stack]
Let's keep discussion and doc here and branch tasks out for rewrites.
bq. + The problem section is too short (state kept in multiple places and all 
have to agree...); need more full list so can be sure proposal addresses them 
all
What level of detail do you have in mind? It's not a bug fix, so I cannot 
really say merge races with snapshot, or something like that; that could also 
be arguably resolved by another 100k patch to existing AM :)
bq. + How is the proposal different from what we currently have? I see us tying 
regionstate to table state. That is new. But the rest, where we have a record 
and it is atomically changed looks like our RegionState in Master memory? There 
is an increasing 'version' which should help ensure a 'direction' for change 
which should help.
See the design principles (and below discussion :)). We are trying to avoid 
multiple flavors of split-brain state.
bq. Its fine having a source of truth but ain't the hard part bring the system 
along? (meta edits, clients, etc.).
Yes :)
bq. Experience has zk as messy to reason with. It is also an indirection having 
RS and M go to zk to do 'state'.
I think ZK got a bad reputation not on its own merit, but on how we use it.
I can see that problems exist but IMHO advantages outweigh the disadvantages 
compared to system table.
Co-located system table, I am not so sure, but so far there's no even 
high-level design for this (for example - do all splits have to go thru 
master/system table now? how does it recover? etc.).
Perhaps we should abstract an async persistence mechanism sufficiently and then 
decide. Whether it would be ZK+notifications, or system table, or memory + wal, 
or colocated system table, or what.
The problem is that the usage inside master of that interface would depend on 
perf characteristics.
Anyway, we can work out the state transitions/concurrency/recovery without 
tying 100% to particular store.

bq. + Agree that master should become a lib that any regionserver can run.
That sounds possible.

[~nkeywal]
bq. At least, we should make this really testable, without needing to set up a 
zk, a set of rs and so on.
+1, see my comment above. 
bq. I really really really ( ) think that we need to put performances as a 
requirement for any implementation. For example, something like: on a cluster 
with 5 racks of 20 regionserver each, with 200 regions per RS,, the assignment 
will be completed in 1s if we lose one rack. I saw a reference to async ZK in 
the doc, it's great, because the performances are 10 times better.
We can measure and improve, but I am not really sure about what exact numbers 
will be, at this stage (we don't even know what storage is).


[~devaraj]
bq. A regionserver could first update the meta table, and then just notify the 
master that a certain transition was done; the master could initiate the next 
transition (Elliott Clark comment about coprocessor can probably be made to 
apply in this context). Only when a state change is recorded in meta, the 
operation is considered successful.
Split, for example, requires several changes to meta. Will master be able to 
see them together from the hook? If master is collocated in the same RS with 
meta, it should be small overhead to have master RPC.

bq. Also, there is a chore (probably enhance catalog-janitor) in the master 
that periodically goes over the meta table and restarts (along with some 
diagnostics; probing regionservers in question etc.) failed/stuck state 
transitions. 
+1 on that. Transition states can indicate the start ts, and master will know 
when they started.

bq. I think we should also save the operations that was initiated by the client 
on the master (either in WAL or in some system table) so that the master 
doesn't lose track of those and can execute them in the face of crashes  
restarts. For example, if the user had sent a 'split region' operation and the 
master crashed
Yeah, disable table or move region are a good example. Probably we'd need 
ZK/system table/WAL for ongoing logical operations.

[~jxiang]
bq. We should not have another janitor/chore. If an action is failed, it must 
be because of something unrecoverable by itself, not because of a bug in our 
code. It should stay failed until the issue is resolved.
I think the failures meant are things like RS went away, is slow or buggy, so 
OPENING got stuck - someone needs to pick it up over timeout.

bq. We need to have something like FATE in accumulo to queue/retry actions 
taking several steps like split/merge/move.
We basically need something that allows atomic state 

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-09 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791012#comment-13791012
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

btw, any input on actor model? 
Things queue up operations/notifications (ops) for master; AM runs on timer 
or when queue is non-empty, having as inputs, cluster state (incl. ongoing 
internal actions it ordered before e.g. OPENING state for a region) plus new 
ops from queue, on a single thread; generates new actions (not physically doing 
anything e,g, talking to RS); the ops state and cluster state is persisted; 
then actions are executed on different threads (e.g. messages sent to RS-es, 
etc.), and AM runs again, or sleeps for some time if ops queue is empty.

That is a different model, not sure if it scales for large clusters.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-08 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789470#comment-13789470
 ] 

Elliott Clark commented on HBASE-5487:
--

bq.we still need a reliable store
HBase is a reliable store.  We should be using it as such for current state.

If we co-locate the master process with meta, then the master noticing state 
changes is as simples as loading a co-processor that hooks mutations.  It also 
means that when master wants to look up current state there's no rpc overhead.  
Simply target the hregion. This allows us to reduce the number of copies of 
state.  No longer will we need a local hash map + what's in zk, + what's in 
meta.  

I think Jimmy's correct we should use zk for ephemeral only.  Everything else 
should be in our systems.  

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-08 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789493#comment-13789493
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

Please let's not use coprocessors for mainline functionality... also, if we 
store state in system table that is hosted by master, then we don't need ZK at 
all, we should get rid of it.
The only disadvantages from using ZK that I see are the absence 
getKeyBefore/After API (easy to fix by having ephemeral META table for clients 
to query), and having extra moving part. If we don't get rid of ZK we don't 
alleviate the latter so I think we should either use it for everything or not 
at all... I would prefer to use it for everything.
As far as I see, ZK is more reliable than HBase RS or master, has built-in 
replication with faster recovery, is probably more scalable than reading from 
single RS, and has better model for atomic state changes. Probably has better 
tolerance for stuff like network partitioning too. We could do master WAL and 
all that stuff but I don't see a compelling reason to do this when we have a 
bunch of Apache code that is already written to solve all of these problems. 
What is the reason to not use ZK? What is the advantage of system table, or 
disadvantage of ZK?

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-08 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789576#comment-13789576
 ] 

Elliott Clark commented on HBASE-5487:
--

bq.Please let's not use coprocessors for mainline functionality
We already do.  I don't see anything wrong with making HBase more modular.  If 
there are pain points with using co-processors that cause you to say no, then 
we should fix those.  Not just ignore them.


bq.also, if we store state in system table that is hosted by master, then we 
don't need ZK at all, we should get rid of it.
We don't have ephemeral node capability at all.  And we need it for the 
bootstrap problem.  It allows clients to point at a relatively small number of 
nodes to discover the whole cluster.

bq.As far as I see, ZK is more reliable than HBase RS or master
Our master is only complex because of our use of zk to hold and mutate state.

bq.has built-in replication with faster recovery
With the meta/system wal I think we can be within an order of magnitude.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-08 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789598#comment-13789598
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

Wrt coprocs - that is bad imho, that is not the kind of modular that we want. 
Core parts of the system should depend on well-defined interfaces, not a 
generic extension points. Imho, the litmus test for coproc, as a plugin 
interface, is - can you run HBase without it? If yes, then it's ok to be a 
coproc (e.g. accesscontrol). Otherwise we should have proper interfaces that 
have some meaning to the caller.

bq. Our master is only complex because of our use of zk to hold and mutate 
state.
That is not due to ZK as such, that is due to multi-state-machine 
reconciliation model and truth in multiple places that it requires.
System table can have exact same problem of state in the table + state in 
memory, question is how you split and manage state between them, storage 
substrate doesn't matter as much. If truth was in ZK and nowhere else that 
wouldn't be a problem, same way as with system table.
Also, by reliable I meant that ZK is multiple nodes with built-in master 
recovery by design, whereas with master you need at least HA, and still it's 
probably worse than ZK in case of failure.
There are also other things that I mentioned.

bq. With the meta/system wal I think we can be within an order of magnitude.
So, why would we write a bunch of new code to get within an order of 
magnitude? I don't see an advantage, or ZK disadvantage that you mention 
compared to multiple advantages of ZK.
Esp. if we cannot totally get rid of it, so we'll have an extra service 
regardless.


 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-08 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789612#comment-13789612
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

Btw I agree that the main point is to get rid of the complexity you mention 
(and in the doc I only mention storage mechanism in ZK in one paragraph in the 
end), so the storage mechanism choice is almost orthogonal.
But as far as it is concerned, it seems an obvious choice to use ZK for me ATM. 
I may not know something about ZK (or system tables?), but so far the pattern 
is that meta recovery is a big deal even without bugs, and with ZK we barely 
ever have any problems. 

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-08 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789642#comment-13789642
 ] 

Elliott Clark commented on HBASE-5487:
--

bq.Wrt coprocs - that is bad imho, that is not the kind of modular that we want.
I'm not tied to it being a co-proc.  But it does illustrate the idea that it 
can be done by watching mutations as they come into the normal hregion call 
stack.

bq.That is not due to ZK as such, that is due to multi-state-machine 
reconciliation model and truth in multiple places that it requires.
In part it's due to getting zk messages out of order, and getting them delayed. 
Those pains are due in no small part because zk's client is single threaded.

bq.System table can have exact same problem of state in the table + state in 
memory, question is how you split and manage state between them, storage 
substrate doesn't matter as much.
But you only have the one state if you have master inside of the region server 
hosting meta.  There's no need to have a map of assignment if meta is actually 
just a function call away.  Also  The same is not true at all if you want to 
put state into zk.  Then you need a local cache if you want to make this 
performant at all (That's how we got to the current state).  Putting state into 
zk necessitates a split brain problem.  There's what the master see and what 
the outside worlds sees.

bq.So, why would we write a bunch of new code to get within an order of 
magnitude?
That code is already there, and in use.  We fail over meta right now in 240ms.  
I was commenting on what you were saying that zk fails over faster. And that's 
true but for meta we've narrowed that gap significantly. So I don't think that 
ZK has that much of an advantage.

bq. I don't see an advantage, or ZK disadvantage that you mention compared to 
multiple advantages of ZK
We've tried putting state into zk.  That failed.  I really don't want to put a 
whole bunch of new code into hbase that does almost exactly the same thing as 
we currently have.  It's going to fail.

bq.so the storage mechanism choice is almost orthogonal.
For me it's not just about the storage.  It's about co-locating storage with 
the master means that these split brain problems are much rarer.


 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-08 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789874#comment-13789874
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

bq. We've tried putting state into zk.  That failed.  I really don't want to 
put a whole bunch of new code into hbase that does almost exactly the same 
thing as we currently have.  It's going to fail.
As I said I don't think that is true. Our problem is not state being in ZK; it 
is that the state is in multiple places in ZK itself for different parts of the 
same region's state, plus some state in master to reconcile these, plus some 
state that is not in ZK but only in master, plus also meta.
I.e. not the split between ZK and master but split logical state within both 
and between them.
bq.  Then you need a local cache if you want to make this performant at all 
(That's how we got to the current state).
Our current state is not local cache, it's a bunch of actual state...

I am not yet sure how bad ZK-master split brain problem will be if ZK has 
entire truth, let me think about it.
When you say no split-brain inside master, do you mean master will host the 
meta and do all reads and writes to meta with no local intermediate state in 
memory?


 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789997#comment-13789997
 ] 

stack commented on HBASE-5487:
--

bq.  Our problem is not state being in ZK; it is that the state is in multiple 
places in ZK itself for different parts of the same region's state, plus some 
state in master to reconcile these, plus some state that is not in ZK but only 
in master, plus also meta.

Master is the Actor.  Having it go across a network to get/set the 'state' in a 
service that is non-transactional wasn't our smartest move.

Regionservers currently report state via ZK.  Master reads it from ZK.  Would 
be better if RS just reported directly to RS.


 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-07 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788303#comment-13788303
 ] 

Jimmy Xiang commented on HBASE-5487:


The doc is very good. I like the Problem and Design Principles sections. The 
region state machine can be enhanced.  We should have a single source of truth 
which is scalable and performs well. I think it could be the master (in 
memory).  All actions (including split/merge) should be started and managed by 
the master. 

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-07 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788317#comment-13788317
 ] 

Nicolas Liochon commented on HBASE-5487:


3 comments:
- I wonder if we should use something that would allow us to test all the 
possible states. At least, we should make this really testable, without needing 
to set up a zk, a set of rs and so on.
- We should question the master based architecture. How does it work for the 
MapR implementation for example? Why the assignment manager is not in the 
region server holding meta? This would save one distributed state for example.
- I really really really ( :-) ) think that we need to put performances as a 
requirement for any implementation. For example, something like: on a cluster 
with 5 racks of 20 regionserver each, with 200 regions per RS,, the assignment 
will be completed in 1s if we lose one rack. I saw a reference to async ZK in 
the doc, it's great, because the performances are 10 times better.

Thanks for writing the doc Sergey.



 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-07 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788324#comment-13788324
 ] 

stack commented on HBASE-5487:
--

+ Agree that master should become a lib that any regionserver can run.
+ Agree testable but would like to point out that it is possible to put up our 
current AM in a standalone mode -- it just takes mockery (smile)
+ Agree on perf.  Helps MTTR.


 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-07 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788331#comment-13788331
 ] 

Jimmy Xiang commented on HBASE-5487:


Agree that master should hold the meta region.  It should hold other system 
table regions as well.


 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-07 Thread Jesse Yates (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788334#comment-13788334
 ] 

Jesse Yates commented on HBASE-5487:


-1 that it should hold other system tables. We know META isn't going to span 
more than a region, but it would be completely reasonable for other system 
tables to be larger (i.e. statistics). Maybe worth considering a single-region 
flag for certain tables to identify that they can never be split and can 
support single region transactions.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-07 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788342#comment-13788342
 ] 

Jimmy Xiang commented on HBASE-5487:


That's right.  For big system tables which are not required for the system to 
start/run, it can be assigned to somewhere else. The master doesn't have to 
hold all system table regions.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-07 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788415#comment-13788415
 ] 

Devaraj Das commented on HBASE-5487:


bq. We should have a single source of truth which is scalable and performs 
well. I think it could be the master (in memory). All actions (including 
split/merge) should be started and managed by the master.

I agree with this. We have done this in other components - HDFS, MapReduce, 
Yarn, etc Depending on ZK for the state management brings complexity. I 
think we should use ZK for only ephemeral stuff and not for storing state 
there. In that regard, using ZK for discovering lost RSs is fine, but not for 
storing the region states. I like the idea of the Master WAL to take care of 
master crashes.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-07 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788518#comment-13788518
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

we still need  a reliable store (ZK, system table, or master WAL). It seems ZK 
is the most scalable and best suited for the task. In perfect world we would 
have ZK library that we could host and have a quorum of masters running 
Paxos/ZK. But we don't have that...

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787446#comment-13787446
 ] 

stack commented on HBASE-5487:
--

bq. We have talked a little bit here, and agreed on 2 key points.

What about Enis's 'requirements'.  We agree on his list?  That makes 3 key 
points.  I think add a fourth point where we rehearse what is wrong w/ the 
current system (Jon's suggestion) as it will help ensure we don't repeat the 
mistakes of the past.

Make a subtask 'New Assignment Manager'?  Or make this a subtask of a new issue 
called 'New Assignment Manager'.  An issue named so will be easier to find than 
this one.  Also, others are interested in this effort (@honghua and 
[~xieliang007]) and it'll catch their attention.

I think it too big a change to be done for the pending 0.98.  Lets not rush it. 
 It could even land post hbase 1.0 if 0.98 is to become 1.0.

On the design doc.,

+ Doc., needs author and date.  I would expect a section situating the document 
-- context -- that at least referred to the current 'design' -- se 
https://issues.apache.org/jira/browse/HBASE-2485 (it has 'state' machine that 
looks like this one)
+ The problem section is too short (state kept in multiple places and all have 
to agree...); need more full list so can be sure proposal addresses them all
+ How is the proposal different from what we currently have?  I see us tying 
regionstate to table state.  That is new.  But the rest, where we have a record 
and it is atomically changed looks like our RegionState in Master memory?  
There is an increasing 'version' which should help ensure a 'direction' for 
change which should help.
+ A single atomically mutable record of the regionstate is well and good but 
how then to get the cluster to align w/ what this record says?  For example, 
table record says it is disabled.  It has 10k regions.  How do we get the 
regions to agree w/ the Table record which says it is disabled?  We can send 
the closes but how we sure the close happened on all 10k regions?
+ I don't get this bit Thisrecord  is  the onlysource  of  
truth   about   the region, and is  never   removed while   the 
region  is relevant.Thissimplifies  current situation   where   
ZK  state,  master  state   and 
METAstate   can all conflictin  various special 
ways...  Its fine having a source of truth but ain't the hard part bring the 
system along?  (meta edits, clients, etc.).

Experience has zk as messy to reason with.  It is also an indirection having RS 
and M go to zk to do 'state'.

Thank sfor writing this up Sergey

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787493#comment-13787493
 ] 

stack commented on HBASE-5487:
--

[~jxiang] What you think of [~sershe] doc?

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-04 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786870#comment-13786870
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

I would like to resurrect this issue (and rename it). It seems that every fix 
to assignmentmanager introduces 1-3 more maps or lists around it, makes it even 
more impossible to comprehend and may add more bugs.

We have talked a little bit here, and agreed on 2 key points.
1) Region state should be managed in one permanent place with one state 
machine; no separate and/or transient state machines, no operation-based state 
machines.
2) New assignment manager should be easily testable by simulating sequences in 
events.
I think my doc above is still reasonably good approximation, but of course we 
might need to discuss flesh out the details.

Ahem. I said new assignment manager, that is because I would like to rename 
this jira rewrite assignment manager.
Wdyt? [~enis] [~jxiang] [~saint@gmail.com] [~ndimiduk] [~jmhsieh]


 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-04 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786872#comment-13786872
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

*of events

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-07-11 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706491#comment-13706491
 ] 

stack commented on HBASE-5487:
--

Was just looking at a test failure issue.   A move operation was failing 
because the region we were asking it to move had not fully opened yet so move 
just failed:

2013-07-11 09:35:00,489 DEBUG [RpcServer.handler=4,port=55346] 
master.AssignmentManager(2277): Attempting to unassign 
ephemeral,,1373535299969.ab63d8b7c5339b4a61fdd70e8cb8993a. but it is already in 
transition (OPEN, force=false)

Client needs means of asking what happened to its move.

Client should also be able to say I want the move to succeed. in the above 
case then, the move op should be retried.

Need a queue of outstanding ops.  Need to be able to query it for where is my 
op? (like fedex where is my package)

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-07-11 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706520#comment-13706520
 ] 

Enis Soztutar commented on HBASE-5487:
--

The requirements are pretty much clear. We need a persistent set of ops running 
in the cluster. Every op is defined as a stack of steps which are undo/redoable 
and adempotent. The client submits an op, and gets an request_id, with which it 
can query the state of the operation later. The ops and steps are handled by 
state machines + stacked execution. The state is persisted via zk or a WAL log 
for master or an HBase table (which is just a higher level log). 

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-07-11 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706528#comment-13706528
 ] 

stack commented on HBASE-5487:
--

bq. The requirements are pretty much clear...

Smile. Yeah.  I wonder what is 'next'?

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-07-11 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706532#comment-13706532
 ] 

Enis Soztutar commented on HBASE-5487:
--

Once dust is settled for 0.96, I think this is a good candidate for 0.98 :) 

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-07-11 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706542#comment-13706542
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

Do these requirements cover the test failure just mentioned... or any other 
that come from collision of multiple ops on the same region?

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-07-11 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706616#comment-13706616
 ] 

stack commented on HBASE-5487:
--

[~enis] Agree
[~sershe] No.  I think the move 'failing' is not too bad; it is something we 
can work on perhaps having two types of move... a move recommendation and 
then a required move.  The offense in my scenario above is that the move 
failed silently.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-03-29 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617600#comment-13617600
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

bq. Any major-overhaul solution should make sure that these operations, when 
issued concurrently, interact according to a sane set of semantics in the face 
of failures.
This is another (although not orthogonal) question.
I am looking for a sane way to define and enforce arbitrary semantics first. 
Then sane semantics can be enforced on top of that :)
For example, in actor-ish model below would make it easy to write simple 
code; persistent state would make sure there's definite state at any time, and 
all crucial transitions are atomic, so semantics would be easy to enforce as 
long as the code can handle a failed transition/recovery. Locks also make this 
simple, although locks have other problems imho.
Although we can go both ways, if we define sane semantics it would be easy to 
see how convenient they are to implement in a particular model.

bq. So I buy open/close as a region operation. split/merge are multi region 
operations – is there enough state to recover from a failure?
There should be. Can you elaborate?

bq. So alter table is a region operation? Why isn't it in the state machine?
Alter table is currently the operation that involves region operation, namely 
open/close. Open-close are in the state machine :) As for tables, I am not sure 
state machine is the best model for table state, there isn't that much going on 
with the table that is properly an exclusive state.

bq. Implementing region locks is too far – I'm asking for some back of the 
napkin discussionb.
If a server holds a lock for a region for time Tlock during each day, and 
number of regions is N probability of some region lock (or table read-only 
lock) being held at any given time is (1-(1-(Tlock/Tday))^N), if I am writing 
this correctly. For 5 seconds of locking per day per region, for 1 regions 
(not unreasonable for a large table/cluster) we will be holding some lock about 
44% of the time for region operations.
Calculating the probability of any lock being in recovery (server went down 
with a lock less than recovery time ago) can also be done, but numbers for some 
parameters (how often do servers go down?) will be very speculative...

bq. I think we need some measurements how much throughput we can get in ZK or 
with a ZK-lock implementation and compare his with # rs of watchers * # of 
regions * number of ops...
Will there be many watchers/ops? You only watch and do ops when you acquire the 
lock, so unless region operations are very frequent... 

bq. The current regions-in-transition (RIT) code basically assumes that an 
absent znode is either closed or opened. RIT znodes are present when the region 
is in the inbetween states (opening, closing,
I don't think either closed or opened is good enough :) Also, RITs don't 
cover all scenarios and things like table ops don't use them at all.

bq. I know I've suggested something like this before. Currently the RS 
initiates a split, and does the region open/meta changes. If there are errors, 
at some point the master side detects a timeout. An alternative would have 
splits initiated RS on the rs but have the master do some kind of atomic 
changes to meta and region state for the 3 involved regions (parent, daughter a 
and daughter b).
Yeah, although in other models (locks, persistent state) that is not required. 
Also if meta is cache for clients and not source of truth meta changes can 
still be on the server; I assume by meta you mean global state, wherever that 
is?

bq. We need to be careful about ZK – since it is a network connection also, 
exceptions could be failures or timeouts (which succeed but wan't able to ack). 
If we can describe the properties (durable vs erasable) and assumptions (if the 
wipeable ZK is source of truth, how do we make sure the version state is 
recoverable without time travel?)
The former applies to any distributed state; as for the latter - I was thinking 
of ZK+WAL if we intend to keep ZK wipeable.


 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and 

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-03-28 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616129#comment-13616129
 ] 

Jonathan Hsieh commented on HBASE-5487:
---

To do a major overhaul, we need something stronger than the code is hard to 
read.  I agree that it is hard to follow (see: 
http://people.apache.org/~jmhsieh/hbase/120905-hbase-assignment.pdf) but it 
seems to be basically working which is a pretty strong argument.  Let's compare 
and point out what is wrong/broken in the current implementation and how the 
new design won't have those problems.  

The spreadsheet link is my first step to enumerating semantics and distilling 
the set of possible problems and things that are being guarded from races.  Any 
major-overhaul solution should make sure that these operations, when issued 
concurrently, interact according to a sane set of semantics in the face of 
failures.

bq. Only for the current document version... tables could be added

So I buy open/close as a region operation.  split/merge are multi region 
operations -- is there enough state to recover from a failure?

So alter table is a region operation? Why isn't it in the state machine? 

bq. Hmm... that would require implementing region locks, and having a very 
large cluster. I am talking more about unacceptable blocking of user 
operations, and management of expiring locks in presense of real-life failures.

Implementing region locks is too far -- I'm asking for some back of the napkin 
discussionb.  I think we need  some measurements how much throughput we can get 
in ZK or with a ZK-lock implementation and compare his with # rs of watchers * 
# of regions * number of ops..

The current regions-in-transition (RIT) code basically assumes that an absent 
znode is either closed or opened.  RIT znodes are present when the region is in 
the inbetween states (opening, closing, 

bq. You mean like WAL for operations?

Yeah, we could call it an intent log.  It would have info so that a promoted 
backup master can look in one place and complete an operation started by the 
downed original master.

bq. ... Also usually that would mean RSes won't be able to initiate operations 
(like split) - they will have to go thru master (which I would argue is ok).

I know I've suggested something like this before.  Currently the RS initiates a 
split, and does the region open/meta changes.  If there are errors, at some 
point the master side detects a timeout.  An alternative would have splits 
initiated RS on the rs but have the master do some kind of atomic changes to 
meta and region state for the 3 involved regions (parent, daughter a and 
daughter b).  

bq. Depends on where we store it, but yeah these have to be transactional. Last 
section (very short ) suggests using ZK, which already supports that.

We need to be careful about ZK -- since it is a network connection also, 
exceptions could be failures or timeouts (which succeed but wan't able to ack). 
 If we can describe the properties (durable vs erasable) and assumptions (if 
the wipeable ZK is source of truth, how do we make sure the version state is 
recoverable without time travel?)


 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-03-28 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616130#comment-13616130
 ] 

Jonathan Hsieh commented on HBASE-5487:
---

I'll deal with the spreadsheet comments related by putting int somewhere that 
comments can be easily dropped into.  


 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-03-28 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616885#comment-13616885
 ] 

Enis Soztutar commented on HBASE-5487:
--

bq. Yeah, we could call it an intent log. It would have info so that a 
promoted backup master can look in one place and complete an operation started 
by the downed original master.
This was what I proposed in an earlier comment:
https://issues.apache.org/jira/browse/HBASE-5487?focusedCommentId=13551519page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13551519
What we need is a transactional authoritative, fault tolerant, and durable 
source for ground truth about the cluster state, and execution state. Whether 
we can do it using ZK or a master WAL, or a system table (using an implicit 
WAL), we will have to figure it out. 


 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-03-28 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616971#comment-13616971
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

[~jmhsieh] Will reply in details later/tomorrow, one clarification - the 
justification was not just code is hard to understand, but code is hard to 
reason about AND we want to expand it to support a bunch of features.. I think 
the whole idea of general framework, whatever it is, is to have some unified 
model of things and way of doing these manipulations and expanding their set, 
that is not patchwork.

[~enis] I think finding one is the least of our problems :) How to use it to 
store/manage state is the question.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-03-27 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13615901#comment-13615901
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

bq. Is the assignment manager the only master coordinated task in scope?
Only for the current document version... tables could be added.
bq. Instead of asserting it is not clear if table (+region) locks scale, let's 
find out.
Hmm... that would require implementing region locks, and having a very large 
cluster. I am talking more about unacceptable blocking of user operations, and 
management of expiring locks in presense of real-life failures.
bq. Master operations and processes can clash and we should understand where we 
need concurrency control. (I'm working on a table – here's an draft distilled 
version [1], there exists an overly detailed version that I'll share once i get 
it fixed)
Comments below.
bq. Should there be a notion of queuing operations? (locking, or an actual 
queue) Should these operations be generically logged so they can complete if a 
master goes down in the middle? (ex: master goes down during a move operation 
after the close but before the open on the new rs).
You mean like WAL for operations?

bq. The design principles is actually more of a proposed design.
Yeah, sorry, wanted to split it into two sections but never did. Will rename.

bq. how do we deal with operations where we need locks on multiple region 
because we are reading or modifying multiple regions – e.g. splits, merges, 
snapshots? Matteo Bertozzi had suggested in another jira making a the meta row 
per table, or maybe part of the solution is using the multi-row single meta 
region transaction.
Depends on where we store it, but yeah these have to be transactional. Last 
section (very short :)) suggests using ZK, which already supports that.


bq. What are alternatives? why this approach vs others?
I can expand the doc... the implicitly mentioned existing alternatives are 
locks, which I would argue scale less and are harder to manage; or transaction 
approach that is currently used (although not unified), for example via 
transient transaction nodes.

Actually, one alternative approach I saw used for such things is to simplify 
concurrency of operations/etc. with actor-like model, where master has logical 
cluster state and previously saved target state, and periodically (often) takes 
an epic lock, looks at them quickly, and based on what it is doing, outputs new 
target cluster state and a list of physical things to do Then it releases epic 
lock, and the new target state is saved, and operations performed.
That way all state-management code becomes simple, because it runs in one place 
with no concurrency, and recovery just has to compare real cluster state with 
destination state. 
But this will require thinking about this differently. 
Also usually that would mean RSes won't be able to initiate operations (like 
split) - they will have to go thru master (which I would argue is ok).
Also it's not clear whether this will become too much of a bottleneck.


bq. Where do you think the new information will be, META table?
It seems to me that ZK would be better (see last section), but META is also an 
option.

From the spreadsheet:
bq. Enabling and disabling table operations should be blocked when  any of 
these simple region operations are in progress
Not clear why (logically).
bq. move
Move is close and open, doesn't require consistency, right?
bq. Regionserver Processes ... However, the individual operations must  
maintain the table integrity property.
Not clear what this means for snapshots.
  

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-03-26 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614410#comment-13614410
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

Any opinion? Thanks. I think the same model should be used for all tables too, 
but it's less necessary at this time...

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-03-26 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614484#comment-13614484
 ] 

Jonathan Hsieh commented on HBASE-5487:
---

Thanks for writing this up.  I read the first two sections and haven't spent 
time reading the details of the others yet.  (already have a bunch of 
quetsions).

So the only problem is that the assignment manager and ssh is difficult to 
reason about?  Is the assignment manager the only master coordinated task in 
scope? 

I think there are more problems than that that we should enumerate:
* Instead of asserting it is not clear if table (+region) locks scale, let's 
find out.  
* Master operations and processes can clash and we should understand where we 
need concurrency control.  (I'm working on a table -- here's an draft distilled 
version [1], there exists an overly detailed version that I'll share once i get 
it fixed)
* Should there be a notion of queuing operations?  (locking, or an actual 
queue) Should these operations be generically logged so they can complete if a 
master goes down in the middle? (ex: master goes down during a move operation 
after the close but before the open on the new rs).

The design principles is actually more of a proposed design.  

Design principles::region record
* how do we deal with operations where we need locks on multiple region 
because we are reading or modifying multiple regions -- e.g. splits, merges, 
snapshots?  [~mbertozzi] had suggested in another jira making a the meta row 
per table, or maybe part of the solution is using the multi-row single meta 
region transaction.

What are alternatives?  why this approach vs others? 

[1]  
https://docs.google.com/spreadsheet/ccc?key=0AiVCAt6zRttFdDRVaG56RnpQZlNNZUNsRVJsSXM3YlEusp=sharing

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-03-26 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614612#comment-13614612
 ] 

Jimmy Xiang commented on HBASE-5487:


Where do you think the new information will be, META table?

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-03-19 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13606744#comment-13606744
 ] 

Jonathan Hsieh commented on HBASE-5487:
---

+1 Please do a write up -- the standard stuff -- what problems exist (op1 and 
op2 clashes, etc), the proposed fix, and how it would fix it.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed

 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-03-13 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601950#comment-13601950
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

Just checking; is this issue still relevant? I am particularly interested in 
all-encompassing solution for region management that would allow you to read AM 
and not go insane (I was just reading/debugging 0.94 AM for some time, that's 
why I remembered). I have a sketch of an idea, should I write it up?

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed

 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-03-13 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601991#comment-13601991
 ] 

Enis Soztutar commented on HBASE-5487:
--

Yes, please write is up. Although AM related things were not initially in the 
main focus. 

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed

 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-01-21 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559338#comment-13559338
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

I have some different priorities currently, but if I have time I will try to do 
short write-up on ZK + backup table based approach. Maybe towards the eow...

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed

 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-01-17 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556116#comment-13556116
 ] 

Jonathan Hsieh commented on HBASE-5487:
---

bq. My main point is that currently two external region states in ZK and a 
table, plus two complex internal states in server and master, are a root of 
some non-trivial part of all evil. Especially the nature of ZK state, that 
comes and goes. Imho we should remove one of them from being actively managed. 
bq. ZK has notifications and seems better suited for locking/atomic updates; 
w.r.t. availability it has no disadvantage since everything (e.g. locating the 
root) fails without ZK anyway, even if we do remove state machines from there.
bq. System tables are more native to HBase and have built-in WAL, plus have 
advantages for recovery.

ZK notifications are useful when there are two way communications -- log 
splitting, region server initiated splits.  I do agree that opens and closes 
seems more complicated than necessary. 

bq. Maybe instead of WAL we can use ZK as universal source of region state (w/o 
assorted transient nodes e.g. one node per region that is always there, or 
maybe two if we want to use lock with lease to unassign) and mirror it to 
system table that is only used for recovery like you describe, or when ZK state 
disappears? 
bq. Otherwise I think we should just use system table as universal source of 
region state and get rid of ZK region state.
bq. With one source of truth master and server logic can probably be dumber.

Simpler, not dumber. :) 

The 0.20/0.89 version of the master actually had most things in meta -- and 
there are definitely some trade-offs with that approach.  In the transition to 
the 0.90 style master we traded some pain points for new ones.  If we change 
this again we need to make sure we keep those previous ones in mind to not 
duplicate the worst of them again. 

Is there any chance we could get a high level design deck/doc that illustrates 
these processes currently and what looks like after we move to this proposed 
FATE-like mechanism? Also, what operations would eventually get ported to this 
mechanism?  I think discussion and an example at the design/rpc comms level 
would help a whole lot by grounding this conversion in reality and not require 
diving into the code.  Once we basically agree on design, code reviews would be 
easier because they'd be focused on the implementation matching the design.



 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Nick Dimiduk

 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-01-17 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556706#comment-13556706
 ] 

Nick Dimiduk commented on HBASE-5487:
-

I've pushed my FATE WIP to 
[github|https://github.com/ndimiduk/hbase/tree/5487-protobuf-repos] so you guys 
can see what a fate repo might look like for us. I appear to have introduced a 
bug while porting over the logic, but the general ideas is there.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Nick Dimiduk

 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   >