[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14725774#comment-14725774 ] Sergey Shelukhin commented on HBASE-5487: - Long live AssignmentManager! :) > Generic framework for Master-coordinated tasks > -- > > Key: HBASE-5487 > URL: https://issues.apache.org/jira/browse/HBASE-5487 > Project: HBase > Issue Type: New Feature > Components: master, regionserver, Zookeeper >Affects Versions: 0.94.0 >Reporter: Mubarak Seyed >Priority: Critical > Attachments: Entity management in Master - part 1.pdf, Entity > management in Master - part 1.pdf, Is the FATE of Assignment Manager > FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, > hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf > > > Need a framework to execute master-coordinated tasks in a fault-tolerant > manner. > Master-coordinated tasks such as online-scheme change and delete-range > (deleting region(s) based on start/end key) can make use of this framework. > The advantages of framework are > 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for > master-coordinated tasks > 2. Ability to abstract the common functions across Master -> ZK and RS -> ZK > 3. Easy to plugin new master-coordinated tasks without adding code to core > components -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13850594#comment-13850594 ] Jonathan Hsieh commented on HBASE-5487: --- Having snapshots succeed while splits, merges and alters can be handled with the open synchronization point. Having snapshots succeed through failovers would require some major revamping. We can file that issue -- roughly it would be coordinating based on region name instead of region server name. (non-trivial work). Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Sergey Shelukhin Priority: Critical Attachments: Entity management in Master - part 1.pdf, Entity management in Master - part 1.pdf, Is the FATE of Assignment Manager FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849396#comment-13849396 ] Sergey Shelukhin commented on HBASE-5487: - That's an interesting one. Given that snapshots by default have no guarantees wrt consistent writes between regions (or do they), seems like snapshot should get the latest schema in case of concurrent alter. Is there any consideration (other the arguably implementation issues of not recovering from close-open) that would prevent that? For consistent snapshots presumably the schema can be snapshotted first, I am assuming they don't stop the world and just take seqId/mvcc/ts or something, so the newer values with new schema will just not exist. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Sergey Shelukhin Priority: Critical Attachments: Entity management in Master - part 1.pdf, Entity management in Master - part 1.pdf, Is the FATE of Assignment Manager FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849403#comment-13849403 ] Jonathan Hsieh commented on HBASE-5487: --- The problem isn't that you would get snapshots with inconsistent schemas if the two operations were issued concurrently. It is that open is async and outside the table write lock which means the snapshot would fail because the region may no have been open. This is a particular case where we would want the open routines to act synchronously with table alters and split daugher region opens (both open before table lock released and snapshot can happen). Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Sergey Shelukhin Priority: Critical Attachments: Entity management in Master - part 1.pdf, Entity management in Master - part 1.pdf, Is the FATE of Assignment Manager FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849452#comment-13849452 ] Sergey Shelukhin commented on HBASE-5487: - IMHO this, in case of opens, promotes not being fault tolerant. In large clusters you cannot get around servers failing and regions closing and reopening. Snapshot should just be able to ride over that. Splits are more interesting. Esp. if snapshots are used more (MR over snapshots), it may be nonviable to prevent splits and other operations for the duration of every snapshot, alter, ... Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Sergey Shelukhin Priority: Critical Attachments: Entity management in Master - part 1.pdf, Entity management in Master - part 1.pdf, Is the FATE of Assignment Manager FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849682#comment-13849682 ] Andrew Purtell commented on HBASE-5487: --- MR over snapshots is already a terrible idea from a security perspective. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Sergey Shelukhin Priority: Critical Attachments: Entity management in Master - part 1.pdf, Entity management in Master - part 1.pdf, Is the FATE of Assignment Manager FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849732#comment-13849732 ] Sergey Shelukhin commented on HBASE-5487: - Yet, it's a very good idea from perf perspective, and logical given that many large MR jobs don't need realtime data. Snapshots can still be secured, and table-level granularity is sufficient for most cases I'd suspect. Regardless, it was just an example here. MR over snapshots can be discussed HBASE-8369 :) Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Sergey Shelukhin Priority: Critical Attachments: Entity management in Master - part 1.pdf, Entity management in Master - part 1.pdf, Is the FATE of Assignment Manager FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849738#comment-13849738 ] Andrew Purtell commented on HBASE-5487: --- bq. Snapshots can still be secured This is debatable, and that is my point for bringing it up here. All of the enterprise customers I interact with universally want more than table-level granularity, which is why we spent so much time on cell granularity features recently - all of which are totally defeated by MR over snapshots. Bringing up MR snapshots as technical justification for other arguments needs qualification that MR over snapshots itself may have limited applicability. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Sergey Shelukhin Priority: Critical Attachments: Entity management in Master - part 1.pdf, Entity management in Master - part 1.pdf, Is the FATE of Assignment Manager FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849798#comment-13849798 ] Sergey Shelukhin commented on HBASE-5487: - There are other justifications... my point is that having a lock over distributed, lengthy operations on tables, esp. with region-level component blocking table-level ops also, is the king of all epic locks, and can cause lots of problems, esp. in large clusters. Snapshot is just one example. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Sergey Shelukhin Priority: Critical Attachments: Entity management in Master - part 1.pdf, Entity management in Master - part 1.pdf, Is the FATE of Assignment Manager FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846861#comment-13846861 ] Jonathan Hsieh commented on HBASE-5487: --- Matteo and Aleks bring up an interesting case that any new master design should handle. HBASE-10136 Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Sergey Shelukhin Priority: Critical Attachments: Entity management in Master - part 1.pdf, Entity management in Master - part 1.pdf, Is the FATE of Assignment Manager FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13841811#comment-13841811 ] Sergey Shelukhin commented on HBASE-5487: - Ah well, I never got to part 2. Did you guys make progress on this? I may have time to resurrect this again soon. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Sergey Shelukhin Priority: Critical Attachments: Entity management in Master - part 1.pdf, Entity management in Master - part 1.pdf, Is the FATE of Assignment Manager FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799905#comment-13799905 ] Jonathan Hsieh commented on HBASE-5487: --- [~saint@gmail.com] acked. Let's post design docs here, and move discussions comparing them to the mailing list. [~sershe] Let's name threads prefixed with [hbase-5487] in subject, and maybe rename subject lines if we get into a more focused discussion that warrants it own thread (it one part gets long), and in general reply inline. (I found this interesting http://en.wikipedia.org/wiki/Posting_style). I'll start by copying and pasting unresolved parts of the response-reply above to the dev mailing list. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Sergey Shelukhin Priority: Critical Attachments: Entity management in Master - part 1.pdf, hbckMasterV2b-long.pdf, hbckMasterV2-long.pdf, Is the FATE of Assignment Manager FATE.pdf, Region management in Master5.docx, Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799465#comment-13799465 ] Devaraj Das commented on HBASE-5487: Quick comments: 1. Master knows of all external updates to the system store - Are there such updates happening without master's knowledge 2. I presume once the client is told an operation is accepted, it would be saved/queued somewhere so even if a different node picks up the master's duties, it can execute the operation. Related to that is that the master should be able to get back with the correct return code for the operation even in the case of fail-overs. Also, the master could have triggered some operations like shutdown handling that should be completed. 3. I think we should support asynchronous operations (submit an operation and check periodically or something). There is no guarantee when a certain operation will complete especially when the operation requires co-ordination with other nodes and/or the node is falling behind in executing operations. We shouldn't force the model to be synchronous (we do not want to hold up precious node resources which we will in synchronous mode). 4. Maybe, we should explicitly state handling cases where the master sends a region operation to a regionserver and the regionserver doesn't get back within some timeout, as one of the requirements. Fencing the regionserver etc are the possible actions when this happens. 5. Should we fail-fast on the client side in case of conflicts? For example, if a client issued drop table and this operation is in progress. Another client comes in and says create table with the same name. We should allow clients to read the store without going through the master. 6. Wondering whether we need to differentiate priorities/ordering/etc. for operations like move region initiated by the master/balancer versus initiated by the user. Who wins, etc. These operations are advanced and won't be commonplace but worth calling it out? Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Sergey Shelukhin Priority: Critical Attachments: Entity management in Master - part 1.pdf, Region management in Master5.docx, Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799576#comment-13799576 ] Jonathan Hsieh commented on HBASE-5487: --- Yesterday, I shared with sergey and some of the folks interested this a draft of the design I've been working on (I'll call it the hbck-master) and a list of questions related to Sergey's design. Since sergey's has got master5 in the name of the doc I'll refer to it as master5. He's answered some question in email but we should do technical discussions out here. We'll be working together to hash out holes in each others designs and potentially merge designs. I have a lot of questions. I'll hit the big questions first. Also would i be possible to put a version of this up as gdoc so we can point out nits and places that need minor clarification? (I have a marked up physical copy version of the doc, would be easier to provide feedback). Main Concerns: What is a failure and how do you react to failures? I think the master5 design needs to spend more effort to considering failure and recovery cases. I claim there are 4 types of responses from a networked IO operation - two states we normally deal with ack successful, ack failed (nack) and unknown due to timeout that succeeded (timeout success) and unknown due to timeout that failed (timeout failed). We have historically missed the last two timeout cases or assumed timeout means failure nack. It seems that master5 makes the same assumptions. I'm very concerned about what we need to do to invalidate information cached RS information at clients in the case of hang, and that will violate the isolation guarantees that we claim to provide. I really want a slice in-depth failure handling case analysis including the master with cached rs assignments for move and something more complicated such as split or alter. I really want more invariant specified for the FSM states. e.g. if a region is in state X, does it have a row in meta? does have data on the FS? is it open on another region? is it open on only one region? I think having 8 pages of tables at the back of the master5 doc can be more concise and precise which will help us get attempt to prove correctness. Clarification questions: 1) State update coordination. What is a state updates from the outside Do RS's initiate splitting on their own? Maybe a picture would help so we can figure out if it is similar or different from hbck-master's? 2) Single point of truth. What is this truth? what the user specficied actions? what the rs's are reporting? the last state we were confirmed to be at? hbck-master tries to define what single point of truth means by defining intended, current, and actual state data with durability properties on each kind. What do clients look at who modifies what? 3) Table record: if regions is out of date, it should be closed and reopened. It is not clear in master5 how regionservers find out that they are out of date. Moreover, how do clients talking to those RS's with stale versions know they are going to the correct RS especially in the face of RS failures due to timeout? 4) region record: transition states. Shouldn't be defined as part of the region record? (This is really similar to hbck-masters current state and intended state. ) 5) Note on user operations: the forgetting thing is scary to me -- in your move split example, what happens if an RS reads state that is forgotten? 6) table state machine. how do we guarantee clients are not writing to against out of date region versions? (in hang situations, regions could be open on multple places -- the hung RS and the new RS the region was assigned to and successfully opened on) 7) region state machine. Earlier draft hand splitting and merge cases. Are they elided in master5 or are not present any more. How would this get extended handle jeffrey's distributed log replay/fast write recovery feature? 8) logical interactions: sounds like master5 allows concurrent operations in specfiic regions and and specfiic table. (e.g. it will allow moves and splits and merges on the same region). hbck-master (though not fully documented) only allows certain region transitions when the table is enabled or if the table is disabled. Are we sure we don't get into race conditions? What happens if disable gets issued -- its possible for someone to reopens the region and for old clients to continue writing to it even though it is closed? nit. 9) in cursive mean in italics. :) 10) The table operations section have tables which I believe are the actions between FSM states in the table or region fsms. Is this correct? Can the edges be labeled to describe which steps these transitions correspond to? Short doc: nit: Design Constraints, code should: Have AM logic isolated from the persistent storage of state.
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799619#comment-13799619 ] Sergey Shelukhin commented on HBASE-5487: - Answers lifted from email also (some fixes + one answer was modified due to clarification here :)). bq. What is a failure and how do you react to failures? I think the master5 design needs to spend more effort to considering failure and recovery cases. I claim there are 4 types of responses from a networked IO operation - two states we normally deal with ack successful, ack failed (nack) and unknown due to timeout that succeeded (timeout success) and unknown due to timeout that failed (timeout failed). We have historically missed the last two cases and they aren't considered in the master5 design. There are a few considerations. Let me examine if there are other cases than these. I am assuming the collocated table, which should reduce such cases for state (probably, if collocated table cannot be written reliably, master must stop-the-world and fail over). When RS contacts master to do state update, it errs on the side of caution - no state update, no open region (or split). Thus, except for the case of multiple masters running, we can always assume RS didn't online the region if we don't know about it. Then, for messages to RS, see Note on messages; they are idempotent so they can always be resent. bq. 1) State update coordination. What is a state updates from the outside Do RS's initiate splitting on their own? Maybe a picture would help so we can figure out if it is similar or different from hbck-master's? Yes, these are RS messages. They are mentioned in some operation descriptions in part 2 - opening-opened, closing-closed; splitting, etc. bq. 2) Single point of truth. hbck-master tries to define what single point of truth means by defining intended, current, and actual state data with durability properties on each kind. What do clients look at who modifies what? Sorry, don't understand the question. I mean single source of truth mainly about what is going on with the region; it is described in design considerations. I like the idea of intended state, however without more detailed reading I am not sure how it works for multiple ops e.g. master recovering the region while the user intends to split it, so the split should be executed after it's opened. bq. 3) Table record: if regions is out of date, it should be closed and reopened. It is not clear in master5 how regionservers find out that they are out of date. Moreover, how do clients talking to those RS's with stale versions know they are going to the correct RS especially in the face of RS failures due to timeout? On alter (and startup if failed), master tries to reopen all regions that are out of date. Regions that are not opened with either pick up the new version when they are opened, or (e.g. if they are now Opening with old version) master discovers they are out of date when they are transitioned to Opened by RS, and reopens them again. As for any case of alter on enabled table, there are no guarantees for clients. To provide these w/o disable/enable (or logical equivalent of coordinating all close-s and open-s), one would need some form of version-time-travel, or waiting for versions, or both. bq. 4) region record: transition states. This is really similar to hbck-masters current state and intended state. Shouldn't be defined as part of the region record? I mention somewhere that could be done. One thing is that if several paths are possible between states, it's useful to know which is taken. But do note that I store user intent separately from what is currently going on, so they are not exactly similar as far as I see. bq. 5) Note on user operations: the forgetting thing is scary to me -- in your move split example, what happens if an RS reads state that is forgotten? I think my description of this might be too vague. State is not forgotten; previous intent is forgotten. I.e. if user does several operations in order that conflict (e.g. split and then merge), the first one will be canceled (safely :)). Also, RS does not read state as a guideline to what needs to be done. bq. 6) table state machine. how do we guarantee clients are writing from the correct version in the in failures? The intent is to fence the WAL for region server, the way we do now. One could also use other mechanism. Perhaps I could specify it more clearly; I think the problem of making sure RS is dead is nearly orthogonal. In my model, due to how opening region is committed to opened, we can only be unsure when the region is in Opened state (or similar states such as Splitting which are not present in my current version, but will be added). In that case, in absence of normal transition, we cannot do literally anything with the region unless
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799634#comment-13799634 ] stack commented on HBASE-5487: -- Suggest moving the out on the dev mailing list as per the bible quoted below. Start a thread there? From Producing Open Source Software: Make sure the bug tracker doesn't turn into a discussion forum. Although it is important to maintain a human presence in the bug tracker, it is not fundamentally suited to real-time discussion. Think of it rather as an archiver, a way to organize facts and references to other discussions, primarily those that take place on mailing lists. There are two reasons to make this distinction. First, the bug tracker is more cumbersome to use than the mailing lists (or than real-time chat forums, for that matter). This is not because bug trackers have bad user interface design, it's just that their interfaces were designed for capturing and presenting discrete states, not free-flowing discussions. Second, not everyone who should be involved in discussing a given issue is necessarily watching the bug tracker. Part of good issue management...is to make sure each issue is brought to the right peoples' attention, rather than requiring every developer to monitor all issues. In the section called “No Conversations in the Bug Tracker” in Chapter 6, Communications, we'll look at ways to make sure people don't accidentally siphon discussions out of appropriate forums and into the bug tracker. Pg. 50 of http://producingoss.com/en/producingoss.pdf Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Sergey Shelukhin Priority: Critical Attachments: Entity management in Master - part 1.pdf, hbckMasterV2-long.pdf, Region management in Master5.docx, Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799660#comment-13799660 ] Sergey Shelukhin commented on HBASE-5487: - We need some convention for inline responses in the mailing list (or tell me if there's one) :) Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Sergey Shelukhin Priority: Critical Attachments: Entity management in Master - part 1.pdf, hbckMasterV2-long.pdf, Region management in Master5.docx, Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798467#comment-13798467 ] Sergey Shelukhin commented on HBASE-5487: - The doc hasn't been out for long; just clarifying - anyone interested in providing feedback for part 1? It'd be really nice to start working out implementation details in part 2 with some confidence, and/or writing code. Should I assume lack of interest or silent agreement to rewrite according to part 1? Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Sergey Shelukhin Priority: Critical Attachments: Entity management in Master - part 1.pdf, Region management in Master5.docx, Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798612#comment-13798612 ] Jonathan Hsieh commented on HBASE-5487: --- I'm doing a pass, will provide feedback tomorrow. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Sergey Shelukhin Priority: Critical Attachments: Entity management in Master - part 1.pdf, Region management in Master5.docx, Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796530#comment-13796530 ] Nicolas Liochon commented on HBASE-5487: +1 for Enis' requirements list :-). I tend to think that AM and meta should be collocated. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master5.docx, Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796855#comment-13796855 ] Nick Dimiduk commented on HBASE-5487: - I'm also a fan of Enis's list, particularly AM should be understandable by simple human beings like myself. The observation I'll add here is that AM and meta don't necessarily need to be collocated. What is necessary is that AM maintain a strongly consistent view of the world, at least from what I understand about the current design. That requirement can be relaxed iff there's an explicitly distributed state management system. Such a system is probably composed out of idempotent operations over CRDTs. I also question the wisdom of moving away from ZK for management of active cluster state, primarily because in our current architecture, that component is completely out of band of data operations. Meaning, the activities which put stress on the configuration consensus bits are different from the operations that put stress on a data provider. (Yes, data activity results in region relocation, but that's a maintenance task, not direct involvement.) Moving to dependency on collocation unnecessarily conflates those two aspects of the system. If the issues with Zookeeper originate from implementation details, why not fix implementation rather than look to a new architecture? For instance, the CoreOS folk have a little something called [etcd|https://github.com/coreos/etcd#etcd]. Raft specifically may not provide the correct kind of available consensus we need; the idea is to examine both the baby and the bathwater. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master5.docx, Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796905#comment-13796905 ] Nicolas Liochon commented on HBASE-5487: bq. AM and meta don't necessarily need to be collocated If there are separated, you double the failure probability, as you need both AM and .META. to work. Moreover, speaking to .meta. becomes a distributed problem, while its less the case when they are collocated (only less because of HDFS). bq. moving away from ZK for management I believe we will need it to determine who is the AM lead. I don't really know about storing in zookeeper vs. meta. As Jimmy said using zookeeper to do rpc calls seems wrong however. I guess this can be decided later. For the requirements, I don't have anything to add to Enis' list. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master5.docx, Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796982#comment-13796982 ] ramkrishna.s.vasudevan commented on HBASE-5487: --- Started going through this document. With my experience with AM definitely the number of states we have and the dependency on ZK callback makes things bit difficult to understand and track and the state of truth is spread across. In the doc, for the create table scenario there are cases where the Create table failure on master abort will result in a table creation that has lesser number of regions actually specified by the clients in the split. The master failover part is another critical area as how we collect the alive and dead RS list and the list of Regions that were partially in either opening/closing and splitting. It is this failiure condition where we end up in lot of hidden areas. Will read the document and share the ideas if any. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master5.docx, Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796985#comment-13796985 ] ramkrishna.s.vasudevan commented on HBASE-5487: --- HBASE-5583 is one such JIRA that handles create table failure cases. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master5.docx, Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797043#comment-13797043 ] Sergey Shelukhin commented on HBASE-5487: - I don't think it can happen on create. Until all regions are moved to Closed state after being created (atomically via multi-row tx), table won't leave Creating state. If there's failover all regions are erased and created from scratch. Create table is rare enough for that to work. [~enis] Wrt req list, mostly agree, however: bq. Bulk region operations Can you please elaborate? Is it the same as modifying several regions' state under multi-row lock? bq. Region operations should be isolated from [snip] table operations (disabling / disabled table, schema changes, etc) and cluster shutdown. AM [snip] should NEVER know about table state (disable/disabling). Strongly disagree with this. If we are doing bunch of balancing and user disables a table at the same time, we have to handle it. If user tries to force-assign regions of a table that is halfway thru create, we have to handle this. For alter, we need to reopen regions, which will have to work w/splits and merges (it's covered in my doc). For what purpose do you want to isolate them? AM should not know about details e.g. schema logic, but it should know about logistics. bq. No master abort when a region’s state cannot be determined. This results in support cases where master cannot start, and without master things become even worse. We should “quarantine” the regions if needed absolutely. That is dangerous. IIRC in my spec I only put master abort if somebody changes table state under master; but in general, if region is in unknown state it's better to make admin act, than to just silently disappear part of data - that can lead to wrong results. Perhaps table needs to be quaranteened then. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master5.docx, Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797368#comment-13797368 ] Sergey Shelukhin commented on HBASE-5487: - One more update from discussion here: - we currently have many operations that cannot be monitored other than by side effects (create table), or at all. We need good way for user to wait for operations. Given that we send request to master, and many operations can recover from master failure, we cannot use simple async API with request and async response (at least not on the lowest level - client library can hide master failover and provide that API). The lowest-level master API should involve some sort of persistent operation cookie, so that you could still wait for operation after failover. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Sergey Shelukhin Priority: Critical Attachments: Region management in Master5.docx, Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797380#comment-13797380 ] stack commented on HBASE-5487: -- Are we conflating functionality here (going by last comment above by Sergey)? There is AM and then there is another facility that uses AM to run sequences of steps to achieve an end (e.g. enable table)? Or is the notion that a revamped AM would do all? The long-running (enable a table w/ 1M regions) and short-term (assign region)? If it is to do both, I suggest we call the new facility GOD. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Sergey Shelukhin Priority: Critical Attachments: Region management in Master5.docx, Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797395#comment-13797395 ] Sergey Shelukhin commented on HBASE-5487: - You'd need some way to connect these end with what AM is doing, so AM will have to support operations attached to its actions even if there's separate operation management. Moreover, you'd find out that these are not steps, they are state goals. For example, if you are disabling table, you want to close regions. So in case of separate operation manager, you might create tasks to close all regions. But what if some server fails? Now some of your regions are already closed. Separate operations to close region might fail now, but the goal is achieved. If I start disabling table and then kill all RS-es, the table is now disabled :) But all operations would fail. State goals fit much more naturally in AM than steps. I want to avoid steps as much as possible. Stateful (as in, having separate step) multi-step operations are also hard to coordinate. In the above example, during recovery, you don't want to reopen region if the table is disabling, but you don't know until it's actually disabled if the table disable is an external operation. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Sergey Shelukhin Priority: Critical Attachments: Region management in Master5.docx, Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797409#comment-13797409 ] Enis Soztutar commented on HBASE-5487: -- I think as a mental exercise to validate the new design, we should think about the cases for the following issues opened recently so that we can ensure that these classes of problems are eliminated: - HBASE-9724 Failed region split is not handled correctly by AM - HBASE-9721 meta assignment did not timeout - HBASE-9696 Master recovery ignores online merge znode - HBASE-9777 Two consecutive RS crashes could lead to their SSH stepping on each other's toes and cause master abort - HBASE-9773 Master aborted when hbck asked the master to assign a region that was already online - HBASE-9525 Move region right after a region split is dangerous - HBASE-9514 Prevent region from assigning before log splitting is done - HBASE-9480 Regions are unexpectedly made offline in certain failure conditions - HBASE-9387 Region could get lost during assignment bq. Can you please elaborate? Is it the same as modifying several regions' state under multi-row lock? Bulk loading requirement is there, so that we do multiple operations in parallel, sending openRegions rpcs for multiple regions at the same time, and not doing one-by-one assignment. That is all. bq. That is dangerous. IIRC in my spec I only put master abort if somebody changes table state under master; but in general, if region is in unknown state it's better to make admin act, than to just silently disappear part of data - that can lead to wrong results. Quaranteing the table or region is fine, but master should not be down because of this (for example, a region can fail to open and you would want to track how many times the region failed to open so that you can decide at some point that the region should be quarantened state (or failed open state). I think there was some issue the region bouncing from server to server indefinitely. For table operations intermixing with region operations, I'll have to read your updated doc. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Sergey Shelukhin Priority: Critical Attachments: Region management in Master5.docx, Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797472#comment-13797472 ] Sergey Shelukhin commented on HBASE-5487: - Ok, I split the doc in half. That way it will be easier to read and manager. Part 1 is ready (as a current version), and describes high level design, operation semantics and interaction (I think the latter might be interesting for [~jmhsieh] It also tries to capture the requirement lists above and high-level implementation (whatever is agreed upon to some degree). Please tell me if something is missing or wrong. Part 2 I will keep attaching updates. It covers the design of operations - state machines, exact steps, how client tracks it, how recovery works, etc. It will follow part 1. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Sergey Shelukhin Priority: Critical Attachments: Entity management in Master - part 1.pdf, Region management in Master5.docx, Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797507#comment-13797507 ] Eric Newton commented on HBASE-5487: I'm sorry for asking such a basic question... could someone please comment: what does AM stands for? I did a quick search through the ticket and the attachments and it didn't pop out at me. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Sergey Shelukhin Priority: Critical Attachments: Entity management in Master - part 1.pdf, Region management in Master5.docx, Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797511#comment-13797511 ] Sergey Shelukhin commented on HBASE-5487: - AssignmentManager, a class in HBase master. Often but not always, when talking about it people also imply bunch of auxiliary classes around it like ServerShutdownHandler, RegionClosed/OpenedHandler, ZKTable, etc. Which together implement region assignment in HBase Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Sergey Shelukhin Priority: Critical Attachments: Entity management in Master - part 1.pdf, Region management in Master5.docx, Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797538#comment-13797538 ] Feng Honghua commented on HBASE-5487: - bq.I also question the wisdom of moving away from ZK for management of active cluster state...If the issues with Zookeeper originate from implementation details, why not fix implementation rather than look to a new architecture? Using system table rather than ZK to store state info is for better (cluster restart) performance for big cluster with such as 250K regions. Certainly if we change the way of using ZK ( let master be the single point to read/write ZK, not using ZK's watch/notify mechanism), no correctness/logic difference between using system table and using ZK Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Sergey Shelukhin Priority: Critical Attachments: Entity management in Master - part 1.pdf, Region management in Master5.docx, Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795680#comment-13795680 ] Sergey Shelukhin commented on HBASE-5487: - [~jmhsieh] I am writing out very detailed operation and failover descriptions right now :) Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795737#comment-13795737 ] Jonathan Hsieh commented on HBASE-5487: --- [~sershe] Looking forward to it! Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796362#comment-13796362 ] Enis Soztutar commented on HBASE-5487: -- I also started a document some time ago, but never got to finish it to the level of details I would like. However, I think we can agree on the design goals section which I augmented from the discussion so far: - Robust implementation - Compressive test coverage by mocking server and region assignment states (unit testable without MiniCluster and CM stuff) - Bulk region operations - Region operations should be isolated from server operations (AM vs SSH, log splitting), and table operations (disabling / disabled table, schema changes, etc) and cluster shutdown. AM and SSH should NEVER know about table state (disable/disabling). Server liveness checks can only be done as an optimization (servers can fail after the check is done) - There should be one source of truth - Should be compatible with master failover, and concurrent region operations(split, RS failover, balancer, etc) - AM should guarantee that a region can be hosted by a single region server at any given time - AM should be understandable by simple human beings like myself - Actions for AM should be logged (possibly separately). We would like to be able to construct the history for the regions from logs or some persisted state. - Assignment should be performant and parallelizable. We should target handling millions of regions and thousands of servers. A single region assignment should complete under 1 sec. (1PB data with 1 GB regions = 1M regions) - No master abort when a region’s state cannot be determined. This results in support cases where master cannot start, and without master things become even worse. We should “quarantine” the regions if needed absolutely. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796384#comment-13796384 ] stack commented on HBASE-5487: -- @enis List makes for pretty good set of requirements. We used to talk 100k regions but folks are long past that now so we are behind the curve (Flurry are 250k IIRC) and we may want to tend away from a few large regions and more toward many small regions if we can get AM to perform (advantages: smaller compression runs, easier to free up WALs, etc) Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master5.docx, Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794232#comment-13794232 ] Jimmy Xiang commented on HBASE-5487: Good. I think we are on the same page. bq. just using it as a reliable storage. We probably won't use ZK as a pure storage. Meta table + cache is a good alternative. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794304#comment-13794304 ] Sergey Shelukhin commented on HBASE-5487: - [~jxiang] by janitor, I mean not timeout monitor, but something picking up timeouts of non-master ops like open. It's a rare case and probably never happens in int tests, but there can be a case where RS is taking too long to open. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794318#comment-13794318 ] Jimmy Xiang commented on HBASE-5487: I see. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794792#comment-13794792 ] Sergey Shelukhin commented on HBASE-5487: - Ok, it's harder than I thought, I don't think I will be done today... but I think I have a clear picture now that covers the above feedback, so I am trying to cover all the failover scenarios and state conflicts. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794862#comment-13794862 ] Eric Newton commented on HBASE-5487: Accumulo does manage tablet (region) assignment tracking through the metadata table, and further, uses a distributed state machine to scale up a little beyond a single master node. I have been meaning to write it up, but I've not had a chance. I've not kept up with every HBase improvement, so I don't know if it is pertinent... the accumulo metadata table is typically spread out over 50 - 100% of the available tablet (region) servers. Still, the metadata table, and especially the root table(t), is subject to hot-spotting on large map/reduce jobs where hundreds (or thousands) of clients are learning tablet locations at the same time. Block caching is important, but at some point massive numbers of simultaneous RPC requests to a single node cause delays, or even timeouts and failures. But using accumulo to store accumulo state has scaled well. Accumulo has 2 frameworks for master tasks: * master general state processing: a table should be online, assignments are recorded and servers repeatedly informed * FATE processing, where multi-stage operations are saved, executed and progress is re-recorded The first is general maintenance: keeping the system running. Tablets are assigned, unassigned and in-general balanced. The second allows for temporal deviance: tablets are taken offline for a merge, for example. The step-by-step allocation of resources and state are walked, each step recording progress in zookeeper. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794892#comment-13794892 ] stack commented on HBASE-5487: -- Thanks for the helpful input [~ecn] Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794915#comment-13794915 ] Jonathan Hsieh commented on HBASE-5487: --- FYI, I've been looking at our support cases, and have been thinking and writing up a clean slate design for a master redesign with the problems we've faced in the field in mind. I focus a bit more on invariants necessary in the different states, state transitions with master interactions, extensibility of the model, and on the recovery strategy. It basically takes a pessimistic view of the world and if I had to summarize its spirit, I'd call it the hbck-all-the-time.master. It is currently durable storage agnostic but requires atomic CAS operations (single row or single znode should be sufficient). When I re-read this thread it could use either of the implementation details described here (zk vs meta, etc). It sounds like being based in hbase is preferred so a little more thought is going in that direction. I'm working currently on examples of how to extend it for new features currently (like fast write recovery aka distributed log replay) and proving to myself that it would be immune from problems we've encountered before like double assignments, conflicting concurrent operations (especially during recovery), and regions stuck in transitions in the face of failures, hangs or juliet pauses. I read Sergey's doc after my first cut and while there are some similarities it deviates in other places. (I definitely want more on the error recovery and error prevention mechanics). My hope is to share it sometime this later week so that folks can read, discuss and compare the different designs presented at the upcoming dev meeting. Before and jirae are file for implementation also consider things like upgrades, compatibility and performance. I'm also hoping I'll have time to take a look at the accumulo master's design as well for the discussion. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793881#comment-13793881 ] Feng Honghua commented on HBASE-5487: - [~jxiang] bq. to the uncertainty due to ZK, I don't think it is because the way how we use it. It is more because ZK doesn't support continuous events. You have to set the watch again after each event callback. The problem is that after an event is triggered, when we try to get the data, the data could be changed again so an event is missed that will cause state jump. Agree. 'one-time watch' and 'asynchronous event notification' are the root cause of current AM problem ( I mentioned in above comment, you can find it:-) ). And when I said 'because the way we use it', I meant we use ZK's watch/event mechanism: A process(RS) updates ZK, and B process(master) gets notified the update via watch event. If we use ZK just as a reliable storage, just the way of using meta table, it makes no difference we use meta table or ZK (except performance difference) In the theme of using meta table, we adopt another communication pattern for tasks(assign/split/merge): master requests RS to do something(and master stores the task progress/state to meta table), RS responses master of its progress periodically, master changes the task progress in both memory and meta table... ---under this theme we can use ZK to replace meta table, and avoid previous state transition miss problem as well, since we don't use ZK's watch/event mechanism, just using it as a reliable storage. right? Just clarify, I think we share the same understanding of this problem, you can check my above comments :-) Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793436#comment-13793436 ] Jimmy Xiang commented on HBASE-5487: [~fenghh], to the uncertainty due to ZK, I don't think it is because the way how we use it. It is more because ZK doesn't support continuous events. You have to set the watch again after each event callback. The problem is that after an event is triggered, when we try to get the data, the data could be changed again so an event is missed that will cause state jump. Currently, we do have a region state machine. However, the machine is not strict due to the ZK thing. We could jump over some state, which make the state transition machine can't be strictly enforced. If we go without ZK, we can have a strict state machine to follow. That will make things much predictable. [~sershe], to the janitor, I think we don't need it. Currently, we have a timeout monitor. But it is disabled and will be removed soon I think. Without the monitor, ITBLL with CM runs very well. With 0.96 tip, I tried to run ITBLL with CM with aggressive region moving, and it is perfectly fine. If a RS is gone, SSH should handle it properly and assign regions. If there is a janitor, it will compete with SSH in this case, which probably does more harm than good. To make some RS to serve the role of master, besides we can have meta on it, we can have some (not all, of course, to make [~jesse_yates] happy :) ) system tables on it too. This way, we can support level region assignments, i.e. we can open some regions before the rest, if these regions can be assigned to the master RS, or we can open on this master RS at first, then move away later after system is fully started. This applies to some special regions only for sure. Now, we bundle two import modules (master + meta) in one RS. It is critical to make sure it has light load, not die too often (even better, not die at all). So I think we should move other regions out of the RS once it's promoted to be the master one. I think we should allow only a list of RS with good hardware to be master, if not all RS nodes have decent/same hardware. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793086#comment-13793086 ] Sergey Shelukhin commented on HBASE-5487: - I think it's the approach discussed above. I will update the doc on monday, I think I'm sold on collocated system table. Initially we can just run an RS that runs master library and only hosts hardcoded system regions as master. Then probably any RS (with caveats) can host the master regions and act as master, so recovery can become much easier. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793208#comment-13793208 ] Aaron T. Myers commented on HBASE-5487: --- bq. ZK is used for the selection of the primary nn (via the failure controllers) but I believe the journal nodes (that do the durable consensus logging) does not use ZK at all. Todd Lipcon or Aaron T. Myers can confirm. I can confirm this. The QJM in the NN uses its own (heavily ZK-inspired) consensus protocol, but does not rely on ZK itself. The only thing HDFS currently uses ZK for is for the leader election of the active NN, as Jon says here. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791516#comment-13791516 ] Feng Honghua commented on HBASE-5487: - bq.Master is the Actor. Having it go across a network to get/set the 'state' in a service that is non-transactional wasn't our smartest move. Regionservers currently report state via ZK. Master reads it from ZK. Would be better if RS just reported directly to RS. [~stack] Yes, this is exactly what I proposed in HBASE-9726 :-) bq.I am wondering whether it makes sense to update the meta table from the various regionservers on the region state changes or go via the master.. But maybe the master doesn't need to be a bottleneck if possible. A regionserver could first update the meta table, and then just notify the master that a certain transition was done; the master could initiate the next transition [~devaraj] It would be better to let master updates the meta table rather than let various regionservers do it. Master being the single actor and truth-maintainer can avoid many tricky bugs/problems. And for frequent state changes to the meta table, the regionserver serving the (state) meta table would be sooner the bottleneck than master which issues the update requests, so whether it doesn't matter the update requests are from the master or from various regionservers. bq.I prefer not to use ZK since it's kind of the root cause of uncertainty: has the master/region server got/processed the event? has the znode hijacked since master/region server changes its mind? We should store the state in meta table which is cached in the memory. Whether to use coprocessor it is not a big concern to me. If we don't use coprocessor, I prefer to use the master as the proxy to do all meta table updates. Otherwise, we need to listen to something for updates. [~jxiang] Agree. IMO ZK alone is not the root cause of uncertainty, the current usage pattern of ZK is the root cause, the pattern that regionserver updates state in ZK and master listens to the ZK and updates states in its local memory accordingly exhibits too many tricky scenarios/bugs due to ZK watch is one-time(which can result in missed state transition) and the notification/process is asyncronous(which can lead to delayed/non-update-to-date state in master memory). And by replacing ZK with meta table, we also need to discard this 'RS updates - master listen' pattern since meta table inherently lack listen-notify mechanism:-). bq.I think ZK got a bad reputation not on its own merit, but on how we use it. I can see that problems exist but IMHO advantages outweigh the disadvantages compared to system table. Co-located system table, I am not so sure, but so far there's no even high-level design for this (for example - do all splits have to go thru master/system table now? how does it recover? etc.). Perhaps we should abstract an async persistence mechanism sufficiently and then decide. Whether it would be ZK+notifications, or system table, or memory + wal, or colocated system table, or what. The problem is that the usage inside master of that interface would depend on perf characteristics. Anyway, we can work out the state transitions/concurrency/recovery without tying 100% to particular store. [~sershe] Agree on ZK got a bad reputation not on its own merit, but on how we use it., especially if you mean currently master relies on ZK watch/notification to maintain/update master's in-memory region state. IMO this is almost the biggest root cause of current assignment design. If we just uses ZK the same way as using meta table to storing states, it makes no that big difference to store the states in ZK or meta table, right(except using meta table can have much better performance for restart of a big cluster with large amount of regions)? But using ZK's update/listen pattern does make the difference. bq.btw, any input on actor model? Things queue up operations/notifications (ops) for master; AM runs on timer or when queue is non-empty, having as inputs, cluster state (incl. ongoing internal actions it ordered before e.g. OPENING state for a region) plus new ops from queue, on a single thread; generates new actions (not physically doing anything e,g, talking to RS); the ops state and cluster state is persisted; then actions are executed on different threads (e.g. messages sent to RS-es, etc.), and AM runs again, or sleeps for some time if ops queue is empty. That is a different model, not sure if it scales for large clusters. [~sershe] operations/notifications means RS responses action progress to master? Master is the single point to update the state truth(to meta table) and RS doesn't know where the states are stored and doesn't access them directly, right? I think a communication/storage diagram can help a lot for an overall clear understanding here:-) Generic framework for
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791520#comment-13791520 ] Feng Honghua commented on HBASE-5487: - Since HBASE-9726 is closed as duplicated with this one, I copied the proposal of HBASE-9726 here for discussion/reference: Current assignment process (also split process) relies on ZK for the communication between master and regionserver. This pattern has two drawbacks: 1. For cluster with big number of regions(say, 10K-100K regions), ZK becomes the bottleneck for cluster restart since the assignment/split status/progress is stored in ZK due to ZK's limited write throughput 2. Since ZK's watch is one-time and the event notification/process is asynchronous, there is no guarantee for master(the watcher) to be notified of the up-to-date status/progress in time, thereby master relies on idempotence for its correctness, which makes the logic/code very hard to understand/maintain A new assignment design proposal is as below: 1. Assignment/split status/progress is stored in a system table(say 'assignTable') as meta table rather than ZK to improve the write throughput, hence to improve the proformance of restart for cluster with large number of regions. 2. The communication pattern for assignment/split is changed this way: master talks directly with regionserver(master issues assign request to regionserver, regionserver responses the assign progress to master) and records the status/progress of each assignment/split in the 'assignTable', in case of master failure, new active master reads the 'assignTable' to rebuilds the knowledge of the ongoing assignmeng/split tasks and continues from that knowledge. (regionserver doesn't write to the 'assignTable') Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790109#comment-13790109 ] Devaraj Das commented on HBASE-5487: bq. we still need a reliable store (ZK, system table, or master WAL). It seems ZK is the most scalable and best suited for the task [~sershe], not ZK, IMHO. Let's use one of our internal storages rather than external system for storing the region state. I am all for removing ZK altogether from HBase. One less distributed system to worry about. One less component to manage. We already have heartbeats from RSs to master, and region open/close RPCs from master to the RSs. I think we have enough communication already in place between the master and RSs to deal with region states We also have chores in the master that tries to take some actions based on assignment timeouts... Would this model work (conceptually). It's late night here; please pardon me if there are glaring issues :-) Please bear with me :-) All region state manipulation operations are initiated by the master and they act upon the meta region. We have extra columns to store the state of the region etc in the meta table. The initial rows are created by the master and the state of the regions are UNASSIGNED. This is not new - we already do this but IIRC we don't store the state of the region. Some state transitions happen through method executions and some of those method executions are RPCs from the master to some regionserver. I think that the states would be more granular here (to prevent potential replay/repetitions of large operations). I am wondering whether it makes sense to update the meta table from the various regionservers on the region state changes or go via the master.. But maybe the master doesn't need to be a bottleneck if possible. A regionserver could first update the meta table, and then just notify the master that a certain transition was done; the master could initiate the next transition ([~eclark] comment about coprocessor can probably be made to apply in this context). Only when a state change is recorded in meta, the operation is considered successful. Also, there is a chore (probably enhance catalog-janitor) in the master that periodically goes over the meta table and restarts (along with some diagnostics; probing regionservers in question etc.) failed/stuck state transitions. This chore runs once as soon as the master is started and the meta region is assigned to take care of transitions that were started in the previous life of the master and which are now waiting for some action from the master. For example, if the state was OPENING for a certain region, and the master crashed, the master would send a openRegion RPC to the region assignee upon restart. The region assignee would have been recorded as a column in the row for the region by the previous master. I think we should also save the operations that was initiated by the client on the master (either in WAL or in some system table) so that the master doesn't lose track of those and can execute them in the face of crashes restarts. For example, if the user had sent a 'split region' operation and the master crashed. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790151#comment-13790151 ] Nicolas Liochon commented on HBASE-5487: bq. zk vs. non zk. ZK is used in HDFS HA, no? So any way we have it in our architecture. Then using it for permanent data is another discussion (stuff like ZOOKEEPER-1147 makes it interesting. I would personally prefer to remove the master rather than adding functions to it. Saying that there are some specific threads in the region servers holding .meta. is acceptable imho. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790172#comment-13790172 ] Jonathan Hsieh commented on HBASE-5487: --- ZK is used for the selection of the primary nn (via the failure controllers) but I believe the journal nodes (that do the durable consensus logging) does not use ZK at all. [~tlipcon]or [~atm] can confirm. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790433#comment-13790433 ] Devaraj Das commented on HBASE-5487: Removing the separate master daemon is fine by me, Nicolas. However, we still need someone to do various operations (servicing user requests and other janitorial tasks). Long back we were discussing that a random region server (elected via zk) could perform the master role. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790442#comment-13790442 ] Jimmy Xiang commented on HBASE-5487: I prefer not to use ZK since it's kind of the root cause of uncertainty: has the master/region server got/processed the event? has the znode hijacked since master/region server changes its mind? We should store the state in meta table which is cached in the memory. Whether to use coprocessor it is not a big concern to me. If we don't use coprocessor, I prefer to use the master as the proxy to do all meta table updates. Otherwise, we need to listen to something for updates. We should not have another janitor/chore. If an action is failed, it must be because of something unrecoverable by itself, not because of a bug in our code. It should stay failed until the issue is resolved. We need to have something like FATE in accumulo to queue/retry actions taking several steps like split/merge/move. It is a nice-to-have to keep a history of region state transition. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790451#comment-13790451 ] Nicolas Liochon commented on HBASE-5487: bq. However, we still need someone to do various operations (servicing user requests and other janitorial tasks). Yeah, I agree. Balance is a good example. Less say that I'm more comfortable w/ something that lowers the role of the master than the opposite. bq. I prefer not to use ZK When you say this, Jimmy, do you mean no ZK in HBase at all, or No ZK for permanent data, or No ZK at all for assignment? bq. We should store the state in meta table which is cached in the memory. I'm fine with that (if we can make it work :-) ) Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790460#comment-13790460 ] Jimmy Xiang commented on HBASE-5487: Nicolas, I mean no ZK for assignment. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791007#comment-13791007 ] Sergey Shelukhin commented on HBASE-5487: - Big response to not-responded-to recent comments. Let me update the doc, EOW-ish probably depending on the number of bugs surfacing ;) [~stack] Let's keep discussion and doc here and branch tasks out for rewrites. bq. + The problem section is too short (state kept in multiple places and all have to agree...); need more full list so can be sure proposal addresses them all What level of detail do you have in mind? It's not a bug fix, so I cannot really say merge races with snapshot, or something like that; that could also be arguably resolved by another 100k patch to existing AM :) bq. + How is the proposal different from what we currently have? I see us tying regionstate to table state. That is new. But the rest, where we have a record and it is atomically changed looks like our RegionState in Master memory? There is an increasing 'version' which should help ensure a 'direction' for change which should help. See the design principles (and below discussion :)). We are trying to avoid multiple flavors of split-brain state. bq. Its fine having a source of truth but ain't the hard part bring the system along? (meta edits, clients, etc.). Yes :) bq. Experience has zk as messy to reason with. It is also an indirection having RS and M go to zk to do 'state'. I think ZK got a bad reputation not on its own merit, but on how we use it. I can see that problems exist but IMHO advantages outweigh the disadvantages compared to system table. Co-located system table, I am not so sure, but so far there's no even high-level design for this (for example - do all splits have to go thru master/system table now? how does it recover? etc.). Perhaps we should abstract an async persistence mechanism sufficiently and then decide. Whether it would be ZK+notifications, or system table, or memory + wal, or colocated system table, or what. The problem is that the usage inside master of that interface would depend on perf characteristics. Anyway, we can work out the state transitions/concurrency/recovery without tying 100% to particular store. bq. + Agree that master should become a lib that any regionserver can run. That sounds possible. [~nkeywal] bq. At least, we should make this really testable, without needing to set up a zk, a set of rs and so on. +1, see my comment above. bq. I really really really ( ) think that we need to put performances as a requirement for any implementation. For example, something like: on a cluster with 5 racks of 20 regionserver each, with 200 regions per RS,, the assignment will be completed in 1s if we lose one rack. I saw a reference to async ZK in the doc, it's great, because the performances are 10 times better. We can measure and improve, but I am not really sure about what exact numbers will be, at this stage (we don't even know what storage is). [~devaraj] bq. A regionserver could first update the meta table, and then just notify the master that a certain transition was done; the master could initiate the next transition (Elliott Clark comment about coprocessor can probably be made to apply in this context). Only when a state change is recorded in meta, the operation is considered successful. Split, for example, requires several changes to meta. Will master be able to see them together from the hook? If master is collocated in the same RS with meta, it should be small overhead to have master RPC. bq. Also, there is a chore (probably enhance catalog-janitor) in the master that periodically goes over the meta table and restarts (along with some diagnostics; probing regionservers in question etc.) failed/stuck state transitions. +1 on that. Transition states can indicate the start ts, and master will know when they started. bq. I think we should also save the operations that was initiated by the client on the master (either in WAL or in some system table) so that the master doesn't lose track of those and can execute them in the face of crashes restarts. For example, if the user had sent a 'split region' operation and the master crashed Yeah, disable table or move region are a good example. Probably we'd need ZK/system table/WAL for ongoing logical operations. [~jxiang] bq. We should not have another janitor/chore. If an action is failed, it must be because of something unrecoverable by itself, not because of a bug in our code. It should stay failed until the issue is resolved. I think the failures meant are things like RS went away, is slow or buggy, so OPENING got stuck - someone needs to pick it up over timeout. bq. We need to have something like FATE in accumulo to queue/retry actions taking several steps like split/merge/move. We basically need something that allows atomic state
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791012#comment-13791012 ] Sergey Shelukhin commented on HBASE-5487: - btw, any input on actor model? Things queue up operations/notifications (ops) for master; AM runs on timer or when queue is non-empty, having as inputs, cluster state (incl. ongoing internal actions it ordered before e.g. OPENING state for a region) plus new ops from queue, on a single thread; generates new actions (not physically doing anything e,g, talking to RS); the ops state and cluster state is persisted; then actions are executed on different threads (e.g. messages sent to RS-es, etc.), and AM runs again, or sleeps for some time if ops queue is empty. That is a different model, not sure if it scales for large clusters. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789470#comment-13789470 ] Elliott Clark commented on HBASE-5487: -- bq.we still need a reliable store HBase is a reliable store. We should be using it as such for current state. If we co-locate the master process with meta, then the master noticing state changes is as simples as loading a co-processor that hooks mutations. It also means that when master wants to look up current state there's no rpc overhead. Simply target the hregion. This allows us to reduce the number of copies of state. No longer will we need a local hash map + what's in zk, + what's in meta. I think Jimmy's correct we should use zk for ephemeral only. Everything else should be in our systems. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789493#comment-13789493 ] Sergey Shelukhin commented on HBASE-5487: - Please let's not use coprocessors for mainline functionality... also, if we store state in system table that is hosted by master, then we don't need ZK at all, we should get rid of it. The only disadvantages from using ZK that I see are the absence getKeyBefore/After API (easy to fix by having ephemeral META table for clients to query), and having extra moving part. If we don't get rid of ZK we don't alleviate the latter so I think we should either use it for everything or not at all... I would prefer to use it for everything. As far as I see, ZK is more reliable than HBase RS or master, has built-in replication with faster recovery, is probably more scalable than reading from single RS, and has better model for atomic state changes. Probably has better tolerance for stuff like network partitioning too. We could do master WAL and all that stuff but I don't see a compelling reason to do this when we have a bunch of Apache code that is already written to solve all of these problems. What is the reason to not use ZK? What is the advantage of system table, or disadvantage of ZK? Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789576#comment-13789576 ] Elliott Clark commented on HBASE-5487: -- bq.Please let's not use coprocessors for mainline functionality We already do. I don't see anything wrong with making HBase more modular. If there are pain points with using co-processors that cause you to say no, then we should fix those. Not just ignore them. bq.also, if we store state in system table that is hosted by master, then we don't need ZK at all, we should get rid of it. We don't have ephemeral node capability at all. And we need it for the bootstrap problem. It allows clients to point at a relatively small number of nodes to discover the whole cluster. bq.As far as I see, ZK is more reliable than HBase RS or master Our master is only complex because of our use of zk to hold and mutate state. bq.has built-in replication with faster recovery With the meta/system wal I think we can be within an order of magnitude. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789598#comment-13789598 ] Sergey Shelukhin commented on HBASE-5487: - Wrt coprocs - that is bad imho, that is not the kind of modular that we want. Core parts of the system should depend on well-defined interfaces, not a generic extension points. Imho, the litmus test for coproc, as a plugin interface, is - can you run HBase without it? If yes, then it's ok to be a coproc (e.g. accesscontrol). Otherwise we should have proper interfaces that have some meaning to the caller. bq. Our master is only complex because of our use of zk to hold and mutate state. That is not due to ZK as such, that is due to multi-state-machine reconciliation model and truth in multiple places that it requires. System table can have exact same problem of state in the table + state in memory, question is how you split and manage state between them, storage substrate doesn't matter as much. If truth was in ZK and nowhere else that wouldn't be a problem, same way as with system table. Also, by reliable I meant that ZK is multiple nodes with built-in master recovery by design, whereas with master you need at least HA, and still it's probably worse than ZK in case of failure. There are also other things that I mentioned. bq. With the meta/system wal I think we can be within an order of magnitude. So, why would we write a bunch of new code to get within an order of magnitude? I don't see an advantage, or ZK disadvantage that you mention compared to multiple advantages of ZK. Esp. if we cannot totally get rid of it, so we'll have an extra service regardless. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789612#comment-13789612 ] Sergey Shelukhin commented on HBASE-5487: - Btw I agree that the main point is to get rid of the complexity you mention (and in the doc I only mention storage mechanism in ZK in one paragraph in the end), so the storage mechanism choice is almost orthogonal. But as far as it is concerned, it seems an obvious choice to use ZK for me ATM. I may not know something about ZK (or system tables?), but so far the pattern is that meta recovery is a big deal even without bugs, and with ZK we barely ever have any problems. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789642#comment-13789642 ] Elliott Clark commented on HBASE-5487: -- bq.Wrt coprocs - that is bad imho, that is not the kind of modular that we want. I'm not tied to it being a co-proc. But it does illustrate the idea that it can be done by watching mutations as they come into the normal hregion call stack. bq.That is not due to ZK as such, that is due to multi-state-machine reconciliation model and truth in multiple places that it requires. In part it's due to getting zk messages out of order, and getting them delayed. Those pains are due in no small part because zk's client is single threaded. bq.System table can have exact same problem of state in the table + state in memory, question is how you split and manage state between them, storage substrate doesn't matter as much. But you only have the one state if you have master inside of the region server hosting meta. There's no need to have a map of assignment if meta is actually just a function call away. Also The same is not true at all if you want to put state into zk. Then you need a local cache if you want to make this performant at all (That's how we got to the current state). Putting state into zk necessitates a split brain problem. There's what the master see and what the outside worlds sees. bq.So, why would we write a bunch of new code to get within an order of magnitude? That code is already there, and in use. We fail over meta right now in 240ms. I was commenting on what you were saying that zk fails over faster. And that's true but for meta we've narrowed that gap significantly. So I don't think that ZK has that much of an advantage. bq. I don't see an advantage, or ZK disadvantage that you mention compared to multiple advantages of ZK We've tried putting state into zk. That failed. I really don't want to put a whole bunch of new code into hbase that does almost exactly the same thing as we currently have. It's going to fail. bq.so the storage mechanism choice is almost orthogonal. For me it's not just about the storage. It's about co-locating storage with the master means that these split brain problems are much rarer. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789874#comment-13789874 ] Sergey Shelukhin commented on HBASE-5487: - bq. We've tried putting state into zk. That failed. I really don't want to put a whole bunch of new code into hbase that does almost exactly the same thing as we currently have. It's going to fail. As I said I don't think that is true. Our problem is not state being in ZK; it is that the state is in multiple places in ZK itself for different parts of the same region's state, plus some state in master to reconcile these, plus some state that is not in ZK but only in master, plus also meta. I.e. not the split between ZK and master but split logical state within both and between them. bq. Then you need a local cache if you want to make this performant at all (That's how we got to the current state). Our current state is not local cache, it's a bunch of actual state... I am not yet sure how bad ZK-master split brain problem will be if ZK has entire truth, let me think about it. When you say no split-brain inside master, do you mean master will host the meta and do all reads and writes to meta with no local intermediate state in memory? Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789997#comment-13789997 ] stack commented on HBASE-5487: -- bq. Our problem is not state being in ZK; it is that the state is in multiple places in ZK itself for different parts of the same region's state, plus some state in master to reconcile these, plus some state that is not in ZK but only in master, plus also meta. Master is the Actor. Having it go across a network to get/set the 'state' in a service that is non-transactional wasn't our smartest move. Regionservers currently report state via ZK. Master reads it from ZK. Would be better if RS just reported directly to RS. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788303#comment-13788303 ] Jimmy Xiang commented on HBASE-5487: The doc is very good. I like the Problem and Design Principles sections. The region state machine can be enhanced. We should have a single source of truth which is scalable and performs well. I think it could be the master (in memory). All actions (including split/merge) should be started and managed by the master. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788317#comment-13788317 ] Nicolas Liochon commented on HBASE-5487: 3 comments: - I wonder if we should use something that would allow us to test all the possible states. At least, we should make this really testable, without needing to set up a zk, a set of rs and so on. - We should question the master based architecture. How does it work for the MapR implementation for example? Why the assignment manager is not in the region server holding meta? This would save one distributed state for example. - I really really really ( :-) ) think that we need to put performances as a requirement for any implementation. For example, something like: on a cluster with 5 racks of 20 regionserver each, with 200 regions per RS,, the assignment will be completed in 1s if we lose one rack. I saw a reference to async ZK in the doc, it's great, because the performances are 10 times better. Thanks for writing the doc Sergey. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788324#comment-13788324 ] stack commented on HBASE-5487: -- + Agree that master should become a lib that any regionserver can run. + Agree testable but would like to point out that it is possible to put up our current AM in a standalone mode -- it just takes mockery (smile) + Agree on perf. Helps MTTR. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788331#comment-13788331 ] Jimmy Xiang commented on HBASE-5487: Agree that master should hold the meta region. It should hold other system table regions as well. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788334#comment-13788334 ] Jesse Yates commented on HBASE-5487: -1 that it should hold other system tables. We know META isn't going to span more than a region, but it would be completely reasonable for other system tables to be larger (i.e. statistics). Maybe worth considering a single-region flag for certain tables to identify that they can never be split and can support single region transactions. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788342#comment-13788342 ] Jimmy Xiang commented on HBASE-5487: That's right. For big system tables which are not required for the system to start/run, it can be assigned to somewhere else. The master doesn't have to hold all system table regions. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788415#comment-13788415 ] Devaraj Das commented on HBASE-5487: bq. We should have a single source of truth which is scalable and performs well. I think it could be the master (in memory). All actions (including split/merge) should be started and managed by the master. I agree with this. We have done this in other components - HDFS, MapReduce, Yarn, etc Depending on ZK for the state management brings complexity. I think we should use ZK for only ephemeral stuff and not for storing state there. In that regard, using ZK for discovering lost RSs is fine, but not for storing the region states. I like the idea of the Master WAL to take care of master crashes. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788518#comment-13788518 ] Sergey Shelukhin commented on HBASE-5487: - we still need a reliable store (ZK, system table, or master WAL). It seems ZK is the most scalable and best suited for the task. In perfect world we would have ZK library that we could host and have a quorum of masters running Paxos/ZK. But we don't have that... Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787446#comment-13787446 ] stack commented on HBASE-5487: -- bq. We have talked a little bit here, and agreed on 2 key points. What about Enis's 'requirements'. We agree on his list? That makes 3 key points. I think add a fourth point where we rehearse what is wrong w/ the current system (Jon's suggestion) as it will help ensure we don't repeat the mistakes of the past. Make a subtask 'New Assignment Manager'? Or make this a subtask of a new issue called 'New Assignment Manager'. An issue named so will be easier to find than this one. Also, others are interested in this effort (@honghua and [~xieliang007]) and it'll catch their attention. I think it too big a change to be done for the pending 0.98. Lets not rush it. It could even land post hbase 1.0 if 0.98 is to become 1.0. On the design doc., + Doc., needs author and date. I would expect a section situating the document -- context -- that at least referred to the current 'design' -- se https://issues.apache.org/jira/browse/HBASE-2485 (it has 'state' machine that looks like this one) + The problem section is too short (state kept in multiple places and all have to agree...); need more full list so can be sure proposal addresses them all + How is the proposal different from what we currently have? I see us tying regionstate to table state. That is new. But the rest, where we have a record and it is atomically changed looks like our RegionState in Master memory? There is an increasing 'version' which should help ensure a 'direction' for change which should help. + A single atomically mutable record of the regionstate is well and good but how then to get the cluster to align w/ what this record says? For example, table record says it is disabled. It has 10k regions. How do we get the regions to agree w/ the Table record which says it is disabled? We can send the closes but how we sure the close happened on all 10k regions? + I don't get this bit Thisrecord is the onlysource of truth about the region, and is never removed while the region is relevant.Thissimplifies current situation where ZK state, master state and METAstate can all conflictin various special ways... Its fine having a source of truth but ain't the hard part bring the system along? (meta edits, clients, etc.). Experience has zk as messy to reason with. It is also an indirection having RS and M go to zk to do 'state'. Thank sfor writing this up Sergey Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13787493#comment-13787493 ] stack commented on HBASE-5487: -- [~jxiang] What you think of [~sershe] doc? Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786870#comment-13786870 ] Sergey Shelukhin commented on HBASE-5487: - I would like to resurrect this issue (and rename it). It seems that every fix to assignmentmanager introduces 1-3 more maps or lists around it, makes it even more impossible to comprehend and may add more bugs. We have talked a little bit here, and agreed on 2 key points. 1) Region state should be managed in one permanent place with one state machine; no separate and/or transient state machines, no operation-based state machines. 2) New assignment manager should be easily testable by simulating sequences in events. I think my doc above is still reasonably good approximation, but of course we might need to discuss flesh out the details. Ahem. I said new assignment manager, that is because I would like to rename this jira rewrite assignment manager. Wdyt? [~enis] [~jxiang] [~saint@gmail.com] [~ndimiduk] [~jmhsieh] Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13786872#comment-13786872 ] Sergey Shelukhin commented on HBASE-5487: - *of events Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Priority: Critical Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706491#comment-13706491 ] stack commented on HBASE-5487: -- Was just looking at a test failure issue. A move operation was failing because the region we were asking it to move had not fully opened yet so move just failed: 2013-07-11 09:35:00,489 DEBUG [RpcServer.handler=4,port=55346] master.AssignmentManager(2277): Attempting to unassign ephemeral,,1373535299969.ab63d8b7c5339b4a61fdd70e8cb8993a. but it is already in transition (OPEN, force=false) Client needs means of asking what happened to its move. Client should also be able to say I want the move to succeed. in the above case then, the move op should be retried. Need a queue of outstanding ops. Need to be able to query it for where is my op? (like fedex where is my package) Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706520#comment-13706520 ] Enis Soztutar commented on HBASE-5487: -- The requirements are pretty much clear. We need a persistent set of ops running in the cluster. Every op is defined as a stack of steps which are undo/redoable and adempotent. The client submits an op, and gets an request_id, with which it can query the state of the operation later. The ops and steps are handled by state machines + stacked execution. The state is persisted via zk or a WAL log for master or an HBase table (which is just a higher level log). Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706528#comment-13706528 ] stack commented on HBASE-5487: -- bq. The requirements are pretty much clear... Smile. Yeah. I wonder what is 'next'? Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706532#comment-13706532 ] Enis Soztutar commented on HBASE-5487: -- Once dust is settled for 0.96, I think this is a good candidate for 0.98 :) Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706542#comment-13706542 ] Sergey Shelukhin commented on HBASE-5487: - Do these requirements cover the test failure just mentioned... or any other that come from collision of multiple ops on the same region? Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13706616#comment-13706616 ] stack commented on HBASE-5487: -- [~enis] Agree [~sershe] No. I think the move 'failing' is not too bad; it is something we can work on perhaps having two types of move... a move recommendation and then a required move. The offense in my scenario above is that the move failed silently. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617600#comment-13617600 ] Sergey Shelukhin commented on HBASE-5487: - bq. Any major-overhaul solution should make sure that these operations, when issued concurrently, interact according to a sane set of semantics in the face of failures. This is another (although not orthogonal) question. I am looking for a sane way to define and enforce arbitrary semantics first. Then sane semantics can be enforced on top of that :) For example, in actor-ish model below would make it easy to write simple code; persistent state would make sure there's definite state at any time, and all crucial transitions are atomic, so semantics would be easy to enforce as long as the code can handle a failed transition/recovery. Locks also make this simple, although locks have other problems imho. Although we can go both ways, if we define sane semantics it would be easy to see how convenient they are to implement in a particular model. bq. So I buy open/close as a region operation. split/merge are multi region operations – is there enough state to recover from a failure? There should be. Can you elaborate? bq. So alter table is a region operation? Why isn't it in the state machine? Alter table is currently the operation that involves region operation, namely open/close. Open-close are in the state machine :) As for tables, I am not sure state machine is the best model for table state, there isn't that much going on with the table that is properly an exclusive state. bq. Implementing region locks is too far – I'm asking for some back of the napkin discussionb. If a server holds a lock for a region for time Tlock during each day, and number of regions is N probability of some region lock (or table read-only lock) being held at any given time is (1-(1-(Tlock/Tday))^N), if I am writing this correctly. For 5 seconds of locking per day per region, for 1 regions (not unreasonable for a large table/cluster) we will be holding some lock about 44% of the time for region operations. Calculating the probability of any lock being in recovery (server went down with a lock less than recovery time ago) can also be done, but numbers for some parameters (how often do servers go down?) will be very speculative... bq. I think we need some measurements how much throughput we can get in ZK or with a ZK-lock implementation and compare his with # rs of watchers * # of regions * number of ops... Will there be many watchers/ops? You only watch and do ops when you acquire the lock, so unless region operations are very frequent... bq. The current regions-in-transition (RIT) code basically assumes that an absent znode is either closed or opened. RIT znodes are present when the region is in the inbetween states (opening, closing, I don't think either closed or opened is good enough :) Also, RITs don't cover all scenarios and things like table ops don't use them at all. bq. I know I've suggested something like this before. Currently the RS initiates a split, and does the region open/meta changes. If there are errors, at some point the master side detects a timeout. An alternative would have splits initiated RS on the rs but have the master do some kind of atomic changes to meta and region state for the 3 involved regions (parent, daughter a and daughter b). Yeah, although in other models (locks, persistent state) that is not required. Also if meta is cache for clients and not source of truth meta changes can still be on the server; I assume by meta you mean global state, wherever that is? bq. We need to be careful about ZK – since it is a network connection also, exceptions could be failures or timeouts (which succeed but wan't able to ack). If we can describe the properties (durable vs erasable) and assumptions (if the wipeable ZK is source of truth, how do we make sure the version state is recoverable without time travel?) The former applies to any distributed state; as for the latter - I was thinking of ZK+WAL if we intend to keep ZK wipeable. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616129#comment-13616129 ] Jonathan Hsieh commented on HBASE-5487: --- To do a major overhaul, we need something stronger than the code is hard to read. I agree that it is hard to follow (see: http://people.apache.org/~jmhsieh/hbase/120905-hbase-assignment.pdf) but it seems to be basically working which is a pretty strong argument. Let's compare and point out what is wrong/broken in the current implementation and how the new design won't have those problems. The spreadsheet link is my first step to enumerating semantics and distilling the set of possible problems and things that are being guarded from races. Any major-overhaul solution should make sure that these operations, when issued concurrently, interact according to a sane set of semantics in the face of failures. bq. Only for the current document version... tables could be added So I buy open/close as a region operation. split/merge are multi region operations -- is there enough state to recover from a failure? So alter table is a region operation? Why isn't it in the state machine? bq. Hmm... that would require implementing region locks, and having a very large cluster. I am talking more about unacceptable blocking of user operations, and management of expiring locks in presense of real-life failures. Implementing region locks is too far -- I'm asking for some back of the napkin discussionb. I think we need some measurements how much throughput we can get in ZK or with a ZK-lock implementation and compare his with # rs of watchers * # of regions * number of ops.. The current regions-in-transition (RIT) code basically assumes that an absent znode is either closed or opened. RIT znodes are present when the region is in the inbetween states (opening, closing, bq. You mean like WAL for operations? Yeah, we could call it an intent log. It would have info so that a promoted backup master can look in one place and complete an operation started by the downed original master. bq. ... Also usually that would mean RSes won't be able to initiate operations (like split) - they will have to go thru master (which I would argue is ok). I know I've suggested something like this before. Currently the RS initiates a split, and does the region open/meta changes. If there are errors, at some point the master side detects a timeout. An alternative would have splits initiated RS on the rs but have the master do some kind of atomic changes to meta and region state for the 3 involved regions (parent, daughter a and daughter b). bq. Depends on where we store it, but yeah these have to be transactional. Last section (very short ) suggests using ZK, which already supports that. We need to be careful about ZK -- since it is a network connection also, exceptions could be failures or timeouts (which succeed but wan't able to ack). If we can describe the properties (durable vs erasable) and assumptions (if the wipeable ZK is source of truth, how do we make sure the version state is recoverable without time travel?) Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616130#comment-13616130 ] Jonathan Hsieh commented on HBASE-5487: --- I'll deal with the spreadsheet comments related by putting int somewhere that comments can be easily dropped into. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616885#comment-13616885 ] Enis Soztutar commented on HBASE-5487: -- bq. Yeah, we could call it an intent log. It would have info so that a promoted backup master can look in one place and complete an operation started by the downed original master. This was what I proposed in an earlier comment: https://issues.apache.org/jira/browse/HBASE-5487?focusedCommentId=13551519page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13551519 What we need is a transactional authoritative, fault tolerant, and durable source for ground truth about the cluster state, and execution state. Whether we can do it using ZK or a master WAL, or a system table (using an implicit WAL), we will have to figure it out. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616971#comment-13616971 ] Sergey Shelukhin commented on HBASE-5487: - [~jmhsieh] Will reply in details later/tomorrow, one clarification - the justification was not just code is hard to understand, but code is hard to reason about AND we want to expand it to support a bunch of features.. I think the whole idea of general framework, whatever it is, is to have some unified model of things and way of doing these manipulations and expanding their set, that is not patchwork. [~enis] I think finding one is the least of our problems :) How to use it to store/manage state is the question. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13615901#comment-13615901 ] Sergey Shelukhin commented on HBASE-5487: - bq. Is the assignment manager the only master coordinated task in scope? Only for the current document version... tables could be added. bq. Instead of asserting it is not clear if table (+region) locks scale, let's find out. Hmm... that would require implementing region locks, and having a very large cluster. I am talking more about unacceptable blocking of user operations, and management of expiring locks in presense of real-life failures. bq. Master operations and processes can clash and we should understand where we need concurrency control. (I'm working on a table – here's an draft distilled version [1], there exists an overly detailed version that I'll share once i get it fixed) Comments below. bq. Should there be a notion of queuing operations? (locking, or an actual queue) Should these operations be generically logged so they can complete if a master goes down in the middle? (ex: master goes down during a move operation after the close but before the open on the new rs). You mean like WAL for operations? bq. The design principles is actually more of a proposed design. Yeah, sorry, wanted to split it into two sections but never did. Will rename. bq. how do we deal with operations where we need locks on multiple region because we are reading or modifying multiple regions – e.g. splits, merges, snapshots? Matteo Bertozzi had suggested in another jira making a the meta row per table, or maybe part of the solution is using the multi-row single meta region transaction. Depends on where we store it, but yeah these have to be transactional. Last section (very short :)) suggests using ZK, which already supports that. bq. What are alternatives? why this approach vs others? I can expand the doc... the implicitly mentioned existing alternatives are locks, which I would argue scale less and are harder to manage; or transaction approach that is currently used (although not unified), for example via transient transaction nodes. Actually, one alternative approach I saw used for such things is to simplify concurrency of operations/etc. with actor-like model, where master has logical cluster state and previously saved target state, and periodically (often) takes an epic lock, looks at them quickly, and based on what it is doing, outputs new target cluster state and a list of physical things to do Then it releases epic lock, and the new target state is saved, and operations performed. That way all state-management code becomes simple, because it runs in one place with no concurrency, and recovery just has to compare real cluster state with destination state. But this will require thinking about this differently. Also usually that would mean RSes won't be able to initiate operations (like split) - they will have to go thru master (which I would argue is ok). Also it's not clear whether this will become too much of a bottleneck. bq. Where do you think the new information will be, META table? It seems to me that ZK would be better (see last section), but META is also an option. From the spreadsheet: bq. Enabling and disabling table operations should be blocked when any of these simple region operations are in progress Not clear why (logically). bq. move Move is close and open, doesn't require consistency, right? bq. Regionserver Processes ... However, the individual operations must maintain the table integrity property. Not clear what this means for snapshots. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614410#comment-13614410 ] Sergey Shelukhin commented on HBASE-5487: - Any opinion? Thanks. I think the same model should be used for all tables too, but it's less necessary at this time... Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614484#comment-13614484 ] Jonathan Hsieh commented on HBASE-5487: --- Thanks for writing this up. I read the first two sections and haven't spent time reading the details of the others yet. (already have a bunch of quetsions). So the only problem is that the assignment manager and ssh is difficult to reason about? Is the assignment manager the only master coordinated task in scope? I think there are more problems than that that we should enumerate: * Instead of asserting it is not clear if table (+region) locks scale, let's find out. * Master operations and processes can clash and we should understand where we need concurrency control. (I'm working on a table -- here's an draft distilled version [1], there exists an overly detailed version that I'll share once i get it fixed) * Should there be a notion of queuing operations? (locking, or an actual queue) Should these operations be generically logged so they can complete if a master goes down in the middle? (ex: master goes down during a move operation after the close but before the open on the new rs). The design principles is actually more of a proposed design. Design principles::region record * how do we deal with operations where we need locks on multiple region because we are reading or modifying multiple regions -- e.g. splits, merges, snapshots? [~mbertozzi] had suggested in another jira making a the meta row per table, or maybe part of the solution is using the multi-row single meta region transaction. What are alternatives? why this approach vs others? [1] https://docs.google.com/spreadsheet/ccc?key=0AiVCAt6zRttFdDRVaG56RnpQZlNNZUNsRVJsSXM3YlEusp=sharing Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614612#comment-13614612 ] Jimmy Xiang commented on HBASE-5487: Where do you think the new information will be, META table? Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Attachments: Region management in Master.pdf Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13606744#comment-13606744 ] Jonathan Hsieh commented on HBASE-5487: --- +1 Please do a write up -- the standard stuff -- what problems exist (op1 and op2 clashes, etc), the proposed fix, and how it would fix it. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601950#comment-13601950 ] Sergey Shelukhin commented on HBASE-5487: - Just checking; is this issue still relevant? I am particularly interested in all-encompassing solution for region management that would allow you to read AM and not go insane (I was just reading/debugging 0.94 AM for some time, that's why I remembered). I have a sketch of an idea, should I write it up? Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601991#comment-13601991 ] Enis Soztutar commented on HBASE-5487: -- Yes, please write is up. Although AM related things were not initially in the main focus. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559338#comment-13559338 ] Sergey Shelukhin commented on HBASE-5487: - I have some different priorities currently, but if I have time I will try to do short write-up on ZK + backup table based approach. Maybe towards the eow... Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556116#comment-13556116 ] Jonathan Hsieh commented on HBASE-5487: --- bq. My main point is that currently two external region states in ZK and a table, plus two complex internal states in server and master, are a root of some non-trivial part of all evil. Especially the nature of ZK state, that comes and goes. Imho we should remove one of them from being actively managed. bq. ZK has notifications and seems better suited for locking/atomic updates; w.r.t. availability it has no disadvantage since everything (e.g. locating the root) fails without ZK anyway, even if we do remove state machines from there. bq. System tables are more native to HBase and have built-in WAL, plus have advantages for recovery. ZK notifications are useful when there are two way communications -- log splitting, region server initiated splits. I do agree that opens and closes seems more complicated than necessary. bq. Maybe instead of WAL we can use ZK as universal source of region state (w/o assorted transient nodes e.g. one node per region that is always there, or maybe two if we want to use lock with lease to unassign) and mirror it to system table that is only used for recovery like you describe, or when ZK state disappears? bq. Otherwise I think we should just use system table as universal source of region state and get rid of ZK region state. bq. With one source of truth master and server logic can probably be dumber. Simpler, not dumber. :) The 0.20/0.89 version of the master actually had most things in meta -- and there are definitely some trade-offs with that approach. In the transition to the 0.90 style master we traded some pain points for new ones. If we change this again we need to make sure we keep those previous ones in mind to not duplicate the worst of them again. Is there any chance we could get a high level design deck/doc that illustrates these processes currently and what looks like after we move to this proposed FATE-like mechanism? Also, what operations would eventually get ported to this mechanism? I think discussion and an example at the design/rpc comms level would help a whole lot by grounding this conversion in reality and not require diving into the code. Once we basically agree on design, code reviews would be easier because they'd be focused on the implementation matching the design. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Nick Dimiduk Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks
[ https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13556706#comment-13556706 ] Nick Dimiduk commented on HBASE-5487: - I've pushed my FATE WIP to [github|https://github.com/ndimiduk/hbase/tree/5487-protobuf-repos] so you guys can see what a fate repo might look like for us. I appear to have introduced a bug while porting over the logic, but the general ideas is there. Generic framework for Master-coordinated tasks -- Key: HBASE-5487 URL: https://issues.apache.org/jira/browse/HBASE-5487 Project: HBase Issue Type: New Feature Components: master, regionserver, Zookeeper Affects Versions: 0.94.0 Reporter: Mubarak Seyed Assignee: Nick Dimiduk Need a framework to execute master-coordinated tasks in a fault-tolerant manner. Master-coordinated tasks such as online-scheme change and delete-range (deleting region(s) based on start/end key) can make use of this framework. The advantages of framework are 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for master-coordinated tasks 2. Ability to abstract the common functions across Master - ZK and RS - ZK 3. Easy to plugin new master-coordinated tasks without adding code to core components -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira