subject:"\[jira\] \[Commented\] \(HBASE\-5487\) Generic framework for Master\-coordinated tasks"


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849396#comment-13849396
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

That's an interesting one. Given that snapshots by default have no guarantees 
wrt consistent writes between regions (or do they), seems like snapshot should 
get the latest schema in case of concurrent alter. Is there any consideration 
(other the arguably implementation issues of not recovering from close-open) 
that would prevent that? For consistent snapshots presumably the schema can be 
snapshotted first, I am assuming they don't stop the world and just take 
seqId/mvcc/ts or something, so the newer values with new schema will just not 
exist.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Entity 
 management in Master - part 1.pdf, Is the FATE of Assignment Manager 
 FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, 
 hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-12-16 Thread Jonathan Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849403#comment-13849403
 ] 

Jonathan Hsieh commented on HBASE-5487:
---

The problem isn't that you would get snapshots with inconsistent schemas if the 
two operations were issued concurrently.  It is that open is async and outside 
the table write lock which means  the snapshot would fail because the region 
may no have been open. 

This is a particular case where we would want the open routines to act 
synchronously with table alters and split daugher region opens (both open 
before table lock released and snapshot can happen).



 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Entity 
 management in Master - part 1.pdf, Is the FATE of Assignment Manager 
 FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, 
 hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849452#comment-13849452
]

Sergey Shelukhin commented on HBASE-5487:
-

IMHO this, in case of opens, promotes not being fault tolerant. In large
clusters you cannot get around servers failing and regions closing and
reopening. Snapshot should just be able to ride over that. Splits are more
interesting.
Esp. if snapshots are used more (MR over snapshots), it may be nonviable to
prevent splits and other operations for the duration of every snapshot, alter,
...

Generic framework for Master-coordinated tasks
--

--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-12-16 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849682#comment-13849682
 ] 

Andrew Purtell commented on HBASE-5487:
---

MR over snapshots is already a terrible idea from a security perspective. 

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Entity 
 management in Master - part 1.pdf, Is the FATE of Assignment Manager 
 FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, 
 hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849732#comment-13849732
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

Yet, it's a very good idea from perf perspective, and logical given that many 
large MR jobs don't need realtime data. Snapshots can still be secured, and 
table-level granularity is sufficient for most cases I'd suspect.
Regardless, it was just an example here.
MR over snapshots can be discussed HBASE-8369 :)

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Entity 
 management in Master - part 1.pdf, Is the FATE of Assignment Manager 
 FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, 
 hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-12-16 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849738#comment-13849738
 ] 

Andrew Purtell commented on HBASE-5487:
---

bq. Snapshots can still be secured

This is debatable, and that is my point for bringing it up here. All of the 
enterprise customers I interact with universally want more than table-level 
granularity, which is why we spent so much time on cell granularity features 
recently - all of which are totally defeated by MR over snapshots. 

Bringing up MR snapshots as technical justification for other arguments needs 
qualification that MR over snapshots itself may have limited applicability.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Entity 
 management in Master - part 1.pdf, Is the FATE of Assignment Manager 
 FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, 
 hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13849798#comment-13849798
]

Sergey Shelukhin commented on HBASE-5487:
-

There are other justifications... my point is that having a lock over
distributed, lengthy operations on tables, esp. with region-level component
blocking table-level ops also, is the king of all epic locks, and can cause
lots of problems, esp. in large clusters. Snapshot is just one example.

Generic framework for Master-coordinated tasks
--

--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-12-12 Thread Jonathan Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846861#comment-13846861
 ] 

Jonathan Hsieh commented on HBASE-5487:
---

Matteo and Aleks bring up an interesting case that any new master design should 
handle.  HBASE-10136

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Entity 
 management in Master - part 1.pdf, Is the FATE of Assignment Manager 
 FATE.pdf, Region management in Master.pdf, Region management in Master5.docx, 
 hbckMasterV2-long.pdf, hbckMasterV2b-long.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-12-06 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13841811#comment-13841811
]

Sergey Shelukhin commented on HBASE-5487:
-

Ah well, I never got to part 2. Did you guys make progress on this? I may have
time to resurrect this again soon.

Generic framework for Master-coordinated tasks
--

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-19 Thread Jonathan Hsieh (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799905#comment-13799905
]

Jonathan Hsieh commented on HBASE-5487:
---

[~saint@gmail.com] acked.

Let's post design docs here, and move discussions comparing them to the mailing
list.

[~sershe] Let's name threads prefixed with [hbase-5487] in subject, and maybe
rename subject lines if we get into a more focused discussion that warrants it
own thread (it one part gets long), and in general reply inline. (I found this
interesting http://en.wikipedia.org/wiki/Posting_style).

I'll start by copying and pasting unresolved parts of the response-reply above
to the dev mailing list.

Generic framework for Master-coordinated tasks
--

Key: HBASE-5487
URL: https://issues.apache.org/jira/browse/HBASE-5487
Project: HBase
Issue Type: New Feature
Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
Attachments: Entity management in Master - part 1.pdf,
hbckMasterV2b-long.pdf, hbckMasterV2-long.pdf, Is the FATE of Assignment
Manager FATE.pdf, Region management in Master5.docx, Region management in
Master.pdf

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-18 Thread Devaraj Das (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799465#comment-13799465
]

Devaraj Das commented on HBASE-5487:

Quick comments:
1. Master knows of all external updates to the system store - Are there such
updates happening without master's knowledge
2. I presume once the client is told an operation is accepted, it would be
saved/queued somewhere so even if a different node picks up the master's
duties, it can execute the operation. Related to that is that the master should
be able to get back with the correct return code for the operation even in the
case of fail-overs. Also, the master could have triggered some operations
like shutdown handling that should be completed.
3. I think we should support asynchronous operations (submit an operation and
check periodically or something). There is no guarantee when a certain
operation will complete especially when the operation requires co-ordination
with other nodes and/or the node is falling behind in executing operations. We
shouldn't force the model to be synchronous (we do not want to hold up precious
node resources which we will in synchronous mode).
4. Maybe, we should explicitly state handling cases where the master sends a
region operation to a regionserver and the regionserver doesn't get back within
some timeout, as one of the requirements. Fencing the regionserver etc are the
possible actions when this happens.
5. Should we fail-fast on the client side in case of conflicts? For example, if
a client issued drop table and this operation is in progress. Another client
comes in and says create table with the same name. We should allow clients to
read the store without going through the master.
6. Wondering whether we need to differentiate priorities/ordering/etc. for
operations like move region initiated by the master/balancer versus initiated
by the user. Who wins, etc. These operations are advanced and won't be
commonplace but worth calling it out?

Generic framework for Master-coordinated tasks
--

Key: HBASE-5487
URL: https://issues.apache.org/jira/browse/HBASE-5487
Project: HBase
Issue Type: New Feature
Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
Attachments: Entity management in Master - part 1.pdf, Region
management in Master5.docx, Region management in Master.pdf

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-18 Thread Jonathan Hsieh (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799576#comment-13799576
]

Jonathan Hsieh commented on HBASE-5487:
---

Yesterday, I shared with sergey and some of the folks interested this a draft
of the design I've been working on (I'll call it the hbck-master) and a list of
questions related to Sergey's design. Since sergey's has got master5 in the
name of the doc I'll refer to it as master5. He's answered some question in
email but we should do technical discussions out here. We'll be working
together to hash out holes in each others designs and potentially merge
designs.

I have a lot of questions. I'll hit the big questions first. Also would i be
possible to put a version of this up as gdoc so we can point out nits and
places that need minor clarification? (I have a marked up physical copy
version of the doc, would be easier to provide feedback).

Main Concerns:

What is a failure and how do you react to failures? I think the master5 design
needs to spend more effort to considering failure and recovery cases. I claim
there are 4 types of responses from a networked IO operation - two states we
normally deal with ack successful, ack failed (nack) and unknown due to timeout
that succeeded (timeout success) and unknown due to timeout that failed
(timeout failed). We have historically missed the last two timeout cases or
assumed timeout means failure nack. It seems that master5 makes the same
assumptions.

I'm very concerned about what we need to do to invalidate information cached RS
information at clients in the case of hang, and that will violate the isolation
guarantees that we claim to provide. I really want a slice in-depth failure
handling case analysis including the master with cached rs assignments for move
and something more complicated such as split or alter.

I really want more invariant specified for the FSM states. e.g. if a region is
in state X, does it have a row in meta? does have data on the FS? is it open on
another region? is it open on only one region? I think having 8 pages of tables
at the back of the master5 doc can be more concise and precise which will help
us get attempt to prove correctness.

Clarification questions:

1) State update coordination. What is a state updates from the outside Do
RS's initiate splitting on their own? Maybe a picture would help so we can
figure out if it is similar or different from hbck-master's?

2) Single point of truth. What is this truth? what the user specficied
actions? what the rs's are reporting? the last state we were confirmed to be
at? hbck-master tries to define what single point of truth means by defining
intended, current, and actual state data with durability properties on each
kind. What do clients look at who modifies what?

3) Table record: if regions is out of date, it should be closed and reopened.
It is not clear in master5 how regionservers find out that they are out of
date. Moreover, how do clients talking to those RS's with stale versions know
they are going to the correct RS especially in the face of RS failures due to
timeout?

4) region record: transition states. Shouldn't be defined as part of the
region record? (This is really similar to hbck-masters current state and
intended state. )

5) Note on user operations: the forgetting thing is scary to me -- in your move
split example, what happens if an RS reads state that is forgotten?

6) table state machine. how do we guarantee clients are not writing to against
out of date region versions? (in hang situations, regions could be open on
multple places -- the hung RS and the new RS the region was assigned to and
successfully opened on)

7) region state machine. Earlier draft hand splitting and merge cases. Are
they elided in master5 or are not present any more. How would this get extended
handle jeffrey's distributed log replay/fast write recovery feature?

8) logical interactions: sounds like master5 allows concurrent operations in
specfiic regions and and specfiic table. (e.g. it will allow moves and splits
and merges on the same region). hbck-master (though not fully documented) only
allows certain region transitions when the table is enabled or if the table is
disabled. Are we sure we don't get into race conditions? What happens if
disable gets issued -- its possible for someone to reopens the region and for
old clients to continue writing to it even though it is closed?

nit. 9) in cursive mean in italics. :)

10) The table operations section have tables which I believe are the actions
between FSM states in the table or region fsms. Is this correct? Can the
edges be labeled to describe which steps these transitions correspond to?

Short doc:
nit: Design Constraints, code should: Have AM logic isolated from the
persistent storage of state.

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-18 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799619#comment-13799619
]

Sergey Shelukhin commented on HBASE-5487:
-

Answers lifted from email also (some fixes + one answer was modified due to
clarification here :)).

bq. What is a failure and how do you react to failures? I think the master5
design needs to spend more effort to considering failure and recovery cases. I
claim there are 4 types of responses from a networked IO operation - two
states we normally deal with ack successful, ack failed (nack) and unknown due
to timeout that succeeded (timeout success) and unknown due to timeout that
failed (timeout failed). We have historically missed the last two cases and
they aren't considered in the master5 design.

There are a few considerations. Let me examine if there are other cases than
these.
I am assuming the collocated table, which should reduce such cases for state
(probably, if collocated table cannot be written reliably, master must
stop-the-world and fail over).
When RS contacts master to do state update, it errs on the side of caution - no
state update, no open region (or split).
Thus, except for the case of multiple masters running, we can always assume RS
didn't online the region if we don't know about it.
Then, for messages to RS, see Note on messages; they are idempotent so they
can always be resent.

bq. 1) State update coordination. What is a state updates from the outside
Do RS's initiate splitting on their own? Maybe a picture would help so we can
figure out if it is similar or different from hbck-master's?

Yes, these are RS messages. They are mentioned in some operation descriptions
in part 2 - opening-opened, closing-closed; splitting, etc.

bq. 2) Single point of truth. hbck-master tries to define what single point
of truth means by defining intended, current, and actual state data with
durability properties on each kind. What do clients look at who modifies what?

Sorry, don't understand the question. I mean single source of truth mainly
about what is going on with the region; it is described in design
considerations.
I like the idea of intended state, however without more detailed reading I am
not sure how it works for multiple ops e.g. master recovering the region while
the user intends to split it, so the split should be executed after it's opened.

bq. 3) Table record: if regions is out of date, it should be closed and
reopened. It is not clear in master5 how regionservers find out that they are
out of date. Moreover, how do clients talking to those RS's with stale versions
know they are going to the correct RS especially in the face of RS failures due
to timeout?

On alter (and startup if failed), master tries to reopen all regions that are
out of date.
Regions that are not opened with either pick up the new version when they are
opened, or (e.g. if they are now Opening with old version) master discovers
they are out of date when they are transitioned to Opened by RS, and reopens
them again.

As for any case of alter on enabled table, there are no guarantees for clients.
To provide these w/o disable/enable (or logical equivalent of coordinating all
close-s and open-s), one would need some form of version-time-travel, or
waiting for versions, or both.

bq. 4) region record: transition states. This is really similar to
hbck-masters current state and intended state. Shouldn't be defined as part of
the region record?

I mention somewhere that could be done. One thing is that if several paths are
possible between states, it's useful to know which is taken.
But do note that I store user intent separately from what is currently going
on, so they are not exactly similar as far as I see.

bq. 5) Note on user operations: the forgetting thing is scary to me -- in your
move split example, what happens if an RS reads state that is forgotten?

I think my description of this might be too vague. State is not forgotten;
previous intent is forgotten. I.e. if user does several operations in order
that conflict (e.g. split and then merge), the first one will be canceled
(safely :)).
Also, RS does not read state as a guideline to what needs to be done.

bq. 6) table state machine. how do we guarantee clients are writing from the
correct version in the in failures?

The intent is to fence the WAL for region server, the way we do now. One could
also use other mechanism.
Perhaps I could specify it more clearly; I think the problem of making sure RS
is dead is nearly orthogonal.
In my model, due to how opening region is committed to opened, we can only be
unsure when the region is in Opened state (or similar states such as Splitting
which are not present in my current version, but will be added).
In that case, in absence of normal transition, we cannot do literally anything
with the region unless

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-18 Thread stack (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799634#comment-13799634
]

stack commented on HBASE-5487:
--

Suggest moving the out on the dev mailing list as per the bible quoted below.
Start a thread there?

From Producing Open Source Software:

Make sure the bug tracker doesn't turn into a discussion forum.
Although it is important to maintain a human presence in the bug
tracker, it is not fundamentally suited to real-time discussion. Think
of it rather as an archiver, a way to organize facts and references
to other discussions, primarily those that take place on mailing lists.

There are two reasons to make this distinction. First, the bug
tracker is more cumbersome to use than the mailing lists (or than
real-time chat forums, for that matter). This is not because bug
trackers have bad user interface design, it's just that their interfaces
were designed for capturing and presenting discrete states, not
free-flowing discussions. Second, not everyone who should be
involved in discussing a given issue is necessarily watching the bug
tracker. Part of good issue management...is to make sure each issue
is brought to the right peoples' attention, rather than requiring every
developer to monitor all issues. In the section called “No
Conversations in the Bug Tracker” in
Chapter 6, Communications, we'll look at ways to make sure people
don't accidentally siphon discussions out of appropriate forums
and into the bug tracker.

Pg. 50 of http://producingoss.com/en/producingoss.pdf

Generic framework for Master-coordinated tasks
--

Key: HBASE-5487
URL: https://issues.apache.org/jira/browse/HBASE-5487
Project: HBase
Issue Type: New Feature
Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
Attachments: Entity management in Master - part 1.pdf,
hbckMasterV2-long.pdf, Region management in Master5.docx, Region management
in Master.pdf

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-18 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799660#comment-13799660
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

We need some convention for inline responses in the mailing list (or tell me if 
there's one) :)

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, 
 hbckMasterV2-long.pdf, Region management in Master5.docx, Region management 
 in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-17 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798467#comment-13798467
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

The doc hasn't been out for long; just clarifying - anyone interested in 
providing feedback for part 1? 
It'd be really nice to start working out implementation details in part 2 with 
some confidence, and/or writing code. Should I assume lack of interest or 
silent agreement to rewrite according to part 1?

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Region 
 management in Master5.docx, Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-17 Thread Jonathan Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13798612#comment-13798612
 ] 

Jonathan Hsieh commented on HBASE-5487:
---

I'm doing a pass, will provide feedback tomorrow.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Region 
 management in Master5.docx, Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-16 Thread Nicolas Liochon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796530#comment-13796530
 ] 

Nicolas Liochon commented on HBASE-5487:


+1 for Enis' requirements list :-).
I tend to think that AM and meta should be collocated.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master5.docx, Region management in 
 Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-16 Thread Nick Dimiduk (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796855#comment-13796855
]

Nick Dimiduk commented on HBASE-5487:
-

I'm also a fan of Enis's list, particularly AM should be understandable by
simple human beings like myself.

The observation I'll add here is that AM and meta don't necessarily need to be
collocated. What is necessary is that AM maintain a strongly consistent view of
the world, at least from what I understand about the current design. That
requirement can be relaxed iff there's an explicitly distributed state
management system. Such a system is probably composed out of idempotent
operations over CRDTs.

I also question the wisdom of moving away from ZK for management of active
cluster state, primarily because in our current architecture, that component is
completely out of band of data operations. Meaning, the activities which put
stress on the configuration consensus bits are different from the operations
that put stress on a data provider. (Yes, data activity results in region
relocation, but that's a maintenance task, not direct involvement.) Moving to
dependency on collocation unnecessarily conflates those two aspects of the
system.

If the issues with Zookeeper originate from implementation details, why not fix
implementation rather than look to a new architecture? For instance, the CoreOS
folk have a little something called [etcd|https://github.com/coreos/etcd#etcd].
Raft specifically may not provide the correct kind of available consensus we
need; the idea is to examine both the baby and the bathwater.

Generic framework for Master-coordinated tasks
--

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-16 Thread Nicolas Liochon (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796905#comment-13796905
]

Nicolas Liochon commented on HBASE-5487:

bq. AM and meta don't necessarily need to be collocated
If there are separated, you double the failure probability, as you need both AM
and .META. to work. Moreover, speaking to .meta. becomes a distributed problem,
while its less the case when they are collocated (only less because of HDFS).

bq. moving away from ZK for management
I believe we will need it to determine who is the AM lead. I don't really know
about storing in zookeeper vs. meta. As Jimmy said using zookeeper to do rpc
calls seems wrong however.

I guess this can be decided later. For the requirements, I don't have anything
to add to Enis' list.

Generic framework for Master-coordinated tasks
--

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-16 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796982#comment-13796982
 ] 

ramkrishna.s.vasudevan commented on HBASE-5487:
---

Started going through this document.  With my experience with AM definitely the 
number of states we have and the dependency on ZK callback makes things bit 
difficult to understand and track and the state of truth is spread across.
In the doc, for the create table scenario there are cases where the Create 
table failure on master abort will result in a table creation that has lesser 
number of regions actually specified by the clients in the split.
The master failover part is another critical area as how we collect the alive 
and dead RS list and the list of Regions that were partially in either 
opening/closing and splitting. It is this failiure condition where we end up in 
lot of hidden areas.  
Will read the document and share the ideas if any.  

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master5.docx, Region management in 
 Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-16 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796985#comment-13796985
 ] 

ramkrishna.s.vasudevan commented on HBASE-5487:
---

HBASE-5583 is one such JIRA that handles create table failure cases.  

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master5.docx, Region management in 
 Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797043#comment-13797043
]

Sergey Shelukhin commented on HBASE-5487:
-

I don't think it can happen on create. Until all regions are moved to Closed
state after being created (atomically via multi-row tx), table won't leave
Creating state. If there's failover all regions are erased and created from
scratch. Create table is rare enough for that to work.

[~enis] Wrt req list, mostly agree, however:
bq. Bulk region operations
Can you please elaborate? Is it the same as modifying several regions' state
under multi-row lock?

bq. Region operations should be isolated from [snip] table operations
(disabling / disabled table, schema changes, etc) and cluster shutdown. AM
[snip] should NEVER know about table state (disable/disabling).
Strongly disagree with this. If we are doing bunch of balancing and user
disables a table at the same time, we have to handle it.
If user tries to force-assign regions of a table that is halfway thru create,
we have to handle this.
For alter, we need to reopen regions, which will have to work w/splits and
merges (it's covered in my doc).
For what purpose do you want to isolate them?
AM should not know about details e.g. schema logic, but it should know about
logistics.

bq. No master abort when a region’s state cannot be determined. This results in
support cases where master cannot start, and without master things become even
worse. We should “quarantine” the regions if needed absolutely.
That is dangerous. IIRC in my spec I only put master abort if somebody changes
table state under master; but in general, if region is in unknown state it's
better to make admin act, than to just silently disappear part of data - that
can lead to wrong results.
Perhaps table needs to be quaranteened then.

Generic framework for Master-coordinated tasks
--

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797368#comment-13797368
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

One more update from discussion here:
- we currently have many operations that cannot be monitored other than by side 
effects (create table), or at all. We need good way for user to wait for 
operations. Given that we send request to master, and many operations can 
recover from master failure, we cannot use simple async API with request and 
async response (at least not on the lowest level - client library can hide 
master failover and provide that API). The lowest-level master API should 
involve some sort of persistent operation cookie, so that you could still wait 
for operation after failover.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Region management in Master5.docx, Region management in 
 Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-16 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797380#comment-13797380
 ] 

stack commented on HBASE-5487:
--

Are we conflating functionality here (going by last comment above by Sergey)?  
There is AM and then there is another facility that uses AM to run sequences of 
steps to achieve an end (e.g. enable table)?   Or is the notion that a revamped 
AM would do all?  The long-running (enable a table w/ 1M regions) and 
short-term (assign region)?  If it is to do both, I suggest we call the new 
facility GOD.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Region management in Master5.docx, Region management in 
 Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797395#comment-13797395
]

Sergey Shelukhin commented on HBASE-5487:
-

You'd need some way to connect these end with what AM is doing, so AM will
have to support operations attached to its actions even if there's separate
operation management.
Moreover, you'd find out that these are not steps, they are state goals.
For example, if you are disabling table, you want to close regions. So in case
of separate operation manager, you might create tasks to close all regions. But
what if some server fails? Now some of your regions are already closed.
Separate operations to close region might fail now, but the goal is achieved.
If I start disabling table and then kill all RS-es, the table is now disabled
:) But all operations would fail.
State goals fit much more naturally in AM than steps. I want to avoid steps
as much as possible.

Stateful (as in, having separate step) multi-step operations are also hard to
coordinate. In the above example, during recovery, you don't want to reopen
region if the table is disabling, but you don't know until it's actually
disabled if the table disable is an external operation.

Generic framework for Master-coordinated tasks
--

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-16 Thread Enis Soztutar (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797409#comment-13797409
 ] 

Enis Soztutar commented on HBASE-5487:
--

I think as a mental exercise to validate the new design, we should think about 
the cases for the following issues opened recently so that we can ensure that 
these classes of problems are eliminated: 

- HBASE-9724 Failed region split is not handled correctly by AM
- HBASE-9721 meta assignment did not timeout
- HBASE-9696 Master recovery ignores online merge znode
- HBASE-9777 Two consecutive RS crashes could lead to their SSH stepping on 
each other's toes and cause master abort
- HBASE-9773 Master aborted when hbck asked the master to assign a region that 
was already online
- HBASE-9525 Move region right after a region split is dangerous
- HBASE-9514 Prevent region from assigning before log splitting is done
- HBASE-9480 Regions are unexpectedly made offline in certain failure conditions
- HBASE-9387 Region could get lost during assignment

bq. Can you please elaborate? Is it the same as modifying several regions' 
state under multi-row lock?
Bulk loading requirement is there, so that we do multiple operations in 
parallel, sending openRegions rpcs for multiple regions at the same time, and 
not doing one-by-one assignment. That is all. 

bq. That is dangerous. IIRC in my spec I only put master abort if somebody 
changes table state under master; but in general, if region is in unknown state 
it's better to make admin act, than to just silently disappear part of data - 
that can lead to wrong results.
Quaranteing the table or region is fine, but master should not be down because 
of this (for example, a region can fail to open and you would want to track how 
many times the region failed to open so that you can decide at some point that 
the region should be quarantened state (or failed open state). I think there 
was some issue the region bouncing from server to server indefinitely. 

For table operations intermixing with region operations, I'll have to read your 
updated doc. 


 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Region management in Master5.docx, Region management in 
 Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797472#comment-13797472
]

Sergey Shelukhin commented on HBASE-5487:
-

Ok, I split the doc in half. That way it will be easier to read and manager.
Part 1 is ready (as a current version), and describes high level design,
operation semantics and interaction (I think the latter might be interesting
for [~jmhsieh]
It also tries to capture the requirement lists above and high-level
implementation (whatever is agreed upon to some degree).
Please tell me if something is missing or wrong.

Part 2 I will keep attaching updates. It covers the design of operations -
state machines, exact steps, how client tracks it, how recovery works, etc. It
will follow part 1.

Generic framework for Master-coordinated tasks
--

Key: HBASE-5487
URL: https://issues.apache.org/jira/browse/HBASE-5487
Project: HBase
Issue Type: New Feature
Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
Attachments: Entity management in Master - part 1.pdf, Region
management in Master5.docx, Region management in Master.pdf

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-16 Thread Eric Newton (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797507#comment-13797507
 ] 

Eric Newton commented on HBASE-5487:


I'm sorry for asking such a basic question... could someone please comment: 
what does AM stands for?

I did a quick search through the ticket and the attachments and it didn't pop 
out at me.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Region 
 management in Master5.docx, Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797511#comment-13797511
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

AssignmentManager, a class in HBase master. Often but not always, when talking 
about it people also imply bunch of auxiliary classes around it like 
ServerShutdownHandler, RegionClosed/OpenedHandler, ZKTable, etc. Which together 
implement region assignment in HBase

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Region 
 management in Master5.docx, Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-16 Thread Feng Honghua (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797538#comment-13797538
 ] 

Feng Honghua commented on HBASE-5487:
-

bq.I also question the wisdom of moving away from ZK for management of active 
cluster state...If the issues with Zookeeper originate from implementation 
details, why not fix implementation rather than look to a new architecture?
Using system table rather than ZK to store state info is for better (cluster 
restart) performance for big cluster with such as 250K regions. Certainly if we 
change the way of using ZK ( let master be the single point to read/write ZK, 
not using ZK's watch/notify mechanism), no correctness/logic difference between 
using system table and using ZK

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Assignee: Sergey Shelukhin
Priority: Critical
 Attachments: Entity management in Master - part 1.pdf, Region 
 management in Master5.docx, Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-15 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795680#comment-13795680
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

[~jmhsieh] I am writing out very detailed operation and failover descriptions 
right now :)

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-15 Thread Jonathan Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13795737#comment-13795737
 ] 

Jonathan Hsieh commented on HBASE-5487:
---

[~sershe] Looking forward to it!  

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-15 Thread Enis Soztutar (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796362#comment-13796362
]

Enis Soztutar commented on HBASE-5487:
--

I also started a document some time ago, but never got to finish it to the
level of details I would like. However, I think we can agree on the design
goals section which I augmented from the discussion so far:

- Robust implementation
- Compressive test coverage by mocking server and region assignment states
(unit testable without MiniCluster and CM stuff)
- Bulk region operations
- Region operations should be isolated from server operations (AM vs SSH, log
splitting), and table operations (disabling / disabled table, schema changes,
etc) and cluster shutdown. AM and SSH should NEVER know about table state
(disable/disabling). Server liveness checks can only be done as an optimization
(servers can fail after the check is done)
- There should be one source of truth
- Should be compatible with master failover, and concurrent region
operations(split, RS failover, balancer, etc)
- AM should guarantee that a region can be hosted by a single region server at
any given time
- AM should be understandable by simple human beings like myself
- Actions for AM should be logged (possibly separately). We would like to be
able to construct the history for the regions from logs or some persisted
state.
- Assignment should be performant and parallelizable. We should target handling
millions of regions and thousands of servers. A single region assignment should
complete under 1 sec. (1PB data with 1 GB regions = 1M regions)
- No master abort when a region’s state cannot be determined. This results in
support cases where master cannot start, and without master things become even
worse. We should “quarantine” the regions if needed absolutely.

Generic framework for Master-coordinated tasks
--

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-15 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13796384#comment-13796384
 ] 

stack commented on HBASE-5487:
--

@enis List makes for pretty good set of requirements.  We used to talk 100k 
regions but folks are long past that now so we are behind the curve (Flurry are 
250k IIRC) and we may want to tend away from a few large regions and more 
toward many small regions if we can get AM to perform (advantages: smaller 
compression runs, easier to free up WALs, etc)

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master5.docx, Region management in 
 Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-14 Thread Jimmy Xiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794232#comment-13794232
 ] 

Jimmy Xiang commented on HBASE-5487:


Good. I think we are on the same page. 

bq.  just using it as a reliable storage.
We probably won't use ZK as a pure storage. Meta table + cache is a good 
alternative.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-14 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794304#comment-13794304
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

[~jxiang] by janitor, I mean not timeout monitor, but something picking up 
timeouts of non-master ops like open.
It's a rare case and probably never happens in int tests, but there can be a 
case where RS is taking too long to open.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-14 Thread Jimmy Xiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794318#comment-13794318
 ] 

Jimmy Xiang commented on HBASE-5487:


I see.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-14 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794792#comment-13794792
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

Ok, it's harder than I thought, I don't think I will be done today... but I 
think I have a clear picture now that covers the above feedback, so I am trying 
to cover all the failover scenarios and state conflicts.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-14 Thread Eric Newton (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794862#comment-13794862
]

Eric Newton commented on HBASE-5487:

Accumulo does manage tablet (region) assignment tracking through the metadata
table, and further, uses a distributed state machine to scale up a little
beyond a single master node. I have been meaning to write it up, but I've not
had a chance.

I've not kept up with every HBase improvement, so I don't know if it is
pertinent... the accumulo metadata table is typically spread out over 50 - 100%
of the available tablet (region) servers.

Still, the metadata table, and especially the root table(t), is subject to
hot-spotting on large map/reduce jobs where hundreds (or thousands) of clients
are learning tablet locations at the same time. Block caching is important,
but at some point massive numbers of simultaneous RPC requests to a single node
cause delays, or even timeouts and failures.

But using accumulo to store accumulo state has scaled well.

Accumulo has 2 frameworks for master tasks:

* master general state processing: a table should be online, assignments are
recorded and servers repeatedly informed
* FATE processing, where multi-stage operations are saved, executed and
progress is re-recorded

The first is general maintenance: keeping the system running. Tablets are
assigned, unassigned and in-general balanced.

The second allows for temporal deviance: tablets are taken offline for a merge,
for example. The step-by-step allocation of resources and state are walked,
each step recording progress in zookeeper.

Generic framework for Master-coordinated tasks
--

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-14 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794892#comment-13794892
 ] 

stack commented on HBASE-5487:
--

Thanks for the helpful input [~ecn]

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-14 Thread Jonathan Hsieh (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13794915#comment-13794915
]

Jonathan Hsieh commented on HBASE-5487:
---

FYI, I've been looking at our support cases, and have been thinking and writing
up a clean slate design for a master redesign with the problems we've faced in
the field in mind. I focus a bit more on invariants necessary in the different
states, state transitions with master interactions, extensibility of the model,
and on the recovery strategy. It basically takes a pessimistic view of the
world and if I had to summarize its spirit, I'd call it the
hbck-all-the-time.master.

It is currently durable storage agnostic but requires atomic CAS operations
(single row or single znode should be sufficient). When I re-read this thread
it could use either of the implementation details described here (zk vs meta,
etc). It sounds like being based in hbase is preferred so a little more
thought is going in that direction. I'm working currently on examples of how
to extend it for new features currently (like fast write recovery aka
distributed log replay) and proving to myself that it would be immune from
problems we've encountered before like double assignments, conflicting
concurrent operations (especially during recovery), and regions stuck in
transitions in the face of failures, hangs or juliet pauses.

I read Sergey's doc after my first cut and while there are some similarities it
deviates in other places. (I definitely want more on the error recovery and
error prevention mechanics). My hope is to share it sometime this later week so
that folks can read, discuss and compare the different designs presented at the
upcoming dev meeting. Before and jirae are file for implementation also
consider things like upgrades, compatibility and performance.

I'm also hoping I'll have time to take a look at the accumulo master's design
as well for the discussion.

Generic framework for Master-coordinated tasks
--

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-13 Thread Feng Honghua (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793881#comment-13793881
]

Feng Honghua commented on HBASE-5487:
-

[~jxiang]
bq. to the uncertainty due to ZK, I don't think it is because the way how we
use it. It is more because ZK doesn't support continuous events. You have to
set the watch again after each event callback. The problem is that after an
event is triggered, when we try to get the data, the data could be changed
again so an event is missed that will cause state jump.
Agree. 'one-time watch' and 'asynchronous event notification' are the root
cause of current AM problem ( I mentioned in above comment, you can find it:-)
). And when I said 'because the way we use it', I meant we use ZK's watch/event
mechanism: A process(RS) updates ZK, and B process(master) gets notified the
update via watch event. If we use ZK just as a reliable storage, just the way
of using meta table, it makes no difference we use meta table or ZK (except
performance difference)
In the theme of using meta table, we adopt another communication pattern for
tasks(assign/split/merge): master requests RS to do something(and master stores
the task progress/state to meta table), RS responses master of its progress
periodically, master changes the task progress in both memory and meta table...
---under this theme we can use ZK to replace meta table, and avoid previous
state transition miss problem as well, since we don't use ZK's watch/event
mechanism, just using it as a reliable storage. right?
Just clarify, I think we share the same understanding of this problem, you can
check my above comments :-)

Generic framework for Master-coordinated tasks
--

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-12 Thread Jimmy Xiang (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793436#comment-13793436
]

Jimmy Xiang commented on HBASE-5487:

[~fenghh], to the uncertainty due to ZK, I don't think it is because the way
how we use it. It is more because ZK doesn't support continuous events. You
have to set the watch again after each event callback. The problem is that
after an event is triggered, when we try to get the data, the data could be
changed again so an event is missed that will cause state jump.

Currently, we do have a region state machine. However, the machine is not
strict due to the ZK thing. We could jump over some state, which make the
state transition machine can't be strictly enforced. If we go without ZK, we
can have a strict state machine to follow. That will make things much
predictable.

[~sershe], to the janitor, I think we don't need it. Currently, we have a
timeout monitor. But it is disabled and will be removed soon I think. Without
the monitor, ITBLL with CM runs very well. With 0.96 tip, I tried to run ITBLL
with CM with aggressive region moving, and it is perfectly fine. If a RS is
gone, SSH should handle it properly and assign regions. If there is a janitor,
it will compete with SSH in this case, which probably does more harm than good.

To make some RS to serve the role of master, besides we can have meta on it, we
can have some (not all, of course, to make [~jesse_yates] happy :) ) system
tables on it too. This way, we can support level region assignments, i.e. we
can open some regions before the rest, if these regions can be assigned to the
master RS, or we can open on this master RS at first, then move away later
after system is fully started. This applies to some special regions only for
sure.

Now, we bundle two import modules (master + meta) in one RS. It is critical to
make sure it has light load, not die too often (even better, not die at all).
So I think we should move other regions out of the RS once it's promoted to be
the master one.

I think we should allow only a list of RS with good hardware to be master, if
not all RS nodes have decent/same hardware.

Generic framework for Master-coordinated tasks
--

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-11 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793086#comment-13793086
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

I think it's the approach discussed above.

I will update the doc on monday, I think I'm sold on collocated system table. 
Initially we can just run an RS that runs master library and only hosts 
hardcoded system regions as master.
Then probably any RS (with caveats) can host the master regions and act as 
master, so recovery can become much easier.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-11 Thread Aaron T. Myers (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13793208#comment-13793208
]

Aaron T. Myers commented on HBASE-5487:
---

bq. ZK is used for the selection of the primary nn (via the failure
controllers) but I believe the journal nodes (that do the durable consensus
logging) does not use ZK at all. Todd Lipcon or Aaron T. Myers can confirm.

I can confirm this. The QJM in the NN uses its own (heavily ZK-inspired)
consensus protocol, but does not rely on ZK itself. The only thing HDFS
currently uses ZK for is for the leader election of the active NN, as Jon says
here.

Generic framework for Master-coordinated tasks
--

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-10 Thread Feng Honghua (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791516#comment-13791516
]

Feng Honghua commented on HBASE-5487:
-

bq.Master is the Actor. Having it go across a network to get/set the 'state' in
a service that is non-transactional wasn't our smartest move.
Regionservers currently report state via ZK. Master reads it from ZK. Would be
better if RS just reported directly to RS.
[~stack] Yes, this is exactly what I proposed in HBASE-9726 :-)

bq.I am wondering whether it makes sense to update the meta table from the
various regionservers on the region state changes or go via the master.. But
maybe the master doesn't need to be a bottleneck if possible. A regionserver
could first update the meta table, and then just notify the master that a
certain transition was done; the master could initiate the next transition
[~devaraj] It would be better to let master updates the meta table rather than
let various regionservers do it. Master being the single actor and
truth-maintainer can avoid many tricky bugs/problems. And for frequent state
changes to the meta table, the regionserver serving the (state) meta table
would be sooner the bottleneck than master which issues the update requests, so
whether it doesn't matter the update requests are from the master or from
various regionservers.

bq.I prefer not to use ZK since it's kind of the root cause of uncertainty: has
the master/region server got/processed the event? has the znode hijacked since
master/region server changes its mind?
We should store the state in meta table which is cached in the memory.
Whether to use coprocessor it is not a big concern to me. If we don't use
coprocessor, I prefer to use the master as the proxy to do all meta table
updates. Otherwise, we need to listen to something for updates.
[~jxiang] Agree. IMO ZK alone is not the root cause of uncertainty, the current
usage pattern of ZK is the root cause, the pattern that regionserver updates
state in ZK and master listens to the ZK and updates states in its local memory
accordingly exhibits too many tricky scenarios/bugs due to ZK watch is
one-time(which can result in missed state transition) and the
notification/process is asyncronous(which can lead to
delayed/non-update-to-date state in master memory). And by replacing ZK with
meta table, we also need to discard this 'RS updates - master listen' pattern
since meta table inherently lack listen-notify mechanism:-).

bq.I think ZK got a bad reputation not on its own merit, but on how we use it.
I can see that problems exist but IMHO advantages outweigh the disadvantages
compared to system table.
Co-located system table, I am not so sure, but so far there's no even
high-level design for this (for example - do all splits have to go thru
master/system table now? how does it recover? etc.).
Perhaps we should abstract an async persistence mechanism sufficiently and then
decide. Whether it would be ZK+notifications, or system table, or memory + wal,
or colocated system table, or what.
The problem is that the usage inside master of that interface would depend on
perf characteristics.
Anyway, we can work out the state transitions/concurrency/recovery without
tying 100% to particular store.
[~sershe] Agree on ZK got a bad reputation not on its own merit, but on how we
use it., especially if you mean currently master relies on ZK
watch/notification to maintain/update master's in-memory region state. IMO this
is almost the biggest root cause of current assignment design. If we just uses
ZK the same way as using meta table to storing states, it makes no that big
difference to store the states in ZK or meta table, right(except using meta
table can have much better performance for restart of a big cluster with large
amount of regions)? But using ZK's update/listen pattern does make the
difference.

bq.btw, any input on actor model?
Things queue up operations/notifications (ops) for master; AM runs on timer
or when queue is non-empty, having as inputs, cluster state (incl. ongoing
internal actions it ordered before e.g. OPENING state for a region) plus new
ops from queue, on a single thread; generates new actions (not physically doing
anything e,g, talking to RS); the ops state and cluster state is persisted;
then actions are executed on different threads (e.g. messages sent to RS-es,
etc.), and AM runs again, or sleeps for some time if ops queue is empty.
That is a different model, not sure if it scales for large clusters.
[~sershe] operations/notifications means RS responses action progress to
master? Master is the single point to update the state truth(to meta table)
and RS doesn't know where the states are stored and doesn't access them
directly, right? I think a communication/storage diagram can help a lot for an
overall clear understanding here:-)

Generic framework for

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-10 Thread Feng Honghua (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791520#comment-13791520
 ] 

Feng Honghua commented on HBASE-5487:
-

Since HBASE-9726 is closed as duplicated with this one, I copied the proposal 
of HBASE-9726 here for discussion/reference:

Current assignment process (also split process) relies on ZK for the 
communication between master and regionserver. This pattern has two drawbacks: 
  1. For cluster with big number of regions(say, 10K-100K regions), ZK becomes 
the bottleneck for cluster restart since the assignment/split status/progress 
is stored in ZK due to ZK's limited write throughput 
  2. Since ZK's watch is one-time and the event notification/process is 
asynchronous, there is no guarantee for master(the watcher) to be notified of 
the up-to-date status/progress in time, thereby master relies on idempotence 
for its correctness, which makes the logic/code very hard to 
understand/maintain 

A new assignment design proposal is as below: 
  1. Assignment/split status/progress is stored in a system table(say 
'assignTable') as meta table rather than ZK to improve the write throughput, 
hence to improve the proformance of restart for cluster with large number of 
regions. 
  2. The communication pattern for assignment/split is changed this way: master 
talks directly with regionserver(master issues assign request to regionserver, 
regionserver responses the assign progress to master) and records the 
status/progress of each assignment/split in the 'assignTable', in case of 
master failure, new active master reads the 'assignTable' to rebuilds the 
knowledge of the ongoing assignmeng/split tasks and continues from that 
knowledge. (regionserver doesn't write to the 'assignTable') 

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-09 Thread Devaraj Das (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790109#comment-13790109
]

Devaraj Das commented on HBASE-5487:

bq. we still need a reliable store (ZK, system table, or master WAL). It seems
ZK is the most scalable and best suited for the task

[~sershe], not ZK, IMHO. Let's use one of our internal storages rather than
external system for storing the region state. I am all for removing ZK
altogether from HBase. One less distributed system to worry about. One less
component to manage. We already have heartbeats from RSs to master, and region
open/close RPCs from master to the RSs. I think we have enough communication
already in place between the master and RSs to deal with region states We
also have chores in the master that tries to take some actions based on
assignment timeouts...

Would this model work (conceptually). It's late night here; please pardon me if
there are glaring issues :-) Please bear with me :-)

All region state manipulation operations are initiated by the master and they
act upon the meta region. We have extra columns to store the state of the
region etc in the meta table. The initial rows are created by the master and
the state of the regions are UNASSIGNED. This is not new - we already do this
but IIRC we don't store the state of the region. Some state transitions happen
through method executions and some of those method executions are RPCs from the
master to some regionserver. I think that the states would be more granular
here (to prevent potential replay/repetitions of large operations). I am
wondering whether it makes sense to update the meta table from the various
regionservers on the region state changes or go via the master.. But maybe the
master doesn't need to be a bottleneck if possible. A regionserver could first
update the meta table, and then just notify the master that a certain
transition was done; the master could initiate the next transition ([~eclark]
comment about coprocessor can probably be made to apply in this context). Only
when a state change is recorded in meta, the operation is considered successful.

Also, there is a chore (probably enhance catalog-janitor) in the master that
periodically goes over the meta table and restarts (along with some
diagnostics; probing regionservers in question etc.) failed/stuck state
transitions. This chore runs once as soon as the master is started and the meta
region is assigned to take care of transitions that were started in the
previous life of the master and which are now waiting for some action from the
master. For example, if the state was OPENING for a certain region, and the
master crashed, the master would send a openRegion RPC to the region assignee
upon restart. The region assignee would have been recorded as a column in the
row for the region by the previous master.

I think we should also save the operations that was initiated by the client on
the master (either in WAL or in some system table) so that the master doesn't
lose track of those and can execute them in the face of crashes restarts. For
example, if the user had sent a 'split region' operation and the master crashed.

Generic framework for Master-coordinated tasks
--

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-09 Thread Nicolas Liochon (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790151#comment-13790151
]

Nicolas Liochon commented on HBASE-5487:

bq. zk vs. non zk.
ZK is used in HDFS HA, no? So any way we have it in our architecture. Then
using it for permanent data is another discussion (stuff like ZOOKEEPER-1147
makes it interesting.
I would personally prefer to remove the master rather than adding functions to
it. Saying that there are some specific threads in the region servers holding
.meta. is acceptable imho.

Generic framework for Master-coordinated tasks
--

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-09 Thread Jonathan Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790172#comment-13790172
 ] 

Jonathan Hsieh commented on HBASE-5487:
---

ZK is used for the selection of the primary nn (via the failure controllers) 
but I believe the journal nodes (that do the durable consensus logging) does 
not use ZK at all. [~tlipcon]or [~atm] can confirm.


 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-09 Thread Devaraj Das (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790433#comment-13790433
 ] 

Devaraj Das commented on HBASE-5487:


Removing the separate master daemon is fine by me, Nicolas. However, we still 
need someone to do various operations (servicing user requests and other 
janitorial tasks). Long back we were discussing that a random region server 
(elected via zk) could perform the master role.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-09 Thread Jimmy Xiang (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790442#comment-13790442
]

Jimmy Xiang commented on HBASE-5487:

I prefer not to use ZK since it's kind of the root cause of uncertainty: has
the master/region server got/processed the event? has the znode hijacked since
master/region server changes its mind?

We should store the state in meta table which is cached in the memory.

Whether to use coprocessor it is not a big concern to me. If we don't use
coprocessor, I prefer to use the master as the proxy to do all meta table
updates. Otherwise, we need to listen to something for updates.

We should not have another janitor/chore. If an action is failed, it must be
because of something unrecoverable by itself, not because of a bug in our code.
It should stay failed until the issue is resolved.

We need to have something like FATE in accumulo to queue/retry actions taking
several steps like split/merge/move.

It is a nice-to-have to keep a history of region state transition.

Generic framework for Master-coordinated tasks
--

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-09 Thread Nicolas Liochon (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790451#comment-13790451
]

Nicolas Liochon commented on HBASE-5487:

bq. However, we still need someone to do various operations (servicing user
requests and other janitorial tasks).
Yeah, I agree. Balance is a good example. Less say that I'm more comfortable w/
something that lowers the role of the master than the opposite.

bq. I prefer not to use ZK
When you say this, Jimmy, do you mean no ZK in HBase at all, or No ZK for
permanent data, or No ZK at all for assignment?

bq. We should store the state in meta table which is cached in the memory.
I'm fine with that (if we can make it work :-) )

Generic framework for Master-coordinated tasks
--

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-09 Thread Jimmy Xiang (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13790460#comment-13790460
 ] 

Jimmy Xiang commented on HBASE-5487:


Nicolas, I mean no ZK for assignment.

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-09 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791007#comment-13791007
]

Sergey Shelukhin commented on HBASE-5487:
-

Big response to not-responded-to recent comments.
Let me update the doc, EOW-ish probably depending on the number of bugs
surfacing ;)

[~stack]
Let's keep discussion and doc here and branch tasks out for rewrites.
bq. + The problem section is too short (state kept in multiple places and all
have to agree...); need more full list so can be sure proposal addresses them
all
What level of detail do you have in mind? It's not a bug fix, so I cannot
really say merge races with snapshot, or something like that; that could also
be arguably resolved by another 100k patch to existing AM :)
bq. + How is the proposal different from what we currently have? I see us tying
regionstate to table state. That is new. But the rest, where we have a record
and it is atomically changed looks like our RegionState in Master memory? There
is an increasing 'version' which should help ensure a 'direction' for change
which should help.
See the design principles (and below discussion :)). We are trying to avoid
multiple flavors of split-brain state.
bq. Its fine having a source of truth but ain't the hard part bring the system
along? (meta edits, clients, etc.).
Yes :)
bq. Experience has zk as messy to reason with. It is also an indirection having
RS and M go to zk to do 'state'.
I think ZK got a bad reputation not on its own merit, but on how we use it.
I can see that problems exist but IMHO advantages outweigh the disadvantages
compared to system table.
Co-located system table, I am not so sure, but so far there's no even
high-level design for this (for example - do all splits have to go thru
master/system table now? how does it recover? etc.).
Perhaps we should abstract an async persistence mechanism sufficiently and then
decide. Whether it would be ZK+notifications, or system table, or memory + wal,
or colocated system table, or what.
The problem is that the usage inside master of that interface would depend on
perf characteristics.
Anyway, we can work out the state transitions/concurrency/recovery without
tying 100% to particular store.

bq. + Agree that master should become a lib that any regionserver can run.
That sounds possible.

[~nkeywal]
bq. At least, we should make this really testable, without needing to set up a
zk, a set of rs and so on.
+1, see my comment above.
bq. I really really really ( ) think that we need to put performances as a
requirement for any implementation. For example, something like: on a cluster
with 5 racks of 20 regionserver each, with 200 regions per RS,, the assignment
will be completed in 1s if we lose one rack. I saw a reference to async ZK in
the doc, it's great, because the performances are 10 times better.
We can measure and improve, but I am not really sure about what exact numbers
will be, at this stage (we don't even know what storage is).

[~devaraj]
bq. A regionserver could first update the meta table, and then just notify the
master that a certain transition was done; the master could initiate the next
transition (Elliott Clark comment about coprocessor can probably be made to
apply in this context). Only when a state change is recorded in meta, the
operation is considered successful.
Split, for example, requires several changes to meta. Will master be able to
see them together from the hook? If master is collocated in the same RS with
meta, it should be small overhead to have master RPC.

bq. Also, there is a chore (probably enhance catalog-janitor) in the master
that periodically goes over the meta table and restarts (along with some
diagnostics; probing regionservers in question etc.) failed/stuck state
transitions.
+1 on that. Transition states can indicate the start ts, and master will know
when they started.

bq. I think we should also save the operations that was initiated by the client
on the master (either in WAL or in some system table) so that the master
doesn't lose track of those and can execute them in the face of crashes
restarts. For example, if the user had sent a 'split region' operation and the
master crashed
Yeah, disable table or move region are a good example. Probably we'd need
ZK/system table/WAL for ongoing logical operations.

[~jxiang]
bq. We should not have another janitor/chore. If an action is failed, it must
be because of something unrecoverable by itself, not because of a bug in our
code. It should stay failed until the issue is resolved.
I think the failures meant are things like RS went away, is slow or buggy, so
OPENING got stuck - someone needs to pick it up over timeout.

bq. We need to have something like FATE in accumulo to queue/retry actions
taking several steps like split/merge/move.
We basically need something that allows atomic state

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-09 Thread Sergey Shelukhin (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13791012#comment-13791012
]

Sergey Shelukhin commented on HBASE-5487:
-

btw, any input on actor model?
Things queue up operations/notifications (ops) for master; AM runs on timer
or when queue is non-empty, having as inputs, cluster state (incl. ongoing
internal actions it ordered before e.g. OPENING state for a region) plus new
ops from queue, on a single thread; generates new actions (not physically doing
anything e,g, talking to RS); the ops state and cluster state is persisted;
then actions are executed on different threads (e.g. messages sent to RS-es,
etc.), and AM runs again, or sleeps for some time if ops queue is empty.

That is a different model, not sure if it scales for large clusters.

Generic framework for Master-coordinated tasks
--

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-08 Thread Elliott Clark (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789470#comment-13789470
]

Elliott Clark commented on HBASE-5487:
--

bq.we still need a reliable store
HBase is a reliable store. We should be using it as such for current state.

If we co-locate the master process with meta, then the master noticing state
changes is as simples as loading a co-processor that hooks mutations. It also
means that when master wants to look up current state there's no rpc overhead.
Simply target the hregion. This allows us to reduce the number of copies of
state. No longer will we need a local hash map + what's in zk, + what's in
meta.

I think Jimmy's correct we should use zk for ephemeral only. Everything else
should be in our systems.

Generic framework for Master-coordinated tasks
--

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789493#comment-13789493
]

Sergey Shelukhin commented on HBASE-5487:
-

Please let's not use coprocessors for mainline functionality... also, if we
store state in system table that is hosted by master, then we don't need ZK at
all, we should get rid of it.
The only disadvantages from using ZK that I see are the absence
getKeyBefore/After API (easy to fix by having ephemeral META table for clients
to query), and having extra moving part. If we don't get rid of ZK we don't
alleviate the latter so I think we should either use it for everything or not
at all... I would prefer to use it for everything.
As far as I see, ZK is more reliable than HBase RS or master, has built-in
replication with faster recovery, is probably more scalable than reading from
single RS, and has better model for atomic state changes. Probably has better
tolerance for stuff like network partitioning too. We could do master WAL and
all that stuff but I don't see a compelling reason to do this when we have a
bunch of Apache code that is already written to solve all of these problems.
What is the reason to not use ZK? What is the advantage of system table, or
disadvantage of ZK?

Generic framework for Master-coordinated tasks
--

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-08 Thread Elliott Clark (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789576#comment-13789576
]

Elliott Clark commented on HBASE-5487:
--

bq.Please let's not use coprocessors for mainline functionality
We already do. I don't see anything wrong with making HBase more modular. If
there are pain points with using co-processors that cause you to say no, then
we should fix those. Not just ignore them.

bq.also, if we store state in system table that is hosted by master, then we
don't need ZK at all, we should get rid of it.
We don't have ephemeral node capability at all. And we need it for the
bootstrap problem. It allows clients to point at a relatively small number of
nodes to discover the whole cluster.

bq.As far as I see, ZK is more reliable than HBase RS or master
Our master is only complex because of our use of zk to hold and mutate state.

bq.has built-in replication with faster recovery
With the meta/system wal I think we can be within an order of magnitude.

Generic framework for Master-coordinated tasks
--

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789598#comment-13789598
]

Sergey Shelukhin commented on HBASE-5487:
-

Wrt coprocs - that is bad imho, that is not the kind of modular that we want.
Core parts of the system should depend on well-defined interfaces, not a
generic extension points. Imho, the litmus test for coproc, as a plugin
interface, is - can you run HBase without it? If yes, then it's ok to be a
coproc (e.g. accesscontrol). Otherwise we should have proper interfaces that
have some meaning to the caller.

bq. Our master is only complex because of our use of zk to hold and mutate
state.
That is not due to ZK as such, that is due to multi-state-machine
reconciliation model and truth in multiple places that it requires.
System table can have exact same problem of state in the table + state in
memory, question is how you split and manage state between them, storage
substrate doesn't matter as much. If truth was in ZK and nowhere else that
wouldn't be a problem, same way as with system table.
Also, by reliable I meant that ZK is multiple nodes with built-in master
recovery by design, whereas with master you need at least HA, and still it's
probably worse than ZK in case of failure.
There are also other things that I mentioned.

bq. With the meta/system wal I think we can be within an order of magnitude.
So, why would we write a bunch of new code to get within an order of
magnitude? I don't see an advantage, or ZK disadvantage that you mention
compared to multiple advantages of ZK.
Esp. if we cannot totally get rid of it, so we'll have an extra service
regardless.

Generic framework for Master-coordinated tasks
--

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks


[ 
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789612#comment-13789612
 ] 

Sergey Shelukhin commented on HBASE-5487:
-

Btw I agree that the main point is to get rid of the complexity you mention 
(and in the doc I only mention storage mechanism in ZK in one paragraph in the 
end), so the storage mechanism choice is almost orthogonal.
But as far as it is concerned, it seems an obvious choice to use ZK for me ATM. 
I may not know something about ZK (or system tables?), but so far the pattern 
is that meta recovery is a big deal even without bugs, and with ZK we barely 
ever have any problems. 

 Generic framework for Master-coordinated tasks
 --

 Key: HBASE-5487
 URL: https://issues.apache.org/jira/browse/HBASE-5487
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver, Zookeeper
Affects Versions: 0.94.0
Reporter: Mubarak Seyed
Priority: Critical
 Attachments: Region management in Master.pdf


 Need a framework to execute master-coordinated tasks in a fault-tolerant 
 manner. 
 Master-coordinated tasks such as online-scheme change and delete-range 
 (deleting region(s) based on start/end key) can make use of this framework.
 The advantages of framework are
 1. Eliminate repeated code in Master, ZooKeeper tracker and Region-server for 
 master-coordinated tasks
 2. Ability to abstract the common functions across Master - ZK and RS - ZK
 3. Easy to plugin new master-coordinated tasks without adding code to core 
 components



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks

2013-10-08 Thread Elliott Clark (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13789642#comment-13789642
]

Elliott Clark commented on HBASE-5487:
--

bq.Wrt coprocs - that is bad imho, that is not the kind of modular that we want.
I'm not tied to it being a co-proc. But it does illustrate the idea that it
can be done by watching mutations as they come into the normal hregion call
stack.

bq.That is not due to ZK as such, that is due to multi-state-machine
reconciliation model and truth in multiple places that it requires.
In part it's due to getting zk messages out of order, and getting them delayed.
Those pains are due in no small part because zk's client is single threaded.

bq.System table can have exact same problem of state in the table + state in
memory, question is how you split and manage state between them, storage
substrate doesn't matter as much.
But you only have the one state if you have master inside of the region server
hosting meta. There's no need to have a map of assignment if meta is actually
just a function call away. Also The same is not true at all if you want to
put state into zk. Then you need a local cache if you want to make this
performant at all (That's how we got to the current state). Putting state into
zk necessitates a split brain problem. There's what the master see and what
the outside worlds sees.

bq.So, why would we write a bunch of new code to get within an order of
magnitude?
That code is already there, and in use. We fail over meta right now in 240ms.
I was commenting on what you were saying that zk fails over faster. And that's
true but for meta we've narrowed that gap significantly. So I don't think that
ZK has that much of an advantage.

bq. I don't see an advantage, or ZK disadvantage that you mention compared to
multiple advantages of ZK
We've tried putting state into zk. That failed. I really don't want to put a
whole bunch of new code into hbase that does almost exactly the same thing as
we currently have. It's going to fail.

bq.so the storage mechanism choice is almost orthogonal.
For me it's not just about the storage. It's about co-locating storage with
the master means that these split brain problems are much rarer.

Generic framework for Master-coordinated tasks
--

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-5487) Generic framework for Master-coordinated tasks