[
https://issues.apache.org/jira/browse/HBASE-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13933578#comment-13933578
]
Jimmy Xiang edited comment on HBASE-10569 at 3/13/14 5:56 PM:
--
Attached a patch that passed unit tests, integration tests (including ITBLL),
and some live cluster tests. Will put it on RB soon when RB is up.
Here is what I have done in this patch:
# Moved RPC related code out of HRegionServer and HMaster so that they are
smaller for easier change/maintenance.
# Make HMaster extends HRegionServer so that HMaster is also a HRegionServer,
removed duplicate code/parameters.
# Due to 2, HMaster#getMetrics is renamed to getMasterMetrics to avoid naming
conflict with HRegionServer#getMetrics. The same has been done to
HMaster#getCoprocessors, #getCoprocessorHost.
# Added HRegionServer#getRpcServices and HMaster#getMasterRpcServices to expose
the RPC functionalities.
# Changed references related to 3 and 4 (a lot, especially in tests).
# HMaster and HRegionServer share one RPC server and one InfoServer.
# RpcServiceInterface is changed a little. Method #startThreads and #openServer
are removed since backup master doesn’t hold the RPC server any more. A
parameter HMaster#serviceStarted is introduced to indicate if a master is
active so as ServerNotRunningYetException can be thrown before a master is
active.
# Master recovery in case of ZK connection loss is removed since it doesn’t
recover listeners added in HRegionServer. We can get this feature back if
needed. The other reason I didn’t try to get it back is because we are going to
use raft to choose active master instead of relying on ZK.
# HRegionServer on the active HMaster communicates with the active HMaster
directly instead of going through the RPC. Shortcut helps.
# Master(active/backup) web UI contains info about the corresponding region
server.
# Backup master moves users regions away (and meta/namespace region to the
master if already assigned somewhere else) after becoming active.
# Integration testing doesn’t restart the master as a region server, or restart
the region server that holds the meta. One reason is because the startup script
can’t tell if a region server should be master.
Here is a list of things to be done (in separate issues):
# Need to make sure the master listens to the old ports (RPC + webUI) too, so
as to support rolling upgrade from old versions (0.96+), and be backward
compatible.
# Need to consolidate(?) chores/threads/handlers in master/regionserver, so
that the active master manager in the backup master has a high priority so that
it can grab the ZK node faster, before we move to raft.
# Clean up MetaServerShutdownHandler and HMaster#assignMeta in next major
release when rolling upgrade is not an issue any more. This should be done much
later.
was (Author: jxiang):
Attached a patch that passed unit tests, integration tests (including ITBLL),
and some live cluster tests. Will put it on RB soon.
Here is what I have done in this patch:
* Moved RPC related code out of HRegionServer and HMaster so that they are
smaller for easier change/maintenance.
* Make HMaster extends HRegionServer so that HMaster is also a HRegionServer,
removed duplicate code/parameters.
* Due to B, HMaster#getMetrics is renamed to getMasterMetrics to avoid naming
conflict with HRegionServer#getMetrics. The same has been done to
HMaster#getCoprocessors, #getCoprocessorHost.
* Added HRegionServer#getRpcServices and HMaster#getMasterRpcServices to expose
the RPC functionalities.
* Changed references related to C and D (a lot, especially in tests).
* HMaster and HRegionServer share one RPC server and one InfoServer.
* RpcServiceInterface is changed a little. Method #startThreads and #openServer
are removed since backup master doesn’t hold the RPC server any more. A
parameter HMaster#serviceStarted is introduced to indicate if a master is
active so as ServerNotRunningYetException can be thrown before a master is
active.
* Master recovery in case of ZK connection loss is removed since it doesn’t
recover listeners added in HRegionServer. We can get this feature back if
needed. The other reason I didn’t try to get it back is because we are going to
use raft to choose active master instead of relying on ZK.
* HRegionServer on the active HMaster communicates with the active HMaster
directly instead of going through the RPC. Shortcut helps.
* Master(active/backup) web UI contains info about the corresponding region
server.
* Backup master moves users regions away (and meta/namespace region to the
master if already assigned somewhere else) after becoming active.
* Integration testing doesn’t restart the master as a region server, or restart
the region server that holds the meta. One reason is because the startup script
can’t tell if a region server should be master.
Here is a list of things