[GitHub] [hbase] Apache9 commented on a change in pull request #237: HBASE-22408 add dead and unknown server open regions metric to AM 01
Apache9 commented on a change in pull request #237: HBASE-22408 add dead and unknown server open regions metric to AM 01 URL: https://github.com/apache/hbase/pull/237#discussion_r285327808 ## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java ## @@ -907,6 +906,20 @@ public boolean isServerOnline(ServerName serverName) { return serverName != null && onlineServers.containsKey(serverName); } + public enum ServerLiveState { +LIVE, +DEAD, +UNKNOWN + } + + /** + * @return whether the server is online, dead, or unknown. + */ + public synchronized ServerLiveState isServerKnownAndOnline(ServerName serverName) { Review comment: Oh, we also checked the deadServers. That's fine. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hbase] Apache9 commented on a change in pull request #237: HBASE-22408 add dead and unknown server open regions metric to AM 01
Apache9 commented on a change in pull request #237: HBASE-22408 add dead and unknown server open regions metric to AM 01 URL: https://github.com/apache/hbase/pull/237#discussion_r284516470 ## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java ## @@ -620,8 +620,7 @@ public synchronized boolean expireServer(final ServerName serverName) { } } - @VisibleForTesting - public void moveFromOnlineToDeadServers(final ServerName sn) { + public synchronized void moveFromOnlineToDeadServers(final ServerName sn) { Review comment: Checked the code, it is used for HBCK2 so I do not think synchronized is necessary there as it is a human operation. Anyway it is fine to synchronized for now, can review it later. But please add the VisibleForTesting back, that's why we make it public, otherwise we can make it package private. This can also be polished later. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hbase] Apache9 commented on a change in pull request #237: HBASE-22408 add dead and unknown server open regions metric to AM 01
Apache9 commented on a change in pull request #237: HBASE-22408 add dead and unknown server open regions metric to AM 01 URL: https://github.com/apache/hbase/pull/237#discussion_r284515790 ## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java ## @@ -907,6 +906,20 @@ public boolean isServerOnline(ServerName serverName) { return serverName != null && onlineServers.containsKey(serverName); } + public enum ServerLiveState { +LIVE, +DEAD, +UNKNOWN + } + + /** + * @return whether the server is online, dead, or unknown. + */ + public synchronized ServerLiveState isServerKnownAndOnline(ServerName serverName) { Review comment: Do we really need synchronized here? The onlineServers is already a ConcurrentMap. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hbase] Apache9 commented on a change in pull request #237: HBASE-22408 add dead and unknown server open regions metric to AM 01
Apache9 commented on a change in pull request #237: HBASE-22408 add dead and unknown server open regions metric to AM 01 URL: https://github.com/apache/hbase/pull/237#discussion_r284516124 ## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java ## @@ -620,8 +620,7 @@ public synchronized boolean expireServer(final ServerName serverName) { } } - @VisibleForTesting - public void moveFromOnlineToDeadServers(final ServerName sn) { + public synchronized void moveFromOnlineToDeadServers(final ServerName sn) { synchronized (onlineServers) { Review comment: The locking here is a bit strange, onlineServers is already a ConcurrentMap, and seems this is the only place where we synchronized on it. Not your fault, can review it later. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hbase] Apache9 commented on a change in pull request #237: HBASE-22408 add dead and unknown server open regions metric to AM 01
Apache9 commented on a change in pull request #237: HBASE-22408 add dead and unknown server open regions metric to AM 01 URL: https://github.com/apache/hbase/pull/237#discussion_r283597275 ## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/AssignmentManager.java ## @@ -1130,6 +1148,65 @@ protected void periodicExecute(final MasterProcedureEnv env) { } } + private static class NonLiveServerMetricRegionChore extends ProcedureInMemoryChore { +public NonLiveServerMetricRegionChore(final int timeoutMsec) { + super(timeoutMsec); +} + +@Override +protected void periodicExecute(final MasterProcedureEnv env) { + final ServerManager sm = env.getMasterServices().getServerManager(); + final AssignmentManager am = env.getAssignmentManager(); + // To minimize inconsistencies we are not going to snapshot live servers in advance in case + // new servers are added; OTOH we don't want to add heavy sync for a consistent view since + // this is for metrics. Instead, we're going to check each regions as we go; to avoid making + // too many checks, we maintain a local lists of server, limiting us to false negatives. If + // we miss some recently-dead server, we'll just see it next time. + Set recentlyLiveServers = new HashSet<>(); + int deadRegions = 0, unknownRegions = 0; + for (RegionStateNode rsn : am.getRegionStates().getRegionStateNodes()) { +if (rsn.getState() != State.OPEN) { + continue; // Opportunistic check, should quickly skip RITs, offline tables, etc. +} +ServerName sn; +State state; +synchronized (rsn.getState()) { Review comment: Why this synchronized? On a State? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hbase] Apache9 commented on a change in pull request #237: HBASE-22408 add dead and unknown server open regions metric to AM 01
Apache9 commented on a change in pull request #237: HBASE-22408 add dead and unknown server open regions metric to AM 01 URL: https://github.com/apache/hbase/pull/237#discussion_r283598113 ## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStates.java ## @@ -178,12 +179,14 @@ public void deleteRegions(final List regionInfos) { return regions; } - Collection getRegionStateNodes() { -return regionsMap.values(); + /** @return A view of region state nodes for all the regions. */ + public Collection getRegionStateNodes() { +return UnmodifiableCollection.unmodifiableCollection(regionsMap.values()); Review comment: Just use the method in java.util.Collections? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hbase] Apache9 commented on a change in pull request #237: HBASE-22408 add dead and unknown server open regions metric to AM 01
Apache9 commented on a change in pull request #237: HBASE-22408 add dead and unknown server open regions metric to AM 01 URL: https://github.com/apache/hbase/pull/237#discussion_r283596569 ## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java ## @@ -620,8 +620,9 @@ public synchronized boolean expireServer(final ServerName serverName) { } } + /** This is already called from a synchronized method, so synch is for visibility sake. */ @VisibleForTesting - public void moveFromOnlineToDeadServers(final ServerName sn) { + public synchronized void moveFromOnlineToDeadServers(final ServerName sn) { Review comment: Please remove the duplicated synchronized modifier, as there is an VisibleForTesting annotation which means this method will only be called inside the class, and we have synchronized in side the method, which will make others confusing? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services