[GitHub] storm issue #2918: STORM-3295 allow blacklist scheduling to function properl...
Github user danny0405 commented on the issue: https://github.com/apache/storm/pull/2918 @agresch If this strategy only happens when cluster resource is full originally, i'm ok with it. At least it's a better way. ---
[GitHub] storm issue #2915: [STORM-3291]Worker can't run as the user who submitted th...
Github user liu-zhaokun commented on the issue: https://github.com/apache/storm/pull/2915 @revans2 could you help me to review this PR? thanks ---
[GitHub] storm issue #2918: STORM-3295 allow blacklist scheduling to function properl...
Github user agresch commented on the issue: https://github.com/apache/storm/pull/2918 @revans2 @danny0405 - Made changes requested by @revans2. Let me know if we should have a separate JIRA for switching blacklisting to work strictly by supervisor instead of doing this mismatch between host and supervisors. I think this trade off of releasing all supervisors on a host is acceptable. ---
[GitHub] storm issue #2918: STORM-3295 allow blacklist scheduling to function properl...
Github user agresch commented on the issue: https://github.com/apache/storm/pull/2918 @revans2 - I will look at addressing DefaultBlacklistStrategy. ---
[GitHub] storm pull request #2918: STORM-3295 allow blacklist scheduling to function ...
Github user agresch commented on a diff in the pull request: https://github.com/apache/storm/pull/2918#discussion_r239522894 --- Diff: storm-server/src/main/java/org/apache/storm/scheduler/blacklist/strategies/RasBlacklistStrategy.java --- @@ -79,25 +81,46 @@ if (shortage.areAnyOverZero() || shortageSlots > 0) { LOG.info("Need {} and {} slots more. Releasing some blacklisted nodes to cover it.", shortage, shortageSlots); -//release earliest blacklist -for (String supervisor : blacklistedNodeIds) { -SupervisorDetails sd = availableSupervisors.get(supervisor); -if (sd != null) { -NormalizedResourcesWithMemory sdAvailable = cluster.getAvailableResources(sd); -int sdAvailableSlots = cluster.getAvailablePorts(sd).size(); -readyToRemove.add(supervisor); -shortage.remove(sdAvailable, cluster.getResourceMetrics()); -shortageSlots -= sdAvailableSlots; -LOG.debug("Releasing {} with {} and {} slots leaving {} and {} slots to go", supervisor, -sdAvailable, sdAvailableSlots, shortage, shortageSlots); -if (!shortage.areAnyOverZero() && shortageSlots <= 0) { -// we have enough resources now... -break; + +//release earliest blacklist - but release all supervisors on a given blacklisted host. +Map> hostToSupervisorIds = createHostToSupervisorMap(blacklistedNodeIds, cluster); +for (Set supervisorIds : hostToSupervisorIds.values()) { +for (String supervisorId : supervisorIds) { +SupervisorDetails sd = availableSupervisors.get(supervisorId); +if (sd != null) { +NormalizedResourcesWithMemory sdAvailable = cluster.getAvailableResources(sd); +int sdAvailableSlots = cluster.getAvailablePorts(sd).size(); +readyToRemove.add(supervisorId); +shortage.remove(sdAvailable, cluster.getResourceMetrics()); +shortageSlots -= sdAvailableSlots; +LOG.info("Releasing {} with {} and {} slots leaving {} and {} slots to go", supervisorId, +sdAvailable, sdAvailableSlots, shortage, shortageSlots); } } +// make sure we've handled all supervisors on the host before we break +if (!shortage.areAnyOverZero() && shortageSlots <= 0) { +// we have enough resources now... +break; --- End diff -- I am doing what you are indicating I create a list of supervisors on the host and release all the supervisors before breaking. ---
[GitHub] storm pull request #2918: STORM-3295 allow blacklist scheduling to function ...
Github user revans2 commented on a diff in the pull request: https://github.com/apache/storm/pull/2918#discussion_r239517824 --- Diff: storm-server/src/main/java/org/apache/storm/scheduler/blacklist/strategies/RasBlacklistStrategy.java --- @@ -79,25 +81,46 @@ if (shortage.areAnyOverZero() || shortageSlots > 0) { LOG.info("Need {} and {} slots more. Releasing some blacklisted nodes to cover it.", shortage, shortageSlots); -//release earliest blacklist -for (String supervisor : blacklistedNodeIds) { -SupervisorDetails sd = availableSupervisors.get(supervisor); -if (sd != null) { -NormalizedResourcesWithMemory sdAvailable = cluster.getAvailableResources(sd); -int sdAvailableSlots = cluster.getAvailablePorts(sd).size(); -readyToRemove.add(supervisor); -shortage.remove(sdAvailable, cluster.getResourceMetrics()); -shortageSlots -= sdAvailableSlots; -LOG.debug("Releasing {} with {} and {} slots leaving {} and {} slots to go", supervisor, -sdAvailable, sdAvailableSlots, shortage, shortageSlots); -if (!shortage.areAnyOverZero() && shortageSlots <= 0) { -// we have enough resources now... -break; + +//release earliest blacklist - but release all supervisors on a given blacklisted host. +Map> hostToSupervisorIds = createHostToSupervisorMap(blacklistedNodeIds, cluster); +for (Set supervisorIds : hostToSupervisorIds.values()) { +for (String supervisorId : supervisorIds) { +SupervisorDetails sd = availableSupervisors.get(supervisorId); +if (sd != null) { +NormalizedResourcesWithMemory sdAvailable = cluster.getAvailableResources(sd); +int sdAvailableSlots = cluster.getAvailablePorts(sd).size(); +readyToRemove.add(supervisorId); +shortage.remove(sdAvailable, cluster.getResourceMetrics()); +shortageSlots -= sdAvailableSlots; +LOG.info("Releasing {} with {} and {} slots leaving {} and {} slots to go", supervisorId, +sdAvailable, sdAvailableSlots, shortage, shortageSlots); --- End diff -- @danny0405 This code only happens when the cluster is full. Meaning a topology is not scheduled and there are not enough free resources on the cluster to run that topology. The idea is that if a cluster is full and there are blacklisted supervisors it is better to try and run things on possibly bad hosts instead of not running anything. The issue here is that the existing code is broken if there are multiple supervisors on a single node, and this is fixing the existing code to operate properly in that case. If you have problems with releasing blacklisted supervisors/nodes at all then that is a separate JIRA. ---
[GitHub] storm pull request #2918: STORM-3295 allow blacklist scheduling to function ...
Github user revans2 commented on a diff in the pull request: https://github.com/apache/storm/pull/2918#discussion_r239519351 --- Diff: storm-server/src/main/java/org/apache/storm/scheduler/blacklist/strategies/RasBlacklistStrategy.java --- @@ -79,25 +81,46 @@ if (shortage.areAnyOverZero() || shortageSlots > 0) { LOG.info("Need {} and {} slots more. Releasing some blacklisted nodes to cover it.", shortage, shortageSlots); -//release earliest blacklist -for (String supervisor : blacklistedNodeIds) { -SupervisorDetails sd = availableSupervisors.get(supervisor); -if (sd != null) { -NormalizedResourcesWithMemory sdAvailable = cluster.getAvailableResources(sd); -int sdAvailableSlots = cluster.getAvailablePorts(sd).size(); -readyToRemove.add(supervisor); -shortage.remove(sdAvailable, cluster.getResourceMetrics()); -shortageSlots -= sdAvailableSlots; -LOG.debug("Releasing {} with {} and {} slots leaving {} and {} slots to go", supervisor, -sdAvailable, sdAvailableSlots, shortage, shortageSlots); -if (!shortage.areAnyOverZero() && shortageSlots <= 0) { -// we have enough resources now... -break; + +//release earliest blacklist - but release all supervisors on a given blacklisted host. +Map> hostToSupervisorIds = createHostToSupervisorMap(blacklistedNodeIds, cluster); +for (Set supervisorIds : hostToSupervisorIds.values()) { +for (String supervisorId : supervisorIds) { +SupervisorDetails sd = availableSupervisors.get(supervisorId); +if (sd != null) { +NormalizedResourcesWithMemory sdAvailable = cluster.getAvailableResources(sd); +int sdAvailableSlots = cluster.getAvailablePorts(sd).size(); +readyToRemove.add(supervisorId); +shortage.remove(sdAvailable, cluster.getResourceMetrics()); +shortageSlots -= sdAvailableSlots; +LOG.info("Releasing {} with {} and {} slots leaving {} and {} slots to go", supervisorId, +sdAvailable, sdAvailableSlots, shortage, shortageSlots); } } +// make sure we've handled all supervisors on the host before we break +if (!shortage.areAnyOverZero() && shortageSlots <= 0) { +// we have enough resources now... +break; --- End diff -- I think we need to break after we have released all of the supervisors on the node, otherwise we can have released a single supervisor on a node and got enough resources, but but because there is still at least one supervisor on that node still blacklisted the whole node is still blacklisted. ---
[GitHub] storm pull request #2918: STORM-3295 allow blacklist scheduling to function ...
Github user agresch commented on a diff in the pull request: https://github.com/apache/storm/pull/2918#discussion_r239461731 --- Diff: storm-server/src/main/java/org/apache/storm/scheduler/blacklist/strategies/RasBlacklistStrategy.java --- @@ -79,25 +81,46 @@ if (shortage.areAnyOverZero() || shortageSlots > 0) { LOG.info("Need {} and {} slots more. Releasing some blacklisted nodes to cover it.", shortage, shortageSlots); -//release earliest blacklist -for (String supervisor : blacklistedNodeIds) { -SupervisorDetails sd = availableSupervisors.get(supervisor); -if (sd != null) { -NormalizedResourcesWithMemory sdAvailable = cluster.getAvailableResources(sd); -int sdAvailableSlots = cluster.getAvailablePorts(sd).size(); -readyToRemove.add(supervisor); -shortage.remove(sdAvailable, cluster.getResourceMetrics()); -shortageSlots -= sdAvailableSlots; -LOG.debug("Releasing {} with {} and {} slots leaving {} and {} slots to go", supervisor, -sdAvailable, sdAvailableSlots, shortage, shortageSlots); -if (!shortage.areAnyOverZero() && shortageSlots <= 0) { -// we have enough resources now... -break; + +//release earliest blacklist - but release all supervisors on a given blacklisted host. +Map> hostToSupervisorIds = createHostToSupervisorMap(blacklistedNodeIds, cluster); +for (Set supervisorIds : hostToSupervisorIds.values()) { +for (String supervisorId : supervisorIds) { +SupervisorDetails sd = availableSupervisors.get(supervisorId); +if (sd != null) { +NormalizedResourcesWithMemory sdAvailable = cluster.getAvailableResources(sd); +int sdAvailableSlots = cluster.getAvailablePorts(sd).size(); +readyToRemove.add(supervisorId); +shortage.remove(sdAvailable, cluster.getResourceMetrics()); +shortageSlots -= sdAvailableSlots; +LOG.info("Releasing {} with {} and {} slots leaving {} and {} slots to go", supervisorId, +sdAvailable, sdAvailableSlots, shortage, shortageSlots); --- End diff -- It sounds like you don't agree even with the existing code that releases blacklisted supervisors when needed for scheduling? Should I be looking at converting all the existing code to blacklist supervisors instead of hosts to get things functional when there are more than one supervisor on a host? Thanks. ---