[GitHub] storm issue #2918: STORM-3295 allow blacklist scheduling to function properl...

2018-12-06 Thread danny0405
Github user danny0405 commented on the issue:

https://github.com/apache/storm/pull/2918
  
@agresch If this strategy only happens when cluster resource is full 
originally, i'm ok with it. At least it's a better way.


---


[GitHub] storm issue #2915: [STORM-3291]Worker can't run as the user who submitted th...

2018-12-06 Thread liu-zhaokun
Github user liu-zhaokun commented on the issue:

https://github.com/apache/storm/pull/2915
  
@revans2 could you help me to review this PR? thanks


---


[GitHub] storm issue #2918: STORM-3295 allow blacklist scheduling to function properl...

2018-12-06 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2918
  
@revans2 @danny0405 - Made changes requested by @revans2.  Let me know if 
we should have a separate JIRA for switching blacklisting to work strictly by 
supervisor instead of doing this mismatch between host and supervisors.  I 
think this trade off of releasing all supervisors on a host is acceptable.



---


[GitHub] storm issue #2918: STORM-3295 allow blacklist scheduling to function properl...

2018-12-06 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2918
  
@revans2 - I will look at addressing DefaultBlacklistStrategy.


---


[GitHub] storm pull request #2918: STORM-3295 allow blacklist scheduling to function ...

2018-12-06 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2918#discussion_r239522894
  
--- Diff: 
storm-server/src/main/java/org/apache/storm/scheduler/blacklist/strategies/RasBlacklistStrategy.java
 ---
@@ -79,25 +81,46 @@
 
 if (shortage.areAnyOverZero() || shortageSlots > 0) {
 LOG.info("Need {} and {} slots more. Releasing some 
blacklisted nodes to cover it.", shortage, shortageSlots);
-//release earliest blacklist
-for (String supervisor : blacklistedNodeIds) {
-SupervisorDetails sd = 
availableSupervisors.get(supervisor);
-if (sd != null) {
-NormalizedResourcesWithMemory sdAvailable = 
cluster.getAvailableResources(sd);
-int sdAvailableSlots = 
cluster.getAvailablePorts(sd).size();
-readyToRemove.add(supervisor);
-shortage.remove(sdAvailable, 
cluster.getResourceMetrics());
-shortageSlots -= sdAvailableSlots;
-LOG.debug("Releasing {} with {} and {} slots 
leaving {} and {} slots to go", supervisor,
-sdAvailable, sdAvailableSlots, shortage, 
shortageSlots);
-if (!shortage.areAnyOverZero() && shortageSlots <= 
0) {
-// we have enough resources now...
-break;
+
+//release earliest blacklist - but release all supervisors 
on a given blacklisted host.
+Map> hostToSupervisorIds = 
createHostToSupervisorMap(blacklistedNodeIds, cluster);
+for (Set supervisorIds : 
hostToSupervisorIds.values()) {
+for (String supervisorId : supervisorIds) {
+SupervisorDetails sd = 
availableSupervisors.get(supervisorId);
+if (sd != null) {
+NormalizedResourcesWithMemory sdAvailable = 
cluster.getAvailableResources(sd);
+int sdAvailableSlots = 
cluster.getAvailablePorts(sd).size();
+readyToRemove.add(supervisorId);
+shortage.remove(sdAvailable, 
cluster.getResourceMetrics());
+shortageSlots -= sdAvailableSlots;
+LOG.info("Releasing {} with {} and {} slots 
leaving {} and {} slots to go", supervisorId,
+sdAvailable, sdAvailableSlots, 
shortage, shortageSlots);
 }
 }
+// make sure we've handled all supervisors on the host 
before we break
+if (!shortage.areAnyOverZero() && shortageSlots <= 0) {
+// we have enough resources now...
+break;
--- End diff --

I am doing what you are indicating  I create a list of supervisors on 
the host and release all the supervisors before breaking.


---


[GitHub] storm pull request #2918: STORM-3295 allow blacklist scheduling to function ...

2018-12-06 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/storm/pull/2918#discussion_r239517824
  
--- Diff: 
storm-server/src/main/java/org/apache/storm/scheduler/blacklist/strategies/RasBlacklistStrategy.java
 ---
@@ -79,25 +81,46 @@
 
 if (shortage.areAnyOverZero() || shortageSlots > 0) {
 LOG.info("Need {} and {} slots more. Releasing some 
blacklisted nodes to cover it.", shortage, shortageSlots);
-//release earliest blacklist
-for (String supervisor : blacklistedNodeIds) {
-SupervisorDetails sd = 
availableSupervisors.get(supervisor);
-if (sd != null) {
-NormalizedResourcesWithMemory sdAvailable = 
cluster.getAvailableResources(sd);
-int sdAvailableSlots = 
cluster.getAvailablePorts(sd).size();
-readyToRemove.add(supervisor);
-shortage.remove(sdAvailable, 
cluster.getResourceMetrics());
-shortageSlots -= sdAvailableSlots;
-LOG.debug("Releasing {} with {} and {} slots 
leaving {} and {} slots to go", supervisor,
-sdAvailable, sdAvailableSlots, shortage, 
shortageSlots);
-if (!shortage.areAnyOverZero() && shortageSlots <= 
0) {
-// we have enough resources now...
-break;
+
+//release earliest blacklist - but release all supervisors 
on a given blacklisted host.
+Map> hostToSupervisorIds = 
createHostToSupervisorMap(blacklistedNodeIds, cluster);
+for (Set supervisorIds : 
hostToSupervisorIds.values()) {
+for (String supervisorId : supervisorIds) {
+SupervisorDetails sd = 
availableSupervisors.get(supervisorId);
+if (sd != null) {
+NormalizedResourcesWithMemory sdAvailable = 
cluster.getAvailableResources(sd);
+int sdAvailableSlots = 
cluster.getAvailablePorts(sd).size();
+readyToRemove.add(supervisorId);
+shortage.remove(sdAvailable, 
cluster.getResourceMetrics());
+shortageSlots -= sdAvailableSlots;
+LOG.info("Releasing {} with {} and {} slots 
leaving {} and {} slots to go", supervisorId,
+sdAvailable, sdAvailableSlots, 
shortage, shortageSlots);
--- End diff --

@danny0405 This code only happens when the cluster is full.  Meaning a 
topology is not scheduled and there are not enough free resources on the 
cluster to run that topology. The idea is that if a cluster is full and there 
are blacklisted supervisors it is better to try and run things on possibly bad 
hosts instead of not running anything.  The issue here is that the existing 
code is broken if there are multiple supervisors on a single node, and this is 
fixing the existing code to operate properly in that case.

If you have problems with releasing blacklisted supervisors/nodes at all 
then that is a separate JIRA.


---


[GitHub] storm pull request #2918: STORM-3295 allow blacklist scheduling to function ...

2018-12-06 Thread revans2
Github user revans2 commented on a diff in the pull request:

https://github.com/apache/storm/pull/2918#discussion_r239519351
  
--- Diff: 
storm-server/src/main/java/org/apache/storm/scheduler/blacklist/strategies/RasBlacklistStrategy.java
 ---
@@ -79,25 +81,46 @@
 
 if (shortage.areAnyOverZero() || shortageSlots > 0) {
 LOG.info("Need {} and {} slots more. Releasing some 
blacklisted nodes to cover it.", shortage, shortageSlots);
-//release earliest blacklist
-for (String supervisor : blacklistedNodeIds) {
-SupervisorDetails sd = 
availableSupervisors.get(supervisor);
-if (sd != null) {
-NormalizedResourcesWithMemory sdAvailable = 
cluster.getAvailableResources(sd);
-int sdAvailableSlots = 
cluster.getAvailablePorts(sd).size();
-readyToRemove.add(supervisor);
-shortage.remove(sdAvailable, 
cluster.getResourceMetrics());
-shortageSlots -= sdAvailableSlots;
-LOG.debug("Releasing {} with {} and {} slots 
leaving {} and {} slots to go", supervisor,
-sdAvailable, sdAvailableSlots, shortage, 
shortageSlots);
-if (!shortage.areAnyOverZero() && shortageSlots <= 
0) {
-// we have enough resources now...
-break;
+
+//release earliest blacklist - but release all supervisors 
on a given blacklisted host.
+Map> hostToSupervisorIds = 
createHostToSupervisorMap(blacklistedNodeIds, cluster);
+for (Set supervisorIds : 
hostToSupervisorIds.values()) {
+for (String supervisorId : supervisorIds) {
+SupervisorDetails sd = 
availableSupervisors.get(supervisorId);
+if (sd != null) {
+NormalizedResourcesWithMemory sdAvailable = 
cluster.getAvailableResources(sd);
+int sdAvailableSlots = 
cluster.getAvailablePorts(sd).size();
+readyToRemove.add(supervisorId);
+shortage.remove(sdAvailable, 
cluster.getResourceMetrics());
+shortageSlots -= sdAvailableSlots;
+LOG.info("Releasing {} with {} and {} slots 
leaving {} and {} slots to go", supervisorId,
+sdAvailable, sdAvailableSlots, 
shortage, shortageSlots);
 }
 }
+// make sure we've handled all supervisors on the host 
before we break
+if (!shortage.areAnyOverZero() && shortageSlots <= 0) {
+// we have enough resources now...
+break;
--- End diff --

I think we need to break after we have released all of the supervisors on 
the node, otherwise we can have released a single supervisor on a node and got 
enough resources, but but because there is still at least one supervisor on 
that node still blacklisted the whole node is still blacklisted.


---


[GitHub] storm pull request #2918: STORM-3295 allow blacklist scheduling to function ...

2018-12-06 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2918#discussion_r239461731
  
--- Diff: 
storm-server/src/main/java/org/apache/storm/scheduler/blacklist/strategies/RasBlacklistStrategy.java
 ---
@@ -79,25 +81,46 @@
 
 if (shortage.areAnyOverZero() || shortageSlots > 0) {
 LOG.info("Need {} and {} slots more. Releasing some 
blacklisted nodes to cover it.", shortage, shortageSlots);
-//release earliest blacklist
-for (String supervisor : blacklistedNodeIds) {
-SupervisorDetails sd = 
availableSupervisors.get(supervisor);
-if (sd != null) {
-NormalizedResourcesWithMemory sdAvailable = 
cluster.getAvailableResources(sd);
-int sdAvailableSlots = 
cluster.getAvailablePorts(sd).size();
-readyToRemove.add(supervisor);
-shortage.remove(sdAvailable, 
cluster.getResourceMetrics());
-shortageSlots -= sdAvailableSlots;
-LOG.debug("Releasing {} with {} and {} slots 
leaving {} and {} slots to go", supervisor,
-sdAvailable, sdAvailableSlots, shortage, 
shortageSlots);
-if (!shortage.areAnyOverZero() && shortageSlots <= 
0) {
-// we have enough resources now...
-break;
+
+//release earliest blacklist - but release all supervisors 
on a given blacklisted host.
+Map> hostToSupervisorIds = 
createHostToSupervisorMap(blacklistedNodeIds, cluster);
+for (Set supervisorIds : 
hostToSupervisorIds.values()) {
+for (String supervisorId : supervisorIds) {
+SupervisorDetails sd = 
availableSupervisors.get(supervisorId);
+if (sd != null) {
+NormalizedResourcesWithMemory sdAvailable = 
cluster.getAvailableResources(sd);
+int sdAvailableSlots = 
cluster.getAvailablePorts(sd).size();
+readyToRemove.add(supervisorId);
+shortage.remove(sdAvailable, 
cluster.getResourceMetrics());
+shortageSlots -= sdAvailableSlots;
+LOG.info("Releasing {} with {} and {} slots 
leaving {} and {} slots to go", supervisorId,
+sdAvailable, sdAvailableSlots, 
shortage, shortageSlots);
--- End diff --

It sounds like you don't agree even with the existing code that releases 
blacklisted supervisors when needed for scheduling? 

Should I be looking at converting all the existing code to blacklist 
supervisors instead of hosts to get things functional when there are more than 
one supervisor on a host?

Thanks.



---