[GitHub] storm pull request #2938: STORM-3317 fix upload credentials when using a dif...

2019-01-17 Thread agresch
GitHub user agresch opened a pull request:

https://github.com/apache/storm/pull/2938

STORM-3317 fix upload credentials when using a differing path for jav…

…a.security.auth.login.config

If the launcher has a java.security.auth.login.config file locally 
specified that differs from the system property in the topology conf, we need 
to honor that setting for upload credentials to work properly.





You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agresch/storm agresch_uploadcreds

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2938.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2938


commit 75f9f5c25dfece734c4e9bed1df2a4fbe4842c79
Author: Aaron Gresch 
Date:   2019-01-17T20:22:23Z

STORM-3317 fix upload credentials when using a differing path for 
java.security.auth.login.config




---


[GitHub] storm pull request #2926: STORM-3303 adjust some logging priorities, log top...

2018-12-11 Thread agresch
GitHub user agresch opened a pull request:

https://github.com/apache/storm/pull/2926

STORM-3303 adjust some logging priorities, log topology info



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agresch/storm agresch_logging

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2926.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2926


commit abca1540be9e6875f93dc0b9a276710717bc0ff6
Author: Aaron Gresch 
Date:   2018-12-11T20:18:23Z

STORM-3303 adjust some logging priorities, log topology info




---


[GitHub] storm issue #2920: STORM-3297 prevent supervisor restart when no nimbus lead...

2018-12-11 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2920
  
@srdo - done


---


[GitHub] storm pull request #2925: STORM-3302: Ensures we close streams to HDFS

2018-12-10 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2925#discussion_r240368505
  
--- Diff: storm-client/src/jvm/org/apache/storm/blobstore/BlobStore.java ---
@@ -288,13 +288,15 @@ public void createBlob(String key, InputStream in, 
SettableBlobMeta meta, Subjec
 while ((len = in.read(buffer)) > 0) {
 out.write(buffer, 0, len);
 }
-out.close();
-} catch (AuthorizationException | IOException | RuntimeException 
e) {
-if (out != null) {
-out.cancel();
-}
 } finally {
-in.close();
+try {
+if (out != null) {
--- End diff --

Previously closed was called in the normal case.  Now we're always calling 
cancel().  These seem to have slightly different functionality.


---


[GitHub] storm pull request #2925: STORM-3302: Ensures we close streams to HDFS

2018-12-10 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2925#discussion_r240360762
  
--- Diff: 
storm-client/src/jvm/org/apache/storm/dependency/DependencyUploader.java ---
@@ -154,11 +154,22 @@ private boolean uploadDependencyToBlobStore(String 
key, File dependency)
 acls.add(new AccessControl(AccessControlType.OTHER,
BlobStoreAclHandler.READ));
 
-AtomicOutputStream blob = getBlobStore().createBlob(key, new 
SettableBlobMeta(acls));
-Files.copy(dependency.toPath(), blob);
-blob.close();
+AtomicOutputStream blob = null;
+try {
+blob = getBlobStore().createBlob(key, new 
SettableBlobMeta(acls));
+Files.copy(dependency.toPath(), blob);
+blob.close();
 
-uploadNew = true;
+uploadNew = true;
--- End diff --

shouldn't we call blob = null here to prevent a dupe cancel after close?


---


[GitHub] storm pull request #2925: STORM-3302: Ensures we close streams to HDFS

2018-12-10 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2925#discussion_r240360980
  
--- Diff: 
storm-server/src/main/java/org/apache/storm/blobstore/BlobStoreUtils.java ---
@@ -204,6 +204,14 @@ public static boolean downloadUpdatedBlob(Map conf, BlobStore bl
 } catch (Exception exp) {
 // Logging an exception while client is connecting
 LOG.error("Exception", exp);
+} finally {
+if (null != out) {
--- End diff --

same here, shouldn't we have nulled out out on close?


---


[GitHub] storm pull request #2925: STORM-3302: Ensures we close streams to HDFS

2018-12-10 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2925#discussion_r240360428
  
--- Diff: storm-client/src/jvm/org/apache/storm/blobstore/BlobStore.java ---
@@ -288,13 +288,15 @@ public void createBlob(String key, InputStream in, 
SettableBlobMeta meta, Subjec
 while ((len = in.read(buffer)) > 0) {
 out.write(buffer, 0, len);
 }
-out.close();
-} catch (AuthorizationException | IOException | RuntimeException 
e) {
-if (out != null) {
-out.cancel();
-}
 } finally {
-in.close();
+try {
+if (out != null) {
--- End diff --

we're always calling cancel here instead of close() for the normal case.


---


[GitHub] storm issue #2918: STORM-3295 allow blacklist scheduling to function properl...

2018-12-06 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2918
  
@revans2 @danny0405 - Made changes requested by @revans2.  Let me know if 
we should have a separate JIRA for switching blacklisting to work strictly by 
supervisor instead of doing this mismatch between host and supervisors.  I 
think this trade off of releasing all supervisors on a host is acceptable.



---


[GitHub] storm issue #2918: STORM-3295 allow blacklist scheduling to function properl...

2018-12-06 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2918
  
@revans2 - I will look at addressing DefaultBlacklistStrategy.


---


[GitHub] storm pull request #2918: STORM-3295 allow blacklist scheduling to function ...

2018-12-06 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2918#discussion_r239522894
  
--- Diff: 
storm-server/src/main/java/org/apache/storm/scheduler/blacklist/strategies/RasBlacklistStrategy.java
 ---
@@ -79,25 +81,46 @@
 
 if (shortage.areAnyOverZero() || shortageSlots > 0) {
 LOG.info("Need {} and {} slots more. Releasing some 
blacklisted nodes to cover it.", shortage, shortageSlots);
-//release earliest blacklist
-for (String supervisor : blacklistedNodeIds) {
-SupervisorDetails sd = 
availableSupervisors.get(supervisor);
-if (sd != null) {
-NormalizedResourcesWithMemory sdAvailable = 
cluster.getAvailableResources(sd);
-int sdAvailableSlots = 
cluster.getAvailablePorts(sd).size();
-readyToRemove.add(supervisor);
-shortage.remove(sdAvailable, 
cluster.getResourceMetrics());
-shortageSlots -= sdAvailableSlots;
-LOG.debug("Releasing {} with {} and {} slots 
leaving {} and {} slots to go", supervisor,
-sdAvailable, sdAvailableSlots, shortage, 
shortageSlots);
-if (!shortage.areAnyOverZero() && shortageSlots <= 
0) {
-// we have enough resources now...
-break;
+
+//release earliest blacklist - but release all supervisors 
on a given blacklisted host.
+Map> hostToSupervisorIds = 
createHostToSupervisorMap(blacklistedNodeIds, cluster);
+for (Set supervisorIds : 
hostToSupervisorIds.values()) {
+for (String supervisorId : supervisorIds) {
+SupervisorDetails sd = 
availableSupervisors.get(supervisorId);
+if (sd != null) {
+NormalizedResourcesWithMemory sdAvailable = 
cluster.getAvailableResources(sd);
+int sdAvailableSlots = 
cluster.getAvailablePorts(sd).size();
+readyToRemove.add(supervisorId);
+shortage.remove(sdAvailable, 
cluster.getResourceMetrics());
+shortageSlots -= sdAvailableSlots;
+LOG.info("Releasing {} with {} and {} slots 
leaving {} and {} slots to go", supervisorId,
+sdAvailable, sdAvailableSlots, 
shortage, shortageSlots);
 }
 }
+// make sure we've handled all supervisors on the host 
before we break
+if (!shortage.areAnyOverZero() && shortageSlots <= 0) {
+// we have enough resources now...
+break;
--- End diff --

I am doing what you are indicating  I create a list of supervisors on 
the host and release all the supervisors before breaking.


---


[GitHub] storm pull request #2918: STORM-3295 allow blacklist scheduling to function ...

2018-12-06 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2918#discussion_r239461731
  
--- Diff: 
storm-server/src/main/java/org/apache/storm/scheduler/blacklist/strategies/RasBlacklistStrategy.java
 ---
@@ -79,25 +81,46 @@
 
 if (shortage.areAnyOverZero() || shortageSlots > 0) {
 LOG.info("Need {} and {} slots more. Releasing some 
blacklisted nodes to cover it.", shortage, shortageSlots);
-//release earliest blacklist
-for (String supervisor : blacklistedNodeIds) {
-SupervisorDetails sd = 
availableSupervisors.get(supervisor);
-if (sd != null) {
-NormalizedResourcesWithMemory sdAvailable = 
cluster.getAvailableResources(sd);
-int sdAvailableSlots = 
cluster.getAvailablePorts(sd).size();
-readyToRemove.add(supervisor);
-shortage.remove(sdAvailable, 
cluster.getResourceMetrics());
-shortageSlots -= sdAvailableSlots;
-LOG.debug("Releasing {} with {} and {} slots 
leaving {} and {} slots to go", supervisor,
-sdAvailable, sdAvailableSlots, shortage, 
shortageSlots);
-if (!shortage.areAnyOverZero() && shortageSlots <= 
0) {
-// we have enough resources now...
-break;
+
+//release earliest blacklist - but release all supervisors 
on a given blacklisted host.
+Map> hostToSupervisorIds = 
createHostToSupervisorMap(blacklistedNodeIds, cluster);
+for (Set supervisorIds : 
hostToSupervisorIds.values()) {
+for (String supervisorId : supervisorIds) {
+SupervisorDetails sd = 
availableSupervisors.get(supervisorId);
+if (sd != null) {
+NormalizedResourcesWithMemory sdAvailable = 
cluster.getAvailableResources(sd);
+int sdAvailableSlots = 
cluster.getAvailablePorts(sd).size();
+readyToRemove.add(supervisorId);
+shortage.remove(sdAvailable, 
cluster.getResourceMetrics());
+shortageSlots -= sdAvailableSlots;
+LOG.info("Releasing {} with {} and {} slots 
leaving {} and {} slots to go", supervisorId,
+sdAvailable, sdAvailableSlots, 
shortage, shortageSlots);
--- End diff --

It sounds like you don't agree even with the existing code that releases 
blacklisted supervisors when needed for scheduling? 

Should I be looking at converting all the existing code to blacklist 
supervisors instead of hosts to get things functional when there are more than 
one supervisor on a host?

Thanks.



---


[GitHub] storm pull request #2920: STORM-3297 prevent supervisor restart when no nimb...

2018-12-05 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2920#discussion_r239235259
  
--- Diff: 
storm-server/src/main/java/org/apache/storm/metricstore/NimbusMetricProcessor.java
 ---
@@ -24,7 +24,7 @@
 public void processWorkerMetrics(Map conf, 
WorkerMetrics metrics) throws MetricException {
 try (NimbusClient client = NimbusClient.getConfiguredClient(conf)) 
{
 client.getClient().processWorkerMetrics(metrics);
-} catch (TException e) {
--- End diff --

done


---


[GitHub] storm pull request #2920: STORM-3297 prevent supervisor restart when no nimb...

2018-12-05 Thread agresch
GitHub user agresch opened a pull request:

https://github.com/apache/storm/pull/2920

STORM-3297 prevent supervisor restart when no nimbus leader exists



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agresch/storm agresch_processWorkerMetrics

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2920.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2920


commit d8e649d2139a7fb7e71c022a28eefd40f684308e
Author: Aaron Gresch 
Date:   2018-12-05T17:27:43Z

STORM-3297 prevent supervisor restart when no nimbus leader exists




---


[GitHub] storm pull request #2918: STORM-3295 allow blacklist scheduling to function ...

2018-12-05 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2918#discussion_r239066404
  
--- Diff: 
storm-server/src/main/java/org/apache/storm/scheduler/blacklist/strategies/RasBlacklistStrategy.java
 ---
@@ -79,25 +81,46 @@
 
 if (shortage.areAnyOverZero() || shortageSlots > 0) {
 LOG.info("Need {} and {} slots more. Releasing some 
blacklisted nodes to cover it.", shortage, shortageSlots);
-//release earliest blacklist
-for (String supervisor : blacklistedNodeIds) {
-SupervisorDetails sd = 
availableSupervisors.get(supervisor);
-if (sd != null) {
-NormalizedResourcesWithMemory sdAvailable = 
cluster.getAvailableResources(sd);
-int sdAvailableSlots = 
cluster.getAvailablePorts(sd).size();
-readyToRemove.add(supervisor);
-shortage.remove(sdAvailable, 
cluster.getResourceMetrics());
-shortageSlots -= sdAvailableSlots;
-LOG.debug("Releasing {} with {} and {} slots 
leaving {} and {} slots to go", supervisor,
-sdAvailable, sdAvailableSlots, shortage, 
shortageSlots);
-if (!shortage.areAnyOverZero() && shortageSlots <= 
0) {
-// we have enough resources now...
-break;
+
+//release earliest blacklist - but release all supervisors 
on a given blacklisted host.
+Map> hostToSupervisorIds = 
createHostToSupervisorMap(blacklistedNodeIds, cluster);
+for (Set supervisorIds : 
hostToSupervisorIds.values()) {
+for (String supervisorId : supervisorIds) {
+SupervisorDetails sd = 
availableSupervisors.get(supervisorId);
+if (sd != null) {
+NormalizedResourcesWithMemory sdAvailable = 
cluster.getAvailableResources(sd);
+int sdAvailableSlots = 
cluster.getAvailablePorts(sd).size();
+readyToRemove.add(supervisorId);
+shortage.remove(sdAvailable, 
cluster.getResourceMetrics());
+shortageSlots -= sdAvailableSlots;
+LOG.info("Releasing {} with {} and {} slots 
leaving {} and {} slots to go", supervisorId,
+sdAvailable, sdAvailableSlots, 
shortage, shortageSlots);
--- End diff --

If there are two supervisors on a host that are blacklisted and we release 
one, one supervisor will remain blacklisted.  This single blacklisted 
supervisor will be returned here:


https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/scheduler/blacklist/BlacklistScheduler.java#L167

Then this code will see that the same host that both supervisors are on is 
blacklisted, preventing any scheduling on that node:


https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/scheduler/blacklist/BlacklistScheduler.java#L167

I think a better overall long term solution is to blacklist supervisors 
instead of hosts, but that touches a lot more code.  In the short term I think 
this is a relatively small tradeoff to allow blacklisting to work with multiple 
supervisors per host.




---


[GitHub] storm pull request #2918: STORM-3295 allow blacklist scheduling to function ...

2018-12-04 Thread agresch
GitHub user agresch opened a pull request:

https://github.com/apache/storm/pull/2918

STORM-3295 allow blacklist scheduling to function properly with multi…

…ple supervisors on a host

If any supervisor on a host remains blacklisted, 
BlacklistScheduler.getBlacklistHosts() still considers the host blacklisted.  
This change forces all supervisors on a supervisor to be released, which will 
free the host.

It may be nicer to consider blacklisting strictly based on supervisors, but 
that is open to discussion, and a much larger change than this.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agresch/storm agresch_blacklist

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2918.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2918


commit b4d3df2167b2962bdf7039ba064ad7a646f32644
Author: Aaron Gresch 
Date:   2018-12-04T22:23:35Z

STORM-3295 allow blacklist scheduling to function properly with multiple 
supervisors on a host




---


[GitHub] storm pull request #2881: STORM-3259: NUMA Support for Storm

2018-10-25 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2881#discussion_r228287958
  
--- Diff: 
storm-server/src/main/java/org/apache/storm/daemon/supervisor/BasicContainer.java
 ---
@@ -607,6 +608,27 @@ protected String javaCmd(String cmd) {
 return ret;
 }
 
+/**
+ * Extracting out to mock it for tests.
+ * @return true if on Linux.
+ */
+protected boolean isOnLinux() {
+return SystemUtils.IS_OS_LINUX;
+}
+
+private void addNumaPinningIfApplicable(String numaId, List 
commandList) {
+if (numaId != null) {
+if (isOnLinux()) {
+commandList.add("numactl");
+commandList.add("--i=" + numaId);
+return;
+} else {
+// TODO : Add support for pinning on Windows host
--- End diff --

throw an exception?


---


[GitHub] storm pull request #2881: STORM-3259: NUMA Support for Storm

2018-10-25 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2881#discussion_r228287568
  
--- Diff: 
storm-client/src/jvm/org/apache/storm/daemon/worker/WorkerState.java ---
@@ -622,22 +624,26 @@ public boolean areAllConnectionsReady() {
 return this.autoCredentials;
 }
 
-private List> readWorkerExecutors(IStormClusterState 
stormClusterState, String topologyId, String assignmentId,
+private List> readWorkerExecutors(IStormClusterState 
stormClusterState,
+ String topologyId, String 
assignmentId,
  int port) {
 LOG.info("Reading assignments");
 List> executorsAssignedToThisWorker = new ArrayList<>();
 executorsAssignedToThisWorker.add(Constants.SYSTEM_EXECUTOR_ID);
-Map, NodeInfo> executorToNodePort = 
getLocalAssignment(conf, stormClusterState, 
topologyId).get_executor_node_port();
+Map, NodeInfo> executorToNodePort =
+getLocalAssignment(conf, stormClusterState, 
topologyId).get_executor_node_port();
 for (Map.Entry, NodeInfo> entry : 
executorToNodePort.entrySet()) {
 NodeInfo nodeInfo = entry.getValue();
-if (nodeInfo.get_node().equals(assignmentId) && 
nodeInfo.get_port().iterator().next() == port) {
+if (nodeInfo.get_node().startsWith(assignmentId) && 
nodeInfo.get_port().iterator().next() == port) {
--- End diff --

Started a new / non-resolved conversation.

Can we have assignment id = supervisor1 and another assignment id = 
supervisor11 that would cause aliasing?

Instead of startsWith, can we use some known delimiter to grab up to and 
make sure that matches exactly?


---


[GitHub] storm pull request #2878: [STORM-3257] 'storm kill' command line should be a...

2018-10-25 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2878#discussion_r228222037
  
--- Diff: storm-core/src/jvm/org/apache/storm/command/KillTopology.java ---
@@ -25,21 +25,49 @@
 
 public static void main(String[] args) throws Exception {
 Map cl = CLI.opt("w", "wait", null, CLI.AS_INT)
+.boolOpt("i", "ignore-errors")
--- End diff --

do we even want/need this option?  Any reason not to have this be the 
default behavior?


---


[GitHub] storm pull request #2893: STORM-3272 allow worker-launcher to delete dead sy...

2018-10-23 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2893#discussion_r227434073
  
--- Diff: storm-core/src/native/worker-launcher/impl/worker-launcher.c ---
@@ -593,14 +593,17 @@ int recursive_delete(const char *path, int 
supervisor_owns_dir) {
 return UNABLE_TO_BUILD_PATH;
   }
 
+  struct stat file_stat;
+
   if(access(path, F_OK) != 0) {
 if(errno == ENOENT) {
-  return 0;
-}
-// Can probably return here, but we'll try to lstat anyway.
-  }
+   // we need to handle symlinks that target missing files.
+   if((lstat(path, &file_stat) != 0) || ((file_stat.st_mode & S_IFMT) 
!= S_IFLNK)) {
+ return 0;
+   }
--- End diff --

@d2r @revans2 - added logging


---


[GitHub] storm pull request #2893: STORM-3272 allow worker-launcher to delete dead sy...

2018-10-23 Thread agresch
GitHub user agresch opened a pull request:

https://github.com/apache/storm/pull/2893

STORM-3272 allow worker-launcher to delete dead symlinks

When a topology folder contains a symlink to a missing file, the 
worker-launcher would not remove the symlink.  This prevents force delete from 
deleting the folder when cleaning up a topology.  

When a cluster is short on resources, this can cause all sorts of issues on 
the supervisor when rescheduling the same topology when this folder still 
exists.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agresch/storm agresch_storm-3272

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2893.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2893


commit 131344777ae4cbc4ca47952b3a95f35b11713f23
Author: Aaron Gresch 
Date:   2018-10-23T13:57:52Z

STORM-3272 allow worker-launcher to delete dead symlinks




---


[GitHub] storm pull request #2887: STORM-3262 prevent falsely reporting leadership

2018-10-18 Thread agresch
GitHub user agresch opened a pull request:

https://github.com/apache/storm/pull/2887

STORM-3262 prevent falsely reporting leadership



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agresch/storm agresch_storm-3262

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2887.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2887


commit 02404865f241d2e737eb5dabe9d745d2393f3d9c
Author: Aaron Gresch 
Date:   2018-10-18T16:23:28Z

STORM-3262 prevent falsely reporting leadership




---


[GitHub] storm pull request #2881: STORM-3259: NUMA Support for Storm

2018-10-17 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2881#discussion_r226060529
  
--- Diff: 
storm-client/src/jvm/org/apache/storm/daemon/worker/WorkerState.java ---
@@ -630,7 +632,7 @@ public boolean areAllConnectionsReady() {
 Map, NodeInfo> executorToNodePort = 
getLocalAssignment(conf, stormClusterState, 
topologyId).get_executor_node_port();
 for (Map.Entry, NodeInfo> entry : 
executorToNodePort.entrySet()) {
 NodeInfo nodeInfo = entry.getValue();
-if (nodeInfo.get_node().equals(assignmentId) && 
nodeInfo.get_port().iterator().next() == port) {
+if (nodeInfo.get_node().startsWith(assignmentId) && 
nodeInfo.get_port().iterator().next() == port) {
--- End diff --

what will assignmentId be?  Why does this change?


---


[GitHub] storm pull request #2881: STORM-3259: NUMA Support for Storm

2018-10-17 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2881#discussion_r226065664
  
--- Diff: storm-client/src/jvm/org/apache/storm/utils/Utils.java ---
@@ -121,11 +121,68 @@
 private static String memoizedLocalHostnameString = null;
 public static final Pattern TOPOLOGY_KEY_PATTERN = 
Pattern.compile("^[\\w \\t\\._-]+$", Pattern.UNICODE_CHARACTER_CLASS);
 
+public static final String NUMA_MEMORY_IN_MB = "MemoryInMB";
+public static final String NUMA_CORES = "Cores";
+public static final String NUMAS_PORTS = "Ports";
+public static final String NUMA_ID = "Id";
+public static final String NUMAS_BASE = "Numas";
+
 static {
 localConf = readStormConfig();
 serializationDelegate = getSerializationDelegate(localConf);
 }
 
+/**
+ * Validate supervisor numa configuration.
+ * @param stormConf stormConf
+ * @return getValidatedNumaMap
+ * @throws KeyNotFoundException
+ */
+public static Map getValidatedNumaMap(Map stormConf) throws KeyNotFoundException {
+Map validatedNumaMap = new HashMap();
+Map supervisorNumaMap = (Map) 
stormConf.get(Config.SUPERVISOR_NUMA_META);
+if (supervisorNumaMap == null) return validatedNumaMap;
+if (!supervisorNumaMap.containsKey(NUMAS_BASE)) {
+throw new KeyNotFoundException("The numa configurations [" + 
NUMAS_BASE + "] is missing!");
+}
+List numaEntries = (List) 
supervisorNumaMap.get(NUMAS_BASE);
+if (numaEntries == null) return validatedNumaMap;
+for (Map numa : numaEntries) {
+for (String key : new String[]{NUMA_ID, NUMA_CORES, 
NUMA_MEMORY_IN_MB, NUMAS_PORTS}) {
+if (!numa.containsKey(key)) {
+throw new KeyNotFoundException("The numa configuration 
key [" + key + "] is missing!");
+}
+}
+validatedNumaMap.put((String) numa.get(NUMA_ID), numa);
+}
+return validatedNumaMap;
+}
+
+/**
+ * getNumaIdForPort.
+ * @param port port
+ * @param stormConf stormConf
+ * @return getNumaIdForPort
+ * @throws KeyNotFoundException
+ */
+public static String getNumaIdForPort(Number port, Map 
stormConf) {
+Map validatedNumaMap = null;
+try {
+validatedNumaMap = getValidatedNumaMap(stormConf);
+} catch (KeyNotFoundException e) {
+LOG.error("Exception while getting NUMA config", e);
+return null;
+}
+for (Object numaEntry : validatedNumaMap.values()) {
+Map numaMap  = (Map) numaEntry;
+List portList = (List) 
numaMap.get(NUMAS_PORTS);
+if (portList.contains(port)) {
--- End diff --

does it make sense to have a NUMA map and not be able to find a port?  
Should this be an error?


---


[GitHub] storm pull request #2881: STORM-3259: NUMA Support for Storm

2018-10-17 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2881#discussion_r226065179
  
--- Diff: storm-client/src/jvm/org/apache/storm/utils/Utils.java ---
@@ -121,11 +121,68 @@
 private static String memoizedLocalHostnameString = null;
 public static final Pattern TOPOLOGY_KEY_PATTERN = 
Pattern.compile("^[\\w \\t\\._-]+$", Pattern.UNICODE_CHARACTER_CLASS);
 
+public static final String NUMA_MEMORY_IN_MB = "MemoryInMB";
+public static final String NUMA_CORES = "Cores";
+public static final String NUMAS_PORTS = "Ports";
+public static final String NUMA_ID = "Id";
+public static final String NUMAS_BASE = "Numas";
+
 static {
 localConf = readStormConfig();
 serializationDelegate = getSerializationDelegate(localConf);
 }
 
+/**
+ * Validate supervisor numa configuration.
+ * @param stormConf stormConf
+ * @return getValidatedNumaMap
+ * @throws KeyNotFoundException
+ */
+public static Map getValidatedNumaMap(Map stormConf) throws KeyNotFoundException {
+Map validatedNumaMap = new HashMap();
+Map supervisorNumaMap = (Map) 
stormConf.get(Config.SUPERVISOR_NUMA_META);
+if (supervisorNumaMap == null) return validatedNumaMap;
+if (!supervisorNumaMap.containsKey(NUMAS_BASE)) {
+throw new KeyNotFoundException("The numa configurations [" + 
NUMAS_BASE + "] is missing!");
+}
+List numaEntries = (List) 
supervisorNumaMap.get(NUMAS_BASE);
+if (numaEntries == null) return validatedNumaMap;
+for (Map numa : numaEntries) {
+for (String key : new String[]{NUMA_ID, NUMA_CORES, 
NUMA_MEMORY_IN_MB, NUMAS_PORTS}) {
+if (!numa.containsKey(key)) {
+throw new KeyNotFoundException("The numa configuration 
key [" + key + "] is missing!");
+}
+}
+validatedNumaMap.put((String) numa.get(NUMA_ID), numa);
+}
+return validatedNumaMap;
+}
+
+/**
+ * getNumaIdForPort.
+ * @param port port
+ * @param stormConf stormConf
+ * @return getNumaIdForPort
+ * @throws KeyNotFoundException
+ */
+public static String getNumaIdForPort(Number port, Map 
stormConf) {
+Map validatedNumaMap = null;
+try {
+validatedNumaMap = getValidatedNumaMap(stormConf);
+} catch (KeyNotFoundException e) {
+LOG.error("Exception while getting NUMA config", e);
+return null;
--- End diff --

This sounds like it should be a serious misconfig.  I would think we should 
throw an exception?


---


[GitHub] storm pull request #2879: STORM-3258 reduce blobstore logging

2018-10-16 Thread agresch
GitHub user agresch opened a pull request:

https://github.com/apache/storm/pull/2879

STORM-3258 reduce blobstore logging



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agresch/storm agresch_storm-3258

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2879.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2879


commit 56d3752c001f021ebb723c904d134fd01024cad8
Author: Aaron Gresch 
Date:   2018-10-16T13:40:00Z

STORM-3258 reduce blobstore logging




---


[GitHub] storm issue #2869: Revert "[STORM-3233] Updated zookeeper client to version ...

2018-10-11 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2869
  
What was the reason for the upgrade?  Do we need a JIRA to upgrade later?  
At what point?


---


[GitHub] storm pull request #2872: STORM-3251 properly determine if logviewer filter ...

2018-10-11 Thread agresch
GitHub user agresch opened a pull request:

https://github.com/apache/storm/pull/2872

STORM-3251 properly determine if logviewer filter is enabled



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agresch/storm agresch_log_filter_fail

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2872.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2872


commit 1fa76994608198dd50d0abcd0793ec4cf42e7bbf
Author: Aaron Gresch 
Date:   2018-10-11T18:20:59Z

STORM-3251 properly determine if logviewer filter is enabled




---


[GitHub] storm pull request #2865: STORM-3247 remove BLOBSTORE_SUPERUSER

2018-10-05 Thread agresch
GitHub user agresch opened a pull request:

https://github.com/apache/storm/pull/2865

STORM-3247 remove BLOBSTORE_SUPERUSER



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agresch/storm agresch_blobstore_superuser

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2865.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2865


commit e5e0ddec875534af34d8499523530f34ede086c8
Author: Aaron Gresch 
Date:   2018-10-05T19:23:44Z

STORM-3247 remove BLOBSTORE_SUPERUSER




---


[GitHub] storm issue #2861: STORM-3244 allow logviewer to use independent filter sett...

2018-10-05 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2861
  
@revans2 - added fallback to ui filter.


---


[GitHub] storm pull request #2861: STORM-3244 allow logviewer to use independent filt...

2018-10-05 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2861#discussion_r223054311
  
--- Diff: 
storm-webapp/src/main/java/org/apache/storm/daemon/logviewer/LogviewerServer.java
 ---
@@ -66,10 +64,9 @@ private static Server mkHttpServer(StormMetricsRegistry 
metricsRegistry, Map= 0) {
 LOG.info("Starting Logviewer HTTP servers...");
-Integer headerBufferSize = 
ObjectReader.getInt(conf.get(UI_HEADER_BUFFER_BYTES));
-String filterClass = (String) 
(conf.get(DaemonConfig.UI_FILTER));
+String filterClass = (String) 
(conf.get(DaemonConfig.LOGVIEWER_FILTER));
 @SuppressWarnings("unchecked")
-Map filterParams = (Map) 
(conf.get(DaemonConfig.UI_FILTER_PARAMS));
+Map filterParams = (Map) 
(conf.get(DaemonConfig.LOGVIEWER_FILTER_PARAMS));
--- End diff --

sure.  


---


[GitHub] storm pull request #2861: STORM-3244 allow logviewer to use independent filt...

2018-10-03 Thread agresch
GitHub user agresch opened a pull request:

https://github.com/apache/storm/pull/2861

STORM-3244 allow logviewer to use independent filter settings from ui



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agresch/storm agresch_logviewer_filter

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2861.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2861


commit 9cad3415e597bd25534d64271109653676d9be46
Author: Aaron Gresch 
Date:   2018-10-03T20:25:02Z

STORM-3244 allow logviewer to use independent filter settings from ui




---


[GitHub] storm issue #2855: STORM-3240 health checks should succeed on exit code 0

2018-10-02 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2855
  
Missed @revans2 last comment.  But the String is used here: 
https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/healthcheck/HealthChecker.java#L60




---


[GitHub] storm issue #2855: STORM-3240 health checks should succeed on exit code 0

2018-10-02 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2855
  
@kishorvpatil - thanks for the explanation.  I do like this behavior better.


---


[GitHub] storm pull request #2855: STORM-3240 health checks should succeed on exit co...

2018-10-02 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2855#discussion_r222100581
  
--- Diff: 
storm-server/src/main/java/org/apache/storm/healthcheck/HealthChecker.java ---
@@ -106,13 +106,13 @@ public void run() {
 BufferedReader reader = new BufferedReader(new 
InputStreamReader(stdin));
 while ((str = reader.readLine()) != null) {
 if (str.startsWith("ERROR")) {
+LOG.warn("The healthcheck process {} exited with 
code {}", script, process.exitValue());
 return FAILED;
 }
 }
 return SUCCESS;
--- End diff --

I can code this if desired, but the documentation had indicated that an 
ERROR was necessary to fail.  I'm fine adding this change if people agree.  
Just let me know.  


---


[GitHub] storm pull request #2855: STORM-3240 health checks should succeed on exit co...

2018-10-02 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2855#discussion_r222096924
  
--- Diff: 
storm-server/src/main/java/org/apache/storm/healthcheck/HealthChecker.java ---
@@ -106,13 +106,13 @@ public void run() {
 BufferedReader reader = new BufferedReader(new 
InputStreamReader(stdin));
 while ((str = reader.readLine()) != null) {
 if (str.startsWith("ERROR")) {
+LOG.warn("The healthcheck process {} exited with 
code {}", script, process.exitValue());
 return FAILED;
 }
 }
 return SUCCESS;
--- End diff --

@kishorvpatil - swapped the error codes


---


[GitHub] storm issue #2855: STORM-3240 health checks should succeed on exit code 0

2018-10-02 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2855
  
@revans2  - took a guess at what you wanted documented.  If this isn't what 
you want, just let me know.


---


[GitHub] storm issue #2855: STORM-3240 health checks should succeed on exit code 0

2018-10-02 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2855
  
@revans2 - I am not familiar with previous releases.  Can you explain what 
you mean by something changing in this release?  Are you saying this bug also 
exists in the other lines?


---


[GitHub] storm pull request #2855: STORM-3240 health checks should succeed on exit co...

2018-10-01 Thread agresch
GitHub user agresch opened a pull request:

https://github.com/apache/storm/pull/2855

STORM-3240 health checks should succeed on exit code 0



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agresch/storm agresch_healthcheck

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2855.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2855


commit 39d28387296a57835947da45f4c5c1f92eb676e6
Author: Aaron Gresch 
Date:   2018-10-01T16:12:28Z

STORM-3240 health checks should succeed on exit code 0




---


[GitHub] storm pull request #2854: STORM-3239: Adding dumpjstack action and removing ...

2018-10-01 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2854#discussion_r221626099
  
--- Diff: 
storm-webapp/src/main/java/org/apache/storm/daemon/ui/UIHelpers.java ---
@@ -2165,21 +2165,20 @@ public static String getWorkerDumpLink(String host, 
long port, String topologyId
  * @param client client
  * @param id id
  * @param hostPort hostPort
- * @param timeout timeout
+ * @param timestamp timestamp
  * @param config config
  * @param profileAction profileAction
  * @throws TException TException
  */
 public static void getTopologyProfilingAction(
 Nimbus.Iface client, String id,
-String hostPort, String timeout, Map

[GitHub] storm issue #2852: STORM-3237 track Nimbus mkAssignment failures

2018-09-28 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2852
  
@revans2 - added documentation


---


[GitHub] storm issue #2852: STORM-3237 track Nimbus mkAssignment failures

2018-09-28 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2852
  
@revans2 @kishorvpatil - made the change requested by Bobby.


---


[GitHub] storm pull request #2852: STORM-3237 track Nimbus mkAssignment failures

2018-09-28 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2852#discussion_r221349119
  
--- Diff: 
storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java ---
@@ -2162,7 +2165,12 @@ private boolean isReadyForMKAssignments() throws 
Exception {
 }
 
 private void mkAssignments() throws Exception {
-mkAssignments(null);
+try {
+mkAssignments(null);
--- End diff --

I can change that.


---


[GitHub] storm issue #2852: STORM-3237 track Nimbus mkAssignment failures

2018-09-28 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2852
  
@kishorvpatil - made the change you requested.


---


[GitHub] storm pull request #2852: STORM-3237 track Nimbus mkAssignment failures

2018-09-28 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2852#discussion_r221264813
  
--- Diff: 
storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java ---
@@ -2850,6 +2853,7 @@ public void launchServer() throws Exception {
 }
 doCleanup();
 } catch (Exception e) {
+
this.mkAssignmentsErrors.mark();
--- End diff --

My thought was any exception would be worth investigating in any case.  I 
would hope that these issues would be rare.  Let me know if you want it more 
narrowly applied just to mkAssignment.


---


[GitHub] storm pull request #2852: STORM-3237 track Nimbus mkAssignment failures

2018-09-27 Thread agresch
GitHub user agresch opened a pull request:

https://github.com/apache/storm/pull/2852

STORM-3237 track Nimbus mkAssignment failures



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agresch/storm agresch_mkAssignments_fail

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2852.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2852


commit eb7886e1c53682e4c42b3951bf99cf35050f3cfa
Author: Aaron Gresch 
Date:   2018-09-27T18:30:59Z

STORM-3237 track Nimbus mkAssignment failures




---


[GitHub] storm pull request #2851: STORM-3236 mark shutdown meters before stopping me...

2018-09-26 Thread agresch
GitHub user agresch opened a pull request:

https://github.com/apache/storm/pull/2851

STORM-3236 mark shutdown meters before stopping metric reporting



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agresch/storm agresch_storm-3236

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2851.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2851


commit 2b9f404cc29f1b46008cdf7ae0d2337dc905a47b
Author: Aaron Gresch 
Date:   2018-09-26T21:37:03Z

STORM-3236 mark shutdown meters before stopping metric reporting




---


[GitHub] storm pull request #2843: STORM-3230: Add in sync if key not found

2018-09-19 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2843#discussion_r218902320
  
--- Diff: 
storm-client/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java ---
@@ -903,6 +911,7 @@ public void removeAllPrivateWorkerKeys(String 
topologyId) {
 for (WorkerTokenServiceType type : 
WorkerTokenServiceType.values()) {
 String path = ClusterUtils.secretKeysPath(type, topologyId);
 try {
+LOG.debug("Removing worker keys under {}", path);
--- End diff --

If this does not hit often, I would vote for making this info.


---


[GitHub] storm pull request #2839: STORM-3228 allow refernce counting of differing Po...

2018-09-17 Thread agresch
GitHub user agresch opened a pull request:

https://github.com/apache/storm/pull/2839

STORM-3228 allow refernce counting of differing PortAndAssignment obj…

…ects to work properly

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agresch/storm agresch_storm-3228

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2839.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2839


commit 5c47b2d7c1f3628c892e16b4d0f2dc78beddbb28
Author: Aaron Gresch 
Date:   2018-09-17T17:58:31Z

STORM-3228 allow refernce counting of differing PortAndAssignment objects 
to work properly




---


[GitHub] storm issue #2789: STORM-3173: flush metrics to ScheduledReporter on shutdow...

2018-09-12 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2789
  
what was the status of this? I think work @srdo was doing addressed 
STORM-3173 possibly?  Was that change committed?




---


[GitHub] storm pull request #2818: STORM-3210 fix user is not authorized error code

2018-09-04 Thread agresch
GitHub user agresch opened a pull request:

https://github.com/apache/storm/pull/2818

STORM-3210 fix user is not authorized error code



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agresch/storm agresch_403

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2818.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2818


commit c179ea7ea1d7619d2a377f9961a9d45d47c3fbfe
Author: Aaron Gresch 
Date:   2018-09-04T14:12:49Z

STORM-3210 fix user is not authorized error code




---


[GitHub] storm issue #2816: STORM-3208 fix worker kill NPE

2018-08-30 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2816
  
@revans2 - added a warning.  


---


[GitHub] storm issue #2816: STORM-3208 fix worker kill NPE

2018-08-30 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2816
  
@revans2 - this is a normal run of our integration test pipeline.  I don't 
have much context, just noticed this error in the logs and filed the JIRA.


---


[GitHub] storm issue #2816: STORM-3208 fix worker kill NPE

2018-08-30 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2816
  
@danny0405 - I'm not really knowledgable about any of this particular code, 
but the LocalState API clearly allows returning null, rather than throwing any 
exception.  I'm just trying to cover the callstack I provided in the JIRA.  If 
we feel this shouldn't happen, maybe LocalState getApprovedWorkers() should 
throw an exception if there are no approved workers?


---


[GitHub] storm pull request #2816: STORM-3208 fix worker kill NPE

2018-08-29 Thread agresch
GitHub user agresch opened a pull request:

https://github.com/apache/storm/pull/2816

STORM-3208 fix worker kill NPE



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agresch/storm agresch_storm-3208

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2816.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2816


commit 91458a32cded87fb64dbb1b22db8059edd6af99d
Author: Aaron Gresch 
Date:   2018-08-29T17:42:55Z

STORM-3208 fix worker kill NPE




---


[GitHub] storm pull request #2810: STORM-3200 improve keytab error message

2018-08-21 Thread agresch
GitHub user agresch opened a pull request:

https://github.com/apache/storm/pull/2810

STORM-3200 improve keytab error message



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agresch/storm agresch_blobstore_debug

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2810.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2810


commit 04f75bfad131ac2be4450433231814f3eba25717
Author: Aaron Gresch 
Date:   2018-08-21T19:32:16Z

STORM-3200 improve keytab error message




---


[GitHub] storm pull request #2805: STORM-3197: Make StormMetricsRegistry non-static

2018-08-16 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2805#discussion_r210690260
  
--- Diff: pom.xml ---
@@ -280,7 +280,7 @@
 1.6.6
 2.8.2
 1.7.21
-3.1.0
+3.2.6
--- End diff --

Thanks.  Just like to have a trail for version changes and the requirements.


---


[GitHub] storm pull request #2805: STORM-3197: Make StormMetricsRegistry non-static

2018-08-16 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2805#discussion_r210680050
  
--- Diff: pom.xml ---
@@ -280,7 +280,7 @@
 1.6.6
 2.8.2
 1.7.21
-3.1.0
+3.2.6
--- End diff --

can you comment on the version change?


---


[GitHub] storm issue #2802: STORM-3194 reduce FIFOSchedulingPriorityStrategy logging ...

2018-08-15 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2802
  
The JIRA is STORM-3194.  It's in the title.


---


[GitHub] storm pull request #2802: STORM-3194 reduce FIFOSchedulingPriorityStrategy l...

2018-08-14 Thread agresch
GitHub user agresch opened a pull request:

https://github.com/apache/storm/pull/2802

STORM-3194 reduce FIFOSchedulingPriorityStrategy logging level



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agresch/storm agresch_fifologging

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2802.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2802


commit cac56053b40c888cc7e448f2b525c7baea419205
Author: Aaron Gresch 
Date:   2018-08-14T18:06:47Z

STORM-3194 reduce FIFOSchedulingPriorityStrategy logging level




---


[GitHub] storm issue #2786: STORM-3168 prevent AsyncLocalizer cleanup from crashing

2018-08-02 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2786
  
@srdo - updated to shutdown on errors.  Filed 
https://issues.apache.org/jira/browse/STORM-3174 to standardize exit codes.  
I'm not clear on why we're selecting the values we are.


---


[GitHub] storm issue #2786: STORM-3168 prevent AsyncLocalizer cleanup from crashing

2018-08-02 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2786
  
If an error occurred, what should be the expected behavior?  Shutdown?  


---


[GitHub] storm pull request #2786: STORM-3168 prevent AsyncLocalizer cleanup from cra...

2018-08-01 Thread agresch
GitHub user agresch opened a pull request:

https://github.com/apache/storm/pull/2786

STORM-3168 prevent AsyncLocalizer cleanup from crashing

Just catching and logging any error that may be occurring.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agresch/storm agresch_localizercleanup

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2786.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2786


commit 15b7238368827562b9b9102836c5745c6a62bbb7
Author: Aaron Gresch 
Date:   2018-08-01T19:53:53Z

STORM-3168 prevent AsyncLocalizer cleanup from crashing




---


[GitHub] storm issue #2783: [WIP] Make StormMetricsRegistry a regular class rather th...

2018-07-31 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2783
  
I like this.


---


[GitHub] storm pull request #2769: Fixed a potential file resource leak.

2018-07-24 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2769#discussion_r204755463
  
--- Diff: 
storm-server/src/main/java/org/apache/storm/utils/ServerUtils.java ---
@@ -635,16 +635,16 @@ public static void unZip(File inFile, File toDir) 
throws IOException {
  *
  * @throws IOException
  */
+//Used only by logviewer, considering moving to web-app?
--- End diff --

even if used only in one place, if it is a general purpose function, I 
think it makes sense to have in a global place.  I'd remove this comment.


---


[GitHub] storm pull request #2772: STORM-3158 improve logging on login failure when k...

2018-07-20 Thread agresch
GitHub user agresch opened a pull request:

https://github.com/apache/storm/pull/2772

STORM-3158 improve logging on login failure when kerberos is misconfi…

…gured

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agresch/storm agresch_kerblog

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2772.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2772


commit de59923e37911885975fc097d4313fae14ef5f37
Author: Aaron Gresch 
Date:   2018-07-20T21:02:43Z

STORM-3158 improve logging on login failure when kerberos is misconfigured




---


[GitHub] storm issue #2618: [STORM-2905] Fix KeyNotFoundException when kill a storm a...

2018-07-18 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2618
  
Couple of comments back to @revans2 from Apr5.

1) We don't delete the blobs on the nimbus side for a while after we kill 
the topology. - Would we also prevent the user from doing so in this case?

2) We don't output the stack trace until it has failed some number of times 
in a row.   - This may be fine for the supervisor, but I see this exception 
trace polluting the nimbus log non-stop.  Would you prevent this as well?

2018-06-25 21:49:03.928 o.a.s.d.n.Nimbus pool-37-thread-483 [WARN] get blob 
meta exception.
org.apache.storm.utils.WrappedKeyNotFoundException: 
run-1-1529962766-stormjar.jar
at 
org.apache.storm.blobstore.LocalFsBlobStore.getStoredBlobMeta(LocalFsBlobStore.java:256)
 ~[storm-server-2.0.0.y.jar:2.0.0.y]
at 
org.apache.storm.blobstore.LocalFsBlobStore.getBlobMeta(LocalFsBlobStore.java:286)
 ~[storm-server-2.0.0.y.jar:2.0.0.y]
at 
org.apache.storm.daemon.nimbus.Nimbus.getBlobMeta(Nimbus.java:3498) 
[storm-server-2.0.0.y.jar:2.0.0.y]
at 
org.apache.storm.generated.Nimbus$Processor$getBlobMeta.getResult(Nimbus.java:4014)
 [storm-client-2.0.0.y.jar:2.0.0.y]
at 
org.apache.storm.generated.Nimbus$Processor$getBlobMeta.getResult(Nimbus.java:3993)
 [storm-client-2.0.0.y.jar:2.0.0.y]
at 
org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:38) 
[shaded-deps-2.0.0.y.jar:2.0.0.y]
at 
org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
[shaded-deps-2.0.0.y.jar:2.0.0.y]
at 
org.apache.storm.security.auth.sasl.SaslTransportPlugin$TUGIWrapProcessor.process(SaslTransportPlugin.java:147)
 [storm-client-2.0.0.y.jar:2.0.0.y]
at 
org.apache.storm.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:291)
 [shaded-deps-2.0.0.y.jar:2.0.0.y]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[?:1.8.0_131]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[?:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]



---


[GitHub] storm issue #2618: [STORM-2905] Fix KeyNotFoundException when kill a storm a...

2018-07-18 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2618
  
Just curious what the plan is fo this PR? 


---


[GitHub] storm pull request #2747: STORM-3134: Improve upload-creds user experience

2018-07-05 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2747#discussion_r200417650
  
--- Diff: storm-core/src/jvm/org/apache/storm/command/AdminCommands.java ---
@@ -109,6 +133,7 @@ public void printCliHelp(String command, PrintStream 
out) {
 static {
 COMMANDS.put("remove_corrupt_topologies", new 
RemoveCorruptTopologies());
 COMMANDS.put("zk_cli", new ZkCli());
+COMMANDS.put("creds", new CredentialsDebug());
--- End diff --

is there documentation that should be updated for these commands?


---


[GitHub] storm pull request #2747: STORM-3134: Improve upload-creds user experience

2018-07-02 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2747#discussion_r199493660
  
--- Diff: 
storm-core/src/jvm/org/apache/storm/command/UploadCredentials.java ---
@@ -52,7 +61,37 @@ public static void main(String[] args) throws Exception {
 credentialsMap.put(rawCredentials.get(i), 
rawCredentials.get(i + 1));
 }
 }
-StormSubmitter.pushCredentials(topologyName, new HashMap<>(), 
credentialsMap);
+
+Map topologyConf = new HashMap<>();
+//Try to get the topology conf from nimbus, so we can reuse it.
+try (NimbusClient nc = NimbusClient.getConfiguredClient(new 
HashMap<>())) {
+Nimbus.Iface client = nc.getClient();
+ClusterSummary summary = client.getClusterInfo();
+for (TopologySummary topo : summary.get_topologies()) {
+if (topologyName.equals(topo.get_name())) {
+//We found the topology, lets get the conf
+String topologyId = topo.get_id();
+topologyConf = (Map) 
JSONValue.parse(client.getTopologyConf(topologyId));
+LOG.info("Using topology conf from {} as basis for 
getting new creds", topologyId);
+
+Map commandLine = 
Utils.readCommandLineOpts();
+List clCreds = 
(List)commandLine.get(Config.TOPOLOGY_AUTO_CREDENTIALS);
+List topoCreds = 
(List)topologyConf.get(Config.TOPOLOGY_AUTO_CREDENTIALS);
+
+if (clCreds != null) {
+Set extra = new HashSet<>(clCreds);
+if (topoCreds != null) {
+extra.removeAll(topoCreds);
+}
+if (!extra.isEmpty()) {
--- End diff --

This is great.  But shouldn't we also check for missing credentials?


---


[GitHub] storm pull request #2741: STORM-3124 reconnect to pacemaker on failure

2018-06-29 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2741#discussion_r199145547
  
--- Diff: 
storm-client/src/jvm/org/apache/storm/messaging/netty/KerberosSaslClientHandler.java
 ---
@@ -96,53 +96,69 @@ private void handleControlMessage(ChannelHandlerContext 
ctx, ControlMessage cont
 Channel channel = ctx.channel();
 KerberosSaslNettyClient saslNettyClient = 
getChannelSaslClient(channel);
 if (controlMessage == ControlMessage.SASL_COMPLETE_REQUEST) {
-LOG.debug("Server has sent us the SaslComplete message. 
Allowing normal work to proceed.");
+LOG.debug("Server has sent us the SaslComplete message. 
Allowing normal work to proceed.");
 
-if (!saslNettyClient.isComplete()) {
+if (!saslNettyClient.isComplete()) {
 String errorMessage =
 "Server returned a Sasl-complete message, but as far 
as we can tell, we are not authenticated yet.";
 LOG.error(errorMessage);
 throw new Exception(errorMessage);
-}
+}
 ctx.pipeline().remove(this);
 this.client.channelReady(channel);
 
 // We call fireChannelRead since the client is allowed to
-// perform this request. The client's request will now 
proceed
-// to the next pipeline component namely 
StormClientHandler.
+// perform this request. The client's request will now proceed
+// to the next pipeline component namely StormClientHandler.
 ctx.fireChannelRead(controlMessage);
-} else {
+} else {
 LOG.warn("Unexpected control message: {}", controlMessage);
-}
+}
 }
 
 private void handleSaslMessageToken(ChannelHandlerContext ctx, 
SaslMessageToken saslMessageToken) throws Exception {
 Channel channel = ctx.channel();
 KerberosSaslNettyClient saslNettyClient = 
getChannelSaslClient(channel);
-LOG.debug("Responding to server's token of length: {}",
-saslMessageToken.getSaslToken().length);
-
-// Generate SASL response (but we only actually send the 
response if
-// it's non-null.
-byte[] responseToServer = saslNettyClient
-.saslResponse(saslMessageToken);
-if (responseToServer == null) {
-// If we generate a null response, then authentication has 
completed
-// (if not, warn), and return without sending a response 
back to the
-// server.
-LOG.debug("Response to server is null: authentication 
should now be complete.");
-if (!saslNettyClient.isComplete()) {
-LOG.warn("Generated a null response, but 
authentication is not complete.");
-throw new Exception("Our reponse to the server is 
null, but as far as we can tell, we are not authenticated yet.");
-}
-this.client.channelReady(channel);
-} else {
-LOG.debug("Response to server token has length: {}",
-  responseToServer.length);
-// Construct a message containing the SASL response and send 
it to the
+LOG.debug("Responding to server's token of length: {}", 
saslMessageToken.getSaslToken().length);
+
+// Generate SASL response (but we only actually send the response 
if
+// it's non-null.
+byte[] responseToServer = 
saslNettyClient.saslResponse(saslMessageToken);
+if (responseToServer == null) {
+// If we generate a null response, then authentication has 
completed
+// (if not, warn), and return without sending a response back 
to the
 // server.
+LOG.debug("Response to server is null: authentication should 
now be complete.");
+if (!saslNettyClient.isComplete()) {
+LOG.warn("Generated a null response, but authentication is 
not complete.");
+throw new Exception("Our reponse to the server is null, 
but as far as we can tell, we are not authenticated yet.");
+}
+this.client.channelReady(channel);
+} else {
+LOG.debug("Response to server token has length: {}",
+  responseToServer.length);
+// Construct a message containing the SASL response and send 
it to the 

[GitHub] storm issue #2741: STORM-3124 reconnect to pacemaker on failure

2018-06-27 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2741
  
@revans2 - made the changes you wanted.


---


[GitHub] storm pull request #2741: STORM-3124 reconnect to pacemaker on failure

2018-06-27 Thread agresch
GitHub user agresch opened a pull request:

https://github.com/apache/storm/pull/2741

STORM-3124 reconnect to pacemaker on failure



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agresch/storm agresch_pacemaker_connect

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2741.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2741


commit 8cd3aa25fd8463a161b684e2674b653aa344efbe
Author: Aaron Gresch 
Date:   2018-06-27T19:57:28Z

STORM-3124 reconnect to pacemaker on failure




---


[GitHub] storm issue #2732: STORM-3117 prevent deleting blobs while topologies still ...

2018-06-25 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2732
  
@revans2 - just added it


---


[GitHub] storm issue #2732: STORM-3117 prevent deleting blobs while topologies still ...

2018-06-25 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2732
  
@HeartSaVioR @revans2 - switched thrift exception to new 
IllegalStateException.



---


[GitHub] storm pull request #2732: STORM-3117 prevent deleting blobs while topologies...

2018-06-25 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2732#discussion_r197832970
  
--- Diff: 
storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java ---
@@ -3557,6 +3570,14 @@ public ByteBuffer downloadBlobChunk(String session) 
throws AuthorizationExceptio
 @Override
 public void deleteBlob(String key) throws AuthorizationException, 
KeyNotFoundException, TException {
 try {
+String topoName = ConfigUtils.getIdFromBlobKey(key);
+if (topoName != null) {
+if (isTopologyActiveOrActivating(stormClusterState, 
topoName)) {
+String message = "Attempting to delete blob " + key + 
" from under active topology " + topoName;
+LOG.warn(message);
+throw new TException(message);
--- End diff --

Makes sense to me.  I prefer this to an AuthorizationException.


---


[GitHub] storm pull request #2732: STORM-3117 prevent deleting blobs while topologies...

2018-06-25 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2732#discussion_r197790334
  
--- Diff: 
storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java ---
@@ -2672,7 +2680,12 @@ private ClusterSummary getClusterInfoImpl() throws 
Exception {
 
summary.set_assigned_memoffheap(resources.getAssignedMemOffHeap());
 summary.set_assigned_cpu(resources.getAssignedCpu());
 }
-
summary.set_replication_count(getBlobReplicationCount(ConfigUtils.masterStormCodeKey(topoId)));
+try {
+
summary.set_replication_count(getBlobReplicationCount(ConfigUtils.masterStormCodeKey(topoId)));
+} catch (KeyNotFoundException e) {
+// This could fail if a blob gets deleted by mistake.  
Don't crash nimbus.
+LOG.error("Unable to find blob entry", e);
--- End diff --

@HeartSaVioR  - This change I made first in isolation (when still allowing 
the blob to be deleted).  This prevents nimbus from crashing, but a nimbus 
restart will still have issues.  With the other change, this line may no longer 
be necessary, but I would rather be defensive.  It should not hit unless we 
have further race conditions the other code check is missing.


---


[GitHub] storm pull request #2732: STORM-3117 prevent deleting blobs while topologies...

2018-06-21 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2732#discussion_r197264602
  
--- Diff: 
storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java ---
@@ -3557,6 +3570,14 @@ public ByteBuffer downloadBlobChunk(String session) 
throws AuthorizationExceptio
 @Override
 public void deleteBlob(String key) throws AuthorizationException, 
KeyNotFoundException, TException {
 try {
+String topoName = ConfigUtils.getIdFromBlobKey(key);
+if (topoName != null) {
+if (isTopologyActiveOrActivating(stormClusterState, 
topoName)) {
+String message = "Attempting to delete blob " + key + 
" from under active topology " + topoName;
+LOG.warn(message);
+throw new TException(message);
--- End diff --

I thought of that and wondered if that would be confusing to the users 
since it is a temporary exception.  Willing to change it though.


---


[GitHub] storm pull request #2732: STORM-3117 prevent deleting blobs while topologies...

2018-06-21 Thread agresch
GitHub user agresch opened a pull request:

https://github.com/apache/storm/pull/2732

STORM-3117 prevent deleting blobs while topologies still active

Prevented crashing nimbus if blob is deleted during a getClusterInfo call.  
However, once nimbus is restarted, if there was an active topology with a 
missing blob, nimbus will be stuck validatingAcls and not be able to schedule.

To get around this, I prevented deleting blobs unless a topology is not 
active.  As a bonus, this should also fix race conditions we have seen between 
the scheduling loop and a kill topology occurring with the cleanup thread 
deleting the blobs.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agresch/storm agresch_blob_delete

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2732.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2732


commit ab25cd38fe9746c1fbf7b6533aaebfec0ab9816d
Author: Aaron Gresch 
Date:   2018-06-21T19:56:19Z

STORM-3117 prevent deleting blobs while topologies still active




---


[GitHub] storm pull request #2731: STORM-3118 fix netty buffer usage

2018-06-21 Thread agresch
GitHub user agresch opened a pull request:

https://github.com/apache/storm/pull/2731

STORM-3118 fix netty buffer usage



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agresch/storm agresch_pacemaker_netty

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2731.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2731


commit 79ce734e4d74393d1811149158782702eaab957e
Author: Aaron Gresch 
Date:   2018-06-21T18:45:16Z

STORM-3118 fix netty buffer usage




---


[GitHub] storm issue #2718: STORM-3103 allow nimbus to shutdown properly

2018-06-15 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2718
  
If it helps, here is the callstack from our original bug report:

2018-05-24 09:27:05.636 o.a.s.d.n.Nimbus main [INFO] Starting nimbus server 
for storm version '2.0.0.y'
2018-05-24 09:27:06.012 o.a.s.d.n.Nimbus timer [ERROR] Error while 
processing event
java.lang.RuntimeException: java.lang.NullPointerException
at 
org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$37(Nimbus.java:2685) 
~[storm-server-2.0.0.y.jar:2.0.0.y]
at org.apache.storm.StormTimer$1.run(StormTimer.java:111) 
~[storm-client-2.0.0.y.jar:2.0.0.y]
at 
org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:227) 
~[storm-client-2.0.0.y.jar:2.0.0.y]
Caused by: java.lang.NullPointerException
at 
org.apache.storm.daemon.nimbus.Nimbus.readAllSupervisorDetails(Nimbus.java:1814)
 ~[storm-server-2.0.0.y.jar:2.0.0.y]
at 
org.apache.storm.daemon.nimbus.Nimbus.computeNewSchedulerAssignments(Nimbus.java:1906)
 ~[storm-server-2.0.0.y.jar:2.0.0.y]
at 
org.apache.storm.daemon.nimbus.Nimbus.mkAssignments(Nimbus.java:2057) 
~[storm-server-2.0.0.y.jar:2.0.0.y]
at 
org.apache.storm.daemon.nimbus.Nimbus.mkAssignments(Nimbus.java:2003) 
~[storm-server-2.0.0.y.jar:2.0.0.y]
at 
org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$37(Nimbus.java:2681) 
~[storm-server-2.0.0.y.jar:2.0.0.y]
... 2 more
2018-05-24 09:27:06.023 o.a.s.u.Utils timer [ERROR] Halting process: Error 
while processing event
java.lang.RuntimeException: Halting process: Error while processing event
at org.apache.storm.utils.Utils.exitProcess(Utils.java:469) 
~[storm-client-2.0.0.y.jar:2.0.0.y]
at 
org.apache.storm.daemon.nimbus.Nimbus.lambda$new$17(Nimbus.java:484) 
~[storm-server-2.0.0.y.jar:2.0.0.y]
at 
org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:252) 
~[storm-client-2.0.0.y.jar:2.0.0.y]
2018-05-24 09:27:06.032 o.a.s.d.n.Nimbus Thread-12 [INFO] Shutting down 
master
2018-05-24 09:27:06.032 o.a.s.u.Utils Thread-13 [INFO] Halting after 10 
seconds

No further shutdown was processed.  It's entirely reproducible by forcing 
an NPE from the timer.



---


[GitHub] storm issue #2718: STORM-3103 allow nimbus to shutdown properly

2018-06-15 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2718
  
@danny0405 - The case I saw was the Timer throwing a null pointer exception 
from mkAssignments() - fixed in https://github.com/apache/storm/pull/2693.  
This causes the onKill callback to be called and locks up the timer thread.


---


[GitHub] storm issue #2718: STORM-3103 allow nimbus to shutdown properly

2018-06-14 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2718
  
It seems not to, as Nimbus was able to continue the shutdown process 
cleanly.


---


[GitHub] storm pull request #2714: STORM-3101: Add filter to StormMetricsRegistry.

2018-06-13 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2714#discussion_r195243886
  
--- Diff: 
storm-server/src/main/java/org/apache/storm/metric/StormMetricsRegistry.java ---
@@ -73,6 +130,12 @@ private static void 
startMetricsReporter(PreparableReporter reporter, Map T register(String name, T metric) {
+if (source != null && !name.startsWith(source)) {
--- End diff --

I think it would be better to register with a Daemon type enum and metric 
name.  This would allows metrics that are daemon specific and general to work 
for a given daemon.


---


[GitHub] storm pull request #2718: STORM-3103 allow nimbus to shutdown properly

2018-06-13 Thread agresch
GitHub user agresch opened a pull request:

https://github.com/apache/storm/pull/2718

STORM-3103 allow nimbus to shutdown properly

Nimbus registers exitProcess() for the StormTimer.onKill routine.  This 
does not return once called.  This basically locks up the timer with active 
being set.

Then when Nimbus shuts down, it calls timer.close().  Since the timer is 
stuck in the onKill routine forever with active set on, timer close is 
deadlocked waiting for the join() to occur here:


https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/StormTimer.java#L173

The proposed fix is to just set active to false before calling the onKill() 
for the timer.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agresch/storm agresch_timer

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2718.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2718


commit 5c5d071305785fb322413b0a39d3fb6d19a83a9d
Author: Aaron Gresch 
Date:   2018-06-13T20:31:15Z

STORM-3103 allow nimbus to shutdown properly




---


[GitHub] storm pull request #2705: STORM-3096 prevent race condition during topology ...

2018-06-05 Thread agresch
GitHub user agresch opened a pull request:

https://github.com/apache/storm/pull/2705

STORM-3096 prevent race condition during topology submission

My previous attempt at fixing this addressed the race in 
store.storedTopoIds().  But state.idsOfTopologiesWithPrivateWorkerKeys() will 
also return a topology as ready to clean up while the topology is being 
submitted.

The fix is to delay deletion on all topologies found for all the state and 
store checks.

I was able to reproduce the issue and validate the fix properly this time 
by forcing a Nimbus cleanup to be called as part of topology submission.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agresch/storm agresch_blob2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2705.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2705


commit 8f60cefe4bd57f4d143f47e1b8862765abaad445
Author: Aaron Gresch 
Date:   2018-06-05T20:58:29Z

STORM-3096 prevent race condition during topology submission




---


[GitHub] storm issue #2701: STORM-3091 don't allow workers to create heartbeat direct...

2018-06-05 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2701
  
@revans2 - fixed the javadoc


---


[GitHub] storm pull request #2701: STORM-3091 don't allow workers to create heartbeat...

2018-06-04 Thread agresch
GitHub user agresch opened a pull request:

https://github.com/apache/storm/pull/2701

STORM-3091 don't allow workers to create heartbeat directory



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agresch/storm agresch_storm-3091

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2701.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2701


commit dbae31d8507aea5992d2e669da1de43d66442075
Author: Aaron Gresch 
Date:   2018-06-04T14:16:10Z

STORM-3091 don't allow workers to create heartbeat directory




---


[GitHub] storm issue #2693: STORM-3084 fix Nimbus NPE

2018-05-30 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2693
  
Can we get this merged?  Thanks.


---


[GitHub] storm issue #2686: STORM-3053 prevent race deleting blobs before topologies ...

2018-05-30 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2686
  
@srdo - made the changes you requested.


---


[GitHub] storm pull request #2693: STORM-3084 fix Nimbus NPE

2018-05-24 Thread agresch
GitHub user agresch opened a pull request:

https://github.com/apache/storm/pull/2693

STORM-3084 fix Nimbus NPE



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agresch/storm agresch_storm3084

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2693.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2693


commit fe388df9b3e9649fbdf9f2e57206ee655212bcc6
Author: Aaron Gresch 
Date:   2018-05-24T21:39:07Z

STORM-3084 fix Nimbus NPE




---


[GitHub] storm pull request #2686: STORM-3053 prevent race deleting blobs before topo...

2018-05-21 Thread agresch
GitHub user agresch opened a pull request:

https://github.com/apache/storm/pull/2686

STORM-3053 prevent race deleting blobs before topologies can be submitted

When submitting a toplogy, the jar first gets uploaded, and then the 
topology is submitted.  Between these two events, a nimbus timer can go off and 
it finds that the jar belongs to a topology that does not exist, and removes 
it, causing topology submission to fail.

This change logs when a blob is first detected for deletion.  It then waits 
a period of time (defaulting to 5 minutes) before considering it idle.




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agresch/storm agresch_storm3053

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2686.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2686


commit 99f47570750266c12d31eaa11ef912e92575b095
Author: Aaron Gresch 
Date:   2018-05-21T19:52:58Z

STORM-3053 prevent race deleting blobstores before topologies can be 
submitted




---


[GitHub] storm issue #2659: STORM-3042 restore topology.acker.cpu.pcore.percent

2018-05-21 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2659
  
@HeartSaVioR - rebased.


---


[GitHub] storm pull request #2684: STORM-3079 add getMessage() support for thrift exc...

2018-05-17 Thread agresch
Github user agresch commented on a diff in the pull request:

https://github.com/apache/storm/pull/2684#discussion_r189079432
  
--- Diff: 
storm-client/src/jvm/org/apache/storm/utils/WrappedAuthorizationException.java 
---
@@ -0,0 +1,17 @@
+package org.apache.storm.utils;
--- End diff --

Fixed


---


[GitHub] storm pull request #2684: STORM-3079 add getMessage() support for thrift exc...

2018-05-17 Thread agresch
GitHub user agresch opened a pull request:

https://github.com/apache/storm/pull/2684

STORM-3079 add getMessage() support for thrift exceptions



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agresch/storm agresch_thriftexception

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2684.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2684


commit 48451e5d6855635fa5237140f0c7b7c8a79ebd67
Author: Aaron Gresch 
Date:   2018-05-17T19:31:20Z

STORM-3079 add getMessage() support for thrift exceptions




---


[GitHub] storm issue #2677: STORM-3075 fix NPE

2018-05-17 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2677
  
@Ethanlm - done.  Added a spelling fix.


---


[GitHub] storm issue #2677: STORM-3075 fix NPE

2018-05-15 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2677
  
@Ethanlm - please take a look.  Thanks.


---


[GitHub] storm issue #2677: STORM-3075 fix NPE

2018-05-15 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2677
  
@Ethanlm - nice catch.  Will fix the source of the problem instead


---


[GitHub] storm issue #2677: STORM-3075 fix NPE

2018-05-15 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2677
  
@Ethanlm - I see tc being null whenever I start nimbus (at least with no 
topologies).


---


[GitHub] storm pull request #2677: STORM-3075 fix NPE

2018-05-15 Thread agresch
GitHub user agresch opened a pull request:

https://github.com/apache/storm/pull/2677

STORM-3075 fix NPE



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/agresch/storm agresch_storm-3075

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/2677.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2677


commit 6a3150edef5ea58ddd8218df91e2b93ad9f90cd3
Author: Aaron Gresch 
Date:   2018-05-15T20:06:53Z

STORM-3075 fix NPE




---


[GitHub] storm issue #2659: STORM-3042 restore topology.acker.cpu.pcore.percent

2018-05-04 Thread agresch
Github user agresch commented on the issue:

https://github.com/apache/storm/pull/2659
  
@Ethanlm - updated the issues you mentioned.


---


  1   2   >