[jira] [Resolved] (STORM-3875) ThroughputVsLatency does not run on JDK11 due to specified TOPOLOGY_WORKER_GC_CHILDOPTS
[ https://issues.apache.org/jira/browse/STORM-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3875. - Fix Version/s: 2.5.0 Resolution: Fixed > ThroughputVsLatency does not run on JDK11 due to specified > TOPOLOGY_WORKER_GC_CHILDOPTS > --- > > Key: STORM-3875 > URL: https://issues.apache.org/jira/browse/STORM-3875 > Project: Apache Storm > Issue Type: Bug >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Minor > Fix For: 2.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (STORM-3875) ThroughputVsLatency does not run on JDK11 due to specified TOPOLOGY_WORKER_GC_CHILDOPTS
Aaron Gresch created STORM-3875: --- Summary: ThroughputVsLatency does not run on JDK11 due to specified TOPOLOGY_WORKER_GC_CHILDOPTS Key: STORM-3875 URL: https://issues.apache.org/jira/browse/STORM-3875 Project: Apache Storm Issue Type: Bug Reporter: Aaron Gresch Assignee: Aaron Gresch -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Closed] (STORM-3863) tirupathi trip from chennai
[ https://issues.apache.org/jira/browse/STORM-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch closed STORM-3863. --- Resolution: Invalid > tirupathi trip from chennai > --- > > Key: STORM-3863 > URL: https://issues.apache.org/jira/browse/STORM-3863 > Project: Apache Storm > Issue Type: Bug > Components: trident >Affects Versions: 1.2.1 >Reporter: Padmavathi Travels >Priority: Trivial > Fix For: 2.2.0 > > > Padmavathi Travels T.Nagar Provides Chennai to tirupati Car Packages and > Services at best Price. *Our Tirupati Tour Package by car* includes all the > customer requirments, We are operating Daily Tirupati Balaji Darshan from > Chennai for more than 23+ years. Padmavathi Travels chennai is considered as > one of the best travel agents in chennai. > https://padmavathitravels.com/index.amp.shtml -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (STORM-3862) HdfsBlobStoreImpl should check permission after mkdirs
[ https://issues.apache.org/jira/browse/STORM-3862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3862. - Fix Version/s: 2.5.0 Resolution: Fixed > HdfsBlobStoreImpl should check permission after mkdirs > -- > > Key: STORM-3862 > URL: https://issues.apache.org/jira/browse/STORM-3862 > Project: Apache Storm > Issue Type: Bug > Components: blobstore >Affects Versions: 2.4.0 >Reporter: Zhang Dongsheng >Priority: Major > Fix For: 2.5.0 > > Time Spent: 50m > Remaining Estimate: 0h > > HdfsBlobStoreImpl and HdfsBlobStoreFile will create directory with 700 > permission, we need to check if permission is set as expected. Because of the > influence of settings such as umask, we need to check whether the permissions > are set as expected. If not, we should give them the correct permissions to > ensure subsequent normal operation. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (STORM-3861) Upgrade clojure-maven-plugin
[ https://issues.apache.org/jira/browse/STORM-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3861. - Fix Version/s: 2.5.0 Resolution: Fixed > Upgrade clojure-maven-plugin > > > Key: STORM-3861 > URL: https://issues.apache.org/jira/browse/STORM-3861 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Minor > Fix For: 2.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > I wasted a lot of time trying to figure out a build failure on a new > environment (on two separate occasions) due to the clojure plugin swallowing > an exception. I had submitted this improvement, which makes the errors > debuggable. It should be available now. > [https://github.com/talios/clojure-maven-plugin/pull/112] -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (STORM-3861) Upgrade clojure-maven-plugin
Aaron Gresch created STORM-3861: --- Summary: Upgrade clojure-maven-plugin Key: STORM-3861 URL: https://issues.apache.org/jira/browse/STORM-3861 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch I wasted a lot of time trying to figure out a build failure on a new environment (on two separate occasions) due to the clojure plugin swallowing an exception. I had submitted this improvement, which makes the errors debuggable. It should be available now. [https://github.com/talios/clojure-maven-plugin/pull/112] -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (STORM-3838) prevent topology from overriding STORM_WORKERS_ARTIFACTS_DIR
Aaron Gresch created STORM-3838: --- Summary: prevent topology from overriding STORM_WORKERS_ARTIFACTS_DIR Key: STORM-3838 URL: https://issues.apache.org/jira/browse/STORM-3838 Project: Apache Storm Issue Type: Bug Reporter: Aaron Gresch Assignee: Aaron Gresch A user overrode this and EventLoggerBolt throws an exception, preventing workers from coming up. There should be no reason for a user to set this value. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (STORM-3830) exclude all old log4j
[ https://issues.apache.org/jira/browse/STORM-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch closed STORM-3830. --- Resolution: Duplicate > exclude all old log4j > - > > Key: STORM-3830 > URL: https://issues.apache.org/jira/browse/STORM-3830 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (STORM-3835) Log when shell command exceptions occur
Aaron Gresch created STORM-3835: --- Summary: Log when shell command exceptions occur Key: STORM-3835 URL: https://issues.apache.org/jira/browse/STORM-3835 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch When the numShellExceptions meter increments, it would be nice to see what command failed and what exception caused the problem. We saw this internally trigger when LDAP servers were having issues. Knowing the command would help narrow down the problem faster. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (STORM-3831) exclude all old log4j
[ https://issues.apache.org/jira/browse/STORM-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3831. - Fix Version/s: 2.4.0 Resolution: Fixed > exclude all old log4j > - > > Key: STORM-3831 > URL: https://issues.apache.org/jira/browse/STORM-3831 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Major > Fix For: 2.4.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (STORM-3828) upgrade org/glassfish/javax.el due to build problems
[ https://issues.apache.org/jira/browse/STORM-3828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3828. - Fix Version/s: 2.4.0 Resolution: Fixed > upgrade org/glassfish/javax.el due to build problems > > > Key: STORM-3828 > URL: https://issues.apache.org/jira/browse/STORM-3828 > Project: Apache Storm > Issue Type: Improvement >Reporter: PJ Fanning >Priority: Major > Fix For: 2.4.0 > > Time Spent: 50m > Remaining Estimate: 0h > > {code:java} > [ERROR] Failed to execute goal on project storm-autocreds: Could not resolve > dependencies for project org.apache.storm:storm-autocreds:jar:2.4.1-SNAPSHOT: > Failed to collect dependencies at org.apache.hbase:hbase-server:jar:2.1.3 -> > org.glassfish.web:javax.servlet.jsp:jar:2.3.2 -> > org.glassfish:javax.el:jar:3.0.1-b06-SNAPSHOT: Failed to read artifact > descriptor for org.glassfish:javax.el:jar:3.0.1-b06-SNAPSHOT: Failure to > transfer org.glassfish:javax.el:pom:3.0.1-b06-SNAPSHOT from > https://maven.java.net/content/repositories/snapshots was cached in the local > repository, resolution will not be reattempted until the update interval of > jvnet-nexus-snapshots has elapsed or updates are forced. Original error: > Could not transfer artifact org.glassfish:javax.el:pom:3.0.1-b06-SNAPSHOT > from/to jvnet-nexus-snapshots > (https://maven.java.net/content/repositories/snapshots): Transfer failed for > https://maven.java.net/content/repositories/snapshots/org/glassfish/javax.el/3.0.1-b06-SNAPSHOT/javax.el-3.0.1-b06-SNAPSHOT.pom > -> [Help 1] {code} > [https://app.travis-ci.com/github/apache/storm/jobs/561567903] > Also seems like a bad idea to be relying on 3.0.1-b06-SNAPSHOT > Similar issue - https://issues.apache.org/jira/browse/JCR-4626 > My workaround is based on this -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (STORM-3831) exclude all old log4j
Aaron Gresch created STORM-3831: --- Summary: exclude all old log4j Key: STORM-3831 URL: https://issues.apache.org/jira/browse/STORM-3831 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (STORM-3830) exclude all old log4j
Aaron Gresch created STORM-3830: --- Summary: exclude all old log4j Key: STORM-3830 URL: https://issues.apache.org/jira/browse/STORM-3830 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (STORM-3821) use commons-compress 1.21 due to security issues
[ https://issues.apache.org/jira/browse/STORM-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3821. - Fix Version/s: 2.4.0 Resolution: Fixed > use commons-compress 1.21 due to security issues > > > Key: STORM-3821 > URL: https://issues.apache.org/jira/browse/STORM-3821 > Project: Apache Storm > Issue Type: Dependency upgrade >Reporter: PJ Fanning >Priority: Major > Fix For: 2.4.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Part of https://issues.apache.org/jira/browse/STORM-3592 > See vulnerabilities in > https://mvnrepository.com/artifact/org.apache.commons/commons-compress/1.18 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (STORM-3824) upgrade httpclient due to security issues
[ https://issues.apache.org/jira/browse/STORM-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3824. - Fix Version/s: 2.4.0 Resolution: Fixed > upgrade httpclient due to security issues > - > > Key: STORM-3824 > URL: https://issues.apache.org/jira/browse/STORM-3824 > Project: Apache Storm > Issue Type: Improvement >Reporter: PJ Fanning >Priority: Major > Fix For: 2.4.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Relates to https://issues.apache.org/jira/browse/STORM-3592 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (STORM-3817) Upgrading to Zookeeper 3.5.x, 3.6.x or 3.7.x
[ https://issues.apache.org/jira/browse/STORM-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3817. - Fix Version/s: 2.4.0 Resolution: Fixed > Upgrading to Zookeeper 3.5.x, 3.6.x or 3.7.x > > > Key: STORM-3817 > URL: https://issues.apache.org/jira/browse/STORM-3817 > Project: Apache Storm > Issue Type: Dependency upgrade >Affects Versions: 2.3.0, 2.2.1 >Reporter: Richard Zowalla >Priority: Major > Fix For: 2.4.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Is there any possibility to upgrade the [shaded zookeeper version > |https://github.com/apache/storm/blob/master/storm-shaded-deps/pom.xml#L64] > from 3.4.14 to a newer version? Or are there any reasons for not doing an > upgrade right now? > I am doing some testing with Storm in a Java 17 environment and it looks like > I am suffering from this Zookeeper specific issue present in 3.4.14: > https://issues.apache.org/jira/browse/ZOOKEEPER-3779 > If necessary I can also provide a PR for an upgrade to 3.5.x, 3.6.x or 3.7.x > UPDATE: Looks like curator depends on 3.5.x - so probably 3.5.x should be an > option. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (STORM-3815) allow option to disable sending of __send-iconnection metrics
[ https://issues.apache.org/jira/browse/STORM-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3815. - Fix Version/s: 2.4.0 Resolution: Fixed > allow option to disable sending of __send-iconnection metrics > - > > Key: STORM-3815 > URL: https://issues.apache.org/jira/browse/STORM-3815 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Minor > Fix For: 2.4.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > The __send-iconnection metrics can be substantial for large topologies and > users may not care about them. Add an option to allow disable their > reporting. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (STORM-3815) allow option to disable sending of __send-iconnection metrics
Aaron Gresch created STORM-3815: --- Summary: allow option to disable sending of __send-iconnection metrics Key: STORM-3815 URL: https://issues.apache.org/jira/browse/STORM-3815 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch The __send-iconnection metrics can be substantial for large topologies and users may not care about them. Add an option to allow disable their reporting. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (STORM-3811) update log4j
Aaron Gresch created STORM-3811: --- Summary: update log4j Key: STORM-3811 URL: https://issues.apache.org/jira/browse/STORM-3811 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (STORM-3804) Don't allow deleting blobs if they are required for an active topology
[ https://issues.apache.org/jira/browse/STORM-3804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3804. - Fix Version/s: 2.4.0 Resolution: Fixed > Don't allow deleting blobs if they are required for an active topology > -- > > Key: STORM-3804 > URL: https://issues.apache.org/jira/browse/STORM-3804 > Project: Apache Storm > Issue Type: Improvement > Components: blobstore >Reporter: Nikhil Singh >Assignee: Nikhil Singh >Priority: Minor > Fix For: 2.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Don't allow deleting blobs if they are required for an active topology. Throw > an exception. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3802) Allow adding metrics reporters to all topologies
[ https://issues.apache.org/jira/browse/STORM-3802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3802. - Fix Version/s: 2.4.0 Resolution: Fixed > Allow adding metrics reporters to all topologies > > > Key: STORM-3802 > URL: https://issues.apache.org/jira/browse/STORM-3802 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Minor > Fix For: 2.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > We would like to be able to track some topology-specific metrics for all > topologies, regardless of how a topology configures their metrics reporters. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3802) Allow adding metrics reporters to all topologies
Aaron Gresch created STORM-3802: --- Summary: Allow adding metrics reporters to all topologies Key: STORM-3802 URL: https://issues.apache.org/jira/browse/STORM-3802 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch We would like to be able to track some topology-specific metrics for all topologies, regardless of how a topology configures their metrics reporters. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3801) newWorkerEvent doesn't report properly for multiple reporters
[ https://issues.apache.org/jira/browse/STORM-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3801. - Fix Version/s: 2.4.0 Resolution: Fixed > newWorkerEvent doesn't report properly for multiple reporters > - > > Key: STORM-3801 > URL: https://issues.apache.org/jira/browse/STORM-3801 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Minor > Fix For: 2.4.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Add a get and reset functionality that works for multiple reporters. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3801) newWorkerEvent doesn't report properly for multiple reporters
Aaron Gresch created STORM-3801: --- Summary: newWorkerEvent doesn't report properly for multiple reporters Key: STORM-3801 URL: https://issues.apache.org/jira/browse/STORM-3801 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch Add a get and reset functionality that works for multiple reporters. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3793) Add metric to track backpressure status for a task
Aaron Gresch created STORM-3793: --- Summary: Add metric to track backpressure status for a task Key: STORM-3793 URL: https://issues.apache.org/jira/browse/STORM-3793 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3791) update metric documentation
[ https://issues.apache.org/jira/browse/STORM-3791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3791. - Fix Version/s: 2.3.0 Resolution: Fixed > update metric documentation > --- > > Key: STORM-3791 > URL: https://issues.apache.org/jira/browse/STORM-3791 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Minor > Fix For: 2.3.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Various changes to metrics to V2 have occurred. Make a sweep and try and > update documentation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3791) update metric documentation
Aaron Gresch created STORM-3791: --- Summary: update metric documentation Key: STORM-3791 URL: https://issues.apache.org/jira/browse/STORM-3791 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch Various changes to metrics to V2 have occurred. Make a sweep and try and update documentation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3790) Add meter to track failures WorkerTokenAuthorizer getPassword
[ https://issues.apache.org/jira/browse/STORM-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3790. - Fix Version/s: 2.3.0 Resolution: Fixed > Add meter to track failures WorkerTokenAuthorizer getPassword > - > > Key: STORM-3790 > URL: https://issues.apache.org/jira/browse/STORM-3790 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Minor > Fix For: 2.3.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3790) Add meter to track failures WorkerTokenAuthorizer getPassword
Aaron Gresch created STORM-3790: --- Summary: Add meter to track failures WorkerTokenAuthorizer getPassword Key: STORM-3790 URL: https://issues.apache.org/jira/browse/STORM-3790 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3786) V2 metrics tick may overreport or not report at all
[ https://issues.apache.org/jira/browse/STORM-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3786. - Resolution: Fixed > V2 metrics tick may overreport or not report at all > --- > > Key: STORM-3786 > URL: https://issues.apache.org/jira/browse/STORM-3786 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > V2 metrics tick should report only at a specific interval. It also may not > be triggered if no v1 metrics exist. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (STORM-3786) V2 metrics tick may overreport or not report at all
[ https://issues.apache.org/jira/browse/STORM-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch updated STORM-3786: Fix Version/s: 2.3.0 > V2 metrics tick may overreport or not report at all > --- > > Key: STORM-3786 > URL: https://issues.apache.org/jira/browse/STORM-3786 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Minor > Fix For: 2.3.0 > > Time Spent: 50m > Remaining Estimate: 0h > > V2 metrics tick should report only at a specific interval. It also may not > be triggered if no v1 metrics exist. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (STORM-3784) my supervisor will shut down on 2:00 am everyday
[ https://issues.apache.org/jira/browse/STORM-3784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390102#comment-17390102 ] Aaron Gresch commented on STORM-3784: - Something is deleting the file /data/apache-storm-2.1.0/status/supervisor/stormdist/PradarLinkTopology-4-1626751925/stormconf.ser > my supervisor will shut down on 2:00 am everyday > > > Key: STORM-3784 > URL: https://issues.apache.org/jira/browse/STORM-3784 > Project: Apache Storm > Issue Type: Bug > Components: storm-server >Affects Versions: 2.1.0 > Environment: centos 7 x64 >Reporter: Sunsy Sun >Priority: Major > Attachments: supervisor(1).log > > > The cluster has one nimbus and two supervisors.one of the supervisors is > alone with nimbus. > I deployed two topology that PradarLinkTopology and PradarLogTopology. > PradarLogTopology run with 4 workers.PradarLinkTopology run with 1 workers. > on 2:00 am everyday, all supervisors will shut down,i havn't find out the > reason. > I try to clean up the status directory,but the problem still exsit. > this is my supervisor.log > {code:java} > //代码占位符 > 2021-07-21 02:03:42.070 o.a.s.u.Utils Thread-17 [INFO] Worker Process > dcae9231-4be4-4842-9ed0-988e1b8a2b28:Error occurred during initialization of > VM2021-07-21 02:03:42.070 o.a.s.u.Utils Thread-17 [INFO] Worker Process > dcae9231-4be4-4842-9ed0-988e1b8a2b28:Error occurred during initialization of > VM2021-07-21 02:03:42.071 o.a.s.u.Utils Thread-17 [INFO] Worker Process > dcae9231-4be4-4842-9ed0-988e1b8a2b28:java.lang.Error: Properties init: Could > not determine current working directory.2021-07-21 02:03:42.071 o.a.s.u.Utils > Thread-17 [INFO] Worker Process dcae9231-4be4-4842-9ed0-988e1b8a2b28: at > java.lang.System.initProperties(Native Method)2021-07-21 02:03:42.071 > o.a.s.u.Utils Thread-17 [INFO] Worker Process > dcae9231-4be4-4842-9ed0-988e1b8a2b28: at > java.lang.System.initializeSystemClass(System.java:1166)2021-07-21 > 02:03:42.071 o.a.s.u.Utils Thread-17 [INFO] Worker Process > dcae9231-4be4-4842-9ed0-988e1b8a2b28:2021-07-21 02:03:42.323 > o.a.s.d.s.BasicContainer SLOT_6702 [INFO] Removed Worker ID > dcae9231-4be4-4842-9ed0-988e1b8a2b282021-07-21 02:03:42.329 o.a.s.d.s.Slot > SLOT_6702 [INFO] STATE kill msInState: 68588 > topo:PradarLogTopology-3-1626751922 worker:null -> empty msInState: > 32021-07-21 02:03:42.329 o.a.s.d.s.Slot SLOT_6702 [INFO] SLOT 6702: Changing > current assignment from > LocalAssignment(topology_id:PradarLogTopology-3-1626751922, > executors:[ExecutorInfo(task_start:4, task_end:4), ExecutorInfo(task_start:1, > task_end:1)], resources:WorkerResources(mem_on_heap:256.0, mem_off_heap:0.0, > cpu:20.0, shared_mem_on_heap:0.0, shared_mem_off_heap:0.0, > resources:{offheap.memory.mb=0.0, onheap.memory.mb=256.0, > cpu.pcore.percent=20.0}, shared_resources:{}), owner:root) to null2021-07-21 > 02:03:42.353 o.a.s.d.s.Supervisor pool-10-thread-1 [WARN] Topology config is > not localized yet...2021-07-21 02:03:42.449 o.a.s.d.s.Slot SLOT_6700 [INFO] > SLOT 6700 all processes are dead...2021-07-21 02:03:42.449 > o.a.s.d.s.Container SLOT_6700 [INFO] Cleaning up > 8cbbfd6c-961b-482d-9175-cf9b79473808-172.26.137.86:b7963273-452a-43af-bc00-d814e0629f962021-07-21 > 02:03:42.450 o.a.s.d.s.Container SLOT_6700 [INFO] GET worker-user for > b7963273-452a-43af-bc00-d814e0629f962021-07-21 02:03:42.450 > o.a.s.d.s.AdvancedFSOps SLOT_6700 [INFO] Deleting path > /data/apache-storm-2.1.0/status/workers/b7963273-452a-43af-bc00-d814e0629f96/pids/163262021-07-21 > 02:03:43.322 o.a.s.d.s.AdvancedFSOps SLOT_6701 [INFO] Deleting path > /data/apache-storm-2.1.0/status/workers/26b5ffbd-08b6-46df-aa04-6b86f78b8ad8/pids2021-07-21 > 02:03:43.322 o.a.s.d.s.AdvancedFSOps SLOT_6701 [INFO] Deleting path > /data/apache-storm-2.1.0/status/workers/26b5ffbd-08b6-46df-aa04-6b86f78b8ad8/tmp2021-07-21 > 02:03:45.209 o.a.s.d.s.BasicContainer Thread-17 [INFO] Worker Process > dcae9231-4be4-4842-9ed0-988e1b8a2b28 exited with code: 12021-07-21 > 02:03:45.224 o.a.s.d.s.AdvancedFSOps SLOT_6701 [INFO] Deleting path > /data/apache-storm-2.1.0/status/workers/26b5ffbd-08b6-46df-aa04-6b86f78b8ad82021-07-21 > 02:03:45.224 o.a.s.d.s.Supervisor pool-10-thread-7 [WARN] Topology config is > not localized yet...2021-07-21 02:03:45.224 o.a.s.d.s.Container SLOT_6701 > [INFO] REMOVE worker-user 26b5ffbd-08b6-46df-aa04-6b86f78b8ad82021-07-21 > 02:03:45.224 o.a.s.d.s.AdvancedFSOps SLOT_6700 [INFO] Deleting path > /data/apache-storm-2.1.0/status/workers/b7963273-452a-43af-bc00-d814e0629f96/heartbeats2021-07-21 > 02:03:45.224 o.a.s.d.s.AdvancedFSOps SLOT_6701 [INFO] Deleting path > /data/apache-storm-2.1.0/status/workers-users/26b5ffbd-08b6-46df-aa04-6b86f78b8ad82021-07-21 > 02:03:45.224 o.a.s.t.ProcessFunction
[jira] [Created] (STORM-3786) V2 metrics tick may overreport or not report at all
Aaron Gresch created STORM-3786: --- Summary: V2 metrics tick may overreport or not report at all Key: STORM-3786 URL: https://issues.apache.org/jira/browse/STORM-3786 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch V2 metrics tick should report only at a specific interval. It also may not be triggered if no v1 metrics exist. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3737) Share Worker Metric Registry For Guice AOP Based Metrics Integeration
[ https://issues.apache.org/jira/browse/STORM-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3737. - Resolution: Fixed Thanks for the PR. > Share Worker Metric Registry For Guice AOP Based Metrics Integeration > - > > Key: STORM-3737 > URL: https://issues.apache.org/jira/browse/STORM-3737 > Project: Apache Storm > Issue Type: Improvement > Components: storm-client >Affects Versions: 2.1.0 >Reporter: Lakshman Sai >Priority: Minor > Fix For: 2.3.0 > > Original Estimate: 1h > Time Spent: 0.5h > Remaining Estimate: 0.5h > > Metric Registry has been made private which makes it harder to integrate with > Guice based AOP metrics. > Proposed solve is to add metric registry created in the worker to > SharedMetricRegistries so while intializing guice based AOP metrics it can be > done in worker hook > [https://github.com/palominolabs/metrics-guice] > > PR: > https://github.com/apache/storm/pull/3373 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3780) switch ErrorReportingMetrics to V2 API
[ https://issues.apache.org/jira/browse/STORM-3780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3780. - Fix Version/s: 2.3.0 Resolution: Fixed > switch ErrorReportingMetrics to V2 API > -- > > Key: STORM-3780 > URL: https://issues.apache.org/jira/browse/STORM-3780 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Minor > Fix For: 2.3.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3781) switch recv-iconnection to v2 metric api
Aaron Gresch created STORM-3781: --- Summary: switch recv-iconnection to v2 metric api Key: STORM-3781 URL: https://issues.apache.org/jira/browse/STORM-3781 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3780) switch ErrorReportingMetrics to V2 API
Aaron Gresch created STORM-3780: --- Summary: switch ErrorReportingMetrics to V2 API Key: STORM-3780 URL: https://issues.apache.org/jira/browse/STORM-3780 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3778) convert SpoutThrottlingMetrics to V2 API
[ https://issues.apache.org/jira/browse/STORM-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3778. - Fix Version/s: 2.3.0 Resolution: Fixed > convert SpoutThrottlingMetrics to V2 API > > > Key: STORM-3778 > URL: https://issues.apache.org/jira/browse/STORM-3778 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Minor > Fix For: 2.3.0 > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3778) convert SpoutThrottlingMetrics to V2 API
Aaron Gresch created STORM-3778: --- Summary: convert SpoutThrottlingMetrics to V2 API Key: STORM-3778 URL: https://issues.apache.org/jira/browse/STORM-3778 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3775) topology.blobstore.map can cause supervisor restarts
[ https://issues.apache.org/jira/browse/STORM-3775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3775. - Fix Version/s: 2.3.0 Resolution: Fixed > topology.blobstore.map can cause supervisor restarts > > > Key: STORM-3775 > URL: https://issues.apache.org/jira/browse/STORM-3775 > Project: Apache Storm > Issue Type: Bug >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Minor > Fix For: 2.3.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > I noticed that setting a blobstore map config with booleans formatted as > strings would be accepted and cause the AsyncLocalizer to throw an exception > and cause supervisor restarts. The config option should not be valid and > prevent being submitted. > > topology.blobstore.map > { "blob1": {"localname": "test.tgz", "uncompress": "false" } > } -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3774) Migrate Cgroup metrics to V2
[ https://issues.apache.org/jira/browse/STORM-3774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3774. - Fix Version/s: 2.3.0 Resolution: Fixed > Migrate Cgroup metrics to V2 > - > > Key: STORM-3774 > URL: https://issues.apache.org/jira/browse/STORM-3774 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Minor > Fix For: 2.3.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3775) topology.blobstore.map can cause supervisor restarts
Aaron Gresch created STORM-3775: --- Summary: topology.blobstore.map can cause supervisor restarts Key: STORM-3775 URL: https://issues.apache.org/jira/browse/STORM-3775 Project: Apache Storm Issue Type: Bug Reporter: Aaron Gresch Assignee: Aaron Gresch I noticed that setting a blobstore map config with booleans formatted as strings would be accepted and cause the AsyncLocalizer to throw an exception and cause supervisor restarts. The config option should not be valid and prevent being submitted. topology.blobstore.map { "blob1": {"localname": "test.tgz", "uncompress": "false" } } -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (STORM-3773) Worker Reassignment - Difference between Storm 2.x and Storm 1.x
[ https://issues.apache.org/jira/browse/STORM-3773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356518#comment-17356518 ] Aaron Gresch commented on STORM-3773: - This sounds like it could be a dupe of STORM-3677. > Worker Reassignment - Difference between Storm 2.x and Storm 1.x > - > > Key: STORM-3773 > URL: https://issues.apache.org/jira/browse/STORM-3773 > Project: Apache Storm > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Surajeet >Priority: Major > > We are currently on Storm 1.2.1 and was in the process of upgrading it to > Storm 2.2.0 > Observed the below while upgrading it to 2.2.0: > 1) In a storm cluster (4 nodes) with 8 topologies running ( with a mapping > of 1-1 between worker and topologies), when i bring down nimbus,supervisor in > one of the node (let's say Node 1, which is not nimbus leader) the workers > running on that node gets reassigned to other 3, even though it is running on > that node (Node 1). So i have 2 worker process for the same topology running > at the same time ( saw the behaviour with or without using pacemaker). The > worker process does get killed when nimbus and supervisor is brought up in > Node 1 > 2) Observed from worker logs that it sends heartbeat to local supervisor and > nimbus leader , which with 1.2.1 used to happen using Zookeeper ( i saw this > behaviour in 2.2.0 with or without using Pacemaker). > If i bring down nimbus and supervisor on node where nimbus is a leader, it > reassigns worker processes and in some cases leads to zombie worker > processess ( is not killed when storm kill is executed) > These above behaviour (reassignment of worker) doesn't happen with Storm 1.2.1 > Since this is a fundamental design change between 1.x and 2.x , are there any > documentation which describes it in detail? ( couldn't find from Release > Notes) > (I am raising this as a bug because its preventing us from moving to 2.2.0 > due to the issue mentioned in 2) ) > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (STORM-3760) Storm 2.2.0 not reporting newWorkerEvents metric
[ https://issues.apache.org/jira/browse/STORM-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17355260#comment-17355260 ] Aaron Gresch commented on STORM-3760: - newWorkerEvent was converted to the V2 metrics API. You should be able to see it by using the V2 reporters or setting topology.enable.v2.metrics.tick to true. > Storm 2.2.0 not reporting newWorkerEvents metric > > > Key: STORM-3760 > URL: https://issues.apache.org/jira/browse/STORM-3760 > Project: Apache Storm > Issue Type: Bug > Components: storm-metrics >Affects Versions: 2.2.0 >Reporter: Cristian Rojas >Priority: Major > > Hi everyone, > > We have recently migrated from Storm 0.10.0 to Storm 2.2.0, we have a custom > _StatsdMetricConsumer_ which implements _IMetricsConsumer_ interface. > > Storm is still reporting some metrics (__transfer-count,_ _ack-count,_ > _metrics, etc)_ but it seems after the migration it stopped reporting > _*newWorkerEvents*_ metric. I made sure this is not a problem in our > implementation by logging all the metrics received _handleDataPoints_ method. > > Is this a known issue? any way to get that metric fixed? > Best regards, thanks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3774) Migrate Cgroup metrics to V2
Aaron Gresch created STORM-3774: --- Summary: Migrate Cgroup metrics to V2 Key: STORM-3774 URL: https://issues.apache.org/jira/browse/STORM-3774 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3769) Failed adding references to blobs: FileNotFoundException
[ https://issues.apache.org/jira/browse/STORM-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3769. - Fix Version/s: 2.3.0 Resolution: Fixed > Failed adding references to blobs: FileNotFoundException > > > Key: STORM-3769 > URL: https://issues.apache.org/jira/browse/STORM-3769 > Project: Apache Storm > Issue Type: Bug >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Major > Fix For: 2.3.0 > > Time Spent: 20m > Remaining Estimate: 0h > > We hit a file not found exception with AsyncLocalizer: > {code:java} > 2021-04-23 17:39:13.380 o.a.s.l.AsyncLocalizer > ForkJoinPool.commonPool-worker-23 [ERROR] Failed adding references to blobs > for TimePortAndAssignment{xxx-1-15-1616201755 on 6708} > java.io.FileNotFoundException: File > '/home/y/var/storm/supervisor/stormdist/xxx-1-15-1616201755/stormconf.ser' > does not exist > at > org.apache.storm.shade.org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:297) > ~[storm-shaded-deps-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.shade.org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1851) > ~[storm-shaded-deps-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.utils.ConfigUtils.readSupervisorStormConfGivenPath(ConfigUtils.java:311) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.utils.ConfigUtils.readSupervisorStormConfImpl(ConfigUtils.java:472) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.utils.ConfigUtils.readSupervisorStormConf(ConfigUtils.java:306) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.localizer.AsyncLocalizer.getLocalResources(AsyncLocalizer.java:368) > ~[storm-server-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.localizer.AsyncLocalizer.addReferencesToBlobs(AsyncLocalizer.java:398) > ~[storm-server-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.localizer.AsyncLocalizer.lambda$null$7(AsyncLocalizer.java:235) > ~[storm-server-2.3.0.y.jar:2.3.0.y] > at > java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1877) > ~[?:1.8.0_262] > at > org.apache.storm.localizer.AsyncLocalizer.lambda$requestDownloadTopologyBlobs$8(AsyncLocalizer.java:229) > ~[storm-server-2.3.0.y.jar:2.3.0.y] > at > java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:966) > [?:1.8.0_262] > at > java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:940) > [?:1.8.0_262] > at > java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:457) > [?:1.8.0_262] > at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) > [?:1.8.0_262] > at > java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) > [?:1.8.0_262] > at > java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) > [?:1.8.0_262] > at > java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:163) > [?:1.8.0_262] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (STORM-3769) Failed adding references to blobs: FileNotFoundException
[ https://issues.apache.org/jira/browse/STORM-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335791#comment-17335791 ] Aaron Gresch commented on STORM-3769: - In this case, a worker for topology A was no longer assigned on a supervisor. At some point the blob cache exceeded the size limit. When cleanup was called, 1 or 2 of the topology A's blobs were deleted, and this was enough to get under the cache size limit. Because some of the topology blobs remain, the code currently assumes the blobs all remain downloaded. Then when a worker for topology A gets reassigned back to the node, this exception occurs and workers will be unable to start. > Failed adding references to blobs: FileNotFoundException > > > Key: STORM-3769 > URL: https://issues.apache.org/jira/browse/STORM-3769 > Project: Apache Storm > Issue Type: Bug >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Major > > We hit a file not found exception with AsyncLocalizer: > {code:java} > 2021-04-23 17:39:13.380 o.a.s.l.AsyncLocalizer > ForkJoinPool.commonPool-worker-23 [ERROR] Failed adding references to blobs > for TimePortAndAssignment{xxx-1-15-1616201755 on 6708} > java.io.FileNotFoundException: File > '/home/y/var/storm/supervisor/stormdist/xxx-1-15-1616201755/stormconf.ser' > does not exist > at > org.apache.storm.shade.org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:297) > ~[storm-shaded-deps-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.shade.org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1851) > ~[storm-shaded-deps-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.utils.ConfigUtils.readSupervisorStormConfGivenPath(ConfigUtils.java:311) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.utils.ConfigUtils.readSupervisorStormConfImpl(ConfigUtils.java:472) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.utils.ConfigUtils.readSupervisorStormConf(ConfigUtils.java:306) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.localizer.AsyncLocalizer.getLocalResources(AsyncLocalizer.java:368) > ~[storm-server-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.localizer.AsyncLocalizer.addReferencesToBlobs(AsyncLocalizer.java:398) > ~[storm-server-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.localizer.AsyncLocalizer.lambda$null$7(AsyncLocalizer.java:235) > ~[storm-server-2.3.0.y.jar:2.3.0.y] > at > java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1877) > ~[?:1.8.0_262] > at > org.apache.storm.localizer.AsyncLocalizer.lambda$requestDownloadTopologyBlobs$8(AsyncLocalizer.java:229) > ~[storm-server-2.3.0.y.jar:2.3.0.y] > at > java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:966) > [?:1.8.0_262] > at > java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:940) > [?:1.8.0_262] > at > java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:457) > [?:1.8.0_262] > at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) > [?:1.8.0_262] > at > java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) > [?:1.8.0_262] > at > java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) > [?:1.8.0_262] > at > java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:163) > [?:1.8.0_262] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3769) Failed adding references to blobs: FileNotFoundException
Aaron Gresch created STORM-3769: --- Summary: Failed adding references to blobs: FileNotFoundException Key: STORM-3769 URL: https://issues.apache.org/jira/browse/STORM-3769 Project: Apache Storm Issue Type: Bug Reporter: Aaron Gresch Assignee: Aaron Gresch We hit a file not found exception with AsyncLocalizer: {code:java} 2021-04-23 17:39:13.380 o.a.s.l.AsyncLocalizer ForkJoinPool.commonPool-worker-23 [ERROR] Failed adding references to blobs for TimePortAndAssignment{xxx-1-15-1616201755 on 6708} java.io.FileNotFoundException: File '/home/y/var/storm/supervisor/stormdist/xxx-1-15-1616201755/stormconf.ser' does not exist at org.apache.storm.shade.org.apache.commons.io.FileUtils.openInputStream(FileUtils.java:297) ~[storm-shaded-deps-2.3.0.y.jar:2.3.0.y] at org.apache.storm.shade.org.apache.commons.io.FileUtils.readFileToByteArray(FileUtils.java:1851) ~[storm-shaded-deps-2.3.0.y.jar:2.3.0.y] at org.apache.storm.utils.ConfigUtils.readSupervisorStormConfGivenPath(ConfigUtils.java:311) ~[storm-client-2.3.0.y.jar:2.3.0.y] at org.apache.storm.utils.ConfigUtils.readSupervisorStormConfImpl(ConfigUtils.java:472) ~[storm-client-2.3.0.y.jar:2.3.0.y] at org.apache.storm.utils.ConfigUtils.readSupervisorStormConf(ConfigUtils.java:306) ~[storm-client-2.3.0.y.jar:2.3.0.y] at org.apache.storm.localizer.AsyncLocalizer.getLocalResources(AsyncLocalizer.java:368) ~[storm-server-2.3.0.y.jar:2.3.0.y] at org.apache.storm.localizer.AsyncLocalizer.addReferencesToBlobs(AsyncLocalizer.java:398) ~[storm-server-2.3.0.y.jar:2.3.0.y] at org.apache.storm.localizer.AsyncLocalizer.lambda$null$7(AsyncLocalizer.java:235) ~[storm-server-2.3.0.y.jar:2.3.0.y] at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1877) ~[?:1.8.0_262] at org.apache.storm.localizer.AsyncLocalizer.lambda$requestDownloadTopologyBlobs$8(AsyncLocalizer.java:229) ~[storm-server-2.3.0.y.jar:2.3.0.y] at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:966) [?:1.8.0_262] at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:940) [?:1.8.0_262] at java.util.concurrent.CompletableFuture$Completion.exec(CompletableFuture.java:457) [?:1.8.0_262] at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) [?:1.8.0_262] at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) [?:1.8.0_262] at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) [?:1.8.0_262] at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:163) [?:1.8.0_262] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3749) improve logging on server error in StormServerHandler
[ https://issues.apache.org/jira/browse/STORM-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3749. - Fix Version/s: 2.3.0 Resolution: Fixed > improve logging on server error in StormServerHandler > - > > Key: STORM-3749 > URL: https://issues.apache.org/jira/browse/STORM-3749 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Minor > Fix For: 2.3.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3748) prevent concurrent modification when fetching v2 metrics
[ https://issues.apache.org/jira/browse/STORM-3748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3748. - Fix Version/s: 2.3.0 Resolution: Fixed > prevent concurrent modification when fetching v2 metrics > > > Key: STORM-3748 > URL: https://issues.apache.org/jira/browse/STORM-3748 > Project: Apache Storm > Issue Type: Bug >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Major > Fix For: 2.3.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > a user reported a ConcurrentModificationException when retrieving metric > names in > StormMetricRegistry getMetricNameMap(). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3749) improve logging on server error in StormServerHandler
Aaron Gresch created STORM-3749: --- Summary: improve logging on server error in StormServerHandler Key: STORM-3749 URL: https://issues.apache.org/jira/browse/STORM-3749 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3748) prevent concurrent modification when fetching v2 metrics
Aaron Gresch created STORM-3748: --- Summary: prevent concurrent modification when fetching v2 metrics Key: STORM-3748 URL: https://issues.apache.org/jira/browse/STORM-3748 Project: Apache Storm Issue Type: Bug Reporter: Aaron Gresch Assignee: Aaron Gresch a user reported a ConcurrentModificationException when retrieving metric names in StormMetricRegistry getMetricNameMap(). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3740) Asynchronous background blob download can cause orphaned blob references
[ https://issues.apache.org/jira/browse/STORM-3740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3740. - Fix Version/s: 2.3.0 Resolution: Fixed > Asynchronous background blob download can cause orphaned blob references > > > Key: STORM-3740 > URL: https://issues.apache.org/jira/browse/STORM-3740 > Project: Apache Storm > Issue Type: Bug >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Major > Fix For: 2.3.0 > > Time Spent: 2h > Remaining Estimate: 0h > > > We hit a path in the AsyncLocalizer where we found blob references being > added after a worker slot was removed. Asynchronous blob downloads were not > canceled before removing the blob references. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3740) Asynchronous background blob download can cause orphaned blob references
Aaron Gresch created STORM-3740: --- Summary: Asynchronous background blob download can cause orphaned blob references Key: STORM-3740 URL: https://issues.apache.org/jira/browse/STORM-3740 Project: Apache Storm Issue Type: Bug Reporter: Aaron Gresch Assignee: Aaron Gresch We hit a path in the AsyncLocalizer where we found blob references being added after a worker slot was removed. Asynchronous blob downloads were not canceled before removing the blob references. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3682) Upgrade netty client metrics to use V2 API
[ https://issues.apache.org/jira/browse/STORM-3682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3682. - Fix Version/s: 2.3.0 Resolution: Fixed > Upgrade netty client metrics to use V2 API > -- > > Key: STORM-3682 > URL: https://issues.apache.org/jira/browse/STORM-3682 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Major > Fix For: 2.3.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3736) remove topologyId and worker port from V2 metrics API
Aaron Gresch created STORM-3736: --- Summary: remove topologyId and worker port from V2 metrics API Key: STORM-3736 URL: https://issues.apache.org/jira/browse/STORM-3736 Project: Apache Storm Issue Type: Improvement Affects Versions: 2.3.0 Reporter: Aaron Gresch the topologyId and port are now available to the StormMetricsRegistry and should be removed from the existing metric API -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3724) Use blobstore dir modtime to avoid update lookups by HDFSBlobstore
[ https://issues.apache.org/jira/browse/STORM-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3724. - Fix Version/s: 2.3.0 Resolution: Fixed > Use blobstore dir modtime to avoid update lookups by HDFSBlobstore > -- > > Key: STORM-3724 > URL: https://issues.apache.org/jira/browse/STORM-3724 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Major > Fix For: 2.3.0 > > Time Spent: 4.5h > Remaining Estimate: 0h > > We have multiple storm clusters with 100's of supervisors polling for blob > updates. This causes high load on our Hadoop namenodes that are also used by > multiple other clusters. > > An improvement would be for the AsyncLocalizer to check the remote blobstore > last mod time once and then skip checking each individual blob if it was > already checked for the same mod time. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3733) AsyncLocalizer stuck looking for missing topology
[ https://issues.apache.org/jira/browse/STORM-3733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3733. - Fix Version/s: 2.3.0 Resolution: Fixed > AsyncLocalizer stuck looking for missing topology > - > > Key: STORM-3733 > URL: https://issues.apache.org/jira/browse/STORM-3733 > Project: Apache Storm > Issue Type: Bug >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Major > Fix For: 2.3.0 > > Time Spent: 50m > Remaining Estimate: 0h > > {code:java} > 2020-11-09 20:18:12.325 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - > 2 [ERROR] Could not update blob, will retry again later > 2020-11-09 20:18:43.744 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - > 2 [ERROR] Could not update blob, will retry again later > 2020-11-09 20:19:14.726 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - > 2 [ERROR] Could not update blob, will retry again later > 2020-11-09 20:19:46.148 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - > 2 [ERROR] Could not update blob, will retry again later > 2020-11-09 20:20:16.560 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - > 0 [ERROR] Could not update blob, will retry again later > 2020-11-09 20:20:47.990 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - > 0 [ERROR] Could not update blob, will retry again later > 2020-11-09 20:21:19.403 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - > 0 [ERROR] Could not update blob, will retry again later > 2020-11-09 20:21:50.818 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - > 0 [ERROR] Could not update blob, will retry again later > 2020-11-09 20:22:21.257 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - > 1 [ERROR] Could not update blob, will retry again later > 2020-11-09 20:22:52.668 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - > 1 [ERROR] Could not update blob, will retry again later > 2020-11-09 20:23:24.082 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - > 1 [ERROR] Could not update blob, will retry again later > 2020-11-09 20:23:55.512 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - > 1 [ERROR] Could not update blob, will retry again later > 2020-11-09 20:24:25.919 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - > 2 [ERROR] Could not update blob, will retry again later > 2020-11-09 20:24:57.343 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - > 2 [ERROR] Could not update blob, will retry again later > 2020-11-09 20:25:28.773 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - > 2 [ERROR] Could not update blob, will retry again later > 2020-11-09 20:16:09.659 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - > 1 [ERROR] Could not update blob, will retry again later > java.util.concurrent.ExecutionException: java.lang.RuntimeException: Could > not download... > at > java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) > ~[?:1.8.0_262] > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) > ~[?:1.8.0_262] > at > org.apache.storm.localizer.AsyncLocalizer.updateBlobs(AsyncLocalizer.java:333) > ~[storm-server-2.3.0.y.jar:2.3.0.y] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [?:1.8.0_262] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [?:1.8.0_262] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [?:1.8.0_262] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [?:1.8.0_262] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0_262] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [?:1.8.0_262] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262] > Caused by: java.lang.RuntimeException: Could not download... > at > org.apache.storm.localizer.AsyncLocalizer.lambda$downloadOrUpdate$10(AsyncLocalizer.java:297) > ~[storm-server-2.3.0.y.jar:2.3.0.y] > at > java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640) > ~[?:1.8.0_262] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[?:1.8.0_262] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[?:1.8.0_262] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > ~[?:1.8.0_262] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > ~[?:1.8.0_262]
[jira] [Resolved] (STORM-3714) Add rate information for TaskMetrics
[ https://issues.apache.org/jira/browse/STORM-3714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3714. - Fix Version/s: 2.3.0 Resolution: Fixed > Add rate information for TaskMetrics > > > Key: STORM-3714 > URL: https://issues.apache.org/jira/browse/STORM-3714 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Minor > Fix For: 2.3.0 > > Time Spent: 3h > Remaining Estimate: 0h > > While converting TaskMetrics to use V2 API, we used Counters over Meters due > to performance implications. We have found we would like to add rate > information as well. > > Ideally we would add some kind of metric that supports a count and rate > without the full performance overhead of the Meter. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3733) AsyncLocalizer stuck looking for missing topology
Aaron Gresch created STORM-3733: --- Summary: AsyncLocalizer stuck looking for missing topology Key: STORM-3733 URL: https://issues.apache.org/jira/browse/STORM-3733 Project: Apache Storm Issue Type: Bug Reporter: Aaron Gresch Assignee: Aaron Gresch {code:java} 2020-11-09 20:18:12.325 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - 2 [ERROR] Could not update blob, will retry again later 2020-11-09 20:18:43.744 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - 2 [ERROR] Could not update blob, will retry again later 2020-11-09 20:19:14.726 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - 2 [ERROR] Could not update blob, will retry again later 2020-11-09 20:19:46.148 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - 2 [ERROR] Could not update blob, will retry again later 2020-11-09 20:20:16.560 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - 0 [ERROR] Could not update blob, will retry again later 2020-11-09 20:20:47.990 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - 0 [ERROR] Could not update blob, will retry again later 2020-11-09 20:21:19.403 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - 0 [ERROR] Could not update blob, will retry again later 2020-11-09 20:21:50.818 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - 0 [ERROR] Could not update blob, will retry again later 2020-11-09 20:22:21.257 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - 1 [ERROR] Could not update blob, will retry again later 2020-11-09 20:22:52.668 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - 1 [ERROR] Could not update blob, will retry again later 2020-11-09 20:23:24.082 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - 1 [ERROR] Could not update blob, will retry again later 2020-11-09 20:23:55.512 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - 1 [ERROR] Could not update blob, will retry again later 2020-11-09 20:24:25.919 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - 2 [ERROR] Could not update blob, will retry again later 2020-11-09 20:24:57.343 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - 2 [ERROR] Could not update blob, will retry again later 2020-11-09 20:25:28.773 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - 2 [ERROR] Could not update blob, will retry again later 2020-11-09 20:16:09.659 o.a.s.l.AsyncLocalizer AsyncLocalizer Task Executor - 1 [ERROR] Could not update blob, will retry again later java.util.concurrent.ExecutionException: java.lang.RuntimeException: Could not download... at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) ~[?:1.8.0_262] at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) ~[?:1.8.0_262] at org.apache.storm.localizer.AsyncLocalizer.updateBlobs(AsyncLocalizer.java:333) ~[storm-server-2.3.0.y.jar:2.3.0.y] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_262] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_262] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_262] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_262] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_262] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_262] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262] Caused by: java.lang.RuntimeException: Could not download... at org.apache.storm.localizer.AsyncLocalizer.lambda$downloadOrUpdate$10(AsyncLocalizer.java:297) ~[storm-server-2.3.0.y.jar:2.3.0.y] at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640) ~[?:1.8.0_262] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_262] at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_262] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_262] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[?:1.8.0_262] ... 3 more Caused by: org.apache.storm.utils.WrappedKeyNotFoundException: testTopology-743-1603821880-stormconf.ser at org.apache.storm.hdfs.blobstore.HdfsBlobStore.getStoredBlobMeta(HdfsBlobStore.java:192) ~[storm-hdfs-blobstore-2.3.0.y.jar:2.3.0.y] at org.apache.storm.hdfs.blobstore.HdfsBlobStore.getBlobMeta(HdfsBlobStore.java:221) ~[storm-hdfs-blobstore-2.3.0.y.jar:2.3.0.y] at
[jira] [Created] (STORM-3727) SUPERVISOR_SLOTS_PORTS could be list of Longs
Aaron Gresch created STORM-3727: --- Summary: SUPERVISOR_SLOTS_PORTS could be list of Longs Key: STORM-3727 URL: https://issues.apache.org/jira/browse/STORM-3727 Project: Apache Storm Issue Type: Bug Affects Versions: 2.2.0 Reporter: Aaron Gresch Assignee: Aaron Gresch A user reported: There's no guarantee that the {{supervisorConf.getOrDefault}} will be a List of Integers. Additionally, in ReadClusterState.java, {{.intValue()}} conversion is removed. Overall result {{java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Integer at org.apache.storm.daemon.supervisor.ReadClusterState.(ReadClusterState.java:101) ~[storm-server-2.2.0.jar:2.2.0] at org.apache.storm.daemon.supervisor.Supervisor.launch(Supervisor.java:310) ~[storm-server-2.2.0.jar:2.2.0]}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3724) Use blobstore dir modtime to avoid update lookups by HDFSBlobstore
Aaron Gresch created STORM-3724: --- Summary: Use blobstore dir modtime to avoid update lookups by HDFSBlobstore Key: STORM-3724 URL: https://issues.apache.org/jira/browse/STORM-3724 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch We have multiple storm clusters with 100's of supervisors polling for blob updates. This causes high load on our Hadoop namenodes that are also used by multiple other clusters. An improvement would be for the AsyncLocalizer to check the remote blobstore last mod time once and then skip checking each individual blob if it was already checked for the same mod time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3720) BlobStoreFile getModTime() never updates after first call
[ https://issues.apache.org/jira/browse/STORM-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3720. - Fix Version/s: 2.3.0 Resolution: Fixed > BlobStoreFile getModTime() never updates after first call > - > > Key: STORM-3720 > URL: https://issues.apache.org/jira/browse/STORM-3720 > Project: Apache Storm > Issue Type: Bug >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Minor > Fix For: 2.3.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > If a blobstore file gets updated after a call to getModTime(), it will get an > incorrect result -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3719) Add configuration for AsyncLocalizer updateBlobs frequency
[ https://issues.apache.org/jira/browse/STORM-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3719. - Fix Version/s: 2.3.0 Resolution: Fixed > Add configuration for AsyncLocalizer updateBlobs frequency > -- > > Key: STORM-3719 > URL: https://issues.apache.org/jira/browse/STORM-3719 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Minor > Fix For: 2.3.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3718) Updating the dropwizard dependency
[ https://issues.apache.org/jira/browse/STORM-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3718. - Fix Version/s: 2.3.0 Resolution: Fixed > Updating the dropwizard dependency > -- > > Key: STORM-3718 > URL: https://issues.apache.org/jira/browse/STORM-3718 > Project: Apache Storm > Issue Type: Dependency upgrade >Reporter: Fannyu Chien >Priority: Major > Fix For: 2.3.0 > > Time Spent: 20m > Remaining Estimate: 0h > > I currently work at JPMC and we are using storm, when we used Aqua to scan > the image it detected critical vulnerabilities regarding dropwizard 1.3.5 and > recommended patching it to 1.3.19. I have submitted a pull request already > but I opened up the issue on Jira just incase. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3720) BlobStoreFile getModTime() never updates after first call
Aaron Gresch created STORM-3720: --- Summary: BlobStoreFile getModTime() never updates after first call Key: STORM-3720 URL: https://issues.apache.org/jira/browse/STORM-3720 Project: Apache Storm Issue Type: Bug Reporter: Aaron Gresch Assignee: Aaron Gresch If a blobstore file gets updated after a call to getModTime(), it will get an incorrect result -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3719) Add configuration for AsyncLocalizer updateBlobs frequency
Aaron Gresch created STORM-3719: --- Summary: Add configuration for AsyncLocalizer updateBlobs frequency Key: STORM-3719 URL: https://issues.apache.org/jira/browse/STORM-3719 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3714) Add rate information for TaskMetrics
Aaron Gresch created STORM-3714: --- Summary: Add rate information for TaskMetrics Key: STORM-3714 URL: https://issues.apache.org/jira/browse/STORM-3714 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch While converting TaskMetrics to use V2 API, we used Counters over Meters due to performance implications. We have found we would like to add rate information as well. Ideally we would add some kind of metric that supports a count and rate without the full performance overhead of the Meter. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3707) Add meter to track update blob failures
[ https://issues.apache.org/jira/browse/STORM-3707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3707. - Fix Version/s: 2.3.0 Resolution: Fixed > Add meter to track update blob failures > --- > > Key: STORM-3707 > URL: https://issues.apache.org/jira/browse/STORM-3707 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Minor > Fix For: 2.3.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3707) Add meter to track update blob failures
Aaron Gresch created STORM-3707: --- Summary: Add meter to track update blob failures Key: STORM-3707 URL: https://issues.apache.org/jira/browse/STORM-3707 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3696) ClientSupervisorUtils.processLauncherAndWait ignores InterruptedException
[ https://issues.apache.org/jira/browse/STORM-3696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3696. - Fix Version/s: 2.3.0 Resolution: Fixed > ClientSupervisorUtils.processLauncherAndWait ignores InterruptedException > - > > Key: STORM-3696 > URL: https://issues.apache.org/jira/browse/STORM-3696 > Project: Apache Storm > Issue Type: Bug >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Minor > Fix For: 2.3.0 > > Time Spent: 50m > Remaining Estimate: 0h > > This code ignores the interrupted exception, going on and causing > IllegalThreadStateException. We should not ignore the exception. > > {code:java} > 2020-09-04 17:36:02.766 o.a.s.d.s.ClientSupervisorUtils SLOT_6700 [INFO] > Worker Process 2e82e6e8-6a97-45eb-950f-2ca68ff793f4 interrupted. > 2020-09-04 17:36:02.767 o.a.s.d.s.Slot SLOT_6700 [ERROR] Failed launching > container > java.lang.IllegalThreadStateException: process hasn't exited > at java.lang.UNIXProcess.exitValue(UNIXProcess.java:422) > ~[?:1.8.0_242] > at > org.apache.storm.daemon.supervisor.ClientSupervisorUtils.processLauncherAndWait(ClientSupervisorUtils.java:82) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3697) Add metric for capacity
Aaron Gresch created STORM-3697: --- Summary: Add metric for capacity Key: STORM-3697 URL: https://issues.apache.org/jira/browse/STORM-3697 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch We don't report metrics for capacity except on the UI. It would be nice for users to have reported metric for this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3696) ClientSupervisorUtils.processLauncherAndWait ignores InterruptedException
Aaron Gresch created STORM-3696: --- Summary: ClientSupervisorUtils.processLauncherAndWait ignores InterruptedException Key: STORM-3696 URL: https://issues.apache.org/jira/browse/STORM-3696 Project: Apache Storm Issue Type: Bug Reporter: Aaron Gresch Assignee: Aaron Gresch This code ignores the interrupted exception, going on and causing IllegalThreadStateException. We should not ignore the exception. {code:java} 2020-09-04 17:36:02.766 o.a.s.d.s.ClientSupervisorUtils SLOT_6700 [INFO] Worker Process 2e82e6e8-6a97-45eb-950f-2ca68ff793f4 interrupted. 2020-09-04 17:36:02.767 o.a.s.d.s.Slot SLOT_6700 [ERROR] Failed launching container java.lang.IllegalThreadStateException: process hasn't exited at java.lang.UNIXProcess.exitValue(UNIXProcess.java:422) ~[?:1.8.0_242] at org.apache.storm.daemon.supervisor.ClientSupervisorUtils.processLauncherAndWait(ClientSupervisorUtils.java:82) ~[storm-client-2.3.0.y.jar:2.3.0.y] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3695) Timer rates not added to datapoints for V2 metrics tick
Aaron Gresch created STORM-3695: --- Summary: Timer rates not added to datapoints for V2 metrics tick Key: STORM-3695 URL: https://issues.apache.org/jira/browse/STORM-3695 Project: Apache Storm Issue Type: Bug Reporter: Aaron Gresch Assignee: Aaron Gresch -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3694) all V2 metric reporters to report metric short names with dimensions
Aaron Gresch created STORM-3694: --- Summary: all V2 metric reporters to report metric short names with dimensions Key: STORM-3694 URL: https://issues.apache.org/jira/browse/STORM-3694 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch Given a metric such as: storm.topology.mytopologyname-17-1595349167.hostname.__system.-1.6700-memory.pools.Code-Cache.max It would be nice to instead be able to report it as: memory.pools.Code-Cache.max with dimensions task Id of -1 and component Id of __system -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3682) Upgrade netty client metrics to use V2 API
Aaron Gresch created STORM-3682: --- Summary: Upgrade netty client metrics to use V2 API Key: STORM-3682 URL: https://issues.apache.org/jira/browse/STORM-3682 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3676) Reduce debug spew to scheduler log
[ https://issues.apache.org/jira/browse/STORM-3676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3676. - Fix Version/s: 2.3.0 Resolution: Fixed > Reduce debug spew to scheduler log > -- > > Key: STORM-3676 > URL: https://issues.apache.org/jira/browse/STORM-3676 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Minor > Fix For: 2.3.0 > > Time Spent: 20m > Remaining Estimate: 0h > > This line accounted for 30% of the lines in our scheduler log. Seems > unnecessary. We already log if there's an error parsing. > 2020-07-11 21:21:46.730 o.a.s.s.r.n.NormalizedResourceRequest timer [DEBUG] > Input to parseResources > \{"topology.tasks":1,"topology.tick.tuple.freq.secs":5} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3676) Reduce debug spew to scheduler log
Aaron Gresch created STORM-3676: --- Summary: Reduce debug spew to scheduler log Key: STORM-3676 URL: https://issues.apache.org/jira/browse/STORM-3676 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch This line accounted for 30% of the lines in our scheduler log. Seems unnecessary. We already log if there's an error parsing. 2020-07-11 21:21:46.730 o.a.s.s.r.n.NormalizedResourceRequest timer [DEBUG] Input to parseResources \{"topology.tasks":1,"topology.tick.tuple.freq.secs":5} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3673) update BuiltinMetrics to use v2 Metrics API
Aaron Gresch created STORM-3673: --- Summary: update BuiltinMetrics to use v2 Metrics API Key: STORM-3673 URL: https://issues.apache.org/jira/browse/STORM-3673 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3656) Change handling of Hadoop TGT renewal exception
[ https://issues.apache.org/jira/browse/STORM-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3656. - Fix Version/s: 2.3.0 Resolution: Fixed > Change handling of Hadoop TGT renewal exception > --- > > Key: STORM-3656 > URL: https://issues.apache.org/jira/browse/STORM-3656 > Project: Apache Storm > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Major > Fix For: 2.3.0 > > Time Spent: 1h > Remaining Estimate: 0h > > STORM-3606 identified an issue where Hadoop's TGT auto renewal thread causes > an exception and worker restart. The fix involved a lot of reflection calls > to emulate the Hadoop code while avoiding launching this thread. > > It's possible Hadoop could change their code, causing this reflection to > fail. To handle this case, we could instead allow Hadoop to launch this > autorenewal thread, and have the worker catch the specific NPE from Hadoop in > the exception handler. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3653) Document treatment of common nodes in favored/unfavored nodes in scheduling
[ https://issues.apache.org/jira/browse/STORM-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3653. - Fix Version/s: 2.3.0 Resolution: Fixed > Document treatment of common nodes in favored/unfavored nodes in scheduling > --- > > Key: STORM-3653 > URL: https://issues.apache.org/jira/browse/STORM-3653 > Project: Apache Storm > Issue Type: Improvement > Components: storm-server >Affects Versions: 2.1.0 >Reporter: Bipin Prasad >Assignee: Bipin Prasad >Priority: Minor > Fix For: 2.3.0 > > Time Spent: 20m > Remaining Estimate: 0h > > If the same node is specified in favored as well as unfavored nodes, the > current undocumented behavior is to removed the node from unfavored list. > Update javadoc to reflect this behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (STORM-3656) Change handling of Hadoop TGT renewal exception
[ https://issues.apache.org/jira/browse/STORM-3656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch updated STORM-3656: Affects Version/s: 2.2.0 > Change handling of Hadoop TGT renewal exception > --- > > Key: STORM-3656 > URL: https://issues.apache.org/jira/browse/STORM-3656 > Project: Apache Storm > Issue Type: Improvement >Affects Versions: 2.2.0 >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Major > > STORM-3606 identified an issue where Hadoop's TGT auto renewal thread causes > an exception and worker restart. The fix involved a lot of reflection calls > to emulate the Hadoop code while avoiding launching this thread. > > It's possible Hadoop could change their code, causing this reflection to > fail. To handle this case, we could instead allow Hadoop to launch this > autorenewal thread, and have the worker catch the specific NPE from Hadoop in > the exception handler. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3656) Change handling of Hadoop TGT renewal exception
Aaron Gresch created STORM-3656: --- Summary: Change handling of Hadoop TGT renewal exception Key: STORM-3656 URL: https://issues.apache.org/jira/browse/STORM-3656 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch STORM-3606 identified an issue where Hadoop's TGT auto renewal thread causes an exception and worker restart. The fix involved a lot of reflection calls to emulate the Hadoop code while avoiding launching this thread. It's possible Hadoop could change their code, causing this reflection to fail. To handle this case, we could instead allow Hadoop to launch this autorenewal thread, and have the worker catch the specific NPE from Hadoop in the exception handler. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3648) Add meter to track worker heartbeat rate
Aaron Gresch created STORM-3648: --- Summary: Add meter to track worker heartbeat rate Key: STORM-3648 URL: https://issues.apache.org/jira/browse/STORM-3648 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch Users could track and alert if heartbeat rate starts dropping due to GC, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3645) worker launcher should consistently use ERRORFILE for error messages
Aaron Gresch created STORM-3645: --- Summary: worker launcher should consistently use ERRORFILE for error messages Key: STORM-3645 URL: https://issues.apache.org/jira/browse/STORM-3645 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3641) switch JCQueue metrics to new metrics API
[ https://issues.apache.org/jira/browse/STORM-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3641. - Fix Version/s: 2.3.0 Resolution: Fixed > switch JCQueue metrics to new metrics API > - > > Key: STORM-3641 > URL: https://issues.apache.org/jira/browse/STORM-3641 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Major > Fix For: 2.3.0 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3642) update AutoTGT metric to new API
[ https://issues.apache.org/jira/browse/STORM-3642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3642. - Fix Version/s: 2.3.0 Resolution: Fixed > update AutoTGT metric to new API > > > Key: STORM-3642 > URL: https://issues.apache.org/jira/browse/STORM-3642 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Minor > Fix For: 2.3.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3644) improve PacemakerClient error messaging
Aaron Gresch created STORM-3644: --- Summary: improve PacemakerClient error messaging Key: STORM-3644 URL: https://issues.apache.org/jira/browse/STORM-3644 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch LOG.error("error attempting to write to a channel {}.", e.getMessage()); This line could add the hostname and properly log the message. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3643) Bring queue metrics documentation up to date
Aaron Gresch created STORM-3643: --- Summary: Bring queue metrics documentation up to date Key: STORM-3643 URL: https://issues.apache.org/jira/browse/STORM-3643 Project: Apache Storm Issue Type: Improvement Affects Versions: 2.2.0 Reporter: Aaron Gresch [https://github.com/apache/storm/blob/master/docs/Metrics.md#queue-metrics] should be updated -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3642) update AutoTGT metric to new API
Aaron Gresch created STORM-3642: --- Summary: update AutoTGT metric to new API Key: STORM-3642 URL: https://issues.apache.org/jira/browse/STORM-3642 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3640) timed out health check processes should be killed
[ https://issues.apache.org/jira/browse/STORM-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3640. - Resolution: Fixed > timed out health check processes should be killed > - > > Key: STORM-3640 > URL: https://issues.apache.org/jira/browse/STORM-3640 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > We noticed some hung health check scripts that were timed up eating CPU. We > should make sure they are killed on timeout. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3641) switch JCQueue metrics to new metrics API
Aaron Gresch created STORM-3641: --- Summary: switch JCQueue metrics to new metrics API Key: STORM-3641 URL: https://issues.apache.org/jira/browse/STORM-3641 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3640) timed out health check processes should be killed
Aaron Gresch created STORM-3640: --- Summary: timed out health check processes should be killed Key: STORM-3640 URL: https://issues.apache.org/jira/browse/STORM-3640 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch We noticed some hung health check scripts that were timed up eating CPU. We should make sure they are killed on timeout. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (STORM-3634) validate numa ports are contained in supervisor.slots.ports
[ https://issues.apache.org/jira/browse/STORM-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17107531#comment-17107531 ] Aaron Gresch commented on STORM-3634: - Decision was made to instead use the superset of numa ports and slot ports. > validate numa ports are contained in supervisor.slots.ports > --- > > Key: STORM-3634 > URL: https://issues.apache.org/jira/browse/STORM-3634 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Major > Fix For: 2.2.0 > > Time Spent: 10m > Remaining Estimate: 0h > > It's currently possible to have a numa port configured that is not in > supervisor.slots.ports. When a supervisor restarts, any worker running on > numa ports not in supervisor.slots.ports will be killed. We should consider > this an invalid configuration. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3634) validate numa ports are contained in supervisor.slots.ports
[ https://issues.apache.org/jira/browse/STORM-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3634. - Fix Version/s: 2.2.0 Resolution: Fixed > validate numa ports are contained in supervisor.slots.ports > --- > > Key: STORM-3634 > URL: https://issues.apache.org/jira/browse/STORM-3634 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Major > Fix For: 2.2.0 > > Time Spent: 10m > Remaining Estimate: 0h > > It's currently possible to have a numa port configured that is not in > supervisor.slots.ports. When a supervisor restarts, any worker running on > numa ports not in supervisor.slots.ports will be killed. We should consider > this an invalid configuration. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3633) Add message that supervisor is killing detached workers
[ https://issues.apache.org/jira/browse/STORM-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3633. - Fix Version/s: 2.2.0 Resolution: Fixed > Add message that supervisor is killing detached workers > --- > > Key: STORM-3633 > URL: https://issues.apache.org/jira/browse/STORM-3633 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Minor > Fix For: 2.2.0 > > > [https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/supervisor/ReadClusterState.java#L116] > From this code we will see messages that workers are killed, but not the > reason why. We should add a message. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3635) update LocalityAwareness documentation due to STORM-3602
Aaron Gresch created STORM-3635: --- Summary: update LocalityAwareness documentation due to STORM-3602 Key: STORM-3635 URL: https://issues.apache.org/jira/browse/STORM-3635 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch The switching to the lower bound changed, doc should change as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3634) validate numa ports are contained in supervisor.slots.ports
Aaron Gresch created STORM-3634: --- Summary: validate numa ports are contained in supervisor.slots.ports Key: STORM-3634 URL: https://issues.apache.org/jira/browse/STORM-3634 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch It's currently possible to have a numa port configured that is not in supervisor.slots.ports. When a supervisor restarts, any worker running on numa ports not in supervisor.slots.ports will be killed. We should consider this an invalid configuration. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3633) Add message that supervisor is killing detached workers
Aaron Gresch created STORM-3633: --- Summary: Add message that supervisor is killing detached workers Key: STORM-3633 URL: https://issues.apache.org/jira/browse/STORM-3633 Project: Apache Storm Issue Type: Improvement Reporter: Aaron Gresch Assignee: Aaron Gresch [https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/supervisor/ReadClusterState.java#L116] >From this code we will see messages that workers are killed, but not the >reason why. We should add a message. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3632) Reduce SimpleSaslServerCallbackHandler supervisor logging
[ https://issues.apache.org/jira/browse/STORM-3632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Gresch resolved STORM-3632. - Fix Version/s: 2.2.0 Resolution: Fixed > Reduce SimpleSaslServerCallbackHandler supervisor logging > - > > Key: STORM-3632 > URL: https://issues.apache.org/jira/browse/STORM-3632 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Minor > Fix For: 2.2.0 > > Time Spent: 20m > Remaining Estimate: 0h > > This message floods our logs and seems to provide little use: > LOG.info("Successfully authenticated client: authenticationID = {} > authorizationID = {}", > nid, zid); -- This message was sent by Atlassian Jira (v8.3.4#803005)