[jira] [Resolved] (STORM-3811) Upgrade log4j version to 2.17.1
[ https://issues.apache.org/jira/browse/STORM-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3811. - Resolution: Fixed > Upgrade log4j version to 2.17.1 > --- > > Key: STORM-3811 > URL: https://issues.apache.org/jira/browse/STORM-3811 > Project: Apache Storm > Issue Type: Dependency upgrade >Reporter: Aaron Gresch >Priority: Critical > Fix For: 2.4.0, 2.3.1, 1.2.5, 2.2.2, 2.1.2 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (STORM-3811) Upgrade log4j version to 2.17.1
[ https://issues.apache.org/jira/browse/STORM-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3811: Fix Version/s: 2.2.2 2.1.2 > Upgrade log4j version to 2.17.1 > --- > > Key: STORM-3811 > URL: https://issues.apache.org/jira/browse/STORM-3811 > Project: Apache Storm > Issue Type: Dependency upgrade >Reporter: Aaron Gresch >Priority: Critical > Fix For: 2.4.0, 2.3.1, 1.2.5, 2.2.2, 2.1.2 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (STORM-3811) Upgrade log4j version to 2.17.1
[ https://issues.apache.org/jira/browse/STORM-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3811: Priority: Critical (was: Major) > Upgrade log4j version to 2.17.1 > --- > > Key: STORM-3811 > URL: https://issues.apache.org/jira/browse/STORM-3811 > Project: Apache Storm > Issue Type: Dependency upgrade >Reporter: Aaron Gresch >Priority: Critical > Fix For: 2.4.0, 2.3.1, 1.2.5 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (STORM-3811) Upgrade log4j version to 2.17.1
[ https://issues.apache.org/jira/browse/STORM-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3811: Issue Type: Dependency upgrade (was: Improvement) > Upgrade log4j version to 2.17.1 > --- > > Key: STORM-3811 > URL: https://issues.apache.org/jira/browse/STORM-3811 > Project: Apache Storm > Issue Type: Dependency upgrade >Reporter: Aaron Gresch >Priority: Major > Fix For: 2.4.0, 2.3.1, 1.2.5 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (STORM-3811) Upgrade log4j version to 2.17.1
[ https://issues.apache.org/jira/browse/STORM-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3811: Fix Version/s: 1.2.5 > Upgrade log4j version to 2.17.1 > --- > > Key: STORM-3811 > URL: https://issues.apache.org/jira/browse/STORM-3811 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Priority: Major > Fix For: 2.4.0, 2.3.1, 1.2.5 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (STORM-3810) CVE-2021-44228 Log4J vulnerability
[ https://issues.apache.org/jira/browse/STORM-3810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li closed STORM-3810. --- Resolution: Duplicate > CVE-2021-44228 Log4J vulnerability > -- > > Key: STORM-3810 > URL: https://issues.apache.org/jira/browse/STORM-3810 > Project: Apache Storm > Issue Type: Bug > Components: storm-core >Affects Versions: 1.2.2, 1.2.3, 1.2.4 >Reporter: Dario Bonino >Priority: Critical > Time Spent: 1h 10m > Remaining Estimate: 0h > > Recent critical CVE about Log4J > ([https://www.cvedetails.com/cve/CVE-2021-44228/)] affects Storm. > Please upgrade to latest Log4j2 >= 2.16.0 (see > [https://search.maven.org/artifact/org.apache.logging.log4j/log4j/2.16.0/pom)] > in 1.2.X Storm branch and also in 2.X.X Storm branches. > Thank you! -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (STORM-3808) Bump log4j version to 2.16.0 (original ticket was 2.15.0)
[ https://issues.apache.org/jira/browse/STORM-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li closed STORM-3808. --- > Bump log4j version to 2.16.0 (original ticket was 2.15.0) > - > > Key: STORM-3808 > URL: https://issues.apache.org/jira/browse/STORM-3808 > Project: Apache Storm > Issue Type: Improvement >Reporter: Luke Sun >Priority: Major > > For CVE-2021-44228 to bump log4j 2.15.0 > {code:java} > News > CVE-2021-44228 > The Log4j team has been made aware of a security vulnerability, > CVE-2021-44228, that has been addressed in Log4j 2.15.0. > Log4j’s JNDI support has not restricted what names could be resolved. Some > protocols are unsafe or can allow remote code execution. Log4j now limits the > protocols by default to only java, ldap, and ldaps and limits the ldap > protocols to only accessing Java primitive objects by default served on the > local host. > One vector that allowed exposure to this vulnerability was Log4j’s allowance > of Lookups to appear in log messages. As of Log4j 2.15.0 this feature is now > disabled by default. While an option has been provided to enable Lookups in > this fashion, users are strongly discouraged from enabling it. > For those who cannot upgrade to 2.15.0, in releases >=2.10, this behavior can > be mitigated by setting either the system property log4j2.formatMsgNoLookups > or the environment variable LOG4J_FORMAT_MSG_NO_LOOKUPS to true. For releases > >=2.7 and <=2.14.1, all PatternLayout patterns can be modified to specify the > message converter as %m{nolookups} instead of just %m. For releases > >=2.0-beta9 and <=2.10.0, the mitigation is to remove the JndiLookup class > from the classpath: zip -q -d log4j-core-*.jar > org/apache/logging/log4j/core/lookup/JndiLookup.class. > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Closed] (STORM-3809) CVE-2021-44228 Log4Shell: upgrade log4j2
[ https://issues.apache.org/jira/browse/STORM-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li closed STORM-3809. --- > CVE-2021-44228 Log4Shell: upgrade log4j2 > > > Key: STORM-3809 > URL: https://issues.apache.org/jira/browse/STORM-3809 > Project: Apache Storm > Issue Type: Bug > Components: storm-core >Affects Versions: 2.3.0, 2.2.1 >Reporter: Antoine Tran >Priority: Critical > > Recent critical CVE about log4shell > ([https://www.cvedetails.com/cve/CVE-2021-44228/)] affects Storm. (Eg: in > Storm 2.2.0, it uses log4j-api-2.11.2.jar) Any log4j2 between 2.0 and 2.14 is > affected. > > I did not found any issue or news related to Apache Storm and a fix. So I > create this ticket to track it down. > Please upgrade to latest Log4j2 >= 2.16.0 (see > [https://search.maven.org/artifact/org.apache.logging.log4j/log4j/2.16.0/pom)] > in both 2.2.X and 2.3.X Storm branches. Thank you! > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (STORM-3808) Bump log4j version to 2.16.0 (original ticket was 2.15.0)
[ https://issues.apache.org/jira/browse/STORM-3808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3808. - Resolution: Duplicate > Bump log4j version to 2.16.0 (original ticket was 2.15.0) > - > > Key: STORM-3808 > URL: https://issues.apache.org/jira/browse/STORM-3808 > Project: Apache Storm > Issue Type: Improvement >Reporter: Luke Sun >Priority: Major > > For CVE-2021-44228 to bump log4j 2.15.0 > {code:java} > News > CVE-2021-44228 > The Log4j team has been made aware of a security vulnerability, > CVE-2021-44228, that has been addressed in Log4j 2.15.0. > Log4j’s JNDI support has not restricted what names could be resolved. Some > protocols are unsafe or can allow remote code execution. Log4j now limits the > protocols by default to only java, ldap, and ldaps and limits the ldap > protocols to only accessing Java primitive objects by default served on the > local host. > One vector that allowed exposure to this vulnerability was Log4j’s allowance > of Lookups to appear in log messages. As of Log4j 2.15.0 this feature is now > disabled by default. While an option has been provided to enable Lookups in > this fashion, users are strongly discouraged from enabling it. > For those who cannot upgrade to 2.15.0, in releases >=2.10, this behavior can > be mitigated by setting either the system property log4j2.formatMsgNoLookups > or the environment variable LOG4J_FORMAT_MSG_NO_LOOKUPS to true. For releases > >=2.7 and <=2.14.1, all PatternLayout patterns can be modified to specify the > message converter as %m{nolookups} instead of just %m. For releases > >=2.0-beta9 and <=2.10.0, the mitigation is to remove the JndiLookup class > from the classpath: zip -q -d log4j-core-*.jar > org/apache/logging/log4j/core/lookup/JndiLookup.class. > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (STORM-3809) CVE-2021-44228 Log4Shell: upgrade log4j2
[ https://issues.apache.org/jira/browse/STORM-3809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3809. - Resolution: Duplicate > CVE-2021-44228 Log4Shell: upgrade log4j2 > > > Key: STORM-3809 > URL: https://issues.apache.org/jira/browse/STORM-3809 > Project: Apache Storm > Issue Type: Bug > Components: storm-core >Affects Versions: 2.3.0, 2.2.1 >Reporter: Antoine Tran >Priority: Critical > > Recent critical CVE about log4shell > ([https://www.cvedetails.com/cve/CVE-2021-44228/)] affects Storm. (Eg: in > Storm 2.2.0, it uses log4j-api-2.11.2.jar) Any log4j2 between 2.0 and 2.14 is > affected. > > I did not found any issue or news related to Apache Storm and a fix. So I > create this ticket to track it down. > Please upgrade to latest Log4j2 >= 2.16.0 (see > [https://search.maven.org/artifact/org.apache.logging.log4j/log4j/2.16.0/pom)] > in both 2.2.X and 2.3.X Storm branches. Thank you! > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (STORM-3811) Upgrade log4j version to 2.17.1
[ https://issues.apache.org/jira/browse/STORM-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3811: Fix Version/s: 2.3.1 2.4.0 > Upgrade log4j version to 2.17.1 > --- > > Key: STORM-3811 > URL: https://issues.apache.org/jira/browse/STORM-3811 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Priority: Major > Fix For: 2.4.0, 2.3.1 > > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (STORM-3811) Upgrade log4j version to 2.17.1
[ https://issues.apache.org/jira/browse/STORM-3811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3811: Summary: Upgrade log4j version to 2.17.1 (was: update log4j) > Upgrade log4j version to 2.17.1 > --- > > Key: STORM-3811 > URL: https://issues.apache.org/jira/browse/STORM-3811 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Updated] (STORM-3803) Format large integers in Storm UI with commas for readability
[ https://issues.apache.org/jira/browse/STORM-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3803: Fix Version/s: 2.4.0 > Format large integers in Storm UI with commas for readability > - > > Key: STORM-3803 > URL: https://issues.apache.org/jira/browse/STORM-3803 > Project: Apache Storm > Issue Type: Improvement > Components: storm-webapp >Reporter: Bipin Prasad >Assignee: Bipin Prasad >Priority: Major > Fix For: 2.4.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Resolved] (STORM-3799) Logging user information for blob delete req
[ https://issues.apache.org/jira/browse/STORM-3799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3799. - Fix Version/s: 2.4.0 Resolution: Fixed Thanks [~snikhil5]. Merged to the master branch (d9b12720d149c78b15a7eb0e114c6b5b5824421d) > Logging user information for blob delete req > > > Key: STORM-3799 > URL: https://issues.apache.org/jira/browse/STORM-3799 > Project: Apache Storm > Issue Type: Improvement > Components: blobstore, storm-server >Reporter: Nikhil Singh >Assignee: Nikhil Singh >Priority: Minor > Fix For: 2.4.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Existing message does not contain information about the user making the > request for deleting the blob. This change will add information about the > user that requested the blob deletion. > > Current log message has no context information: > 2021-01-22 23:14:52.147 o.a.s.d.n.Nimbus pool-34-thread-215 [INFO] Deleted > blob for key maw_conf.tgz -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (STORM-3693) TimeOut ticks should be addressed to Executor instead of being addressed to a task or broadcasted.
[ https://issues.apache.org/jira/browse/STORM-3693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3693: Fix Version/s: (was: 2.2.1) > TimeOut ticks should be addressed to Executor instead of being addressed to a > task or broadcasted. > --- > > Key: STORM-3693 > URL: https://issues.apache.org/jira/browse/STORM-3693 > Project: Apache Storm > Issue Type: Bug > Components: storm-client >Affects Versions: 2.1.0 >Reporter: Chandan Kumar Singh >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > For the purpose of message timeouts, a spout executor uses a rotating map, > irrespective of the number of spout tasks it is dealing with. When a time out > tick tuple is received, it is broadcasted to all the tasks which means we > rotate the map as many times as the number of assigned tasks and expire > tuples prematurely. We need the tuple to be neither a broadcast not addressed > to any task. The executor should act on it only once. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (STORM-3767) NPE on getComponentPendingProfileActions
[ https://issues.apache.org/jira/browse/STORM-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3767: Fix Version/s: 2.1.1 > NPE on getComponentPendingProfileActions > - > > Key: STORM-3767 > URL: https://issues.apache.org/jira/browse/STORM-3767 > Project: Apache Storm > Issue Type: Bug >Affects Versions: 2.0.0, 2.1.0, 2.2.0 >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Fix For: 2.1.1, 2.3.0, 2.2.1 > > Attachments: Screen Shot 2021-04-27 at 11.09.33 AM.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > When a topology is newly submitted, if the scheduling loop takes too long, > the component UI might have error 500. > This is due to the NPE in nimbus code. An example: > 1. When a scheduling loop finishes, nimbus will eventually update the > assignmentsBackend. if a topology is newly submitted, its entry will be added > to the idToAssignment map, otherwise, the entry will be updated with new > assignments. The key point is the new topology Id doesn't exist in > idToAssignment before it reaching here. > https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L2548-L2549 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java#L696 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/assignments/InMemoryAssignmentBackend.java#L63-L64 > 2. However, this assignmentsBackend update only started to happen at > 2021-04-23 15:30:14.299 > {code:java} > 2021-04-23 15:30:14.299 o.a.s.d.n.Nimbus timer [INFO] Setting new assignment > for topology > {code} > while this topology topo1-52-1619191499 has been scheduled at 2021-04-23 > 15:25:13.887. The scheduling loop took longer than 5mins. > {code:java} > 2021-04-23 15:25:13.887 o.a.s.s.Cluster timer [INFO] STATUS - > topo1-52-1619191499 Running - Fully Scheduled by DefaultResourceAwareStrategy > (1297 states traversed in 1275 ms, backtracked 0 times) > other topologies were taking long time > 2021-04-23 15:25:14.378 o.a.s.s.Cluster timer [INFO] STATUS - > topo2-76-1612842912 Running - Fully Scheduled by DefaultResourceAwareStrategy > (111 states traversed in 34 ms, backtracked 0 times) > ... > 2021-04-23 15:30:14.192 o.a.s.s.Cluster timer [INFO] STATUS - > TrendingNowLES-11-1611713968 Not enough resources to schedule after evicting > lower priority topologies. Additional Memory Required: 20128.0 MB (Available: > 5411178.0 MB). Additional CPU Required: 1010.0% CPU (Available: 3100.0 % > CPU).Cannot schedule by DefaultResourceAwareStrategy (65644 states traversed > in 299804 ms, backtracked 6 times, 89 of 150 executors scheduled) > ... > 2021-04-23 15:30:14.216 o.a.s.s.Cluster timer [INFO] STATUS - > evaluateplus-dev-47-1605825401 Running - Fully Scheduled by > GenericResourceAwareStrategy (41 states traversed in 10 ms, backtracked 0 > times) > {code} > 3. During this period, the idToAssignment map in assignmentsBackend wouldn't > have the entry for topo1-52-1619191499, so when a component UI was visited, > https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L3613-L3614 > https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L3100 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java#L194 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/assignments/InMemoryAssignmentBackend.java#L69 > it got a null value as the assignment, and hence NPE. > This can be produced easily by adding some sleep anywhere between > {code:title=Nimbus.java} > Map newSchedulerAssignments = > computeNewSchedulerAssignments(existingAssignments, > topologies, bases, scratchTopoId); > {code} > and > {code:title=Nimbus.java} > state.setAssignment(topoId, assignment, td.getConf()); > {code} > and submit a new topology and visit its component UI -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (STORM-3767) NPE on getComponentPendingProfileActions
[ https://issues.apache.org/jira/browse/STORM-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3767: Fix Version/s: 2.2.1 > NPE on getComponentPendingProfileActions > - > > Key: STORM-3767 > URL: https://issues.apache.org/jira/browse/STORM-3767 > Project: Apache Storm > Issue Type: Bug >Affects Versions: 2.0.0, 2.1.0, 2.2.0 >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Fix For: 2.3.0, 2.2.1 > > Attachments: Screen Shot 2021-04-27 at 11.09.33 AM.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > When a topology is newly submitted, if the scheduling loop takes too long, > the component UI might have error 500. > This is due to the NPE in nimbus code. An example: > 1. When a scheduling loop finishes, nimbus will eventually update the > assignmentsBackend. if a topology is newly submitted, its entry will be added > to the idToAssignment map, otherwise, the entry will be updated with new > assignments. The key point is the new topology Id doesn't exist in > idToAssignment before it reaching here. > https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L2548-L2549 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java#L696 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/assignments/InMemoryAssignmentBackend.java#L63-L64 > 2. However, this assignmentsBackend update only started to happen at > 2021-04-23 15:30:14.299 > {code:java} > 2021-04-23 15:30:14.299 o.a.s.d.n.Nimbus timer [INFO] Setting new assignment > for topology > {code} > while this topology topo1-52-1619191499 has been scheduled at 2021-04-23 > 15:25:13.887. The scheduling loop took longer than 5mins. > {code:java} > 2021-04-23 15:25:13.887 o.a.s.s.Cluster timer [INFO] STATUS - > topo1-52-1619191499 Running - Fully Scheduled by DefaultResourceAwareStrategy > (1297 states traversed in 1275 ms, backtracked 0 times) > other topologies were taking long time > 2021-04-23 15:25:14.378 o.a.s.s.Cluster timer [INFO] STATUS - > topo2-76-1612842912 Running - Fully Scheduled by DefaultResourceAwareStrategy > (111 states traversed in 34 ms, backtracked 0 times) > ... > 2021-04-23 15:30:14.192 o.a.s.s.Cluster timer [INFO] STATUS - > TrendingNowLES-11-1611713968 Not enough resources to schedule after evicting > lower priority topologies. Additional Memory Required: 20128.0 MB (Available: > 5411178.0 MB). Additional CPU Required: 1010.0% CPU (Available: 3100.0 % > CPU).Cannot schedule by DefaultResourceAwareStrategy (65644 states traversed > in 299804 ms, backtracked 6 times, 89 of 150 executors scheduled) > ... > 2021-04-23 15:30:14.216 o.a.s.s.Cluster timer [INFO] STATUS - > evaluateplus-dev-47-1605825401 Running - Fully Scheduled by > GenericResourceAwareStrategy (41 states traversed in 10 ms, backtracked 0 > times) > {code} > 3. During this period, the idToAssignment map in assignmentsBackend wouldn't > have the entry for topo1-52-1619191499, so when a component UI was visited, > https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L3613-L3614 > https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L3100 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java#L194 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/assignments/InMemoryAssignmentBackend.java#L69 > it got a null value as the assignment, and hence NPE. > This can be produced easily by adding some sleep anywhere between > {code:title=Nimbus.java} > Map newSchedulerAssignments = > computeNewSchedulerAssignments(existingAssignments, > topologies, bases, scratchTopoId); > {code} > and > {code:title=Nimbus.java} > state.setAssignment(topoId, assignment, td.getConf()); > {code} > and submit a new topology and visit its component UI -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (STORM-3737) Share Worker Metric Registry For Guice AOP Based Metrics Integeration
[ https://issues.apache.org/jira/browse/STORM-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li reassigned STORM-3737: --- Assignee: Lakshman Sai > Share Worker Metric Registry For Guice AOP Based Metrics Integeration > - > > Key: STORM-3737 > URL: https://issues.apache.org/jira/browse/STORM-3737 > Project: Apache Storm > Issue Type: Improvement > Components: storm-client >Affects Versions: 2.1.0 >Reporter: Lakshman Sai >Assignee: Lakshman Sai >Priority: Minor > Fix For: 2.3.0 > > Original Estimate: 1h > Time Spent: 0.5h > Remaining Estimate: 0.5h > > Metric Registry has been made private which makes it harder to integrate with > Guice based AOP metrics. > Proposed solve is to add metric registry created in the worker to > SharedMetricRegistries so while intializing guice based AOP metrics it can be > done in worker hook > [https://github.com/palominolabs/metrics-guice] > > PR: > https://github.com/apache/storm/pull/3373 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3793) Add metric to track backpressure status for a task
[ https://issues.apache.org/jira/browse/STORM-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3793. - Resolution: Fixed master: 185107f478c75ec9a78b90d036c9bcd639152494 > Add metric to track backpressure status for a task > -- > > Key: STORM-3793 > URL: https://issues.apache.org/jira/browse/STORM-3793 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Major > Fix For: 2.3.0 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (STORM-3793) Add metric to track backpressure status for a task
[ https://issues.apache.org/jira/browse/STORM-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3793: Fix Version/s: 2.3.0 > Add metric to track backpressure status for a task > -- > > Key: STORM-3793 > URL: https://issues.apache.org/jira/browse/STORM-3793 > Project: Apache Storm > Issue Type: Improvement >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Major > Fix For: 2.3.0 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3767) NPE on getComponentPendingProfileActions
[ https://issues.apache.org/jira/browse/STORM-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3767. - Resolution: Fixed > NPE on getComponentPendingProfileActions > - > > Key: STORM-3767 > URL: https://issues.apache.org/jira/browse/STORM-3767 > Project: Apache Storm > Issue Type: Bug >Affects Versions: 2.0.0, 2.1.0, 2.2.0 >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Fix For: 2.3.0 > > Attachments: Screen Shot 2021-04-27 at 11.09.33 AM.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > When a topology is newly submitted, if the scheduling loop takes too long, > the component UI might have error 500. > This is due to the NPE in nimbus code. An example: > 1. When a scheduling loop finishes, nimbus will eventually update the > assignmentsBackend. if a topology is newly submitted, its entry will be added > to the idToAssignment map, otherwise, the entry will be updated with new > assignments. The key point is the new topology Id doesn't exist in > idToAssignment before it reaching here. > https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L2548-L2549 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java#L696 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/assignments/InMemoryAssignmentBackend.java#L63-L64 > 2. However, this assignmentsBackend update only started to happen at > 2021-04-23 15:30:14.299 > {code:java} > 2021-04-23 15:30:14.299 o.a.s.d.n.Nimbus timer [INFO] Setting new assignment > for topology > {code} > while this topology topo1-52-1619191499 has been scheduled at 2021-04-23 > 15:25:13.887. The scheduling loop took longer than 5mins. > {code:java} > 2021-04-23 15:25:13.887 o.a.s.s.Cluster timer [INFO] STATUS - > topo1-52-1619191499 Running - Fully Scheduled by DefaultResourceAwareStrategy > (1297 states traversed in 1275 ms, backtracked 0 times) > other topologies were taking long time > 2021-04-23 15:25:14.378 o.a.s.s.Cluster timer [INFO] STATUS - > topo2-76-1612842912 Running - Fully Scheduled by DefaultResourceAwareStrategy > (111 states traversed in 34 ms, backtracked 0 times) > ... > 2021-04-23 15:30:14.192 o.a.s.s.Cluster timer [INFO] STATUS - > TrendingNowLES-11-1611713968 Not enough resources to schedule after evicting > lower priority topologies. Additional Memory Required: 20128.0 MB (Available: > 5411178.0 MB). Additional CPU Required: 1010.0% CPU (Available: 3100.0 % > CPU).Cannot schedule by DefaultResourceAwareStrategy (65644 states traversed > in 299804 ms, backtracked 6 times, 89 of 150 executors scheduled) > ... > 2021-04-23 15:30:14.216 o.a.s.s.Cluster timer [INFO] STATUS - > evaluateplus-dev-47-1605825401 Running - Fully Scheduled by > GenericResourceAwareStrategy (41 states traversed in 10 ms, backtracked 0 > times) > {code} > 3. During this period, the idToAssignment map in assignmentsBackend wouldn't > have the entry for topo1-52-1619191499, so when a component UI was visited, > https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L3613-L3614 > https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L3100 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java#L194 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/assignments/InMemoryAssignmentBackend.java#L69 > it got a null value as the assignment, and hence NPE. > This can be produced easily by adding some sleep anywhere between > {code:title=Nimbus.java} > Map newSchedulerAssignments = > computeNewSchedulerAssignments(existingAssignments, > topologies, bases, scratchTopoId); > {code} > and > {code:title=Nimbus.java} > state.setAssignment(topoId, assignment, td.getConf()); > {code} > and submit a new topology and visit its component UI -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (STORM-3767) NPE on getComponentPendingProfileActions
[ https://issues.apache.org/jira/browse/STORM-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3767: Affects Version/s: 2.2.0 2.0.0 2.1.0 > NPE on getComponentPendingProfileActions > - > > Key: STORM-3767 > URL: https://issues.apache.org/jira/browse/STORM-3767 > Project: Apache Storm > Issue Type: Bug >Affects Versions: 2.0.0, 2.1.0, 2.2.0 >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Attachments: Screen Shot 2021-04-27 at 11.09.33 AM.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > When a topology is newly submitted, if the scheduling loop takes too long, > the component UI might have error 500. > This is due to the NPE in nimbus code. An example: > 1. When a scheduling loop finishes, nimbus will eventually update the > assignmentsBackend. if a topology is newly submitted, its entry will be added > to the idToAssignment map, otherwise, the entry will be updated with new > assignments. The key point is the new topology Id doesn't exist in > idToAssignment before it reaching here. > https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L2548-L2549 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java#L696 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/assignments/InMemoryAssignmentBackend.java#L63-L64 > 2. However, this assignmentsBackend update only started to happen at > 2021-04-23 15:30:14.299 > {code:java} > 2021-04-23 15:30:14.299 o.a.s.d.n.Nimbus timer [INFO] Setting new assignment > for topology > {code} > while this topology topo1-52-1619191499 has been scheduled at 2021-04-23 > 15:25:13.887. The scheduling loop took longer than 5mins. > {code:java} > 2021-04-23 15:25:13.887 o.a.s.s.Cluster timer [INFO] STATUS - > topo1-52-1619191499 Running - Fully Scheduled by DefaultResourceAwareStrategy > (1297 states traversed in 1275 ms, backtracked 0 times) > other topologies were taking long time > 2021-04-23 15:25:14.378 o.a.s.s.Cluster timer [INFO] STATUS - > topo2-76-1612842912 Running - Fully Scheduled by DefaultResourceAwareStrategy > (111 states traversed in 34 ms, backtracked 0 times) > ... > 2021-04-23 15:30:14.192 o.a.s.s.Cluster timer [INFO] STATUS - > TrendingNowLES-11-1611713968 Not enough resources to schedule after evicting > lower priority topologies. Additional Memory Required: 20128.0 MB (Available: > 5411178.0 MB). Additional CPU Required: 1010.0% CPU (Available: 3100.0 % > CPU).Cannot schedule by DefaultResourceAwareStrategy (65644 states traversed > in 299804 ms, backtracked 6 times, 89 of 150 executors scheduled) > ... > 2021-04-23 15:30:14.216 o.a.s.s.Cluster timer [INFO] STATUS - > evaluateplus-dev-47-1605825401 Running - Fully Scheduled by > GenericResourceAwareStrategy (41 states traversed in 10 ms, backtracked 0 > times) > {code} > 3. During this period, the idToAssignment map in assignmentsBackend wouldn't > have the entry for topo1-52-1619191499, so when a component UI was visited, > https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L3613-L3614 > https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L3100 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java#L194 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/assignments/InMemoryAssignmentBackend.java#L69 > it got a null value as the assignment, and hence NPE. > This can be produced easily by adding some sleep anywhere between > {code:title=Nimbus.java} > Map newSchedulerAssignments = > computeNewSchedulerAssignments(existingAssignments, > topologies, bases, scratchTopoId); > {code} > and > {code:title=Nimbus.java} > state.setAssignment(topoId, assignment, td.getConf()); > {code} > and submit a new topology and visit its component UI -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (STORM-3767) NPE on getComponentPendingProfileActions
[ https://issues.apache.org/jira/browse/STORM-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3767: Fix Version/s: 2.3.0 > NPE on getComponentPendingProfileActions > - > > Key: STORM-3767 > URL: https://issues.apache.org/jira/browse/STORM-3767 > Project: Apache Storm > Issue Type: Bug >Affects Versions: 2.0.0, 2.1.0, 2.2.0 >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Fix For: 2.3.0 > > Attachments: Screen Shot 2021-04-27 at 11.09.33 AM.png > > Time Spent: 0.5h > Remaining Estimate: 0h > > When a topology is newly submitted, if the scheduling loop takes too long, > the component UI might have error 500. > This is due to the NPE in nimbus code. An example: > 1. When a scheduling loop finishes, nimbus will eventually update the > assignmentsBackend. if a topology is newly submitted, its entry will be added > to the idToAssignment map, otherwise, the entry will be updated with new > assignments. The key point is the new topology Id doesn't exist in > idToAssignment before it reaching here. > https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L2548-L2549 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java#L696 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/assignments/InMemoryAssignmentBackend.java#L63-L64 > 2. However, this assignmentsBackend update only started to happen at > 2021-04-23 15:30:14.299 > {code:java} > 2021-04-23 15:30:14.299 o.a.s.d.n.Nimbus timer [INFO] Setting new assignment > for topology > {code} > while this topology topo1-52-1619191499 has been scheduled at 2021-04-23 > 15:25:13.887. The scheduling loop took longer than 5mins. > {code:java} > 2021-04-23 15:25:13.887 o.a.s.s.Cluster timer [INFO] STATUS - > topo1-52-1619191499 Running - Fully Scheduled by DefaultResourceAwareStrategy > (1297 states traversed in 1275 ms, backtracked 0 times) > other topologies were taking long time > 2021-04-23 15:25:14.378 o.a.s.s.Cluster timer [INFO] STATUS - > topo2-76-1612842912 Running - Fully Scheduled by DefaultResourceAwareStrategy > (111 states traversed in 34 ms, backtracked 0 times) > ... > 2021-04-23 15:30:14.192 o.a.s.s.Cluster timer [INFO] STATUS - > TrendingNowLES-11-1611713968 Not enough resources to schedule after evicting > lower priority topologies. Additional Memory Required: 20128.0 MB (Available: > 5411178.0 MB). Additional CPU Required: 1010.0% CPU (Available: 3100.0 % > CPU).Cannot schedule by DefaultResourceAwareStrategy (65644 states traversed > in 299804 ms, backtracked 6 times, 89 of 150 executors scheduled) > ... > 2021-04-23 15:30:14.216 o.a.s.s.Cluster timer [INFO] STATUS - > evaluateplus-dev-47-1605825401 Running - Fully Scheduled by > GenericResourceAwareStrategy (41 states traversed in 10 ms, backtracked 0 > times) > {code} > 3. During this period, the idToAssignment map in assignmentsBackend wouldn't > have the entry for topo1-52-1619191499, so when a component UI was visited, > https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L3613-L3614 > https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L3100 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java#L194 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/assignments/InMemoryAssignmentBackend.java#L69 > it got a null value as the assignment, and hence NPE. > This can be produced easily by adding some sleep anywhere between > {code:title=Nimbus.java} > Map newSchedulerAssignments = > computeNewSchedulerAssignments(existingAssignments, > topologies, bases, scratchTopoId); > {code} > and > {code:title=Nimbus.java} > state.setAssignment(topoId, assignment, td.getConf()); > {code} > and submit a new topology and visit its component UI -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (STORM-3785) Rate metrics are wrongly divided by 1000000
[ https://issues.apache.org/jira/browse/STORM-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3785: Component/s: storm-client > Rate metrics are wrongly divided by 100 > --- > > Key: STORM-3785 > URL: https://issues.apache.org/jira/browse/STORM-3785 > Project: Apache Storm > Issue Type: Bug > Components: storm-client >Affects Versions: 2.2.0 >Reporter: Rui Li >Assignee: Rui Li >Priority: Critical > Fix For: 2.3.0, 2.2.1 > > Time Spent: 40m > Remaining Estimate: 0h > > customers complained about odd behavior of storm rate metrics (m1_rate, etc). > It turns out to be a Storm bug: > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/executor/Executor.java#L416 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/executor/Executor.java#L437 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3785) Rate metrics are wrongly divided by 1000000
[ https://issues.apache.org/jira/browse/STORM-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3785. - Resolution: Fixed > Rate metrics are wrongly divided by 100 > --- > > Key: STORM-3785 > URL: https://issues.apache.org/jira/browse/STORM-3785 > Project: Apache Storm > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Rui Li >Assignee: Rui Li >Priority: Critical > Fix For: 2.3.0, 2.2.1 > > Time Spent: 40m > Remaining Estimate: 0h > > customers complained about odd behavior of storm rate metrics (m1_rate, etc). > It turns out to be a Storm bug: > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/executor/Executor.java#L416 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/executor/Executor.java#L437 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (STORM-3785) Rate metrics are wrongly divided by 1000000
[ https://issues.apache.org/jira/browse/STORM-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17418798#comment-17418798 ] Ethan Li commented on STORM-3785: - Merged in master: 0c7bbca67db593b82030a7b406a44d54122ce792, 775d6098a9a7bc8d2242e925457a960ff89faaeb cherry-picked to 2.2.x-branch: ee6cd75abb4704bc97039f626c1106361518, e95af5a967c75ac8c102752d9d6100853d3ace3b > Rate metrics are wrongly divided by 100 > --- > > Key: STORM-3785 > URL: https://issues.apache.org/jira/browse/STORM-3785 > Project: Apache Storm > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Rui Li >Assignee: Rui Li >Priority: Critical > Time Spent: 40m > Remaining Estimate: 0h > > customers complained about odd behavior of storm rate metrics (m1_rate, etc). > It turns out to be a Storm bug: > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/executor/Executor.java#L416 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/executor/Executor.java#L437 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (STORM-3785) Rate metrics are wrongly divided by 1000000
[ https://issues.apache.org/jira/browse/STORM-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3785: Fix Version/s: 2.2.1 2.3.0 > Rate metrics are wrongly divided by 100 > --- > > Key: STORM-3785 > URL: https://issues.apache.org/jira/browse/STORM-3785 > Project: Apache Storm > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Rui Li >Assignee: Rui Li >Priority: Critical > Fix For: 2.3.0, 2.2.1 > > Time Spent: 40m > Remaining Estimate: 0h > > customers complained about odd behavior of storm rate metrics (m1_rate, etc). > It turns out to be a Storm bug: > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/executor/Executor.java#L416 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/executor/Executor.java#L437 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (STORM-3785) Rate metrics are wrongly divided by 1000000
[ https://issues.apache.org/jira/browse/STORM-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3785: Affects Version/s: 2.2.0 > Rate metrics are wrongly divided by 100 > --- > > Key: STORM-3785 > URL: https://issues.apache.org/jira/browse/STORM-3785 > Project: Apache Storm > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Rui Li >Assignee: Rui Li >Priority: Critical > Time Spent: 20m > Remaining Estimate: 0h > > customers complained about odd behavior of storm rate metrics (m1_rate, etc). > It turns out to be a Storm bug: > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/executor/Executor.java#L416 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/executor/Executor.java#L437 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (STORM-3785) Rate metrics are wrongly divided by 1000000
[ https://issues.apache.org/jira/browse/STORM-3785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li reassigned STORM-3785: --- Assignee: Rui Li > Rate metrics are wrongly divided by 100 > --- > > Key: STORM-3785 > URL: https://issues.apache.org/jira/browse/STORM-3785 > Project: Apache Storm > Issue Type: Bug >Reporter: Rui Li >Assignee: Rui Li >Priority: Critical > Time Spent: 20m > Remaining Estimate: 0h > > customers complained about odd behavior of storm rate metrics (m1_rate, etc). > It turns out to be a Storm bug: > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/executor/Executor.java#L416 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/executor/Executor.java#L437 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (STORM-3767) NPE on getComponentPendingProfileActions
[ https://issues.apache.org/jira/browse/STORM-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li reassigned STORM-3767: --- Assignee: Ethan Li > NPE on getComponentPendingProfileActions > - > > Key: STORM-3767 > URL: https://issues.apache.org/jira/browse/STORM-3767 > Project: Apache Storm > Issue Type: Bug >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Attachments: Screen Shot 2021-04-27 at 11.09.33 AM.png > > Time Spent: 20m > Remaining Estimate: 0h > > When a topology is newly submitted, if the scheduling loop takes too long, > the component UI might have error 500. > This is due to the NPE in nimbus code. An example: > 1. When a scheduling loop finishes, nimbus will eventually update the > assignmentsBackend. if a topology is newly submitted, its entry will be added > to the idToAssignment map, otherwise, the entry will be updated with new > assignments. The key point is the new topology Id doesn't exist in > idToAssignment before it reaching here. > https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L2548-L2549 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java#L696 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/assignments/InMemoryAssignmentBackend.java#L63-L64 > 2. However, this assignmentsBackend update only started to happen at > 2021-04-23 15:30:14.299 > {code:java} > 2021-04-23 15:30:14.299 o.a.s.d.n.Nimbus timer [INFO] Setting new assignment > for topology > {code} > while this topology topo1-52-1619191499 has been scheduled at 2021-04-23 > 15:25:13.887. The scheduling loop took longer than 5mins. > {code:java} > 2021-04-23 15:25:13.887 o.a.s.s.Cluster timer [INFO] STATUS - > topo1-52-1619191499 Running - Fully Scheduled by DefaultResourceAwareStrategy > (1297 states traversed in 1275 ms, backtracked 0 times) > other topologies were taking long time > 2021-04-23 15:25:14.378 o.a.s.s.Cluster timer [INFO] STATUS - > topo2-76-1612842912 Running - Fully Scheduled by DefaultResourceAwareStrategy > (111 states traversed in 34 ms, backtracked 0 times) > ... > 2021-04-23 15:30:14.192 o.a.s.s.Cluster timer [INFO] STATUS - > TrendingNowLES-11-1611713968 Not enough resources to schedule after evicting > lower priority topologies. Additional Memory Required: 20128.0 MB (Available: > 5411178.0 MB). Additional CPU Required: 1010.0% CPU (Available: 3100.0 % > CPU).Cannot schedule by DefaultResourceAwareStrategy (65644 states traversed > in 299804 ms, backtracked 6 times, 89 of 150 executors scheduled) > ... > 2021-04-23 15:30:14.216 o.a.s.s.Cluster timer [INFO] STATUS - > evaluateplus-dev-47-1605825401 Running - Fully Scheduled by > GenericResourceAwareStrategy (41 states traversed in 10 ms, backtracked 0 > times) > {code} > 3. During this period, the idToAssignment map in assignmentsBackend wouldn't > have the entry for topo1-52-1619191499, so when a component UI was visited, > https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L3613-L3614 > https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L3100 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java#L194 > https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/assignments/InMemoryAssignmentBackend.java#L69 > it got a null value as the assignment, and hence NPE. > This can be produced easily by adding some sleep anywhere between > {code:title=Nimbus.java} > Map newSchedulerAssignments = > computeNewSchedulerAssignments(existingAssignments, > topologies, bases, scratchTopoId); > {code} > and > {code:title=Nimbus.java} > state.setAssignment(topoId, assignment, td.getConf()); > {code} > and submit a new topology and visit its component UI -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (STORM-3388) Launch workers inside container using runc runtime
[ https://issues.apache.org/jira/browse/STORM-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17399964#comment-17399964 ] Ethan Li commented on STORM-3388: - A follow-up bug fix is merged in master branch 16ca56eae2c88db42293e926f764abfc2139f26b > Launch workers inside container using runc runtime > -- > > Key: STORM-3388 > URL: https://issues.apache.org/jira/browse/STORM-3388 > Project: Apache Storm > Issue Type: New Feature >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Fix For: 2.3.0 > > Time Spent: 16h 10m > Remaining Estimate: 0h > > Have been working on this. Will push it back after fully tested -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3388) Launch workers inside container using runc runtime
[ https://issues.apache.org/jira/browse/STORM-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3388. - Fix Version/s: 2.3.0 Resolution: Fixed Filed follow up jiras: STORM-3787 STORM-3788 STORM-3789 > Launch workers inside container using runc runtime > -- > > Key: STORM-3388 > URL: https://issues.apache.org/jira/browse/STORM-3388 > Project: Apache Storm > Issue Type: New Feature >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Fix For: 2.3.0 > > Time Spent: 15h 40m > Remaining Estimate: 0h > > Have been working on this. Will push it back after fully tested -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3789) STORM-3388 follow up: answer unresolved questions in comments
Ethan Li created STORM-3789: --- Summary: STORM-3388 follow up: answer unresolved questions in comments Key: STORM-3789 URL: https://issues.apache.org/jira/browse/STORM-3789 Project: Apache Storm Issue Type: Story Reporter: Ethan Li see https://github.com/apache/storm/pull/3366#issuecomment-882596986 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3788) Add more comments in docker-to-squash.py
Ethan Li created STORM-3788: --- Summary: Add more comments in docker-to-squash.py Key: STORM-3788 URL: https://issues.apache.org/jira/browse/STORM-3788 Project: Apache Storm Issue Type: Story Reporter: Ethan Li docker-to-squash.py is a long script and needs more comments to help understanding See https://github.com/apache/storm/pull/3366#issuecomment-882596986 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3787) Add error messages in worker-launcher c code where it is missing
Ethan Li created STORM-3787: --- Summary: Add error messages in worker-launcher c code where it is missing Key: STORM-3787 URL: https://issues.apache.org/jira/browse/STORM-3787 Project: Apache Storm Issue Type: Story Reporter: Ethan Li worker-launcher c code needs more error messages to help debugging if issues occur. see comments in https://github.com/apache/storm/pull/3366#issuecomment-882596986 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (STORM-3388) Launch workers inside container using runc runtime
[ https://issues.apache.org/jira/browse/STORM-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394927#comment-17394927 ] Ethan Li commented on STORM-3388: - Merged to master branch (ce078fcb78dd881f70df0be34e04940a53c38394) > Launch workers inside container using runc runtime > -- > > Key: STORM-3388 > URL: https://issues.apache.org/jira/browse/STORM-3388 > Project: Apache Storm > Issue Type: New Feature >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Time Spent: 15h 40m > Remaining Estimate: 0h > > Have been working on this. Will push it back after fully tested -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (STORM-3765) NPE in DRPCSimpleACLAuthorizer.readAclFromConfig when drpc.authorizer.acl has no values
[ https://issues.apache.org/jira/browse/STORM-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338554#comment-17338554 ] Ethan Li commented on STORM-3765: - Merged into master : d5244b8cb06a0f840ad1bf5d3faf74948702fdb9 > NPE in DRPCSimpleACLAuthorizer.readAclFromConfig when drpc.authorizer.acl has > no values > --- > > Key: STORM-3765 > URL: https://issues.apache.org/jira/browse/STORM-3765 > Project: Apache Storm > Issue Type: Bug >Affects Versions: 2.0.0, 2.1.0, 2.2.0 >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Fix For: 2.3.0 > > Time Spent: 20m > Remaining Estimate: 0h > > When drpc.authorizer.acl has no values, for example: > {code:java} > -bash-4.2$ cat drpc-auth-acl.yaml > drpc.authorizer.acl: > {code} > DRPCSimpleACLAuthorizer will have NPE > {code:java} > 2021-04-22 15:22:48.795 o.a.s.t.ProcessFunction pool-9-thread-1 [ERROR] > Internal error processing fetchRequest > java.lang.NullPointerException: null > at > org.apache.storm.security.auth.authorizer.DRPCSimpleACLAuthorizer.readAclFromConfig(DRPCSimpleACLAuthorizer.java:59) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.security.auth.authorizer.DRPCSimpleACLAuthorizer.permitClientOrInvocationRequest(DRPCSimpleACLAuthorizer.java:108) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.security.auth.authorizer.DRPCSimpleACLAuthorizer.permitInvocationRequest(DRPCSimpleACLAuthorizer.java:150) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.security.auth.authorizer.DRPCAuthorizerBase.permit(DRPCAuthorizerBase.java:51) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.daemon.drpc.DRPC.checkAuthorization(DRPC.java:130) > ~[storm-server-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.daemon.drpc.DRPC.checkAuthorizationNoLog(DRPC.java:143) > ~[storm-server-2.3.0.y.jar:2.3.0.y] > at org.apache.storm.daemon.drpc.DRPC.fetchRequest(DRPC.java:192) > ~[storm-server-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.daemon.drpc.DRPCThrift.fetchRequest(DRPCThrift.java:42) > ~[storm-server-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.generated.DistributedRPCInvocations$Processor$fetchRequest.getResult(DistributedRPCInvocations.java:393) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.generated.DistributedRPCInvocations$Processor$fetchRequest.getResult(DistributedRPCInvocations.java:372) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:38) > [storm-shaded-deps-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > [storm-shaded-deps-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.security.auth.sasl.SaslTransportPlugin$TUGIWrapProcessor.process(SaslTransportPlugin.java:152) > [storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:291) > [storm-shaded-deps-2.3.0.y.jar:2.3.0.y] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0_262] > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Th > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3765) NPE in DRPCSimpleACLAuthorizer.readAclFromConfig when drpc.authorizer.acl has no values
[ https://issues.apache.org/jira/browse/STORM-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3765. - Resolution: Fixed > NPE in DRPCSimpleACLAuthorizer.readAclFromConfig when drpc.authorizer.acl has > no values > --- > > Key: STORM-3765 > URL: https://issues.apache.org/jira/browse/STORM-3765 > Project: Apache Storm > Issue Type: Bug >Affects Versions: 2.0.0, 2.1.0, 2.2.0 >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Fix For: 2.3.0 > > Time Spent: 20m > Remaining Estimate: 0h > > When drpc.authorizer.acl has no values, for example: > {code:java} > -bash-4.2$ cat drpc-auth-acl.yaml > drpc.authorizer.acl: > {code} > DRPCSimpleACLAuthorizer will have NPE > {code:java} > 2021-04-22 15:22:48.795 o.a.s.t.ProcessFunction pool-9-thread-1 [ERROR] > Internal error processing fetchRequest > java.lang.NullPointerException: null > at > org.apache.storm.security.auth.authorizer.DRPCSimpleACLAuthorizer.readAclFromConfig(DRPCSimpleACLAuthorizer.java:59) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.security.auth.authorizer.DRPCSimpleACLAuthorizer.permitClientOrInvocationRequest(DRPCSimpleACLAuthorizer.java:108) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.security.auth.authorizer.DRPCSimpleACLAuthorizer.permitInvocationRequest(DRPCSimpleACLAuthorizer.java:150) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.security.auth.authorizer.DRPCAuthorizerBase.permit(DRPCAuthorizerBase.java:51) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.daemon.drpc.DRPC.checkAuthorization(DRPC.java:130) > ~[storm-server-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.daemon.drpc.DRPC.checkAuthorizationNoLog(DRPC.java:143) > ~[storm-server-2.3.0.y.jar:2.3.0.y] > at org.apache.storm.daemon.drpc.DRPC.fetchRequest(DRPC.java:192) > ~[storm-server-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.daemon.drpc.DRPCThrift.fetchRequest(DRPCThrift.java:42) > ~[storm-server-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.generated.DistributedRPCInvocations$Processor$fetchRequest.getResult(DistributedRPCInvocations.java:393) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.generated.DistributedRPCInvocations$Processor$fetchRequest.getResult(DistributedRPCInvocations.java:372) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:38) > [storm-shaded-deps-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > [storm-shaded-deps-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.security.auth.sasl.SaslTransportPlugin$TUGIWrapProcessor.process(SaslTransportPlugin.java:152) > [storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:291) > [storm-shaded-deps-2.3.0.y.jar:2.3.0.y] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0_262] > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Th > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (STORM-3765) NPE in DRPCSimpleACLAuthorizer.readAclFromConfig when drpc.authorizer.acl has no values
[ https://issues.apache.org/jira/browse/STORM-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3765: Affects Version/s: 2.2.0 2.0.0 2.1.0 > NPE in DRPCSimpleACLAuthorizer.readAclFromConfig when drpc.authorizer.acl has > no values > --- > > Key: STORM-3765 > URL: https://issues.apache.org/jira/browse/STORM-3765 > Project: Apache Storm > Issue Type: Bug >Affects Versions: 2.0.0, 2.1.0, 2.2.0 >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Fix For: 2.3.0 > > Time Spent: 20m > Remaining Estimate: 0h > > When drpc.authorizer.acl has no values, for example: > {code:java} > -bash-4.2$ cat drpc-auth-acl.yaml > drpc.authorizer.acl: > {code} > DRPCSimpleACLAuthorizer will have NPE > {code:java} > 2021-04-22 15:22:48.795 o.a.s.t.ProcessFunction pool-9-thread-1 [ERROR] > Internal error processing fetchRequest > java.lang.NullPointerException: null > at > org.apache.storm.security.auth.authorizer.DRPCSimpleACLAuthorizer.readAclFromConfig(DRPCSimpleACLAuthorizer.java:59) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.security.auth.authorizer.DRPCSimpleACLAuthorizer.permitClientOrInvocationRequest(DRPCSimpleACLAuthorizer.java:108) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.security.auth.authorizer.DRPCSimpleACLAuthorizer.permitInvocationRequest(DRPCSimpleACLAuthorizer.java:150) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.security.auth.authorizer.DRPCAuthorizerBase.permit(DRPCAuthorizerBase.java:51) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.daemon.drpc.DRPC.checkAuthorization(DRPC.java:130) > ~[storm-server-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.daemon.drpc.DRPC.checkAuthorizationNoLog(DRPC.java:143) > ~[storm-server-2.3.0.y.jar:2.3.0.y] > at org.apache.storm.daemon.drpc.DRPC.fetchRequest(DRPC.java:192) > ~[storm-server-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.daemon.drpc.DRPCThrift.fetchRequest(DRPCThrift.java:42) > ~[storm-server-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.generated.DistributedRPCInvocations$Processor$fetchRequest.getResult(DistributedRPCInvocations.java:393) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.generated.DistributedRPCInvocations$Processor$fetchRequest.getResult(DistributedRPCInvocations.java:372) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:38) > [storm-shaded-deps-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > [storm-shaded-deps-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.security.auth.sasl.SaslTransportPlugin$TUGIWrapProcessor.process(SaslTransportPlugin.java:152) > [storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:291) > [storm-shaded-deps-2.3.0.y.jar:2.3.0.y] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0_262] > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Th > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (STORM-3765) NPE in DRPCSimpleACLAuthorizer.readAclFromConfig when drpc.authorizer.acl has no values
[ https://issues.apache.org/jira/browse/STORM-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3765: Fix Version/s: 2.3.0 > NPE in DRPCSimpleACLAuthorizer.readAclFromConfig when drpc.authorizer.acl has > no values > --- > > Key: STORM-3765 > URL: https://issues.apache.org/jira/browse/STORM-3765 > Project: Apache Storm > Issue Type: Bug >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Fix For: 2.3.0 > > Time Spent: 20m > Remaining Estimate: 0h > > When drpc.authorizer.acl has no values, for example: > {code:java} > -bash-4.2$ cat drpc-auth-acl.yaml > drpc.authorizer.acl: > {code} > DRPCSimpleACLAuthorizer will have NPE > {code:java} > 2021-04-22 15:22:48.795 o.a.s.t.ProcessFunction pool-9-thread-1 [ERROR] > Internal error processing fetchRequest > java.lang.NullPointerException: null > at > org.apache.storm.security.auth.authorizer.DRPCSimpleACLAuthorizer.readAclFromConfig(DRPCSimpleACLAuthorizer.java:59) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.security.auth.authorizer.DRPCSimpleACLAuthorizer.permitClientOrInvocationRequest(DRPCSimpleACLAuthorizer.java:108) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.security.auth.authorizer.DRPCSimpleACLAuthorizer.permitInvocationRequest(DRPCSimpleACLAuthorizer.java:150) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.security.auth.authorizer.DRPCAuthorizerBase.permit(DRPCAuthorizerBase.java:51) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.daemon.drpc.DRPC.checkAuthorization(DRPC.java:130) > ~[storm-server-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.daemon.drpc.DRPC.checkAuthorizationNoLog(DRPC.java:143) > ~[storm-server-2.3.0.y.jar:2.3.0.y] > at org.apache.storm.daemon.drpc.DRPC.fetchRequest(DRPC.java:192) > ~[storm-server-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.daemon.drpc.DRPCThrift.fetchRequest(DRPCThrift.java:42) > ~[storm-server-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.generated.DistributedRPCInvocations$Processor$fetchRequest.getResult(DistributedRPCInvocations.java:393) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.generated.DistributedRPCInvocations$Processor$fetchRequest.getResult(DistributedRPCInvocations.java:372) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:38) > [storm-shaded-deps-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > [storm-shaded-deps-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.security.auth.sasl.SaslTransportPlugin$TUGIWrapProcessor.process(SaslTransportPlugin.java:152) > [storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:291) > [storm-shaded-deps-2.3.0.y.jar:2.3.0.y] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0_262] > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Th > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3767) NPE on getComponentPendingProfileActions
Ethan Li created STORM-3767: --- Summary: NPE on getComponentPendingProfileActions Key: STORM-3767 URL: https://issues.apache.org/jira/browse/STORM-3767 Project: Apache Storm Issue Type: Bug Reporter: Ethan Li Attachments: Screen Shot 2021-04-27 at 11.09.33 AM.png When a topology is newly submitted, if the scheduling loop takes too long, the component UI might have error 500. This is due to the NPE in nimbus code. An example: 1. When a scheduling loop finishes, nimbus will eventually update the assignmentsBackend. if a topology is newly submitted, its entry will be added to the idToAssignment map, otherwise, the entry will be updated with new assignments. The key point is the new topology Id doesn't exist in idToAssignment before it reaching here. https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L2548-L2549 https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java#L696 https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/assignments/InMemoryAssignmentBackend.java#L63-L64 2. However, this assignmentsBackend update only started to happen at 2021-04-23 15:30:14.299 {code:java} 2021-04-23 15:30:14.299 o.a.s.d.n.Nimbus timer [INFO] Setting new assignment for topology {code} while this topology topo1-52-1619191499 has been scheduled at 2021-04-23 15:25:13.887. The scheduling loop took longer than 5mins. {code:java} 2021-04-23 15:25:13.887 o.a.s.s.Cluster timer [INFO] STATUS - topo1-52-1619191499 Running - Fully Scheduled by DefaultResourceAwareStrategy (1297 states traversed in 1275 ms, backtracked 0 times) other topologies were taking long time 2021-04-23 15:25:14.378 o.a.s.s.Cluster timer [INFO] STATUS - topo2-76-1612842912 Running - Fully Scheduled by DefaultResourceAwareStrategy (111 states traversed in 34 ms, backtracked 0 times) ... 2021-04-23 15:30:14.192 o.a.s.s.Cluster timer [INFO] STATUS - TrendingNowLES-11-1611713968 Not enough resources to schedule after evicting lower priority topologies. Additional Memory Required: 20128.0 MB (Available: 5411178.0 MB). Additional CPU Required: 1010.0% CPU (Available: 3100.0 % CPU).Cannot schedule by DefaultResourceAwareStrategy (65644 states traversed in 299804 ms, backtracked 6 times, 89 of 150 executors scheduled) ... 2021-04-23 15:30:14.216 o.a.s.s.Cluster timer [INFO] STATUS - evaluateplus-dev-47-1605825401 Running - Fully Scheduled by GenericResourceAwareStrategy (41 states traversed in 10 ms, backtracked 0 times) {code} 3. During this period, the idToAssignment map in assignmentsBackend wouldn't have the entry for topo1-52-1619191499, so when a component UI was visited, https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L3613-L3614 https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L3100 https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/cluster/StormClusterStateImpl.java#L194 https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/assignments/InMemoryAssignmentBackend.java#L69 it got a null value as the assignment, and hence NPE. This can be produced easily by adding some sleep anywhere between {code:title=Nimbus.java} Map newSchedulerAssignments = computeNewSchedulerAssignments(existingAssignments, topologies, bases, scratchTopoId); {code} and {code:title=Nimbus.java} state.setAssignment(topoId, assignment, td.getConf()); {code} and submit a new topology and visit its component UI -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3765) NPE in DRPCSimpleACLAuthorizer.readAclFromConfig when drpc.authorizer.acl has no values
Ethan Li created STORM-3765: --- Summary: NPE in DRPCSimpleACLAuthorizer.readAclFromConfig when drpc.authorizer.acl has no values Key: STORM-3765 URL: https://issues.apache.org/jira/browse/STORM-3765 Project: Apache Storm Issue Type: Bug Reporter: Ethan Li When drpc.authorizer.acl has no values, for example: {code:java} -bash-4.2$ cat drpc-auth-acl.yaml drpc.authorizer.acl: {code} DRPCSimpleACLAuthorizer will have NPE {code:java} 2021-04-22 15:22:48.795 o.a.s.t.ProcessFunction pool-9-thread-1 [ERROR] Internal error processing fetchRequest java.lang.NullPointerException: null at org.apache.storm.security.auth.authorizer.DRPCSimpleACLAuthorizer.readAclFromConfig(DRPCSimpleACLAuthorizer.java:59) ~[storm-client-2.3.0.y.jar:2.3.0.y] at org.apache.storm.security.auth.authorizer.DRPCSimpleACLAuthorizer.permitClientOrInvocationRequest(DRPCSimpleACLAuthorizer.java:108) ~[storm-client-2.3.0.y.jar:2.3.0.y] at org.apache.storm.security.auth.authorizer.DRPCSimpleACLAuthorizer.permitInvocationRequest(DRPCSimpleACLAuthorizer.java:150) ~[storm-client-2.3.0.y.jar:2.3.0.y] at org.apache.storm.security.auth.authorizer.DRPCAuthorizerBase.permit(DRPCAuthorizerBase.java:51) ~[storm-client-2.3.0.y.jar:2.3.0.y] at org.apache.storm.daemon.drpc.DRPC.checkAuthorization(DRPC.java:130) ~[storm-server-2.3.0.y.jar:2.3.0.y] at org.apache.storm.daemon.drpc.DRPC.checkAuthorizationNoLog(DRPC.java:143) ~[storm-server-2.3.0.y.jar:2.3.0.y] at org.apache.storm.daemon.drpc.DRPC.fetchRequest(DRPC.java:192) ~[storm-server-2.3.0.y.jar:2.3.0.y] at org.apache.storm.daemon.drpc.DRPCThrift.fetchRequest(DRPCThrift.java:42) ~[storm-server-2.3.0.y.jar:2.3.0.y] at org.apache.storm.generated.DistributedRPCInvocations$Processor$fetchRequest.getResult(DistributedRPCInvocations.java:393) ~[storm-client-2.3.0.y.jar:2.3.0.y] at org.apache.storm.generated.DistributedRPCInvocations$Processor$fetchRequest.getResult(DistributedRPCInvocations.java:372) ~[storm-client-2.3.0.y.jar:2.3.0.y] at org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:38) [storm-shaded-deps-2.3.0.y.jar:2.3.0.y] at org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:39) [storm-shaded-deps-2.3.0.y.jar:2.3.0.y] at org.apache.storm.security.auth.sasl.SaslTransportPlugin$TUGIWrapProcessor.process(SaslTransportPlugin.java:152) [storm-client-2.3.0.y.jar:2.3.0.y] at org.apache.storm.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:291) [storm-shaded-deps-2.3.0.y.jar:2.3.0.y] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_262] at java.util.concurrent.ThreadPoolExecutor$Worker.run(Th {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (STORM-3765) NPE in DRPCSimpleACLAuthorizer.readAclFromConfig when drpc.authorizer.acl has no values
[ https://issues.apache.org/jira/browse/STORM-3765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li reassigned STORM-3765: --- Assignee: Ethan Li > NPE in DRPCSimpleACLAuthorizer.readAclFromConfig when drpc.authorizer.acl has > no values > --- > > Key: STORM-3765 > URL: https://issues.apache.org/jira/browse/STORM-3765 > Project: Apache Storm > Issue Type: Bug >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > > When drpc.authorizer.acl has no values, for example: > {code:java} > -bash-4.2$ cat drpc-auth-acl.yaml > drpc.authorizer.acl: > {code} > DRPCSimpleACLAuthorizer will have NPE > {code:java} > 2021-04-22 15:22:48.795 o.a.s.t.ProcessFunction pool-9-thread-1 [ERROR] > Internal error processing fetchRequest > java.lang.NullPointerException: null > at > org.apache.storm.security.auth.authorizer.DRPCSimpleACLAuthorizer.readAclFromConfig(DRPCSimpleACLAuthorizer.java:59) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.security.auth.authorizer.DRPCSimpleACLAuthorizer.permitClientOrInvocationRequest(DRPCSimpleACLAuthorizer.java:108) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.security.auth.authorizer.DRPCSimpleACLAuthorizer.permitInvocationRequest(DRPCSimpleACLAuthorizer.java:150) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.security.auth.authorizer.DRPCAuthorizerBase.permit(DRPCAuthorizerBase.java:51) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.daemon.drpc.DRPC.checkAuthorization(DRPC.java:130) > ~[storm-server-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.daemon.drpc.DRPC.checkAuthorizationNoLog(DRPC.java:143) > ~[storm-server-2.3.0.y.jar:2.3.0.y] > at org.apache.storm.daemon.drpc.DRPC.fetchRequest(DRPC.java:192) > ~[storm-server-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.daemon.drpc.DRPCThrift.fetchRequest(DRPCThrift.java:42) > ~[storm-server-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.generated.DistributedRPCInvocations$Processor$fetchRequest.getResult(DistributedRPCInvocations.java:393) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.generated.DistributedRPCInvocations$Processor$fetchRequest.getResult(DistributedRPCInvocations.java:372) > ~[storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:38) > [storm-shaded-deps-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > [storm-shaded-deps-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.security.auth.sasl.SaslTransportPlugin$TUGIWrapProcessor.process(SaslTransportPlugin.java:152) > [storm-client-2.3.0.y.jar:2.3.0.y] > at > org.apache.storm.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:291) > [storm-shaded-deps-2.3.0.y.jar:2.3.0.y] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0_262] > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Th > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3763) Backpressure message ignored by the receiver caused the topology to not progress
[ https://issues.apache.org/jira/browse/STORM-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3763. - Fix Version/s: 2.2.1 2.3.0 Resolution: Fixed Merged to master: 4a1eea700766da2f175ac7eaba6064f0d7f0ff03 Cherry-picked to 2.2.x-branch: 7543a13e021570b6b4d7e583e00823c0b3106a3b > Backpressure message ignored by the receiver caused the topology to not > progress > > > Key: STORM-3763 > URL: https://issues.apache.org/jira/browse/STORM-3763 > Project: Apache Storm > Issue Type: Bug >Affects Versions: 2.0.0, 2.1.0, 2.2.0 >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Fix For: 2.3.0, 2.2.1 > > Time Spent: 20m > Remaining Estimate: 0h > > We have noticed a case where topology is stuck due to the mis-interpretation > of backpressure messge: > At beginning, the topology ran fine but a downstream component had > backpressure, so it sent backpressure signal to its upstream component, and > the upstream component paused sending data to the downstream bolt. > Then the downstream component restarted (due to any reason, for example, > killed by supervisor due to heartbeat timeout). When it came back up, it > sends backpressure message to the upstream bolt. However, the upstream > component didn't know how to interpret the backpressure message so it logs > the below error and ignores the message. > {code:java} > 2021-01-28 19:41:37.175 o.a.s.m.n.SaslStormClientHandler client-worker-1 > [ERROR] Unexpected message from server: > {worker=4c38160a-3c66-4eff-8572-2d0c493bd6c1, bpStatusId=254, bpTasks=[], > nonBpTasks=[546, 790, 863]} > {code} > Then the downstream component will not receive any data from the upstream > component, so it won't have any backpressure (since no data is sent to it), > hence it won't send any backpressure update message to the upstream > component. This leads to a dead situation that the upstream component thinks > the downstream has backpressure so it paused sending data to it, while the > downstream doesn't have backpressure but can't receive any data from > upstream. The topology is stuck because of it. > Let's look at the code: > When the connection between the downstream (server) and upstream (client) is > established, > server invokes > https://github.com/apache/storm/blob/2.2.x-branch/storm-client/src/jvm/org/apache/storm/messaging/netty/StormServerHandler.java#L39-L41 > https://github.com/apache/storm/blob/2.2.x-branch/storm-client/src/jvm/org/apache/storm/daemon/worker/WorkerState.java#L237 > which sends backpressure messages to the client. > This is because in this pipeline, "StormServerHandler" is the only one with > that implemented "channelActive()" method. > https://github.com/apache/storm/blob/2.2.x-branch/storm-client/src/jvm/org/apache/storm/messaging/netty/StormServerPipelineFactory.java#L56 > However, the Client side expects authentication messages. > https://github.com/apache/storm/blob/2.2.x-branch/storm-client/src/jvm/org/apache/storm/messaging/netty/SaslStormClientHandler.java#L70-L75 > so the client can't interpret the backpressure message at the beginning, > hence "unexpected message". > This can be supported with an example. I have a wordcount topology running. > At the startup, the client tries to connect to the server. Once connected, it > sends a "SASL_TOKEN_MESSAGE_REQUEST". > client log > {code:java} > 021-01-29 19:03:21.355 o.a.s.m.n.SaslStormClientHandler client-worker-1 > [DEBUG] SASL credentials for storm topology wc is > -8603731884381183101:-9091319821854384981 > 2021-01-29 19:03:21.359 o.a.s.m.n.Client client-worker-1 [DEBUG] successfully > connected to openstorm14blue-n3.blue.ygrid.yahoo.com/10.215.73.209:6702, [id: > 0x29da2e9c, L:/10.215.73.209:45870 - > R:openstorm14blue-n3.blue.ygrid.yahoo.com/10.215.73.209:6702] [attempt 12] > 2021-01-29 19:03:21.359 o.a.s.m.n.SaslStormClientHandler client-worker-1 > [INFO] Connection established from /10.215.73.209:45870 to > openstorm14blue-n3.blue.ygrid.yahoo.com/10.215.73.209:6702 > ... > 2021-01-29 19:03:21.362 o.a.s.m.n.SaslStormClientHandler client-worker-1 > [DEBUG] Creating saslNettyClient now for channel: [id: 0x29da2e9c, > L:/10.215.73.209:45870 - > R:openstorm14blue-n3.blue.ygrid.yahoo.com/10.215.73.209:6702] > 2021-01-29 19:03:21.363 o.a.s.m.n.SaslNettyClient client-worker-1 [DEBUG] > SaslNettyClient: Creating SASL DIGEST-MD5 client to authenticate to server > 2021-01-29 19:03:21.368 o.a.s.m.n.SaslStormClientHandler client-worker-1 > [DEBUG] Sending SASL_TOKEN_MESSAGE_REQUEST > ... > 2021-01-29 19:03:21.632 o.a.s.m.n.SaslStormClientHandler client-worker-1 > [DEBUG] send/recv time (ms): 277 > 2021-01-29 19:03:21.633 o.a.s.m.n.SaslStormClientHandler
[jira] [Updated] (STORM-3763) Backpressure message ignored by the receiver caused the topology to not progress
[ https://issues.apache.org/jira/browse/STORM-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3763: Description: We have noticed a case where topology is stuck due to the mis-interpretation of backpressure messge: At beginning, the topology ran fine but a downstream component had backpressure, so it sent backpressure signal to its upstream component, and the upstream component paused sending data to the downstream bolt. Then the downstream component restarted (due to any reason, for example, killed by supervisor due to heartbeat timeout). When it came back up, it sends backpressure message to the upstream bolt. However, the upstream component didn't know how to interpret the backpressure message so it logs the below error and ignores the message. {code:java} 2021-01-28 19:41:37.175 o.a.s.m.n.SaslStormClientHandler client-worker-1 [ERROR] Unexpected message from server: {worker=4c38160a-3c66-4eff-8572-2d0c493bd6c1, bpStatusId=254, bpTasks=[], nonBpTasks=[546, 790, 863]} {code} Then the downstream component will not receive any data from the upstream component, so it won't have any backpressure (since no data is sent to it), hence it won't send any backpressure update message to the upstream component. This leads to a dead situation that the upstream component thinks the downstream has backpressure so it paused sending data to it, while the downstream doesn't have backpressure but can't receive any data from upstream. The topology is stuck because of it. Let's look at the code: When the connection between the downstream (server) and upstream (client) is established, server invokes https://github.com/apache/storm/blob/2.2.x-branch/storm-client/src/jvm/org/apache/storm/messaging/netty/StormServerHandler.java#L39-L41 https://github.com/apache/storm/blob/2.2.x-branch/storm-client/src/jvm/org/apache/storm/daemon/worker/WorkerState.java#L237 which sends backpressure messages to the client. This is because in this pipeline, "StormServerHandler" is the only one with that implemented "channelActive()" method. https://github.com/apache/storm/blob/2.2.x-branch/storm-client/src/jvm/org/apache/storm/messaging/netty/StormServerPipelineFactory.java#L56 However, the Client side expects authentication messages. https://github.com/apache/storm/blob/2.2.x-branch/storm-client/src/jvm/org/apache/storm/messaging/netty/SaslStormClientHandler.java#L70-L75 so the client can't interpret the backpressure message at the beginning, hence "unexpected message". This can be supported with an example. I have a wordcount topology running. At the startup, the client tries to connect to the server. Once connected, it sends a "SASL_TOKEN_MESSAGE_REQUEST". client log {code:java} 021-01-29 19:03:21.355 o.a.s.m.n.SaslStormClientHandler client-worker-1 [DEBUG] SASL credentials for storm topology wc is -8603731884381183101:-9091319821854384981 2021-01-29 19:03:21.359 o.a.s.m.n.Client client-worker-1 [DEBUG] successfully connected to openstorm14blue-n3.blue.ygrid.yahoo.com/10.215.73.209:6702, [id: 0x29da2e9c, L:/10.215.73.209:45870 - R:openstorm14blue-n3.blue.ygrid.yahoo.com/10.215.73.209:6702] [attempt 12] 2021-01-29 19:03:21.359 o.a.s.m.n.SaslStormClientHandler client-worker-1 [INFO] Connection established from /10.215.73.209:45870 to openstorm14blue-n3.blue.ygrid.yahoo.com/10.215.73.209:6702 ... 2021-01-29 19:03:21.362 o.a.s.m.n.SaslStormClientHandler client-worker-1 [DEBUG] Creating saslNettyClient now for channel: [id: 0x29da2e9c, L:/10.215.73.209:45870 - R:openstorm14blue-n3.blue.ygrid.yahoo.com/10.215.73.209:6702] 2021-01-29 19:03:21.363 o.a.s.m.n.SaslNettyClient client-worker-1 [DEBUG] SaslNettyClient: Creating SASL DIGEST-MD5 client to authenticate to server 2021-01-29 19:03:21.368 o.a.s.m.n.SaslStormClientHandler client-worker-1 [DEBUG] Sending SASL_TOKEN_MESSAGE_REQUEST ... 2021-01-29 19:03:21.632 o.a.s.m.n.SaslStormClientHandler client-worker-1 [DEBUG] send/recv time (ms): 277 2021-01-29 19:03:21.633 o.a.s.m.n.SaslStormClientHandler client-worker-1 [ERROR] Unexpected message from server: {worker=cdf6f963-678c-45a4-91d2-e1067a9a8516, bpStatusId=1, bpTasks=[], nonBpTasks=[17, 1, 18, 3, 4, 22, 7, 8, 9, 12, 13]} {code} But the server sends the backpressure message first, before it deals with the SASL_TOKEN_MESSAGE_REQUEST message server log {code:java} 2021-01-29 19:03:21.473 o.a.s.m.n.SaslStormServerHandler Netty-server-localhost-6702-worker-1 [DEBUG] SASL credentials for storm topology wc is -8603731884381183101:-9091319821854384981 2021-01-29 19:03:21.482 o.a.s.u.Utils main [DEBUG] Using storm.yaml from resources 2021-01-29 19:03:21.490 o.a.s.d.w.WorkerState Netty-server-localhost-6702-worker-1 [INFO] Sending BackPressure status to new client. BPStatus: {worker=cdf6f963-678c-45a4-91d2-e1067a9a8516, bpStatusId=1, bpTasks=[], nonBpTasks=[17, 1, 18, 3, 4, 22, 7, 8, 9, 12, 13]}
[jira] [Assigned] (STORM-3763) Backpressure message ignored by the receiver caused the topology to not progress
[ https://issues.apache.org/jira/browse/STORM-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li reassigned STORM-3763: --- Assignee: Ethan Li > Backpressure message ignored by the receiver caused the topology to not > progress > > > Key: STORM-3763 > URL: https://issues.apache.org/jira/browse/STORM-3763 > Project: Apache Storm > Issue Type: Bug >Affects Versions: 2.0.0, 2.1.0, 2.2.0 >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > > We have noticed a case where topology is stuck due to the mis-interpretation > of backpressure messge: > At beginning, the topology ran fine but a downstream component had > backpressure, so it sent backpressure signal to its upstream component, and > the upstream component paused sending data to the downstream bolt. > Then the downstream component restarted (due to any reason, for example, > killed by supervisor due to heartbeat timeout). When it came back up, it > sends backpressure message to the upstream bolt. However, the upstream > component didn't know how to interpret the backpressure message so it logs > the below error and ignores the message. > {code:java} > 2021-01-28 19:41:37.175 o.a.s.m.n.SaslStormClientHandler client-worker-1 > [ERROR] Unexpected message from server: > {worker=4c38160a-3c66-4eff-8572-2d0c493bd6c1, bpStatusId=254, bpTasks=[], > nonBpTasks=[546, 790, 863]} > {code} > Then the downstream component will not receive any data from the upstream > component, so it won't have any backpressure (since no data is sent to it), > hence it won't send any backpressure update message to the upstream > component. This leads to a dead situation that the upstream component thinks > the downstream has backpressure so it paused sending data to it, while the > downstream doesn't have backpressure but can't receive any data from > upstream. The topology is stuck because of it. > Let's look at the code: > When the connection between the downstream (server) and upstream (client) is > established, > server invokes > https://github.com/apache/storm/blob/2.2.x-branch/storm-client/src/jvm/org/apache/storm/messaging/netty/StormServerHandler.java#L39-L41 > https://github.com/apache/storm/blob/2.2.x-branch/storm-client/src/jvm/org/apache/storm/daemon/worker/WorkerState.java#L237 > which sends backpressure messages to the client. > This is because in this pipeline, "StormServerHandler" is the only one with > that implemented "channelActive()" method. > https://github.com/apache/storm/blob/2.2.x-branch/storm-client/src/jvm/org/apache/storm/messaging/netty/StormServerPipelineFactory.java#L56 > However, the Client side expects authentication messages. > https://github.com/apache/storm/blob/2.2.x-branch/storm-client/src/jvm/org/apache/storm/messaging/netty/SaslStormClientHandler.java#L70-L75 > so the client can't interpret the backpressure message at the beginning, > hence "unexpected message". > This can be supported with an example. I have a wordcount topology running. > At the startup, the client tries to connect to the server. Once connected, it > sends a "SASL_TOKEN_MESSAGE_REQUEST". > client log > {code:java} > 021-01-29 19:03:21.355 o.a.s.m.n.SaslStormClientHandler client-worker-1 > [DEBUG] SASL credentials for storm topology wc is > -8603731884381183101:-9091319821854384981 > 2021-01-29 19:03:21.359 o.a.s.m.n.Client client-worker-1 [DEBUG] successfully > connected to openstorm14blue-n3.blue.ygrid.yahoo.com/10.215.73.209:6702, [id: > 0x29da2e9c, L:/10.215.73.209:45870 - > R:openstorm14blue-n3.blue.ygrid.yahoo.com/10.215.73.209:6702] [attempt 12] > 2021-01-29 19:03:21.359 o.a.s.m.n.SaslStormClientHandler client-worker-1 > [INFO] Connection established from /10.215.73.209:45870 to > openstorm14blue-n3.blue.ygrid.yahoo.com/10.215.73.209:6702 > ... > 2021-01-29 19:03:21.362 o.a.s.m.n.SaslStormClientHandler client-worker-1 > [DEBUG] Creating saslNettyClient now for channel: [id: 0x29da2e9c, > L:/10.215.73.209:45870 - > R:openstorm14blue-n3.blue.ygrid.yahoo.com/10.215.73.209:6702] > 2021-01-29 19:03:21.363 o.a.s.m.n.SaslNettyClient client-worker-1 [DEBUG] > SaslNettyClient: Creating SASL DIGEST-MD5 client to authenticate to server > 2021-01-29 19:03:21.368 o.a.s.m.n.SaslStormClientHandler client-worker-1 > [DEBUG] Sending SASL_TOKEN_MESSAGE_REQUEST > ... > 2021-01-29 19:03:21.632 o.a.s.m.n.SaslStormClientHandler client-worker-1 > [DEBUG] send/recv time (ms): 277 > 2021-01-29 19:03:21.633 o.a.s.m.n.SaslStormClientHandler client-worker-1 > [ERROR] Unexpected message from server: > {worker=cdf6f963-678c-45a4-91d2-e1067a9a8516, bpStatusId=1, bpTasks=[], > nonBpTasks=[17, 1, 18, 3, 4, 22, 7, 8, 9, 12, > 13]} > {code} > But the server sends the backpressure message first, before it deals with
[jira] [Updated] (STORM-3763) Backpressure message ignored by the receiver caused the topology to not progress
[ https://issues.apache.org/jira/browse/STORM-3763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3763: Description: We have noticed a case where topology is stuck due to the mis-interpretation of backpressure messge: At beginning, the topology ran fine but a downstream component had backpressure, so it sent backpressure signal to its upstream component, and the upstream component paused sending data to the downstream bolt. Then the downstream component restarted (due to any reason, for example, killed by supervisor due to heartbeat timeout). When it came back up, it sends backpressure message to the upstream bolt. However, the upstream component didn't know how to interpret the backpressure message so it logs the below error and ignores the message. {code:java} 2021-01-28 19:41:37.175 o.a.s.m.n.SaslStormClientHandler client-worker-1 [ERROR] Unexpected message from server: {worker=4c38160a-3c66-4eff-8572-2d0c493bd6c1, bpStatusId=254, bpTasks=[], nonBpTasks=[546, 790, 863]} {code} Then the downstream component will not receive any data from the upstream component, so it won't have any backpressure (since no data is sent to it), hence it won't send any backpressure update message to the upstream component. This leads to a dead situation that the upstream component thinks the downstream has backpressure so it paused sending data to it, while the downstream doesn't have backpressure but can't receive any data from upstream. The topology is stuck because of it. Let's look at the code: When the connection between the downstream (server) and upstream (client) is established, server invokes https://github.com/apache/storm/blob/2.2.x-branch/storm-client/src/jvm/org/apache/storm/messaging/netty/StormServerHandler.java#L39-L41 https://github.com/apache/storm/blob/2.2.x-branch/storm-client/src/jvm/org/apache/storm/daemon/worker/WorkerState.java#L237 which sends backpressure messages to the client. This is because in this pipeline, "StormServerHandler" is the only one with that implemented "channelActive()" method. https://github.com/apache/storm/blob/2.2.x-branch/storm-client/src/jvm/org/apache/storm/messaging/netty/StormServerPipelineFactory.java#L56 However, the Client side expects authentication messages. https://github.com/apache/storm/blob/2.2.x-branch/storm-client/src/jvm/org/apache/storm/messaging/netty/SaslStormClientHandler.java#L70-L75 so the client can't interpret the backpressure message at the beginning, hence "unexpected message". This can be supported with an example. I have a wordcount topology running. At the startup, the client tries to connect to the server. Once connected, it sends a "SASL_TOKEN_MESSAGE_REQUEST". client log {code:java} 021-01-29 19:03:21.355 o.a.s.m.n.SaslStormClientHandler client-worker-1 [DEBUG] SASL credentials for storm topology wc is -8603731884381183101:-9091319821854384981 2021-01-29 19:03:21.359 o.a.s.m.n.Client client-worker-1 [DEBUG] successfully connected to openstorm14blue-n3.blue.ygrid.yahoo.com/10.215.73.209:6702, [id: 0x29da2e9c, L:/10.215.73.209:45870 - R:openstorm14blue-n3.blue.ygrid.yahoo.com/10.215.73.209:6702] [attempt 12] 2021-01-29 19:03:21.359 o.a.s.m.n.SaslStormClientHandler client-worker-1 [INFO] Connection established from /10.215.73.209:45870 to openstorm14blue-n3.blue.ygrid.yahoo.com/10.215.73.209:6702 ... 2021-01-29 19:03:21.362 o.a.s.m.n.SaslStormClientHandler client-worker-1 [DEBUG] Creating saslNettyClient now for channel: [id: 0x29da2e9c, L:/10.215.73.209:45870 - R:openstorm14blue-n3.blue.ygrid.yahoo.com/10.215.73.209:6702] 2021-01-29 19:03:21.363 o.a.s.m.n.SaslNettyClient client-worker-1 [DEBUG] SaslNettyClient: Creating SASL DIGEST-MD5 client to authenticate to server 2021-01-29 19:03:21.368 o.a.s.m.n.SaslStormClientHandler client-worker-1 [DEBUG] Sending SASL_TOKEN_MESSAGE_REQUEST ... 2021-01-29 19:03:21.632 o.a.s.m.n.SaslStormClientHandler client-worker-1 [DEBUG] send/recv time (ms): 277 2021-01-29 19:03:21.633 o.a.s.m.n.SaslStormClientHandler client-worker-1 [ERROR] Unexpected message from server: {worker=cdf6f963-678c-45a4-91d2-e1067a9a8516, bpStatusId=1, bpTasks=[], nonBpTasks=[17, 1, 18, 3, 4, 22, 7, 8, 9, 12, 13]} {code} But the server sends the backpressure message first, before it deals with the SASL_TOKEN_MESSAGE_REQUEST message server log {code:java} 2021-01-29 19:03:21.473 o.a.s.m.n.SaslStormServerHandler Netty-server-localhost-6702-worker-1 [DEBUG] SASL credentials for storm topology wc is -8603731884381183101:-9091319821854384981 2021-01-29 19:03:21.482 o.a.s.u.Utils main [DEBUG] Using storm.yaml from resources 2021-01-29 19:03:21.490 o.a.s.d.w.WorkerState Netty-server-localhost-6702-worker-1 [INFO] Sending BackPressure status to new client. BPStatus: {worker=cdf6f963-678c-45a4-91d2-e1067a9a8516, bpStatusId=1, bpTasks=[], nonBpTasks=[17, 1, 18, 3, 4, 22, 7, 8, 9, 12, 13]}
[jira] [Created] (STORM-3763) Backpressure message ignored by the receiver caused the topology to not progress
Ethan Li created STORM-3763: --- Summary: Backpressure message ignored by the receiver caused the topology to not progress Key: STORM-3763 URL: https://issues.apache.org/jira/browse/STORM-3763 Project: Apache Storm Issue Type: Bug Affects Versions: 2.1.0, 2.0.0, 2.2.0 Reporter: Ethan Li We have noticed a case where topology is stuck due to the mis-interpretation of backpressure messge: At beginning, the topology ran fine but a downstream component had backpressure, so it sent backpressure signal to its upstream component, and the upstream component paused sending data to the downstream bolt. Then the downstream component restarted (due to any reason, for example, killed by supervisor due to heartbeat timeout). When it came back up, when a downstream component restarts, it sends backpressure message to the upstream bolt. However, the upstream component didn't know how to interpret the backpressure message so it logs the below error and ignores the message. Then the downstream component will not receive any data from the upstream bolt, so it won't have any backpressure (since no data is sent to it), hence it won't send any backpressure update message to the upstream bolt. This leads to a dead situation that the upstream component thinks the downstream has backpressure so it paused sending data to it, while the downstream doesn't have backpressure but can't receive any data from upstream. {code:java} 2021-01-28 19:41:37.175 o.a.s.m.n.SaslStormClientHandler client-worker-1 [ERROR] Unexpected message from server: {worker=4c38160a-3c66-4eff-8572-2d0c493bd6c1, bpStatusId=254, bpTasks=[], nonBpTasks=[546, 790, 863]} {code} Let's look at the code: When the connection between the downstream (server) and upstream (client) is established, server invokes https://github.com/apache/storm/blob/2.2.x-branch/storm-client/src/jvm/org/apache/storm/messaging/netty/StormServerHandler.java#L39-L41 https://github.com/apache/storm/blob/2.2.x-branch/storm-client/src/jvm/org/apache/storm/daemon/worker/WorkerState.java#L237 which sends backpressure messages to the client. This is because in this pipeline, "StormServerHandler" is the only one with that implemented "channelActive()" method. https://github.com/apache/storm/blob/2.2.x-branch/storm-client/src/jvm/org/apache/storm/messaging/netty/StormServerPipelineFactory.java#L56 However, the Client side expects authentication messages. https://github.com/apache/storm/blob/2.2.x-branch/storm-client/src/jvm/org/apache/storm/messaging/netty/SaslStormClientHandler.java#L70-L75 so the client can't interpret the backpressure message at the beginning, hence "unexpected message". This can be supported with an example. I have a wordcount topology running. At the startup, the client tries to connect to the server. Once connected, it sends a "SASL_TOKEN_MESSAGE_REQUEST". client log {code:java} 021-01-29 19:03:21.355 o.a.s.m.n.SaslStormClientHandler client-worker-1 [DEBUG] SASL credentials for storm topology wc is -8603731884381183101:-9091319821854384981 2021-01-29 19:03:21.359 o.a.s.m.n.Client client-worker-1 [DEBUG] successfully connected to openstorm14blue-n3.blue.ygrid.yahoo.com/10.215.73.209:6702, [id: 0x29da2e9c, L:/10.215.73.209:45870 - R:openstorm14blue-n3.blue.ygrid.yahoo.com/10.215.73.209:6702] [attempt 12] 2021-01-29 19:03:21.359 o.a.s.m.n.SaslStormClientHandler client-worker-1 [INFO] Connection established from /10.215.73.209:45870 to openstorm14blue-n3.blue.ygrid.yahoo.com/10.215.73.209:6702 ... 2021-01-29 19:03:21.362 o.a.s.m.n.SaslStormClientHandler client-worker-1 [DEBUG] Creating saslNettyClient now for channel: [id: 0x29da2e9c, L:/10.215.73.209:45870 - R:openstorm14blue-n3.blue.ygrid.yahoo.com/10.215.73.209:6702] 2021-01-29 19:03:21.363 o.a.s.m.n.SaslNettyClient client-worker-1 [DEBUG] SaslNettyClient: Creating SASL DIGEST-MD5 client to authenticate to server 2021-01-29 19:03:21.368 o.a.s.m.n.SaslStormClientHandler client-worker-1 [DEBUG] Sending SASL_TOKEN_MESSAGE_REQUEST ... 2021-01-29 19:03:21.632 o.a.s.m.n.SaslStormClientHandler client-worker-1 [DEBUG] send/recv time (ms): 277 2021-01-29 19:03:21.633 o.a.s.m.n.SaslStormClientHandler client-worker-1 [ERROR] Unexpected message from server: {worker=cdf6f963-678c-45a4-91d2-e1067a9a8516, bpStatusId=1, bpTasks=[], nonBpTasks=[17, 1, 18, 3, 4, 22, 7, 8, 9, 12, 13]} {code} But the server sends the backpressure message first, before it deals with the SASL_TOKEN_MESSAGE_REQUEST message server log {code:java} 2021-01-29 19:03:21.473 o.a.s.m.n.SaslStormServerHandler Netty-server-localhost-6702-worker-1 [DEBUG] SASL credentials for storm topology wc is -8603731884381183101:-9091319821854384981 2021-01-29 19:03:21.482 o.a.s.u.Utils main [DEBUG] Using storm.yaml from resources 2021-01-29 19:03:21.490 o.a.s.d.w.WorkerState Netty-server-localhost-6702-worker-1
[jira] [Resolved] (STORM-3744) IntelliJ does not find shaded classes
[ https://issues.apache.org/jira/browse/STORM-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3744. - Fix Version/s: 2.3.0 Resolution: Fixed Thanks [~bipinprasad] I merged this to master: 3b48efc15aff240c2dc883d503362d3f20db2692 > IntelliJ does not find shaded classes > - > > Key: STORM-3744 > URL: https://issues.apache.org/jira/browse/STORM-3744 > Project: Apache Storm > Issue Type: Improvement > Components: build >Reporter: Bipin Prasad >Assignee: Bipin Prasad >Priority: Minor > Fix For: 2.3.0 > > Time Spent: 2h > Remaining Estimate: 0h > > When storm project is opened in IntelliJ, it does not find the shaded classes > packaged under storm-shaded-deps package. This precludes compiling and > debugging the project inside the IDE. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3755) While scheduling multiple ackers with executor use best effort basis
[ https://issues.apache.org/jira/browse/STORM-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3755. - Resolution: Fixed Thanks [~bipinprasad]. I merges this to master: 4de0622c6e25e891419f8ccd002e243470dfe969 > While scheduling multiple ackers with executor use best effort basis > > > Key: STORM-3755 > URL: https://issues.apache.org/jira/browse/STORM-3755 > Project: Apache Storm > Issue Type: Improvement > Components: storm-server >Reporter: Bipin Prasad >Assignee: Bipin Prasad >Priority: Minor > Fix For: 2.3.0 > > Time Spent: 20m > Remaining Estimate: 0h > > In the scheduling loop, if the number of ackers to be scheduled along with > the executor is greater than zero, and they cannot all fit, then try to fit > as many ackers as possible. > To do this, fit the executor first. If this succeeds, then attempt to fit as > many ackers are possible upto and including the max required as per > calculation (and as low as zero). This second step should not fail. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (STORM-3757) Update jackson version to 2.10.0
[ https://issues.apache.org/jira/browse/STORM-3757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3757: Affects Version/s: 2.2.0 2.0.0 2.1.0 > Update jackson version to 2.10.0 > > > Key: STORM-3757 > URL: https://issues.apache.org/jira/browse/STORM-3757 > Project: Apache Storm > Issue Type: Dependency upgrade >Affects Versions: 2.0.0, 2.1.0, 2.2.0 >Reporter: Ethan Li >Assignee: Kishor Patil >Priority: Major > Fix For: 2.3.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Update jackson version to 2.10.0 to avoid CVE-2019-14892 and CVE-2019-14893 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3757) Update jackson version to 2.10.0
[ https://issues.apache.org/jira/browse/STORM-3757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3757. - Fix Version/s: 2.3.0 Resolution: Fixed [~kishorvpatil] I filed a JIRA for this since I feel we should include this in the release note. And I merged this to master: 874b87cd7b71107558f734cdb5cdb7f36f5f3d42 Thanks! > Update jackson version to 2.10.0 > > > Key: STORM-3757 > URL: https://issues.apache.org/jira/browse/STORM-3757 > Project: Apache Storm > Issue Type: Dependency upgrade >Reporter: Ethan Li >Assignee: Kishor Patil >Priority: Major > Fix For: 2.3.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Update jackson version to 2.10.0 to avoid CVE-2019-14892 and CVE-2019-14893 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3757) Update jackson version to 2.10.0
Ethan Li created STORM-3757: --- Summary: Update jackson version to 2.10.0 Key: STORM-3757 URL: https://issues.apache.org/jira/browse/STORM-3757 Project: Apache Storm Issue Type: Dependency upgrade Reporter: Ethan Li Assignee: Kishor Patil Update jackson version to 2.10.0 to avoid CVE-2019-14892 and CVE-2019-14893 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (STORM-3704) Cosmetic: columns shifted in "Topology summary" table
[ https://issues.apache.org/jira/browse/STORM-3704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3704: Fix Version/s: 2.3.0 > Cosmetic: columns shifted in "Topology summary" table > - > > Key: STORM-3704 > URL: https://issues.apache.org/jira/browse/STORM-3704 > Project: Apache Storm > Issue Type: Bug > Components: storm-ui >Affects Versions: 2.2.0 >Reporter: Vitaliy Fuks >Assignee: Vitaliy Fuks >Priority: Trivial > Fix For: 2.3.0, 2.2.1 > > Time Spent: 50m > Remaining Estimate: 0h > > In STORM-3534 generic resources were added to be visible in Storm UI. There > was a typo and table headers are shifted if generic resources aren't > displayed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3704) Cosmetic: columns shifted in "Topology summary" table
[ https://issues.apache.org/jira/browse/STORM-3704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3704. - Fix Version/s: 2.2.1 2.3.0 Resolution: Fixed Merged this to master : ab7e003ca0e811f95b092561870d69b0591d6d46 Cherry-picked the commit to 2.2.x-branch: 91927f653cd1e078e403be5cf38500f8ea6c841a > Cosmetic: columns shifted in "Topology summary" table > - > > Key: STORM-3704 > URL: https://issues.apache.org/jira/browse/STORM-3704 > Project: Apache Storm > Issue Type: Bug > Components: storm-ui >Affects Versions: 2.2.0 >Reporter: Vitaliy Fuks >Assignee: Vitaliy Fuks >Priority: Trivial > Fix For: 2.3.0, 2.2.1 > > Time Spent: 50m > Remaining Estimate: 0h > > In STORM-3534 generic resources were added to be visible in Storm UI. There > was a typo and table headers are shifted if generic resources aren't > displayed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (STORM-3704) Cosmetic: columns shifted in "Topology summary" table
[ https://issues.apache.org/jira/browse/STORM-3704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li reassigned STORM-3704: --- Assignee: Vitaliy Fuks > Cosmetic: columns shifted in "Topology summary" table > - > > Key: STORM-3704 > URL: https://issues.apache.org/jira/browse/STORM-3704 > Project: Apache Storm > Issue Type: Bug > Components: storm-ui >Affects Versions: 2.2.0 >Reporter: Vitaliy Fuks >Assignee: Vitaliy Fuks >Priority: Trivial > Time Spent: 0.5h > Remaining Estimate: 0h > > In STORM-3534 generic resources were added to be visible in Storm UI. There > was a typo and table headers are shifted if generic resources aren't > displayed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (STORM-3704) Cosmetic: columns shifted in "Topology summary" table
[ https://issues.apache.org/jira/browse/STORM-3704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303560#comment-17303560 ] Ethan Li commented on STORM-3704: - [~vitaliy.fuks] Thanks for the contribution. I added you as a contributor and assigned this JIRA to you. > Cosmetic: columns shifted in "Topology summary" table > - > > Key: STORM-3704 > URL: https://issues.apache.org/jira/browse/STORM-3704 > Project: Apache Storm > Issue Type: Bug > Components: storm-ui >Affects Versions: 2.2.0 >Reporter: Vitaliy Fuks >Assignee: Vitaliy Fuks >Priority: Trivial > Time Spent: 0.5h > Remaining Estimate: 0h > > In STORM-3534 generic resources were added to be visible in Storm UI. There > was a typo and table headers are shifted if generic resources aren't > displayed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (STORM-3735) Kyro serialization fails on some metric tuples when topology.fall.back.on.java.serialization is false
[ https://issues.apache.org/jira/browse/STORM-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268047#comment-17268047 ] Ethan Li edited comment on STORM-3735 at 1/19/21, 5:13 PM: --- With STORM-3682 (code: https://github.com/apache/storm/pull/3371/), I believe NodeInfo is no longer needed in kryo registration since the related code is removed with this code change #3371. We can evaluate it and maybe remove the NodeInfo from kryo registration in the future. was (Author: ethanli): With STORM-3682 (code: https://github.com/apache/storm/pull/3371/), I believe NodeInfo is not longer needed in kryo registration since the related code is removed with this code change #3371. We can evaluate it and maybe remove the NodeInfo from kryo registration in the future. > Kyro serialization fails on some metric tuples when > topology.fall.back.on.java.serialization is false > - > > Key: STORM-3735 > URL: https://issues.apache.org/jira/browse/STORM-3735 > Project: Apache Storm > Issue Type: Bug >Affects Versions: 2.0.0, 2.1.0, 2.2.0 >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Fix For: 2.3.0 > > Time Spent: 20m > Remaining Estimate: 0h > > When a metric consumer is used, metrics will be sent from all executors to > the consumer. In some of the metrics, it includes NodeInfo object, and kryo > serialization will fail if topology.fall.back.on.java.serialization is false. > {code:title=worker logs} > 2021-01-13 20:16:37.017 o.a.s.e.ExecutorTransfer > Thread-16-__system-executor[-1, -1] [INFO] TRANSFERRING tuple [dest: 5 tuple: > source: __system:-1, stream: __metrics, id: {}, [TASK_INFO: { host: > openstorm14blue-n4.blue.ygrid.yahoo.com:6703 comp: __system[-1]}, [ > [CGroupCpuStat = {nr.throttled-percentage=46.544980443285525, > nr.period-count=767, nr.throttled-count=357, throttled.time-ms=27208}], > [CGroupMemoryLimit = 1342177280], [__recv-iconnection = {dequeuedMessages=0, > enqueued={/10.215.73.210:47038=3169}}], [__send-ico > nnection = {NodeInfo(node:149a917b-bc75-49c8-b351-f74b8ae0fbed-10.215.73.210, > port:[6701])={reconnects=1, src=/10.215.73.210:34938, pending=0, > dest=openstorm14blue-n4.blue.ygrid.yahoo.com/10.215.73.210:6701, sent=1896, > lostOnSend=0}, NodeInfo(node:149a917b-bc75- > 49c8-b351-f74b8ae0fbed-10.215.73.210, port:[6702])={reconnects=8, > src=/10.215.73.210:39476, pending=0, > dest=openstorm14blue-n4.blue.ygrid.yahoo.com/10.215.73.210:6702, sent=2115, > lostOnSend=0}, > NodeInfo(node:b77b5ec6-15ee-4bd2-a9b8-12fcadde7744-10.215.73.211, po > rt:[6700])={reconnects=125, pending=0, > dest=openstorm14blue-n5.blue.ygrid.yahoo.com/10.215.73.211:6700, sent=108, > lostOnSend=1331}}], [CGroupMemory = 316485632], [CGroupCpu = {user-ms=36960, > sys-ms=25860}], [memory.pools.Metaspace.usage = 0.9695890907929322], [m > emory.heap.max = 1073741824], [receive-queue-overflow = 0], > [memory.pools.Compressed-Class-Space.used = 6237424], > [memory.pools.Compressed-Class-Space.max = 1073741824], [memory.non-heap.init > = 2555904], [worker-transfer-queue-overflow = 0], [memory.pools.Metasp > ace.committed = 42074112], [receive-queue-sojourn_time_ms = 0.0], > [threads.waiting.count = 5], [memory.pools.G1-Eden-Space.usage = > 0.2778], [memory.pools.Metaspace.used = 40798320], > [memory.total.used = 101783888], [memory.pools.Code-Cache.init = 255 > 5904], [memory.non-heap.committed = 63832064], [GC.G1-Young-Generation.time = > 677], [receive-queue-insert_failures = 0.0], [memory.total.init = 130482176], > [GC.G1-Old-Generation.count = 0], [memory.pools.Metaspace.init = 0], > [memory.pools.G1-Survivor-Space.commi > tted = 5242880], [worker-transfer-queue-population = 0], > [memory.pools.Compressed-Class-Space.committed = 6684672], > [threads.timed_waiting.count = 31], [memory.pools.G1-Eden-Space.init = > 7340032], [memory.pools.Metaspace.max = -1], [memory.pools.G1-Survivor-Spac > e.used = 5242880], [memory.heap.init = 127926272], > [memory.pools.G1-Old-Gen.used-after-gc = 0], [worker-transfer-queue-capacity > = 1024], [memory.pools.G1-Survivor-Space.used-after-gc = 5242880], > [memory.pools.G1-Old-Gen.committed = 47185920], [memory.pools.G1-Ed > en-Space.committed = 75497472], [receive-queue-arrival_rate_secs = > 0.109421162052741], [memory.pools.Compressed-Class-Space.usage = > 0.0058090537786483765], [TGT-TimeToExpiryMsecs = 71282993], > [threads.runnable.count = 15], [worker-transfer-queue-insert_failures > = 0.0], [worker-transfer-queue-sojourn_time_ms = 0.0], [memory.heap.committed > = 127926272], [memory.non-heap.max = -1], [threads.daemon.count = 29], > [memory.pools.Code-Cache.max = 251658240], > [worker-transfer-queue-arrival_rate_secs =
[jira] [Commented] (STORM-3735) Kyro serialization fails on some metric tuples when topology.fall.back.on.java.serialization is false
[ https://issues.apache.org/jira/browse/STORM-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268047#comment-17268047 ] Ethan Li commented on STORM-3735: - With STORM-3682 (code: https://github.com/apache/storm/pull/3371/), I believe NodeInfo is not longer needed in kryo registration since the related code is removed with this code change #3371. We can evaluate it and maybe remove the NodeInfo from kryo registration in the future. > Kyro serialization fails on some metric tuples when > topology.fall.back.on.java.serialization is false > - > > Key: STORM-3735 > URL: https://issues.apache.org/jira/browse/STORM-3735 > Project: Apache Storm > Issue Type: Bug >Affects Versions: 2.0.0, 2.1.0, 2.2.0 >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Fix For: 2.3.0 > > Time Spent: 20m > Remaining Estimate: 0h > > When a metric consumer is used, metrics will be sent from all executors to > the consumer. In some of the metrics, it includes NodeInfo object, and kryo > serialization will fail if topology.fall.back.on.java.serialization is false. > {code:title=worker logs} > 2021-01-13 20:16:37.017 o.a.s.e.ExecutorTransfer > Thread-16-__system-executor[-1, -1] [INFO] TRANSFERRING tuple [dest: 5 tuple: > source: __system:-1, stream: __metrics, id: {}, [TASK_INFO: { host: > openstorm14blue-n4.blue.ygrid.yahoo.com:6703 comp: __system[-1]}, [ > [CGroupCpuStat = {nr.throttled-percentage=46.544980443285525, > nr.period-count=767, nr.throttled-count=357, throttled.time-ms=27208}], > [CGroupMemoryLimit = 1342177280], [__recv-iconnection = {dequeuedMessages=0, > enqueued={/10.215.73.210:47038=3169}}], [__send-ico > nnection = {NodeInfo(node:149a917b-bc75-49c8-b351-f74b8ae0fbed-10.215.73.210, > port:[6701])={reconnects=1, src=/10.215.73.210:34938, pending=0, > dest=openstorm14blue-n4.blue.ygrid.yahoo.com/10.215.73.210:6701, sent=1896, > lostOnSend=0}, NodeInfo(node:149a917b-bc75- > 49c8-b351-f74b8ae0fbed-10.215.73.210, port:[6702])={reconnects=8, > src=/10.215.73.210:39476, pending=0, > dest=openstorm14blue-n4.blue.ygrid.yahoo.com/10.215.73.210:6702, sent=2115, > lostOnSend=0}, > NodeInfo(node:b77b5ec6-15ee-4bd2-a9b8-12fcadde7744-10.215.73.211, po > rt:[6700])={reconnects=125, pending=0, > dest=openstorm14blue-n5.blue.ygrid.yahoo.com/10.215.73.211:6700, sent=108, > lostOnSend=1331}}], [CGroupMemory = 316485632], [CGroupCpu = {user-ms=36960, > sys-ms=25860}], [memory.pools.Metaspace.usage = 0.9695890907929322], [m > emory.heap.max = 1073741824], [receive-queue-overflow = 0], > [memory.pools.Compressed-Class-Space.used = 6237424], > [memory.pools.Compressed-Class-Space.max = 1073741824], [memory.non-heap.init > = 2555904], [worker-transfer-queue-overflow = 0], [memory.pools.Metasp > ace.committed = 42074112], [receive-queue-sojourn_time_ms = 0.0], > [threads.waiting.count = 5], [memory.pools.G1-Eden-Space.usage = > 0.2778], [memory.pools.Metaspace.used = 40798320], > [memory.total.used = 101783888], [memory.pools.Code-Cache.init = 255 > 5904], [memory.non-heap.committed = 63832064], [GC.G1-Young-Generation.time = > 677], [receive-queue-insert_failures = 0.0], [memory.total.init = 130482176], > [GC.G1-Old-Generation.count = 0], [memory.pools.Metaspace.init = 0], > [memory.pools.G1-Survivor-Space.commi > tted = 5242880], [worker-transfer-queue-population = 0], > [memory.pools.Compressed-Class-Space.committed = 6684672], > [threads.timed_waiting.count = 31], [memory.pools.G1-Eden-Space.init = > 7340032], [memory.pools.Metaspace.max = -1], [memory.pools.G1-Survivor-Spac > e.used = 5242880], [memory.heap.init = 127926272], > [memory.pools.G1-Old-Gen.used-after-gc = 0], [worker-transfer-queue-capacity > = 1024], [memory.pools.G1-Survivor-Space.used-after-gc = 5242880], > [memory.pools.G1-Old-Gen.committed = 47185920], [memory.pools.G1-Ed > en-Space.committed = 75497472], [receive-queue-arrival_rate_secs = > 0.109421162052741], [memory.pools.Compressed-Class-Space.usage = > 0.0058090537786483765], [TGT-TimeToExpiryMsecs = 71282993], > [threads.runnable.count = 15], [worker-transfer-queue-insert_failures > = 0.0], [worker-transfer-queue-sojourn_time_ms = 0.0], [memory.heap.committed > = 127926272], [memory.non-heap.max = -1], [threads.daemon.count = 29], > [memory.pools.Code-Cache.max = 251658240], > [worker-transfer-queue-arrival_rate_secs = 90.47776674390379], [memory > .heap.usage = 0.037109360098838806], [memory.pools.G1-Old-Gen.init = > 120586240], [memory.pools.Code-Cache.committed = 15138816], > [receive-queue-pct_full = 0.0], [worker-transfer-queue-pct_full = 0.0], > [receive-queue-population = 0], [memory.pools.Compressed-Clas > s-Space.init = 0], [memory.pools.Code-Cache.usage =
[jira] [Resolved] (STORM-3735) Kyro serialization fails on some metric tuples when topology.fall.back.on.java.serialization is false
[ https://issues.apache.org/jira/browse/STORM-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3735. - Fix Version/s: 2.3.0 Resolution: Fixed Merged to master (b23e69170f94f8f1265172bd0609c4f3471cf490) > Kyro serialization fails on some metric tuples when > topology.fall.back.on.java.serialization is false > - > > Key: STORM-3735 > URL: https://issues.apache.org/jira/browse/STORM-3735 > Project: Apache Storm > Issue Type: Bug >Affects Versions: 2.0.0, 2.1.0, 2.2.0 >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Fix For: 2.3.0 > > Time Spent: 20m > Remaining Estimate: 0h > > When a metric consumer is used, metrics will be sent from all executors to > the consumer. In some of the metrics, it includes NodeInfo object, and kryo > serialization will fail if topology.fall.back.on.java.serialization is false. > {code:title=worker logs} > 2021-01-13 20:16:37.017 o.a.s.e.ExecutorTransfer > Thread-16-__system-executor[-1, -1] [INFO] TRANSFERRING tuple [dest: 5 tuple: > source: __system:-1, stream: __metrics, id: {}, [TASK_INFO: { host: > openstorm14blue-n4.blue.ygrid.yahoo.com:6703 comp: __system[-1]}, [ > [CGroupCpuStat = {nr.throttled-percentage=46.544980443285525, > nr.period-count=767, nr.throttled-count=357, throttled.time-ms=27208}], > [CGroupMemoryLimit = 1342177280], [__recv-iconnection = {dequeuedMessages=0, > enqueued={/10.215.73.210:47038=3169}}], [__send-ico > nnection = {NodeInfo(node:149a917b-bc75-49c8-b351-f74b8ae0fbed-10.215.73.210, > port:[6701])={reconnects=1, src=/10.215.73.210:34938, pending=0, > dest=openstorm14blue-n4.blue.ygrid.yahoo.com/10.215.73.210:6701, sent=1896, > lostOnSend=0}, NodeInfo(node:149a917b-bc75- > 49c8-b351-f74b8ae0fbed-10.215.73.210, port:[6702])={reconnects=8, > src=/10.215.73.210:39476, pending=0, > dest=openstorm14blue-n4.blue.ygrid.yahoo.com/10.215.73.210:6702, sent=2115, > lostOnSend=0}, > NodeInfo(node:b77b5ec6-15ee-4bd2-a9b8-12fcadde7744-10.215.73.211, po > rt:[6700])={reconnects=125, pending=0, > dest=openstorm14blue-n5.blue.ygrid.yahoo.com/10.215.73.211:6700, sent=108, > lostOnSend=1331}}], [CGroupMemory = 316485632], [CGroupCpu = {user-ms=36960, > sys-ms=25860}], [memory.pools.Metaspace.usage = 0.9695890907929322], [m > emory.heap.max = 1073741824], [receive-queue-overflow = 0], > [memory.pools.Compressed-Class-Space.used = 6237424], > [memory.pools.Compressed-Class-Space.max = 1073741824], [memory.non-heap.init > = 2555904], [worker-transfer-queue-overflow = 0], [memory.pools.Metasp > ace.committed = 42074112], [receive-queue-sojourn_time_ms = 0.0], > [threads.waiting.count = 5], [memory.pools.G1-Eden-Space.usage = > 0.2778], [memory.pools.Metaspace.used = 40798320], > [memory.total.used = 101783888], [memory.pools.Code-Cache.init = 255 > 5904], [memory.non-heap.committed = 63832064], [GC.G1-Young-Generation.time = > 677], [receive-queue-insert_failures = 0.0], [memory.total.init = 130482176], > [GC.G1-Old-Generation.count = 0], [memory.pools.Metaspace.init = 0], > [memory.pools.G1-Survivor-Space.commi > tted = 5242880], [worker-transfer-queue-population = 0], > [memory.pools.Compressed-Class-Space.committed = 6684672], > [threads.timed_waiting.count = 31], [memory.pools.G1-Eden-Space.init = > 7340032], [memory.pools.Metaspace.max = -1], [memory.pools.G1-Survivor-Spac > e.used = 5242880], [memory.heap.init = 127926272], > [memory.pools.G1-Old-Gen.used-after-gc = 0], [worker-transfer-queue-capacity > = 1024], [memory.pools.G1-Survivor-Space.used-after-gc = 5242880], > [memory.pools.G1-Old-Gen.committed = 47185920], [memory.pools.G1-Ed > en-Space.committed = 75497472], [receive-queue-arrival_rate_secs = > 0.109421162052741], [memory.pools.Compressed-Class-Space.usage = > 0.0058090537786483765], [TGT-TimeToExpiryMsecs = 71282993], > [threads.runnable.count = 15], [worker-transfer-queue-insert_failures > = 0.0], [worker-transfer-queue-sojourn_time_ms = 0.0], [memory.heap.committed > = 127926272], [memory.non-heap.max = -1], [threads.daemon.count = 29], > [memory.pools.Code-Cache.max = 251658240], > [worker-transfer-queue-arrival_rate_secs = 90.47776674390379], [memory > .heap.usage = 0.037109360098838806], [memory.pools.G1-Old-Gen.init = > 120586240], [memory.pools.Code-Cache.committed = 15138816], > [receive-queue-pct_full = 0.0], [worker-transfer-queue-pct_full = 0.0], > [receive-queue-population = 0], [memory.pools.Compressed-Clas > s-Space.init = 0], [memory.pools.Code-Cache.usage = 0.059299468994140625], > [worker-transfer-queue-dropped_messages = 0], [GC.G1-Young-Generation.count = > 18], [memory.pools.Code-Cache.used = 14923200], > [memory.pools.G1-Old-Gen.usage =
[jira] [Assigned] (STORM-3735) Kyro serialization fails on some metric tuples when topology.fall.back.on.java.serialization is false
[ https://issues.apache.org/jira/browse/STORM-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li reassigned STORM-3735: --- Assignee: Ethan Li > Kyro serialization fails on some metric tuples when > topology.fall.back.on.java.serialization is false > - > > Key: STORM-3735 > URL: https://issues.apache.org/jira/browse/STORM-3735 > Project: Apache Storm > Issue Type: Bug >Affects Versions: 2.0.0, 2.1.0, 2.2.0 >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > > When a metric consumer is used, metrics will be sent from all executors to > the consumer. In some of the metrics, it includes NodeInfo object, and kryo > serialization will fail if topology.fall.back.on.java.serialization is false. > {code:title=worker logs} > 2021-01-13 20:16:37.017 o.a.s.e.ExecutorTransfer > Thread-16-__system-executor[-1, -1] [INFO] TRANSFERRING tuple [dest: 5 tuple: > source: __system:-1, stream: __metrics, id: {}, [TASK_INFO: { host: > openstorm14blue-n4.blue.ygrid.yahoo.com:6703 comp: __system[-1]}, [ > [CGroupCpuStat = {nr.throttled-percentage=46.544980443285525, > nr.period-count=767, nr.throttled-count=357, throttled.time-ms=27208}], > [CGroupMemoryLimit = 1342177280], [__recv-iconnection = {dequeuedMessages=0, > enqueued={/10.215.73.210:47038=3169}}], [__send-ico > nnection = {NodeInfo(node:149a917b-bc75-49c8-b351-f74b8ae0fbed-10.215.73.210, > port:[6701])={reconnects=1, src=/10.215.73.210:34938, pending=0, > dest=openstorm14blue-n4.blue.ygrid.yahoo.com/10.215.73.210:6701, sent=1896, > lostOnSend=0}, NodeInfo(node:149a917b-bc75- > 49c8-b351-f74b8ae0fbed-10.215.73.210, port:[6702])={reconnects=8, > src=/10.215.73.210:39476, pending=0, > dest=openstorm14blue-n4.blue.ygrid.yahoo.com/10.215.73.210:6702, sent=2115, > lostOnSend=0}, > NodeInfo(node:b77b5ec6-15ee-4bd2-a9b8-12fcadde7744-10.215.73.211, po > rt:[6700])={reconnects=125, pending=0, > dest=openstorm14blue-n5.blue.ygrid.yahoo.com/10.215.73.211:6700, sent=108, > lostOnSend=1331}}], [CGroupMemory = 316485632], [CGroupCpu = {user-ms=36960, > sys-ms=25860}], [memory.pools.Metaspace.usage = 0.9695890907929322], [m > emory.heap.max = 1073741824], [receive-queue-overflow = 0], > [memory.pools.Compressed-Class-Space.used = 6237424], > [memory.pools.Compressed-Class-Space.max = 1073741824], [memory.non-heap.init > = 2555904], [worker-transfer-queue-overflow = 0], [memory.pools.Metasp > ace.committed = 42074112], [receive-queue-sojourn_time_ms = 0.0], > [threads.waiting.count = 5], [memory.pools.G1-Eden-Space.usage = > 0.2778], [memory.pools.Metaspace.used = 40798320], > [memory.total.used = 101783888], [memory.pools.Code-Cache.init = 255 > 5904], [memory.non-heap.committed = 63832064], [GC.G1-Young-Generation.time = > 677], [receive-queue-insert_failures = 0.0], [memory.total.init = 130482176], > [GC.G1-Old-Generation.count = 0], [memory.pools.Metaspace.init = 0], > [memory.pools.G1-Survivor-Space.commi > tted = 5242880], [worker-transfer-queue-population = 0], > [memory.pools.Compressed-Class-Space.committed = 6684672], > [threads.timed_waiting.count = 31], [memory.pools.G1-Eden-Space.init = > 7340032], [memory.pools.Metaspace.max = -1], [memory.pools.G1-Survivor-Spac > e.used = 5242880], [memory.heap.init = 127926272], > [memory.pools.G1-Old-Gen.used-after-gc = 0], [worker-transfer-queue-capacity > = 1024], [memory.pools.G1-Survivor-Space.used-after-gc = 5242880], > [memory.pools.G1-Old-Gen.committed = 47185920], [memory.pools.G1-Ed > en-Space.committed = 75497472], [receive-queue-arrival_rate_secs = > 0.109421162052741], [memory.pools.Compressed-Class-Space.usage = > 0.0058090537786483765], [TGT-TimeToExpiryMsecs = 71282993], > [threads.runnable.count = 15], [worker-transfer-queue-insert_failures > = 0.0], [worker-transfer-queue-sojourn_time_ms = 0.0], [memory.heap.committed > = 127926272], [memory.non-heap.max = -1], [threads.daemon.count = 29], > [memory.pools.Code-Cache.max = 251658240], > [worker-transfer-queue-arrival_rate_secs = 90.47776674390379], [memory > .heap.usage = 0.037109360098838806], [memory.pools.G1-Old-Gen.init = > 120586240], [memory.pools.Code-Cache.committed = 15138816], > [receive-queue-pct_full = 0.0], [worker-transfer-queue-pct_full = 0.0], > [receive-queue-population = 0], [memory.pools.Compressed-Clas > s-Space.init = 0], [memory.pools.Code-Cache.usage = 0.059299468994140625], > [worker-transfer-queue-dropped_messages = 0], [GC.G1-Young-Generation.count = > 18], [memory.pools.Code-Cache.used = 14923200], > [memory.pools.G1-Old-Gen.usage = 0.012695297598838806], [memo > ry.non-heap.usage = -6.196368E7], [memory.total.max = 1073741823], > [threads.count = 51], [memory.heap.used = 39845872], >
[jira] [Updated] (STORM-3735) Kyro serialization fails on some metric tuples when topology.fall.back.on.java.serialization is false
[ https://issues.apache.org/jira/browse/STORM-3735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3735: Affects Version/s: 2.2.0 2.0.0 2.1.0 > Kyro serialization fails on some metric tuples when > topology.fall.back.on.java.serialization is false > - > > Key: STORM-3735 > URL: https://issues.apache.org/jira/browse/STORM-3735 > Project: Apache Storm > Issue Type: Bug >Affects Versions: 2.0.0, 2.1.0, 2.2.0 >Reporter: Ethan Li >Priority: Major > > When a metric consumer is used, metrics will be sent from all executors to > the consumer. In some of the metrics, it includes NodeInfo object, and kryo > serialization will fail if topology.fall.back.on.java.serialization is false. > {code:title=worker logs} > 2021-01-13 20:16:37.017 o.a.s.e.ExecutorTransfer > Thread-16-__system-executor[-1, -1] [INFO] TRANSFERRING tuple [dest: 5 tuple: > source: __system:-1, stream: __metrics, id: {}, [TASK_INFO: { host: > openstorm14blue-n4.blue.ygrid.yahoo.com:6703 comp: __system[-1]}, [ > [CGroupCpuStat = {nr.throttled-percentage=46.544980443285525, > nr.period-count=767, nr.throttled-count=357, throttled.time-ms=27208}], > [CGroupMemoryLimit = 1342177280], [__recv-iconnection = {dequeuedMessages=0, > enqueued={/10.215.73.210:47038=3169}}], [__send-ico > nnection = {NodeInfo(node:149a917b-bc75-49c8-b351-f74b8ae0fbed-10.215.73.210, > port:[6701])={reconnects=1, src=/10.215.73.210:34938, pending=0, > dest=openstorm14blue-n4.blue.ygrid.yahoo.com/10.215.73.210:6701, sent=1896, > lostOnSend=0}, NodeInfo(node:149a917b-bc75- > 49c8-b351-f74b8ae0fbed-10.215.73.210, port:[6702])={reconnects=8, > src=/10.215.73.210:39476, pending=0, > dest=openstorm14blue-n4.blue.ygrid.yahoo.com/10.215.73.210:6702, sent=2115, > lostOnSend=0}, > NodeInfo(node:b77b5ec6-15ee-4bd2-a9b8-12fcadde7744-10.215.73.211, po > rt:[6700])={reconnects=125, pending=0, > dest=openstorm14blue-n5.blue.ygrid.yahoo.com/10.215.73.211:6700, sent=108, > lostOnSend=1331}}], [CGroupMemory = 316485632], [CGroupCpu = {user-ms=36960, > sys-ms=25860}], [memory.pools.Metaspace.usage = 0.9695890907929322], [m > emory.heap.max = 1073741824], [receive-queue-overflow = 0], > [memory.pools.Compressed-Class-Space.used = 6237424], > [memory.pools.Compressed-Class-Space.max = 1073741824], [memory.non-heap.init > = 2555904], [worker-transfer-queue-overflow = 0], [memory.pools.Metasp > ace.committed = 42074112], [receive-queue-sojourn_time_ms = 0.0], > [threads.waiting.count = 5], [memory.pools.G1-Eden-Space.usage = > 0.2778], [memory.pools.Metaspace.used = 40798320], > [memory.total.used = 101783888], [memory.pools.Code-Cache.init = 255 > 5904], [memory.non-heap.committed = 63832064], [GC.G1-Young-Generation.time = > 677], [receive-queue-insert_failures = 0.0], [memory.total.init = 130482176], > [GC.G1-Old-Generation.count = 0], [memory.pools.Metaspace.init = 0], > [memory.pools.G1-Survivor-Space.commi > tted = 5242880], [worker-transfer-queue-population = 0], > [memory.pools.Compressed-Class-Space.committed = 6684672], > [threads.timed_waiting.count = 31], [memory.pools.G1-Eden-Space.init = > 7340032], [memory.pools.Metaspace.max = -1], [memory.pools.G1-Survivor-Spac > e.used = 5242880], [memory.heap.init = 127926272], > [memory.pools.G1-Old-Gen.used-after-gc = 0], [worker-transfer-queue-capacity > = 1024], [memory.pools.G1-Survivor-Space.used-after-gc = 5242880], > [memory.pools.G1-Old-Gen.committed = 47185920], [memory.pools.G1-Ed > en-Space.committed = 75497472], [receive-queue-arrival_rate_secs = > 0.109421162052741], [memory.pools.Compressed-Class-Space.usage = > 0.0058090537786483765], [TGT-TimeToExpiryMsecs = 71282993], > [threads.runnable.count = 15], [worker-transfer-queue-insert_failures > = 0.0], [worker-transfer-queue-sojourn_time_ms = 0.0], [memory.heap.committed > = 127926272], [memory.non-heap.max = -1], [threads.daemon.count = 29], > [memory.pools.Code-Cache.max = 251658240], > [worker-transfer-queue-arrival_rate_secs = 90.47776674390379], [memory > .heap.usage = 0.037109360098838806], [memory.pools.G1-Old-Gen.init = > 120586240], [memory.pools.Code-Cache.committed = 15138816], > [receive-queue-pct_full = 0.0], [worker-transfer-queue-pct_full = 0.0], > [receive-queue-population = 0], [memory.pools.Compressed-Clas > s-Space.init = 0], [memory.pools.Code-Cache.usage = 0.059299468994140625], > [worker-transfer-queue-dropped_messages = 0], [GC.G1-Young-Generation.count = > 18], [memory.pools.Code-Cache.used = 14923200], > [memory.pools.G1-Old-Gen.usage = 0.012695297598838806], [memo > ry.non-heap.usage = -6.196368E7], [memory.total.max = 1073741823], > [threads.count = 51], [memory.heap.used = 39845872], >
[jira] [Created] (STORM-3735) Kyro serialization fails on some metric tuples when topology.fall.back.on.java.serialization is false
Ethan Li created STORM-3735: --- Summary: Kyro serialization fails on some metric tuples when topology.fall.back.on.java.serialization is false Key: STORM-3735 URL: https://issues.apache.org/jira/browse/STORM-3735 Project: Apache Storm Issue Type: Bug Reporter: Ethan Li When a metric consumer is used, metrics will be sent from all executors to the consumer. In some of the metrics, it includes NodeInfo object, and kryo serialization will fail if topology.fall.back.on.java.serialization is false. {code:title=worker logs} 2021-01-13 20:16:37.017 o.a.s.e.ExecutorTransfer Thread-16-__system-executor[-1, -1] [INFO] TRANSFERRING tuple [dest: 5 tuple: source: __system:-1, stream: __metrics, id: {}, [TASK_INFO: { host: openstorm14blue-n4.blue.ygrid.yahoo.com:6703 comp: __system[-1]}, [ [CGroupCpuStat = {nr.throttled-percentage=46.544980443285525, nr.period-count=767, nr.throttled-count=357, throttled.time-ms=27208}], [CGroupMemoryLimit = 1342177280], [__recv-iconnection = {dequeuedMessages=0, enqueued={/10.215.73.210:47038=3169}}], [__send-ico nnection = {NodeInfo(node:149a917b-bc75-49c8-b351-f74b8ae0fbed-10.215.73.210, port:[6701])={reconnects=1, src=/10.215.73.210:34938, pending=0, dest=openstorm14blue-n4.blue.ygrid.yahoo.com/10.215.73.210:6701, sent=1896, lostOnSend=0}, NodeInfo(node:149a917b-bc75- 49c8-b351-f74b8ae0fbed-10.215.73.210, port:[6702])={reconnects=8, src=/10.215.73.210:39476, pending=0, dest=openstorm14blue-n4.blue.ygrid.yahoo.com/10.215.73.210:6702, sent=2115, lostOnSend=0}, NodeInfo(node:b77b5ec6-15ee-4bd2-a9b8-12fcadde7744-10.215.73.211, po rt:[6700])={reconnects=125, pending=0, dest=openstorm14blue-n5.blue.ygrid.yahoo.com/10.215.73.211:6700, sent=108, lostOnSend=1331}}], [CGroupMemory = 316485632], [CGroupCpu = {user-ms=36960, sys-ms=25860}], [memory.pools.Metaspace.usage = 0.9695890907929322], [m emory.heap.max = 1073741824], [receive-queue-overflow = 0], [memory.pools.Compressed-Class-Space.used = 6237424], [memory.pools.Compressed-Class-Space.max = 1073741824], [memory.non-heap.init = 2555904], [worker-transfer-queue-overflow = 0], [memory.pools.Metasp ace.committed = 42074112], [receive-queue-sojourn_time_ms = 0.0], [threads.waiting.count = 5], [memory.pools.G1-Eden-Space.usage = 0.2778], [memory.pools.Metaspace.used = 40798320], [memory.total.used = 101783888], [memory.pools.Code-Cache.init = 255 5904], [memory.non-heap.committed = 63832064], [GC.G1-Young-Generation.time = 677], [receive-queue-insert_failures = 0.0], [memory.total.init = 130482176], [GC.G1-Old-Generation.count = 0], [memory.pools.Metaspace.init = 0], [memory.pools.G1-Survivor-Space.commi tted = 5242880], [worker-transfer-queue-population = 0], [memory.pools.Compressed-Class-Space.committed = 6684672], [threads.timed_waiting.count = 31], [memory.pools.G1-Eden-Space.init = 7340032], [memory.pools.Metaspace.max = -1], [memory.pools.G1-Survivor-Spac e.used = 5242880], [memory.heap.init = 127926272], [memory.pools.G1-Old-Gen.used-after-gc = 0], [worker-transfer-queue-capacity = 1024], [memory.pools.G1-Survivor-Space.used-after-gc = 5242880], [memory.pools.G1-Old-Gen.committed = 47185920], [memory.pools.G1-Ed en-Space.committed = 75497472], [receive-queue-arrival_rate_secs = 0.109421162052741], [memory.pools.Compressed-Class-Space.usage = 0.0058090537786483765], [TGT-TimeToExpiryMsecs = 71282993], [threads.runnable.count = 15], [worker-transfer-queue-insert_failures = 0.0], [worker-transfer-queue-sojourn_time_ms = 0.0], [memory.heap.committed = 127926272], [memory.non-heap.max = -1], [threads.daemon.count = 29], [memory.pools.Code-Cache.max = 251658240], [worker-transfer-queue-arrival_rate_secs = 90.47776674390379], [memory .heap.usage = 0.037109360098838806], [memory.pools.G1-Old-Gen.init = 120586240], [memory.pools.Code-Cache.committed = 15138816], [receive-queue-pct_full = 0.0], [worker-transfer-queue-pct_full = 0.0], [receive-queue-population = 0], [memory.pools.Compressed-Clas s-Space.init = 0], [memory.pools.Code-Cache.usage = 0.059299468994140625], [worker-transfer-queue-dropped_messages = 0], [GC.G1-Young-Generation.count = 18], [memory.pools.Code-Cache.used = 14923200], [memory.pools.G1-Old-Gen.usage = 0.012695297598838806], [memo ry.non-heap.usage = -6.196368E7], [memory.total.max = 1073741823], [threads.count = 51], [memory.heap.used = 39845872], [memory.pools.G1-Survivor-Space.init = 0], [memory.pools.G1-Old-Gen.used = 13631472], [receive-queue-dropped_messages = 0], [threads.terminate d.count = 0], [memory.pools.G1-Eden-Space.max = -1], [uptimeSecs = 76], [threads.deadlock.count = 0], [threads.blocked.count = 0], [newWorkerEvent = 1], [receive-queue-capacity = 32768], [threads.new.count = 0], [startTimeSecs = 1610568920], [memory.pools.G1-Ede n-Space.used-after-gc = 0], [memory.pools.G1-Eden-Space.used = 20971520],
[jira] [Resolved] (STORM-3727) SUPERVISOR_SLOTS_PORTS could be list of Longs
[ https://issues.apache.org/jira/browse/STORM-3727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3727. - Fix Version/s: 2.3.0 Resolution: Fixed Thanks [~agresch]. I merged this to master (ba7f969a3a046c5a312a394026d5d9dd755e04a0) > SUPERVISOR_SLOTS_PORTS could be list of Longs > - > > Key: STORM-3727 > URL: https://issues.apache.org/jira/browse/STORM-3727 > Project: Apache Storm > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Aaron Gresch >Assignee: Aaron Gresch >Priority: Minor > Fix For: 2.3.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > > A user reported: > There's no guarantee that the {{supervisorConf.getOrDefault}} will be a List > of Integers. > Additionally, in ReadClusterState.java, {{.intValue()}} conversion is > removed. Overall result > > {{java.lang.ClassCastException: java.lang.Long cannot be cast to > java.lang.Integer > at > org.apache.storm.daemon.supervisor.ReadClusterState.(ReadClusterState.java:101) > ~[storm-server-2.2.0.jar:2.2.0] > at > org.apache.storm.daemon.supervisor.Supervisor.launch(Supervisor.java:310) > ~[storm-server-2.2.0.jar:2.2.0]}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (STORM-3734) IntegerValidator doesn't force the object type to be Integer
[ https://issues.apache.org/jira/browse/STORM-3734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3734: Description: The IntegerValidator allows the non-integer object, like Double(1.0). https://github.com/apache/storm/blob/7bef73a6faa14558ef254efe74cbe4bfef81c2e2/storm-client/src/jvm/org/apache/storm/validation/ConfigValidation.java#L404-L415 It can be reproduced by {code:java} IntegerValidator validator = new IntegerValidator(); validator.validateInteger("test", 1.0); {code} More details at https://github.com/apache/storm/pull/3365#issuecomment-754775896 was: The IntegerValidator allows the non-integer object, like Double(1.0). https://github.com/apache/storm/blob/7bef73a6faa14558ef254efe74cbe4bfef81c2e2/storm-client/src/jvm/org/apache/storm/validation/ConfigValidation.java#L404-L415 It can be reproduced by {code:java} IntegerValidator validator = new IntegerValidator(); validator.validateInteger("test", 1.0); {code} > IntegerValidator doesn't force the object type to be Integer > > > Key: STORM-3734 > URL: https://issues.apache.org/jira/browse/STORM-3734 > Project: Apache Storm > Issue Type: Bug >Reporter: Ethan Li >Priority: Major > > The IntegerValidator allows the non-integer object, like Double(1.0). > https://github.com/apache/storm/blob/7bef73a6faa14558ef254efe74cbe4bfef81c2e2/storm-client/src/jvm/org/apache/storm/validation/ConfigValidation.java#L404-L415 > It can be reproduced by > {code:java} > IntegerValidator validator = new IntegerValidator(); > validator.validateInteger("test", 1.0); > {code} > More details at > https://github.com/apache/storm/pull/3365#issuecomment-754775896 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (STORM-3734) IntegerValidator doesn't force the object type to be Integer
Ethan Li created STORM-3734: --- Summary: IntegerValidator doesn't force the object type to be Integer Key: STORM-3734 URL: https://issues.apache.org/jira/browse/STORM-3734 Project: Apache Storm Issue Type: Bug Reporter: Ethan Li The IntegerValidator allows the non-integer object, like Double(1.0). https://github.com/apache/storm/blob/7bef73a6faa14558ef254efe74cbe4bfef81c2e2/storm-client/src/jvm/org/apache/storm/validation/ConfigValidation.java#L404-L415 It can be reproduced by {code:java} IntegerValidator validator = new IntegerValidator(); validator.validateInteger("test", 1.0); {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3731) Remove unused nashorn import in storm-loadgen:OutputStream.java
[ https://issues.apache.org/jira/browse/STORM-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3731. - Fix Version/s: 2.3.0 Resolution: Fixed Thanks [~bipinprasad]. I merged this to master (5360a15861db8edf5728b9f3eaca92c5470811b0) > Remove unused nashorn import in storm-loadgen:OutputStream.java > --- > > Key: STORM-3731 > URL: https://issues.apache.org/jira/browse/STORM-3731 > Project: Apache Storm > Issue Type: Bug > Components: storm-loadgen >Reporter: Bipin Prasad >Assignee: Bipin Prasad >Priority: Minor > Fix For: 2.3.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Removing unused nashorn import. > This allows mvn build to succeed in JDK16 EA as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3729) Assigning memory greater and equal than 2048m will make assgin memory for slot values to 1m
[ https://issues.apache.org/jira/browse/STORM-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3729. - Fix Version/s: 2.3.0 Resolution: Fixed Thanks [~Zeahoo]. I merged this to master (f451be2ef81c0821f65a1a4d671b90c221ef99db) > Assigning memory greater and equal than 2048m will make assgin memory for > slot values to 1m > --- > > Key: STORM-3729 > URL: https://issues.apache.org/jira/browse/STORM-3729 > Project: Apache Storm > Issue Type: Bug > Components: storm-client >Affects Versions: 2.0.0, 2.1.0, 2.2.0 >Reporter: ZhihaoZheng >Assignee: ZhihaoZheng >Priority: Minor > Fix For: 2.3.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > Hi, everyone. > I set my topology over 2048m, the storm ui shows only 65m, so i found its > error in > [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/utils/Utils.java] > line 1089 that value cast to int instead of long, It goes wrong if I pass > 2048m and results 1m. > Simply change this line to cast Long can solve this problem.:) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (STORM-3729) Assigning memory greater and equal than 2048m will make assgin memory for slot values to 1m
[ https://issues.apache.org/jira/browse/STORM-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li reassigned STORM-3729: --- Assignee: ZhihaoZheng > Assigning memory greater and equal than 2048m will make assgin memory for > slot values to 1m > --- > > Key: STORM-3729 > URL: https://issues.apache.org/jira/browse/STORM-3729 > Project: Apache Storm > Issue Type: Bug > Components: storm-client >Affects Versions: 2.0.0, 2.1.0, 2.2.0 >Reporter: ZhihaoZheng >Assignee: ZhihaoZheng >Priority: Minor > Time Spent: 3h 40m > Remaining Estimate: 0h > > Hi, everyone. > I set my topology over 2048m, the storm ui shows only 65m, so i found its > error in > [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/utils/Utils.java] > line 1089 that value cast to int instead of long, It goes wrong if I pass > 2048m and results 1m. > Simply change this line to cast Long can solve this problem.:) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (STORM-3729) Assigning memory greater and equal than 2048m will make assgin memory for slot values to 1m
[ https://issues.apache.org/jira/browse/STORM-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3729: Affects Version/s: 2.0.0 > Assigning memory greater and equal than 2048m will make assgin memory for > slot values to 1m > --- > > Key: STORM-3729 > URL: https://issues.apache.org/jira/browse/STORM-3729 > Project: Apache Storm > Issue Type: Bug > Components: storm-client >Affects Versions: 2.0.0, 2.1.0, 2.2.0 >Reporter: ZhihaoZheng >Priority: Minor > Time Spent: 3h 40m > Remaining Estimate: 0h > > Hi, everyone. > I set my topology over 2048m, the storm ui shows only 65m, so i found its > error in > [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/utils/Utils.java] > line 1089 that value cast to int instead of long, It goes wrong if I pass > 2048m and results 1m. > Simply change this line to cast Long can solve this problem.:) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3650) Ackers and metricComponents are not distributed evenly
[ https://issues.apache.org/jira/browse/STORM-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3650. - Fix Version/s: 2.3.0 Resolution: Fixed Thanks [~li530]. I merged this to master (254a98db0966914bc07094e168667e1a851affcf) > Ackers and metricComponents are not distributed evenly > -- > > Key: STORM-3650 > URL: https://issues.apache.org/jira/browse/STORM-3650 > Project: Apache Storm > Issue Type: Improvement >Reporter: Rui Li >Assignee: Rui Li >Priority: Major > Fix For: 2.3.0 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > (When resource aware scheduler is used,) System components like ackers or > metricComponents are scheduled all together after finishing scheduling > topology components. We might want to add config to allow distributing them > evenly among the workers in order to help the overall performance. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (STORM-3650) Ackers and metricComponents are not distributed evenly
[ https://issues.apache.org/jira/browse/STORM-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3650: Description: (When resource aware scheudler is used,) System components like ackers or metricComponents are scheduled all together after finishing scheduling topology components. We might want to add config to allow distributing them evenly among the workers in order to help the overall performance. (was: System components like ackers or metricComponents are scheduled all together after finishing scheduling topology components. We might want to add config to allow distributing them evenly among the workers in order to help the overall performance.) > Ackers and metricComponents are not distributed evenly > -- > > Key: STORM-3650 > URL: https://issues.apache.org/jira/browse/STORM-3650 > Project: Apache Storm > Issue Type: Improvement >Reporter: Rui Li >Assignee: Rui Li >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > (When resource aware scheudler is used,) System components like ackers or > metricComponents are scheduled all together after finishing scheduling > topology components. We might want to add config to allow distributing them > evenly among the workers in order to help the overall performance. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (STORM-3650) Ackers and metricComponents are not distributed evenly
[ https://issues.apache.org/jira/browse/STORM-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3650: Description: (When resource aware scheduler is used,) System components like ackers or metricComponents are scheduled all together after finishing scheduling topology components. We might want to add config to allow distributing them evenly among the workers in order to help the overall performance. (was: (When resource aware scheudler is used,) System components like ackers or metricComponents are scheduled all together after finishing scheduling topology components. We might want to add config to allow distributing them evenly among the workers in order to help the overall performance.) > Ackers and metricComponents are not distributed evenly > -- > > Key: STORM-3650 > URL: https://issues.apache.org/jira/browse/STORM-3650 > Project: Apache Storm > Issue Type: Improvement >Reporter: Rui Li >Assignee: Rui Li >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > (When resource aware scheduler is used,) System components like ackers or > metricComponents are scheduled all together after finishing scheduling > topology components. We might want to add config to allow distributing them > evenly among the workers in order to help the overall performance. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (STORM-3650) Ackers and metricComponents are not distributed evenly
[ https://issues.apache.org/jira/browse/STORM-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3650: Priority: Major (was: Minor) > Ackers and metricComponents are not distributed evenly > -- > > Key: STORM-3650 > URL: https://issues.apache.org/jira/browse/STORM-3650 > Project: Apache Storm > Issue Type: Improvement >Reporter: Rui Li >Assignee: Rui Li >Priority: Major > Time Spent: 3h 40m > Remaining Estimate: 0h > > System components like ackers or metricComponents are scheduled all together > after finishing scheduling topology components. We might want to add config > to allow distributing them evenly among the workers in order to help the > overall performance. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (STORM-3728) Workers are not able to connect to Pacemaker if pacemaker.auth.method is KERBEROS
[ https://issues.apache.org/jira/browse/STORM-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3728: Fix Version/s: 2.3.0 > Workers are not able to connect to Pacemaker if pacemaker.auth.method is > KERBEROS > - > > Key: STORM-3728 > URL: https://issues.apache.org/jira/browse/STORM-3728 > Project: Apache Storm > Issue Type: Bug >Affects Versions: 2.0.0, 2.1.0, 2.2.0 >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Fix For: 2.3.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > When pacemaker.auth.method is KERBEROS, worker will fail to connect to > KERBEROS because of exceptions like the following: > > {code:java} > 2020-12-21 20:07:00.786 o.a.s.c.PaceMakerStateStorage > executor-heartbeat-timer [ERROR] Timed out waiting for channel ready. Failed > to set_worker_hb. Will make 2 more attempts. > 2020-12-21 20:07:00.902 o.a.s.m.n.KerberosSaslClientHandler > openstorm3blue-n10.blue.ygrid.yahoo.com-pm-1 [INFO] Connection established > from /10.215.73.209:45548 to > openstorm3blue-n10.blue.ygrid.yahoo.com/10.215.79.152:6699 > 2020-12-21 20:07:00.903 o.a.s.m.n.KerberosSaslNettyClient > openstorm3blue-n10.blue.ygrid.yahoo.com-pm-1 [INFO] Creating Kerberos Client. > 2020-12-21 20:07:00.906 o.a.s.m.n.KerberosSaslNettyClient > openstorm3blue-n10.blue.ygrid.yahoo.com-pm-1 [INFO] Kerberos Client Callback > Handler got callback: class javax.security.auth.callback.PasswordCallback > 2020-12-21 20:07:00.906 o.a.s.m.n.Login > openstorm3blue-n10.blue.ygrid.yahoo.com-pm-1 [ERROR] Login using jaas conf > /home/y/lib/storm/current/conf/storm_jaas.conf failed > 2020-12-21 20:07:00.906 o.a.s.m.n.KerberosSaslNettyClient > openstorm3blue-n10.blue.ygrid.yahoo.com-pm-1 [ERROR] Client failed to login > in principal:javax.security.auth.login.LoginException: No password provided > javax.security.auth.login.LoginException: No password provided > at > com.sun.security.auth.module.Krb5LoginModule.promptForPass(Krb5LoginModule.java:923) > ~[?:1.8.0_262] > at > com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:764) > ~[?:1.8.0_262] > at > com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:618) > ~[?:1.8.0_262] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_262] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_262] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_262] > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_262] > at > javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) > ~[?:1.8.0_262] > at > javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) > ~[?:1.8.0_262] > at > javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) > ~[?:1.8.0_262] > at > javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) > ~[?:1.8.0_262] > at java.security.AccessController.doPrivileged(Native Method) > ~[?:1.8.0_262] > at > javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) > ~[?:1.8.0_262] > at > javax.security.auth.login.LoginContext.login(LoginContext.java:587) > ~[?:1.8.0_262] > at org.apache.storm.messaging.netty.Login.login(Login.java:301) > ~[storm-client-2.3.0.y.jar:2.3.0-SNAPSHOT] > at org.apache.storm.messaging.netty.Login.(Login.java:83) > ~[storm-client-2.3.0.y.jar:2.3.0-SNAPSHOT] > at > org.apache.storm.messaging.netty.KerberosSaslNettyClient.(KerberosSaslNettyClient.java:66) > [storm-client-2.3.0.y.jar:2.3.0-SNAPSHOT] > at > org.apache.storm.messaging.netty.KerberosSaslClientHandler.channelActive(KerberosSaslClientHandler.java:59) > [storm-client-2.3.0.y.jar:2.3.0-SNAPSHOT] > at > org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:213) > [storm-shaded-deps-2.3.0.y.jar:2.3.0-SNAPSHOT] > at > org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:199) > [storm-shaded-deps-2.3.0.y.jar:2.3.0-SNAPSHOT] > at > org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:192) > [storm-shaded-deps-2.3.0.y.jar:2.3.0-SNAPSHOT] > at > org.apache.storm.shade.io.netty.channel.ChannelInboundHandlerAdapter.channelActive(ChannelInboundHandlerAdapter.java:64) > [storm-shaded-deps-2.3.0.y.jar:2.3.0-SNAPSHOT] > at >
[jira] [Resolved] (STORM-3728) Workers are not able to connect to Pacemaker if pacemaker.auth.method is KERBEROS
[ https://issues.apache.org/jira/browse/STORM-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3728. - Resolution: Fixed Merged into master (ab6b6f69a10bf57c1b46a6519d57ca423d99558c) > Workers are not able to connect to Pacemaker if pacemaker.auth.method is > KERBEROS > - > > Key: STORM-3728 > URL: https://issues.apache.org/jira/browse/STORM-3728 > Project: Apache Storm > Issue Type: Bug >Affects Versions: 2.0.0, 2.1.0, 2.2.0 >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Fix For: 2.3.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > When pacemaker.auth.method is KERBEROS, worker will fail to connect to > KERBEROS because of exceptions like the following: > > {code:java} > 2020-12-21 20:07:00.786 o.a.s.c.PaceMakerStateStorage > executor-heartbeat-timer [ERROR] Timed out waiting for channel ready. Failed > to set_worker_hb. Will make 2 more attempts. > 2020-12-21 20:07:00.902 o.a.s.m.n.KerberosSaslClientHandler > openstorm3blue-n10.blue.ygrid.yahoo.com-pm-1 [INFO] Connection established > from /10.215.73.209:45548 to > openstorm3blue-n10.blue.ygrid.yahoo.com/10.215.79.152:6699 > 2020-12-21 20:07:00.903 o.a.s.m.n.KerberosSaslNettyClient > openstorm3blue-n10.blue.ygrid.yahoo.com-pm-1 [INFO] Creating Kerberos Client. > 2020-12-21 20:07:00.906 o.a.s.m.n.KerberosSaslNettyClient > openstorm3blue-n10.blue.ygrid.yahoo.com-pm-1 [INFO] Kerberos Client Callback > Handler got callback: class javax.security.auth.callback.PasswordCallback > 2020-12-21 20:07:00.906 o.a.s.m.n.Login > openstorm3blue-n10.blue.ygrid.yahoo.com-pm-1 [ERROR] Login using jaas conf > /home/y/lib/storm/current/conf/storm_jaas.conf failed > 2020-12-21 20:07:00.906 o.a.s.m.n.KerberosSaslNettyClient > openstorm3blue-n10.blue.ygrid.yahoo.com-pm-1 [ERROR] Client failed to login > in principal:javax.security.auth.login.LoginException: No password provided > javax.security.auth.login.LoginException: No password provided > at > com.sun.security.auth.module.Krb5LoginModule.promptForPass(Krb5LoginModule.java:923) > ~[?:1.8.0_262] > at > com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:764) > ~[?:1.8.0_262] > at > com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:618) > ~[?:1.8.0_262] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_262] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_262] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_262] > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_262] > at > javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) > ~[?:1.8.0_262] > at > javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) > ~[?:1.8.0_262] > at > javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) > ~[?:1.8.0_262] > at > javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) > ~[?:1.8.0_262] > at java.security.AccessController.doPrivileged(Native Method) > ~[?:1.8.0_262] > at > javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) > ~[?:1.8.0_262] > at > javax.security.auth.login.LoginContext.login(LoginContext.java:587) > ~[?:1.8.0_262] > at org.apache.storm.messaging.netty.Login.login(Login.java:301) > ~[storm-client-2.3.0.y.jar:2.3.0-SNAPSHOT] > at org.apache.storm.messaging.netty.Login.(Login.java:83) > ~[storm-client-2.3.0.y.jar:2.3.0-SNAPSHOT] > at > org.apache.storm.messaging.netty.KerberosSaslNettyClient.(KerberosSaslNettyClient.java:66) > [storm-client-2.3.0.y.jar:2.3.0-SNAPSHOT] > at > org.apache.storm.messaging.netty.KerberosSaslClientHandler.channelActive(KerberosSaslClientHandler.java:59) > [storm-client-2.3.0.y.jar:2.3.0-SNAPSHOT] > at > org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:213) > [storm-shaded-deps-2.3.0.y.jar:2.3.0-SNAPSHOT] > at > org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:199) > [storm-shaded-deps-2.3.0.y.jar:2.3.0-SNAPSHOT] > at > org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:192) > [storm-shaded-deps-2.3.0.y.jar:2.3.0-SNAPSHOT] > at > org.apache.storm.shade.io.netty.channel.ChannelInboundHandlerAdapter.channelActive(ChannelInboundHandlerAdapter.java:64) >
[jira] [Updated] (STORM-3728) Workers are not able to connect to Pacemaker if pacemaker.auth.method is KERBEROS
[ https://issues.apache.org/jira/browse/STORM-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3728: Affects Version/s: 2.2.0 2.0.0 2.1.0 > Workers are not able to connect to Pacemaker if pacemaker.auth.method is > KERBEROS > - > > Key: STORM-3728 > URL: https://issues.apache.org/jira/browse/STORM-3728 > Project: Apache Storm > Issue Type: Bug >Affects Versions: 2.0.0, 2.1.0, 2.2.0 >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Time Spent: 1h > Remaining Estimate: 0h > > When pacemaker.auth.method is KERBEROS, worker will fail to connect to > KERBEROS because of exceptions like the following: > > {code:java} > 2020-12-21 20:07:00.786 o.a.s.c.PaceMakerStateStorage > executor-heartbeat-timer [ERROR] Timed out waiting for channel ready. Failed > to set_worker_hb. Will make 2 more attempts. > 2020-12-21 20:07:00.902 o.a.s.m.n.KerberosSaslClientHandler > openstorm3blue-n10.blue.ygrid.yahoo.com-pm-1 [INFO] Connection established > from /10.215.73.209:45548 to > openstorm3blue-n10.blue.ygrid.yahoo.com/10.215.79.152:6699 > 2020-12-21 20:07:00.903 o.a.s.m.n.KerberosSaslNettyClient > openstorm3blue-n10.blue.ygrid.yahoo.com-pm-1 [INFO] Creating Kerberos Client. > 2020-12-21 20:07:00.906 o.a.s.m.n.KerberosSaslNettyClient > openstorm3blue-n10.blue.ygrid.yahoo.com-pm-1 [INFO] Kerberos Client Callback > Handler got callback: class javax.security.auth.callback.PasswordCallback > 2020-12-21 20:07:00.906 o.a.s.m.n.Login > openstorm3blue-n10.blue.ygrid.yahoo.com-pm-1 [ERROR] Login using jaas conf > /home/y/lib/storm/current/conf/storm_jaas.conf failed > 2020-12-21 20:07:00.906 o.a.s.m.n.KerberosSaslNettyClient > openstorm3blue-n10.blue.ygrid.yahoo.com-pm-1 [ERROR] Client failed to login > in principal:javax.security.auth.login.LoginException: No password provided > javax.security.auth.login.LoginException: No password provided > at > com.sun.security.auth.module.Krb5LoginModule.promptForPass(Krb5LoginModule.java:923) > ~[?:1.8.0_262] > at > com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:764) > ~[?:1.8.0_262] > at > com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:618) > ~[?:1.8.0_262] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_262] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_262] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_262] > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_262] > at > javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) > ~[?:1.8.0_262] > at > javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) > ~[?:1.8.0_262] > at > javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) > ~[?:1.8.0_262] > at > javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) > ~[?:1.8.0_262] > at java.security.AccessController.doPrivileged(Native Method) > ~[?:1.8.0_262] > at > javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) > ~[?:1.8.0_262] > at > javax.security.auth.login.LoginContext.login(LoginContext.java:587) > ~[?:1.8.0_262] > at org.apache.storm.messaging.netty.Login.login(Login.java:301) > ~[storm-client-2.3.0.y.jar:2.3.0-SNAPSHOT] > at org.apache.storm.messaging.netty.Login.(Login.java:83) > ~[storm-client-2.3.0.y.jar:2.3.0-SNAPSHOT] > at > org.apache.storm.messaging.netty.KerberosSaslNettyClient.(KerberosSaslNettyClient.java:66) > [storm-client-2.3.0.y.jar:2.3.0-SNAPSHOT] > at > org.apache.storm.messaging.netty.KerberosSaslClientHandler.channelActive(KerberosSaslClientHandler.java:59) > [storm-client-2.3.0.y.jar:2.3.0-SNAPSHOT] > at > org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:213) > [storm-shaded-deps-2.3.0.y.jar:2.3.0-SNAPSHOT] > at > org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:199) > [storm-shaded-deps-2.3.0.y.jar:2.3.0-SNAPSHOT] > at > org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:192) > [storm-shaded-deps-2.3.0.y.jar:2.3.0-SNAPSHOT] > at > org.apache.storm.shade.io.netty.channel.ChannelInboundHandlerAdapter.channelActive(ChannelInboundHandlerAdapter.java:64) > [storm-shaded-deps-2.3.0.y.jar:2.3.0-SNAPSHOT] >
[jira] [Commented] (STORM-3725) DRPC spout will crash when any one of DRPC server is down
[ https://issues.apache.org/jira/browse/STORM-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253567#comment-17253567 ] Ethan Li commented on STORM-3725: - Thanks [~li530]. I merged this to master (d0027d97323375120f8c17540f698a3f66489aa7) > DRPC spout will crash when any one of DRPC server is down > - > > Key: STORM-3725 > URL: https://issues.apache.org/jira/browse/STORM-3725 > Project: Apache Storm > Issue Type: Bug >Affects Versions: 2.0.0, 2.1.0, 2.2.0 >Reporter: Rui Li >Assignee: Rui Li >Priority: Critical > Fix For: 2.3.0 > > Time Spent: 1h > Remaining Estimate: 0h > > The root cause is DRPC Spout does not handle drpc connections really > asynchrously. Spout worker will not work unless all DRPC servers are up and > running which leads to the SPOF. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3725) DRPC spout will crash when any one of DRPC server is down
[ https://issues.apache.org/jira/browse/STORM-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3725. - Resolution: Fixed > DRPC spout will crash when any one of DRPC server is down > - > > Key: STORM-3725 > URL: https://issues.apache.org/jira/browse/STORM-3725 > Project: Apache Storm > Issue Type: Bug >Affects Versions: 2.0.0, 2.1.0, 2.2.0 >Reporter: Rui Li >Assignee: Rui Li >Priority: Critical > Fix For: 2.3.0 > > Time Spent: 1h > Remaining Estimate: 0h > > The root cause is DRPC Spout does not handle drpc connections really > asynchrously. Spout worker will not work unless all DRPC servers are up and > running which leads to the SPOF. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (STORM-3725) DRPC spout will crash when any one of DRPC server is down
[ https://issues.apache.org/jira/browse/STORM-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3725: Fix Version/s: 2.3.0 > DRPC spout will crash when any one of DRPC server is down > - > > Key: STORM-3725 > URL: https://issues.apache.org/jira/browse/STORM-3725 > Project: Apache Storm > Issue Type: Bug >Affects Versions: 2.0.0, 2.1.0, 2.2.0 >Reporter: Rui Li >Assignee: Rui Li >Priority: Critical > Fix For: 2.3.0 > > Time Spent: 1h > Remaining Estimate: 0h > > The root cause is DRPC Spout does not handle drpc connections really > asynchrously. Spout worker will not work unless all DRPC servers are up and > running which leads to the SPOF. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (STORM-3725) DRPC spout will crash when any one of DRPC server is down
[ https://issues.apache.org/jira/browse/STORM-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3725: Affects Version/s: 2.2.0 2.0.0 2.1.0 > DRPC spout will crash when any one of DRPC server is down > - > > Key: STORM-3725 > URL: https://issues.apache.org/jira/browse/STORM-3725 > Project: Apache Storm > Issue Type: Bug >Affects Versions: 2.0.0, 2.1.0, 2.2.0 >Reporter: Rui Li >Assignee: Rui Li >Priority: Critical > Time Spent: 1h > Remaining Estimate: 0h > > The root cause is DRPC Spout does not handle drpc connections really > asynchrously. Spout worker will not work unless all DRPC servers are up and > running which leads to the SPOF. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (STORM-3728) Workers are not able to connect to Pacemaker if pacemaker.auth.method is KERBEROS
[ https://issues.apache.org/jira/browse/STORM-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3728: Description: When pacemaker.auth.method is KERBEROS, worker will fail to connect to KERBEROS because of exceptions like the following: {code:java} 2020-12-21 20:07:00.786 o.a.s.c.PaceMakerStateStorage executor-heartbeat-timer [ERROR] Timed out waiting for channel ready. Failed to set_worker_hb. Will make 2 more attempts. 2020-12-21 20:07:00.902 o.a.s.m.n.KerberosSaslClientHandler openstorm3blue-n10.blue.ygrid.yahoo.com-pm-1 [INFO] Connection established from /10.215.73.209:45548 to openstorm3blue-n10.blue.ygrid.yahoo.com/10.215.79.152:6699 2020-12-21 20:07:00.903 o.a.s.m.n.KerberosSaslNettyClient openstorm3blue-n10.blue.ygrid.yahoo.com-pm-1 [INFO] Creating Kerberos Client. 2020-12-21 20:07:00.906 o.a.s.m.n.KerberosSaslNettyClient openstorm3blue-n10.blue.ygrid.yahoo.com-pm-1 [INFO] Kerberos Client Callback Handler got callback: class javax.security.auth.callback.PasswordCallback 2020-12-21 20:07:00.906 o.a.s.m.n.Login openstorm3blue-n10.blue.ygrid.yahoo.com-pm-1 [ERROR] Login using jaas conf /home/y/lib/storm/current/conf/storm_jaas.conf failed 2020-12-21 20:07:00.906 o.a.s.m.n.KerberosSaslNettyClient openstorm3blue-n10.blue.ygrid.yahoo.com-pm-1 [ERROR] Client failed to login in principal:javax.security.auth.login.LoginException: No password provided javax.security.auth.login.LoginException: No password provided at com.sun.security.auth.module.Krb5LoginModule.promptForPass(Krb5LoginModule.java:923) ~[?:1.8.0_262] at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:764) ~[?:1.8.0_262] at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:618) ~[?:1.8.0_262] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_262] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_262] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_262] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_262] at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) ~[?:1.8.0_262] at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) ~[?:1.8.0_262] at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) ~[?:1.8.0_262] at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) ~[?:1.8.0_262] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_262] at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) ~[?:1.8.0_262] at javax.security.auth.login.LoginContext.login(LoginContext.java:587) ~[?:1.8.0_262] at org.apache.storm.messaging.netty.Login.login(Login.java:301) ~[storm-client-2.3.0.y.jar:2.3.0-SNAPSHOT] at org.apache.storm.messaging.netty.Login.(Login.java:83) ~[storm-client-2.3.0.y.jar:2.3.0-SNAPSHOT] at org.apache.storm.messaging.netty.KerberosSaslNettyClient.(KerberosSaslNettyClient.java:66) [storm-client-2.3.0.y.jar:2.3.0-SNAPSHOT] at org.apache.storm.messaging.netty.KerberosSaslClientHandler.channelActive(KerberosSaslClientHandler.java:59) [storm-client-2.3.0.y.jar:2.3.0-SNAPSHOT] at org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:213) [storm-shaded-deps-2.3.0.y.jar:2.3.0-SNAPSHOT] at org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:199) [storm-shaded-deps-2.3.0.y.jar:2.3.0-SNAPSHOT] at org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:192) [storm-shaded-deps-2.3.0.y.jar:2.3.0-SNAPSHOT] at org.apache.storm.shade.io.netty.channel.ChannelInboundHandlerAdapter.channelActive(ChannelInboundHandlerAdapter.java:64) [storm-shaded-deps-2.3.0.y.jar:2.3.0-SNAPSHOT] at org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:213) [storm-shaded-deps-2.3.0.y.jar:2.3.0-SNAPSHOT] at org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:199) [storm-shaded-deps-2.3.0.y.jar:2.3.0-SNAPSHOT] at org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:192) [storm-shaded-deps-2.3.0.y.jar:2.3.0-SNAPSHOT] at org.apache.storm.shade.io.netty.channel.DefaultChannelPipeline$HeadContext.channelActive(DefaultChannelPipeline.java:1422) [storm-shaded-deps-2.3.0.y.jar:2.3.0-SNAPSHOT] at
[jira] [Assigned] (STORM-3728) Workers are not able to connect to Pacemaker if pacemaker.auth.method is KERBEROS
[ https://issues.apache.org/jira/browse/STORM-3728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li reassigned STORM-3728: --- Assignee: Ethan Li > Workers are not able to connect to Pacemaker if pacemaker.auth.method is > KERBEROS > - > > Key: STORM-3728 > URL: https://issues.apache.org/jira/browse/STORM-3728 > Project: Apache Storm > Issue Type: Bug >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > > When pacemaker.auth.method is KERBEROS, worker will fail to connect to > KERBEROS because of exceptions like the following: > > {code:java} > 2020-12-21 20:07:00.786 o.a.s.c.PaceMakerStateStorage > executor-heartbeat-timer [ERROR] Timed out waiting for channel ready. Failed > to set_worker_hb. Will make 2 more attempts. > 2020-12-21 20:07:00.902 o.a.s.m.n.KerberosSaslClientHandler > openstorm3blue-n10.blue.ygrid.yahoo.com-pm-1 [INFO] Connection established > from /10.215.73.209:45548 to > openstorm3blue-n10.blue.ygrid.yahoo.com/10.215.79.152:6699 > 2020-12-21 20:07:00.903 o.a.s.m.n.KerberosSaslNettyClient > openstorm3blue-n10.blue.ygrid.yahoo.com-pm-1 [INFO] Creating Kerberos Client. > 2020-12-21 20:07:00.906 o.a.s.m.n.KerberosSaslNettyClient > openstorm3blue-n10.blue.ygrid.yahoo.com-pm-1 [INFO] Kerberos Client Callback > Handler got callback: class javax.security.auth.callback.PasswordCallback > 2020-12-21 20:07:00.906 o.a.s.m.n.Login > openstorm3blue-n10.blue.ygrid.yahoo.com-pm-1 [ERROR] Login using jaas conf > /home/y/lib/storm/current/conf/storm_jaas.conf failed > 2020-12-21 20:07:00.906 o.a.s.m.n.KerberosSaslNettyClient > openstorm3blue-n10.blue.ygrid.yahoo.com-pm-1 [ERROR] Client failed to login > in principal:javax.security.auth.login.LoginException: No password provided > javax.security.auth.login.LoginException: No password provided > at > com.sun.security.auth.module.Krb5LoginModule.promptForPass(Krb5LoginModule.java:923) > ~[?:1.8.0_262] > at > com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:764) > ~[?:1.8.0_262] > at > com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:618) > ~[?:1.8.0_262] > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > ~[?:1.8.0_262] > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > ~[?:1.8.0_262] > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_262] > at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_262] > at > javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) > ~[?:1.8.0_262] > at > javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) > ~[?:1.8.0_262] > at > javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) > ~[?:1.8.0_262] > at > javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) > ~[?:1.8.0_262] > at java.security.AccessController.doPrivileged(Native Method) > ~[?:1.8.0_262] > at > javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) > ~[?:1.8.0_262] > at > javax.security.auth.login.LoginContext.login(LoginContext.java:587) > ~[?:1.8.0_262] > at org.apache.storm.messaging.netty.Login.login(Login.java:301) > ~[storm-client-2.3.0.y.jar:2.3.0-SNAPSHOT] > at org.apache.storm.messaging.netty.Login.(Login.java:83) > ~[storm-client-2.3.0.y.jar:2.3.0-SNAPSHOT] > at > org.apache.storm.messaging.netty.KerberosSaslNettyClient.(KerberosSaslNettyClient.java:66) > [storm-client-2.3.0.y.jar:2.3.0-SNAPSHOT] > at > org.apache.storm.messaging.netty.KerberosSaslClientHandler.channelActive(KerberosSaslClientHandler.java:59) > [storm-client-2.3.0.y.jar:2.3.0-SNAPSHOT] > at > org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:213) > [storm-shaded-deps-2.3.0.y.jar:2.3.0-SNAPSHOT] > at > org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:199) > [storm-shaded-deps-2.3.0.y.jar:2.3.0-SNAPSHOT] > at > org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:192) > [storm-shaded-deps-2.3.0.y.jar:2.3.0-SNAPSHOT] > at > org.apache.storm.shade.io.netty.channel.ChannelInboundHandlerAdapter.channelActive(ChannelInboundHandlerAdapter.java:64) > [storm-shaded-deps-2.3.0.y.jar:2.3.0-SNAPSHOT] > at > org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:213) >
[jira] [Created] (STORM-3728) Workers are not able to connect to Pacemaker if pacemaker.auth.method is KERBEROS
Ethan Li created STORM-3728: --- Summary: Workers are not able to connect to Pacemaker if pacemaker.auth.method is KERBEROS Key: STORM-3728 URL: https://issues.apache.org/jira/browse/STORM-3728 Project: Apache Storm Issue Type: Bug Reporter: Ethan Li When pacemaker.auth.method is KERBEROS, worker will fail to connect to KERBEROS because of exceptions like the following: {code:java} 2020-12-21 20:07:00.786 o.a.s.c.PaceMakerStateStorage executor-heartbeat-timer [ERROR] Timed out waiting for channel ready. Failed to set_worker_hb. Will make 2 more attempts. 2020-12-21 20:07:00.902 o.a.s.m.n.KerberosSaslClientHandler openstorm3blue-n10.blue.ygrid.yahoo.com-pm-1 [INFO] Connection established from /10.215.73.209:45548 to openstorm3blue-n10.blue.ygrid.yahoo.com/10.215.79.152:6699 2020-12-21 20:07:00.903 o.a.s.m.n.KerberosSaslNettyClient openstorm3blue-n10.blue.ygrid.yahoo.com-pm-1 [INFO] Creating Kerberos Client. 2020-12-21 20:07:00.906 o.a.s.m.n.KerberosSaslNettyClient openstorm3blue-n10.blue.ygrid.yahoo.com-pm-1 [INFO] Kerberos Client Callback Handler got callback: class javax.security.auth.callback.PasswordCallback 2020-12-21 20:07:00.906 o.a.s.m.n.Login openstorm3blue-n10.blue.ygrid.yahoo.com-pm-1 [ERROR] Login using jaas conf /home/y/lib/storm/current/conf/storm_jaas.conf failed 2020-12-21 20:07:00.906 o.a.s.m.n.KerberosSaslNettyClient openstorm3blue-n10.blue.ygrid.yahoo.com-pm-1 [ERROR] Client failed to login in principal:javax.security.auth.login.LoginException: No password provided javax.security.auth.login.LoginException: No password provided at com.sun.security.auth.module.Krb5LoginModule.promptForPass(Krb5LoginModule.java:923) ~[?:1.8.0_262] at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:764) ~[?:1.8.0_262] at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:618) ~[?:1.8.0_262] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_262] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_262] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_262] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_262] at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) ~[?:1.8.0_262] at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) ~[?:1.8.0_262] at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) ~[?:1.8.0_262] at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) ~[?:1.8.0_262] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_262] at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) ~[?:1.8.0_262] at javax.security.auth.login.LoginContext.login(LoginContext.java:587) ~[?:1.8.0_262] at org.apache.storm.messaging.netty.Login.login(Login.java:301) ~[storm-client-2.3.0.y.jar:2.3.0-SNAPSHOT] at org.apache.storm.messaging.netty.Login.(Login.java:83) ~[storm-client-2.3.0.y.jar:2.3.0-SNAPSHOT] at org.apache.storm.messaging.netty.KerberosSaslNettyClient.(KerberosSaslNettyClient.java:66) [storm-client-2.3.0.y.jar:2.3.0-SNAPSHOT] at org.apache.storm.messaging.netty.KerberosSaslClientHandler.channelActive(KerberosSaslClientHandler.java:59) [storm-client-2.3.0.y.jar:2.3.0-SNAPSHOT] at org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:213) [storm-shaded-deps-2.3.0.y.jar:2.3.0-SNAPSHOT] at org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:199) [storm-shaded-deps-2.3.0.y.jar:2.3.0-SNAPSHOT] at org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:192) [storm-shaded-deps-2.3.0.y.jar:2.3.0-SNAPSHOT] at org.apache.storm.shade.io.netty.channel.ChannelInboundHandlerAdapter.channelActive(ChannelInboundHandlerAdapter.java:64) [storm-shaded-deps-2.3.0.y.jar:2.3.0-SNAPSHOT] at org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:213) [storm-shaded-deps-2.3.0.y.jar:2.3.0-SNAPSHOT] at org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelActive(AbstractChannelHandlerContext.java:199) [storm-shaded-deps-2.3.0.y.jar:2.3.0-SNAPSHOT] at org.apache.storm.shade.io.netty.channel.AbstractChannelHandlerContext.fireChannelActive(AbstractChannelHandlerContext.java:192) [storm-shaded-deps-2.3.0.y.jar:2.3.0-SNAPSHOT] at
[jira] [Resolved] (STORM-3723) ServerUtils.isAnyPosixProcessPidDirAlive might return wrong result
[ https://issues.apache.org/jira/browse/STORM-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3723. - Fix Version/s: 2.3.0 Resolution: Fixed Thanks [~bipinprasad]. Merged this into master (9a9596685348e310078fb6b0970a629703465454) > ServerUtils.isAnyPosixProcessPidDirAlive might return wrong result > -- > > Key: STORM-3723 > URL: https://issues.apache.org/jira/browse/STORM-3723 > Project: Apache Storm > Issue Type: Bug > Components: storm-server >Reporter: Bipin Prasad >Assignee: Bipin Prasad >Priority: Minor > Fix For: 2.3.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > There is a bug in ServerUtils.isAnyPosixProcessPidDirAlive() and it returns > incorrect value when all the processIds have been reassigned/reused. > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3401) There is no ARM CI for Storm
[ https://issues.apache.org/jira/browse/STORM-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3401. - Resolution: Fixed Closing this as all subtasks are done > There is no ARM CI for Storm > > > Key: STORM-3401 > URL: https://issues.apache.org/jira/browse/STORM-3401 > Project: Apache Storm > Issue Type: Improvement > Components: build >Reporter: Yikun Jiang >Priority: Major > > Now the CI of storm (in github) is handled by travis-ci. While the test is > running under x86 ARCH, the arm ARCH is missing. This leads an problem that > we don't have a way to test every pull request that if it'll break the storm > deployment on arm or not. > We should add a CI system that support ARM ARCH. Using it, storm can > officially support arm release in the future. Here I'd like to introduce > OpenLab to the community. [OpenLab|https://openlabtesting.org/] is a open > source CI system that can test any open source software on either x86 or arm > ARCH, it's mainly used by github projects. Now some > [projects|https://github.com/theopenlab/openlab-zuul-jobs/blob/master/zuul.d/jobs.yaml] > has integrated it already. Such as containerd (a graduated CNCF project, the > arm build will be triggerd in every PR, > [https://github.com/containerd/containerd/pulls]), terraform and so on. > OpenLab uses the open source CI software [Zuul > |https://github.com/openstack-infra/zuul] for CI system. Zuul is used by > OpenStack community as well. integrating with OpneLab is quite easy using its > github app. All config info is open source as well. > If apache storm community has interested with it, I can help for the > integration. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (STORM-3711) Enable all the modules in ARM CI
[ https://issues.apache.org/jira/browse/STORM-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250756#comment-17250756 ] Ethan Li commented on STORM-3711: - Thanks [~seanlau]. Merged your fix into master (6b3922f8a2ee41d24538213626555d460a3d5415) > Enable all the modules in ARM CI > > > Key: STORM-3711 > URL: https://issues.apache.org/jira/browse/STORM-3711 > Project: Apache Storm > Issue Type: Sub-task >Reporter: liusheng >Assignee: liusheng >Priority: Major > Time Spent: 3h 10m > Remaining Estimate: 0h > > In the https://issues.apache.org/jira/browse/STORM-3681 We have enabled the > ARM 64 CI with only "Client" module configured, because there are some issues > about Travis CI due to its ARM > resources performance, now Travis CI has the new ARM resources > "arm64-graviton2" provided by AWS, please see[1]. Now we can switch to use > the "arm64-graviton2" resources and enable all the modules on ARM CI. > [https://docs.travis-ci.com/user/multi-cpu-architectures] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3711) Enable all the modules in ARM CI
[ https://issues.apache.org/jira/browse/STORM-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3711. - Fix Version/s: 2.3.0 Resolution: Fixed > Enable all the modules in ARM CI > > > Key: STORM-3711 > URL: https://issues.apache.org/jira/browse/STORM-3711 > Project: Apache Storm > Issue Type: Sub-task >Reporter: liusheng >Assignee: liusheng >Priority: Major > Fix For: 2.3.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > > In the https://issues.apache.org/jira/browse/STORM-3681 We have enabled the > ARM 64 CI with only "Client" module configured, because there are some issues > about Travis CI due to its ARM > resources performance, now Travis CI has the new ARM resources > "arm64-graviton2" provided by AWS, please see[1]. Now we can switch to use > the "arm64-graviton2" resources and enable all the modules on ARM CI. > [https://docs.travis-ci.com/user/multi-cpu-architectures] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3722) Update committer list
[ https://issues.apache.org/jira/browse/STORM-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3722. - Fix Version/s: 2.3.0 Resolution: Fixed Thanks [~bipinprasad]. I merged this to master (e4ce51736dd9dd3cbf86657aa8b2ca67b9bd40a7) > Update committer list > - > > Key: STORM-3722 > URL: https://issues.apache.org/jira/browse/STORM-3722 > Project: Apache Storm > Issue Type: Documentation >Reporter: Bipin Prasad >Assignee: Bipin Prasad >Priority: Minor > Fix For: 2.3.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Update committer list -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (STORM-3708) ConstraintSolverConfig LOG messages should include topology id
[ https://issues.apache.org/jira/browse/STORM-3708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3708: Priority: Minor (was: Major) > ConstraintSolverConfig LOG messages should include topology id > -- > > Key: STORM-3708 > URL: https://issues.apache.org/jira/browse/STORM-3708 > Project: Apache Storm > Issue Type: Improvement >Reporter: Bipin Prasad >Assignee: Bipin Prasad >Priority: Minor > Fix For: 2.3.0 > > Time Spent: 1h > Remaining Estimate: 0h > > LOG messages in ConstraintSolverConfig class should include topology id. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3708) ConstraintSolverConfig LOG messages should include topology id
[ https://issues.apache.org/jira/browse/STORM-3708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3708. - Fix Version/s: 2.3.0 Resolution: Fixed Thanks [~bipinprasad]. I merged this to master (58c7ad671e97bc5f8de6f6d85eb8129489ca6939) > ConstraintSolverConfig LOG messages should include topology id > -- > > Key: STORM-3708 > URL: https://issues.apache.org/jira/browse/STORM-3708 > Project: Apache Storm > Issue Type: Improvement >Reporter: Bipin Prasad >Assignee: Bipin Prasad >Priority: Major > Fix For: 2.3.0 > > Time Spent: 1h > Remaining Estimate: 0h > > LOG messages in ConstraintSolverConfig class should include topology id. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3721) Change child pom.xml reference to parent pom.xml
[ https://issues.apache.org/jira/browse/STORM-3721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3721. - Fix Version/s: 2.3.0 Resolution: Fixed Thanks [~bipinprasad]. I merged this to master (10a1e1ec434742ccb1d28d22411a915c75ed74d2) > Change child pom.xml reference to parent pom.xml > > > Key: STORM-3721 > URL: https://issues.apache.org/jira/browse/STORM-3721 > Project: Apache Storm > Issue Type: Improvement > Components: build >Reporter: Bipin Prasad >Assignee: Bipin Prasad >Priority: Minor > Fix For: 2.3.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Several (but not all) module pom.xml files refer to the parent specifying > only the directory name. The correct syntax (as per Example 2 on > [https://maven.apache.org/guides/introduction/introduction-to-the-pom.html)] > is to include the relative path to parent pom.xml > {code:java} > > com.mycompany.app > my-app > 1 > ../parent/pom.xml > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3709) Reject topology submission if missing spout
[ https://issues.apache.org/jira/browse/STORM-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3709. - Fix Version/s: 2.3.0 Resolution: Fixed Thanks [~bipinprasad]. I merged this to master (51f5464fcdda3b416053ff67bba4a40cb257b3a5) > Reject topology submission if missing spout > --- > > Key: STORM-3709 > URL: https://issues.apache.org/jira/browse/STORM-3709 > Project: Apache Storm > Issue Type: Improvement > Components: storm-server >Reporter: Bipin Prasad >Assignee: Bipin Prasad >Priority: Major > Fix For: 2.3.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Topologies without spout cannot be scheduled. These topologies should be > rejected at submission time. > > 2020-10-28 19:40:26.608 o.a.s.s.r.s.s.BaseResourceAwareStrategy > pool-21-thread-1 [ERROR] Topology test_topo_t04_01:Cannot find a Spout! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3706) Cluster.needsSchedulingRas always succeeds
[ https://issues.apache.org/jira/browse/STORM-3706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3706. - Fix Version/s: 2.3.0 Resolution: Fixed Thanks [~bipinprasad]. I merged this to master (19eb699061a1e2ffb7e05576985905d1da1ca1e5) > Cluster.needsSchedulingRas always succeeds > -- > > Key: STORM-3706 > URL: https://issues.apache.org/jira/browse/STORM-3706 > Project: Apache Storm > Issue Type: Improvement >Reporter: Bipin Prasad >Assignee: Bipin Prasad >Priority: Major > Fix For: 2.3.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Prior to refactoring of Resource Aware scheduling, Cluster.needsSchedulingRas > always succeeded. Add tests to ensure this method returns correct value. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (STORM-3715) Add Caching to HDFS BlobStore
[ https://issues.apache.org/jira/browse/STORM-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232525#comment-17232525 ] Ethan Li commented on STORM-3715: - Thanks [~kishorvpatil]. I merged this to master (873bb8820605eb7e3be7c9c9033536edebf7ea11) > Add Caching to HDFS BlobStore > - > > Key: STORM-3715 > URL: https://issues.apache.org/jira/browse/STORM-3715 > Project: Apache Storm > Issue Type: Improvement > Components: blobstore >Reporter: Kishor Patil >Assignee: Kishor Patil >Priority: Major > Fix For: 2.3.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Adding some cache might help lower the frequency of hdfs calls by the > HdfsBlobStore while reporting replicationCount and meta for permission checks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3715) Add Caching to HDFS BlobStore
[ https://issues.apache.org/jira/browse/STORM-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3715. - Fix Version/s: 2.3.0 Resolution: Fixed > Add Caching to HDFS BlobStore > - > > Key: STORM-3715 > URL: https://issues.apache.org/jira/browse/STORM-3715 > Project: Apache Storm > Issue Type: Improvement > Components: blobstore >Reporter: Kishor Patil >Assignee: Kishor Patil >Priority: Major > Fix For: 2.3.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Adding some cache might help lower the frequency of hdfs calls by the > HdfsBlobStore while reporting replicationCount and meta for permission checks. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (STORM-3681) Enable basic Travis ARM CI job
[ https://issues.apache.org/jira/browse/STORM-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li resolved STORM-3681. - Fix Version/s: 2.3.0 Resolution: Fixed > Enable basic Travis ARM CI job > -- > > Key: STORM-3681 > URL: https://issues.apache.org/jira/browse/STORM-3681 > Project: Apache Storm > Issue Type: Sub-task >Reporter: liusheng >Assignee: liusheng >Priority: Major > Fix For: 2.3.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Until now, we have tried many times about adding Travis ARM CI jobs for > Storm, and have made some small fixes. For Storm itself, we have tested > building and running tests, it looks there isn't any block issue for ARM64 > platform. But in my testing, the Travis CI system seems have some problems > about supporting ARM, there are some existed issues in Travis community about > this, for example: > # ARM CI job hang no reason: > [https://travis-ci.community/t/output-is-truncated-heavily-in-arm64-when-a-command-hangs/7630] > # Disk quota exceeded in ARM CI job: > [https://travis-ci.community/t/disk-quota-exceeded-on-ppc64le/8006/4] > But it is OK to enable basic ARM CI just like *s390x* support does see: > https://issues.apache.org/jira/browse/STORM-3401 > We have verfied and it run OK, see: [https://github.com/liusheng/storm/pulls] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Issue Comment Deleted] (STORM-3681) Enable basic Travis ARM CI job
[ https://issues.apache.org/jira/browse/STORM-3681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li updated STORM-3681: Comment: was deleted (was: [~seanlau] found that arm64-graviton2 is not supported on travis-ci.org. Looking for solutions) > Enable basic Travis ARM CI job > -- > > Key: STORM-3681 > URL: https://issues.apache.org/jira/browse/STORM-3681 > Project: Apache Storm > Issue Type: Sub-task >Reporter: liusheng >Assignee: liusheng >Priority: Major > Fix For: 2.3.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Until now, we have tried many times about adding Travis ARM CI jobs for > Storm, and have made some small fixes. For Storm itself, we have tested > building and running tests, it looks there isn't any block issue for ARM64 > platform. But in my testing, the Travis CI system seems have some problems > about supporting ARM, there are some existed issues in Travis community about > this, for example: > # ARM CI job hang no reason: > [https://travis-ci.community/t/output-is-truncated-heavily-in-arm64-when-a-command-hangs/7630] > # Disk quota exceeded in ARM CI job: > [https://travis-ci.community/t/disk-quota-exceeded-on-ppc64le/8006/4] > But it is OK to enable basic ARM CI just like *s390x* support does see: > https://issues.apache.org/jira/browse/STORM-3401 > We have verfied and it run OK, see: [https://github.com/liusheng/storm/pulls] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (STORM-3401) There is no ARM CI for Storm
[ https://issues.apache.org/jira/browse/STORM-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232435#comment-17232435 ] Ethan Li commented on STORM-3401: - reopened due to unresolved issues in STORM-3711 > There is no ARM CI for Storm > > > Key: STORM-3401 > URL: https://issues.apache.org/jira/browse/STORM-3401 > Project: Apache Storm > Issue Type: Improvement > Components: build >Reporter: Yikun Jiang >Priority: Major > > Now the CI of storm (in github) is handled by travis-ci. While the test is > running under x86 ARCH, the arm ARCH is missing. This leads an problem that > we don't have a way to test every pull request that if it'll break the storm > deployment on arm or not. > We should add a CI system that support ARM ARCH. Using it, storm can > officially support arm release in the future. Here I'd like to introduce > OpenLab to the community. [OpenLab|https://openlabtesting.org/] is a open > source CI system that can test any open source software on either x86 or arm > ARCH, it's mainly used by github projects. Now some > [projects|https://github.com/theopenlab/openlab-zuul-jobs/blob/master/zuul.d/jobs.yaml] > has integrated it already. Such as containerd (a graduated CNCF project, the > arm build will be triggerd in every PR, > [https://github.com/containerd/containerd/pulls]), terraform and so on. > OpenLab uses the open source CI software [Zuul > |https://github.com/openstack-infra/zuul] for CI system. Zuul is used by > OpenStack community as well. integrating with OpneLab is quite easy using its > github app. All config info is open source as well. > If apache storm community has interested with it, I can help for the > integration. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (STORM-3711) Enable all the modules in ARM CI
[ https://issues.apache.org/jira/browse/STORM-3711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ethan Li reopened STORM-3711: - [~seanlau] found that arm64-graviton2 is not supported on travis-ci.org. Looking for solutions > Enable all the modules in ARM CI > > > Key: STORM-3711 > URL: https://issues.apache.org/jira/browse/STORM-3711 > Project: Apache Storm > Issue Type: Sub-task >Reporter: liusheng >Assignee: liusheng >Priority: Major > Fix For: 2.3.0 > > Time Spent: 40m > Remaining Estimate: 0h > > In the https://issues.apache.org/jira/browse/STORM-3681 We have enabled the > ARM 64 CI with only "Client" module configured, because there are some issues > about Travis CI due to its ARM > resources performance, now Travis CI has the new ARM resources > "arm64-graviton2" provided by AWS, please see[1]. Now we can switch to use > the "arm64-graviton2" resources and enable all the modules on ARM CI. > [https://docs.travis-ci.com/user/multi-cpu-architectures] -- This message was sent by Atlassian Jira (v8.3.4#803005)