[jira] [Comment Edited] (STORM-2983) Some topologies not working properly
[ https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424819#comment-16424819 ] Jungtaek Lim edited comment on STORM-2983 at 4/4/18 12:50 AM: -- If you're not concerned about retaining the optimization then let's remove it. As explained enough (single worker should be minor one in production use cases), I think we don't need to put it back again. Other concerns look like the things to discuss, since we don't look like having consensus. That's why [~ethanli] would like to file new issues regarding them, and IMHO I think it may be better to initiate them in dev. mailing list so that we can share the concerns and discuss them. As you seem to think I'm missing, I may need to explain my thought below: 1. Even we want to retain the optimization, I think I proposed "the right way" to do it in my comment. Checking topology worker count is "simplest" one for activating/deactivating transfer thread, but "right one" is checking whether outgoing stream exists in that worker. Multiple workers can be run independently (not communicating with each other) depending on how scheduler plans. Isn't it? We never utilized topology worker count in specific worker, and I think worker still has enough information to deal with such optimization without knowing topology worker count. 2. Regarding modifying topology configuration, I think I already put my 2 cent on last comment. Whenever I need to check which value I set in configuration, I'm checking the value in UI. If the value is different than what user put before, we still need to explain why it is happening to avoid confusion. That's not ideal. I think the topology configuration should be immutable one, though we don't guarantee it from the code. Maybe we would need to have context object like StormContext? 3. We already have StormTopology as well as TopologyContext which provide necessary information. Dynamic informations like assignment are handled by worker. The number of worker in topology is not exposed directly, but can be calculated from Assignment. (Please refer [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/worker/WorkerState.java#L357] and [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/generated/Assignment.java]) Based on 2, I would try to calculate it if it really needs. 4. We construct the configuration from combination of command line, code, conf. file, etc. We assume they're not changing during runtime, which makes us safe to read the configuration multiple times in different places. There's a chance for the assumption to be broken if we modify the value in runtime. 5. I admit that it is too ideal for the state of current Storm, but the count of worker in topology can be increased or decreased in runtime if we address elasticity. If we deal with this via relaunching all the workers then the value can be considered as static, but if not, the value will be dynamic which topology configuration is not doable to contain it. was (Author: kabhwan): If you're not concerned about retaining the optimization then let's remove it. As explained enough (single worker should be minor one in production use cases), I think we don't need to put it back again. Other concerns look like the things to discuss, since we don't look like having consensus. That's why [~ethanli] would like to file new issues regarding them, and IMHO I think it may be better to initiate them in dev. mailing list so that we can share the concerns and discuss them. As you seem to think I'm missing, I may need to explain my thought below: 1. Even we want to retain the optimization, I think I proposed "the right way" to do it in my comment. Checking topology worker count is "simplest" one for activating/deactivating transfer thread, but "right one" is checking whether outgoing stream exists in that worker. Multiple workers can be run independently (not communicating with each other) depending on how scheduler plans. Isn't it? We never utilized topology worker count in specific worker, and I think worker still has enough information to deal with such optimization without knowing topology worker count. 2. Regarding modifying topology configuration, I think I already put my 2 cent on last comment. Whenever I need to check which value I set in configuration, I'm checking the value in UI. If the value is different than what user put before, we still need to explain why it is happening to avoid confusion. That's not ideal. I think the topology configuration should be immutable one, though we don't guarantee it from the code. Maybe we would need to have context object like StormContext? 3. We already have StormTopology as well as TopologyContext which provide necessary information. Dynamic informations like assignment ar
[jira] [Commented] (STORM-2983) Some topologies not working properly
[ https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424819#comment-16424819 ] Jungtaek Lim commented on STORM-2983: - If you're not concerned about retaining the optimization then let's remove it. As explained enough (single worker should be minor one in production use cases), I think we don't need to put it back again. Other concerns look like the things to discuss, since we don't look like having consensus. That's why [~ethanli] would like to file new issues regarding them, and IMHO I think it may be better to initiate them in dev. mailing list so that we can share the concerns and discuss them. As you seem to think I'm missing, I may need to explain my thought below: 1. Even we want to retain the optimization, I think I proposed "the right way" to do it in my comment. Checking topology worker count is "simplest" one for activating/deactivating transfer thread, but "right one" is checking whether outgoing stream exists in that worker. Multiple workers can be run independently (not communicating with each other) depending on how scheduler plans. Isn't it? We never utilized topology worker count in specific worker, and I think worker still has enough information to deal with such optimization without knowing topology worker count. 2. Regarding modifying topology configuration, I think I already put my 2 cent on last comment. Whenever I need to check which value I set in configuration, I'm checking the value in UI. If the value is different than what user put before, we still need to explain why it is happening to avoid confusion. That's not ideal. I think the topology configuration should be immutable one, though we don't guarantee it from the code. Maybe we would need to have context object like StormContext? 3. We already have StormTopology as well as TopologyContext which provide necessary information. Dynamic informations like assignment are handled by worker. The number of worker in topology is not exposed directly, but can be calculated from Assignment. (Please refer [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/worker/WorkerState.java#L357] and [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/generated/Assignment.java]) Based on 2, I would try to calculate it if it really needs. 4. We construct the configuration from combination of command line, code, conf. file, etc. We assume they're not changing during runtime, which makes us safe to read the configuration multiple times in different places. There's a chance for the assumption to be broken if we modify the value in runtime. 5. I admit that it is too ideal for the state of current Storm, but the count of worker in topology can be increased or decreased in runtime. If we deal with this via relaunching all the workers then the value can be considered as static, but if not, the value will be dynamic which topology configuration is not doable to contain it. > Some topologies not working properly > - > > Key: STORM-2983 > URL: https://issues.apache.org/jira/browse/STORM-2983 > Project: Apache Storm > Issue Type: Bug >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > For example, > {code:java} > bin/storm jar storm-loadgen-*.jar > org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 > --counters 1 -c topology.debug=true > {code} > on ResourceAwareScheduler not working properly. > With default cluster settings, there will be only one __acker-executor and it > will be on a separate worker. And it looks like the __acker-executor was not > able to receive messages from spouts and bolts. And spouts and bolts > continued to retry sending messages to acker. It then led to another problem: > STORM-2970 > I tried to run on storm right before > [https://github.com/apache/storm/pull/2502] and right after and confirmed > that this bug should be related to it -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (STORM-2983) Some topologies not working properly
[ https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424776#comment-16424776 ] Roshan Naik edited comment on STORM-2983 at 4/3/18 11:48 PM: - [~kabhwan] i think you are again missing what I am stressing. We need a way to in code to check the worker count (for internal and user code). Not be removing code that does such checks. I am not concerned about retaining this one optimization. But there is no point removing reasonable code and then put it back again. I would like to see why we cannot either fix the topology.workers or provide something else as substitute. So i ask again... why cant we fix this setting. was (Author: roshan_naik): [~kabhwan] i think you are again missing what I am stressing. We need a way to in code to do this check the worker count (for internal and user code). Not be removing code that does such checks. I am not concerned about retaining this one optimization. There is no point removing reasonable code and then put it back again. I would like to see why we cannot either fix the topology.workers or provide something else as substitue. > Some topologies not working properly > - > > Key: STORM-2983 > URL: https://issues.apache.org/jira/browse/STORM-2983 > Project: Apache Storm > Issue Type: Bug >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > For example, > {code:java} > bin/storm jar storm-loadgen-*.jar > org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 > --counters 1 -c topology.debug=true > {code} > on ResourceAwareScheduler not working properly. > With default cluster settings, there will be only one __acker-executor and it > will be on a separate worker. And it looks like the __acker-executor was not > able to receive messages from spouts and bolts. And spouts and bolts > continued to retry sending messages to acker. It then led to another problem: > STORM-2970 > I tried to run on storm right before > [https://github.com/apache/storm/pull/2502] and right after and confirmed > that this bug should be related to it -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2983) Some topologies not working properly
[ https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424776#comment-16424776 ] Roshan Naik commented on STORM-2983: [~kabhwan] i think you are again missing what I am stressing. We need a way to in code to do this check the worker count (for internal and user code). Not be removing code that does such checks. I am not concerned about retaining this one optimization. There is no point removing reasonable code and then put it back again. I would like to see why we cannot either fix the topology.workers or provide something else as substitue. > Some topologies not working properly > - > > Key: STORM-2983 > URL: https://issues.apache.org/jira/browse/STORM-2983 > Project: Apache Storm > Issue Type: Bug >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > For example, > {code:java} > bin/storm jar storm-loadgen-*.jar > org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 > --counters 1 -c topology.debug=true > {code} > on ResourceAwareScheduler not working properly. > With default cluster settings, there will be only one __acker-executor and it > will be on a separate worker. And it looks like the __acker-executor was not > able to receive messages from spouts and bolts. And spouts and bolts > continued to retry sending messages to acker. It then led to another problem: > STORM-2970 > I tried to run on storm right before > [https://github.com/apache/storm/pull/2502] and right after and confirmed > that this bug should be related to it -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2983) Some topologies not working properly
[ https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424755#comment-16424755 ] Jungtaek Lim commented on STORM-2983: - I'm still +1 to the proposed fix. Even better if we can avoid removing the optimization code with determining there's outgoing connection, but it's OK to skip if it is not trivial. Topology configuration is what users set up the values, and users will get confused if some of them are dynamically updated. And we already know that not all the configurations may be valid according to the scheduler. "topology.workers" is popular one so it doesn't look trivial to ignore such value, but I think we can explain why it is ignored in RAS document. If we really want to maintain runtime status, better to have separated one instead of modifying topology configuration to retain users' input. > Some topologies not working properly > - > > Key: STORM-2983 > URL: https://issues.apache.org/jira/browse/STORM-2983 > Project: Apache Storm > Issue Type: Bug >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > For example, > {code:java} > bin/storm jar storm-loadgen-*.jar > org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 > --counters 1 -c topology.debug=true > {code} > on ResourceAwareScheduler not working properly. > With default cluster settings, there will be only one __acker-executor and it > will be on a separate worker. And it looks like the __acker-executor was not > able to receive messages from spouts and bolts. And spouts and bolts > continued to retry sending messages to acker. It then led to another problem: > STORM-2970 > I tried to run on storm right before > [https://github.com/apache/storm/pull/2502] and right after and confirmed > that this bug should be related to it -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2983) Some topologies not working properly
[ https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424680#comment-16424680 ] Roshan Naik commented on STORM-2983: My Q was not about finding and fixing all things that RAS breaks. It is limited to fixing this issue with the worker count that is causing the breakage. Any code we try to delete now to unblock you would be useful to revive once the worker count issue is fixed. - Instead of the proposed fix, can you update the worker count to the right value ? - else, could you consider unblocking your work by commenting out the optimization in your local build ? > Some topologies not working properly > - > > Key: STORM-2983 > URL: https://issues.apache.org/jira/browse/STORM-2983 > Project: Apache Storm > Issue Type: Bug >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > For example, > {code:java} > bin/storm jar storm-loadgen-*.jar > org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 > --counters 1 -c topology.debug=true > {code} > on ResourceAwareScheduler not working properly. > With default cluster settings, there will be only one __acker-executor and it > will be on a separate worker. And it looks like the __acker-executor was not > able to receive messages from spouts and bolts. And spouts and bolts > continued to retry sending messages to acker. It then led to another problem: > STORM-2970 > I tried to run on storm right before > [https://github.com/apache/storm/pull/2502] and right after and confirmed > that this bug should be related to it -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-3017) Refactor pacemaker code
[ https://issues.apache.org/jira/browse/STORM-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated STORM-3017: -- Labels: pull-request-available (was: ) > Refactor pacemaker code > --- > > Key: STORM-3017 > URL: https://issues.apache.org/jira/browse/STORM-3017 > Project: Apache Storm > Issue Type: Improvement >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Labels: pull-request-available > > As [~Srdo] pointed out in [https://github.com/apache/storm/pull/2587] and > [~revans2] pointed out in [https://github.com/apache/storm/pull/2608] , there > are some pacemaker code we need to revisit and refactor especially for > exception handling and retrying. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (STORM-2983) Some topologies not working properly
[ https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424644#comment-16424644 ] Ethan Li edited comment on STORM-2983 at 4/3/18 9:38 PM: - We definitely need to read through the code and see what RAS breaks and fix them. That's why I suggested to file a separate Jira to track it and have somebody to do it. I want to get this in because it blocks my current work for quite a long time and I want to get that done sooner than later, if this is OK with you. was (Author: ethanli): As stated before, we definitely need to read through the code and see what RAS breaks and fix them. That's why I suggested to file a separate Jira to track it and have somebody to do it. I want to get this in because it blocks my current work for quite a long time and I want to get that done sooner than later, if this is OK with you. > Some topologies not working properly > - > > Key: STORM-2983 > URL: https://issues.apache.org/jira/browse/STORM-2983 > Project: Apache Storm > Issue Type: Bug >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > For example, > {code:java} > bin/storm jar storm-loadgen-*.jar > org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 > --counters 1 -c topology.debug=true > {code} > on ResourceAwareScheduler not working properly. > With default cluster settings, there will be only one __acker-executor and it > will be on a separate worker. And it looks like the __acker-executor was not > able to receive messages from spouts and bolts. And spouts and bolts > continued to retry sending messages to acker. It then led to another problem: > STORM-2970 > I tried to run on storm right before > [https://github.com/apache/storm/pull/2502] and right after and confirmed > that this bug should be related to it -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2983) Some topologies not working properly
[ https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424644#comment-16424644 ] Ethan Li commented on STORM-2983: - As state before, we definitely need to read through the code and see what RAS breaks and fix them. That's why I suggested to file a separate Jira to track it and have somebody to do it. I want to get this in because it blocks my current work for quite a long time and I want to get that done sooner than later, if this is OK with you. > Some topologies not working properly > - > > Key: STORM-2983 > URL: https://issues.apache.org/jira/browse/STORM-2983 > Project: Apache Storm > Issue Type: Bug >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > For example, > {code:java} > bin/storm jar storm-loadgen-*.jar > org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 > --counters 1 -c topology.debug=true > {code} > on ResourceAwareScheduler not working properly. > With default cluster settings, there will be only one __acker-executor and it > will be on a separate worker. And it looks like the __acker-executor was not > able to receive messages from spouts and bolts. And spouts and bolts > continued to retry sending messages to acker. It then led to another problem: > STORM-2970 > I tried to run on storm right before > [https://github.com/apache/storm/pull/2502] and right after and confirmed > that this bug should be related to it -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (STORM-2983) Some topologies not working properly
[ https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424644#comment-16424644 ] Ethan Li edited comment on STORM-2983 at 4/3/18 9:36 PM: - As stated before, we definitely need to read through the code and see what RAS breaks and fix them. That's why I suggested to file a separate Jira to track it and have somebody to do it. I want to get this in because it blocks my current work for quite a long time and I want to get that done sooner than later, if this is OK with you. was (Author: ethanli): As state before, we definitely need to read through the code and see what RAS breaks and fix them. That's why I suggested to file a separate Jira to track it and have somebody to do it. I want to get this in because it blocks my current work for quite a long time and I want to get that done sooner than later, if this is OK with you. > Some topologies not working properly > - > > Key: STORM-2983 > URL: https://issues.apache.org/jira/browse/STORM-2983 > Project: Apache Storm > Issue Type: Bug >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > For example, > {code:java} > bin/storm jar storm-loadgen-*.jar > org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 > --counters 1 -c topology.debug=true > {code} > on ResourceAwareScheduler not working properly. > With default cluster settings, there will be only one __acker-executor and it > will be on a separate worker. And it looks like the __acker-executor was not > able to receive messages from spouts and bolts. And spouts and bolts > continued to retry sending messages to acker. It then led to another problem: > STORM-2970 > I tried to run on storm right before > [https://github.com/apache/storm/pull/2502] and right after and confirmed > that this bug should be related to it -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2983) Some topologies not working properly
[ https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424616#comment-16424616 ] Roshan Naik commented on STORM-2983: As stated before, the core issue is not the specific optimization. Along with removing this optimization we would have to remove all other code that checks the same. It is impt to get RAS working but needs to be done correctly. My concern is that (independent of the existence/absence of this optimization)... the mechanism to check the worker count by storm internal code or end user code is broken. Fixing that will address RAS as well as does not need to remove similar code. So would like to ask my prev question again: - Is there good reason why topology.workers cannot be dynamically updated to reflect the actual worker count. > Some topologies not working properly > - > > Key: STORM-2983 > URL: https://issues.apache.org/jira/browse/STORM-2983 > Project: Apache Storm > Issue Type: Bug >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > For example, > {code:java} > bin/storm jar storm-loadgen-*.jar > org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 > --counters 1 -c topology.debug=true > {code} > on ResourceAwareScheduler not working properly. > With default cluster settings, there will be only one __acker-executor and it > will be on a separate worker. And it looks like the __acker-executor was not > able to receive messages from spouts and bolts. And spouts and bolts > continued to retry sending messages to acker. It then led to another problem: > STORM-2970 > I tried to run on storm right before > [https://github.com/apache/storm/pull/2502] and right after and confirmed > that this bug should be related to it -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (STORM-2994) KafkaSpout consumes messages but doesn't commit offsets
[ https://issues.apache.org/jira/browse/STORM-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stig Rohde Døssing resolved STORM-2994. --- Resolution: Fixed > KafkaSpout consumes messages but doesn't commit offsets > --- > > Key: STORM-2994 > URL: https://issues.apache.org/jira/browse/STORM-2994 > Project: Apache Storm > Issue Type: Bug > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.1.2, 1.0.6, 1.2.1 >Reporter: Rui Abreu >Assignee: Rui Abreu >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0, 1.1.3, 1.0.7, 1.2.2 > > Time Spent: 3.5h > Remaining Estimate: 0h > > A topology that consumes from two different Kafka clusters: 0.10.1.1 and > 0.10.2.1. > Spouts consuming from 0.10.2.1 have a low lag (and regularly commit offsets) > The Spout that consumes from 0.10.1.1 exhibits either: > 1- Unknown lag > 2- Lag that increments as the Spout reads messages from Kafka > > In DEBUG, Offset manager logs: "topic-partition has NO offsets ready to be > committed", despite continuing to consume messages. > Several configuration tweaks were tried, including setting maxRetries to 1, > in case messages with a lower offset were being retried (logs didn't show it, > though) > offsetCommitPeriodMs was also lowered to no avail. > The only configuration that works is to have > ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG=true, but this is undesired since > we lose processing guarantees. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2994) KafkaSpout consumes messages but doesn't commit offsets
[ https://issues.apache.org/jira/browse/STORM-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424583#comment-16424583 ] Stig Rohde Døssing commented on STORM-2994: --- Thanks [~RAbreu], merged to master, 1.x, 1.1.x and 1.0.x branches. > KafkaSpout consumes messages but doesn't commit offsets > --- > > Key: STORM-2994 > URL: https://issues.apache.org/jira/browse/STORM-2994 > Project: Apache Storm > Issue Type: Bug > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.1.2, 1.0.6, 1.2.1 >Reporter: Rui Abreu >Assignee: Rui Abreu >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0, 1.1.3, 1.0.7, 1.2.2 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > A topology that consumes from two different Kafka clusters: 0.10.1.1 and > 0.10.2.1. > Spouts consuming from 0.10.2.1 have a low lag (and regularly commit offsets) > The Spout that consumes from 0.10.1.1 exhibits either: > 1- Unknown lag > 2- Lag that increments as the Spout reads messages from Kafka > > In DEBUG, Offset manager logs: "topic-partition has NO offsets ready to be > committed", despite continuing to consume messages. > Several configuration tweaks were tried, including setting maxRetries to 1, > in case messages with a lower offset were being retried (logs didn't show it, > though) > offsetCommitPeriodMs was also lowered to no avail. > The only configuration that works is to have > ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG=true, but this is undesired since > we lose processing guarantees. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (STORM-2994) KafkaSpout consumes messages but doesn't commit offsets
[ https://issues.apache.org/jira/browse/STORM-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424583#comment-16424583 ] Stig Rohde Døssing edited comment on STORM-2994 at 4/3/18 8:53 PM: --- Thanks [~RAbreu], merged to master, 1.x, 1.1.x and 1.0.x branches. Keep up the good work. was (Author: srdo): Thanks [~RAbreu], merged to master, 1.x, 1.1.x and 1.0.x branches. > KafkaSpout consumes messages but doesn't commit offsets > --- > > Key: STORM-2994 > URL: https://issues.apache.org/jira/browse/STORM-2994 > Project: Apache Storm > Issue Type: Bug > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.1.2, 1.0.6, 1.2.1 >Reporter: Rui Abreu >Assignee: Rui Abreu >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0, 1.1.3, 1.0.7, 1.2.2 > > Time Spent: 3h 40m > Remaining Estimate: 0h > > A topology that consumes from two different Kafka clusters: 0.10.1.1 and > 0.10.2.1. > Spouts consuming from 0.10.2.1 have a low lag (and regularly commit offsets) > The Spout that consumes from 0.10.1.1 exhibits either: > 1- Unknown lag > 2- Lag that increments as the Spout reads messages from Kafka > > In DEBUG, Offset manager logs: "topic-partition has NO offsets ready to be > committed", despite continuing to consume messages. > Several configuration tweaks were tried, including setting maxRetries to 1, > in case messages with a lower offset were being retried (logs didn't show it, > though) > offsetCommitPeriodMs was also lowered to no avail. > The only configuration that works is to have > ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG=true, but this is undesired since > we lose processing guarantees. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-2994) KafkaSpout consumes messages but doesn't commit offsets
[ https://issues.apache.org/jira/browse/STORM-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stig Rohde Døssing updated STORM-2994: -- Fix Version/s: 1.0.7 > KafkaSpout consumes messages but doesn't commit offsets > --- > > Key: STORM-2994 > URL: https://issues.apache.org/jira/browse/STORM-2994 > Project: Apache Storm > Issue Type: Bug > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.1.2, 1.0.6, 1.2.1 >Reporter: Rui Abreu >Assignee: Rui Abreu >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0, 1.1.3, 1.0.7, 1.2.2 > > Time Spent: 3.5h > Remaining Estimate: 0h > > A topology that consumes from two different Kafka clusters: 0.10.1.1 and > 0.10.2.1. > Spouts consuming from 0.10.2.1 have a low lag (and regularly commit offsets) > The Spout that consumes from 0.10.1.1 exhibits either: > 1- Unknown lag > 2- Lag that increments as the Spout reads messages from Kafka > > In DEBUG, Offset manager logs: "topic-partition has NO offsets ready to be > committed", despite continuing to consume messages. > Several configuration tweaks were tried, including setting maxRetries to 1, > in case messages with a lower offset were being retried (logs didn't show it, > though) > offsetCommitPeriodMs was also lowered to no avail. > The only configuration that works is to have > ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG=true, but this is undesired since > we lose processing guarantees. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-2994) KafkaSpout consumes messages but doesn't commit offsets
[ https://issues.apache.org/jira/browse/STORM-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stig Rohde Døssing updated STORM-2994: -- Affects Version/s: (was: 1.1.0) 1.0.6 > KafkaSpout consumes messages but doesn't commit offsets > --- > > Key: STORM-2994 > URL: https://issues.apache.org/jira/browse/STORM-2994 > Project: Apache Storm > Issue Type: Bug > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.1.2, 1.0.6, 1.2.1 >Reporter: Rui Abreu >Assignee: Rui Abreu >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0, 1.1.3, 1.0.7, 1.2.2 > > Time Spent: 3.5h > Remaining Estimate: 0h > > A topology that consumes from two different Kafka clusters: 0.10.1.1 and > 0.10.2.1. > Spouts consuming from 0.10.2.1 have a low lag (and regularly commit offsets) > The Spout that consumes from 0.10.1.1 exhibits either: > 1- Unknown lag > 2- Lag that increments as the Spout reads messages from Kafka > > In DEBUG, Offset manager logs: "topic-partition has NO offsets ready to be > committed", despite continuing to consume messages. > Several configuration tweaks were tried, including setting maxRetries to 1, > in case messages with a lower offset were being retried (logs didn't show it, > though) > offsetCommitPeriodMs was also lowered to no avail. > The only configuration that works is to have > ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG=true, but this is undesired since > we lose processing guarantees. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-2994) KafkaSpout consumes messages but doesn't commit offsets
[ https://issues.apache.org/jira/browse/STORM-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stig Rohde Døssing updated STORM-2994: -- Fix Version/s: 1.1.3 > KafkaSpout consumes messages but doesn't commit offsets > --- > > Key: STORM-2994 > URL: https://issues.apache.org/jira/browse/STORM-2994 > Project: Apache Storm > Issue Type: Bug > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.1.0, 1.1.2, 1.2.1 >Reporter: Rui Abreu >Assignee: Rui Abreu >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0, 1.1.3, 1.2.2 > > Time Spent: 3.5h > Remaining Estimate: 0h > > A topology that consumes from two different Kafka clusters: 0.10.1.1 and > 0.10.2.1. > Spouts consuming from 0.10.2.1 have a low lag (and regularly commit offsets) > The Spout that consumes from 0.10.1.1 exhibits either: > 1- Unknown lag > 2- Lag that increments as the Spout reads messages from Kafka > > In DEBUG, Offset manager logs: "topic-partition has NO offsets ready to be > committed", despite continuing to consume messages. > Several configuration tweaks were tried, including setting maxRetries to 1, > in case messages with a lower offset were being retried (logs didn't show it, > though) > offsetCommitPeriodMs was also lowered to no avail. > The only configuration that works is to have > ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG=true, but this is undesired since > we lose processing guarantees. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-2994) KafkaSpout consumes messages but doesn't commit offsets
[ https://issues.apache.org/jira/browse/STORM-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stig Rohde Døssing updated STORM-2994: -- Fix Version/s: 1.2.2 > KafkaSpout consumes messages but doesn't commit offsets > --- > > Key: STORM-2994 > URL: https://issues.apache.org/jira/browse/STORM-2994 > Project: Apache Storm > Issue Type: Bug > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.1.0, 1.1.2, 1.2.1 >Reporter: Rui Abreu >Assignee: Rui Abreu >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0, 1.2.2 > > Time Spent: 3.5h > Remaining Estimate: 0h > > A topology that consumes from two different Kafka clusters: 0.10.1.1 and > 0.10.2.1. > Spouts consuming from 0.10.2.1 have a low lag (and regularly commit offsets) > The Spout that consumes from 0.10.1.1 exhibits either: > 1- Unknown lag > 2- Lag that increments as the Spout reads messages from Kafka > > In DEBUG, Offset manager logs: "topic-partition has NO offsets ready to be > committed", despite continuing to consume messages. > Several configuration tweaks were tried, including setting maxRetries to 1, > in case messages with a lower offset were being retried (logs didn't show it, > though) > offsetCommitPeriodMs was also lowered to no avail. > The only configuration that works is to have > ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG=true, but this is undesired since > we lose processing guarantees. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (STORM-3003) Nimbus should cache assignments to avoid excess state polling
[ https://issues.apache.org/jira/browse/STORM-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kishor Patil closed STORM-3003. --- Resolution: Duplicate > Nimbus should cache assignments to avoid excess state polling > - > > Key: STORM-3003 > URL: https://issues.apache.org/jira/browse/STORM-3003 > Project: Apache Storm > Issue Type: Improvement > Components: storm-server >Reporter: Kishor Patil >Assignee: Kishor Patil >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > Since nimbus ( scheduler generates assignments) it can cache it instead of > polling for it from ZK or other state manager. > This would improve scheduling iteration time, as well as all UI pages that > require assignment information. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-3003) Nimbus should cache assignments to avoid excess state polling
[ https://issues.apache.org/jira/browse/STORM-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424262#comment-16424262 ] Kishor Patil commented on STORM-3003: - Since STORM-2693 already take care of caching assignments. I would mark this duplicate and close it. > Nimbus should cache assignments to avoid excess state polling > - > > Key: STORM-3003 > URL: https://issues.apache.org/jira/browse/STORM-3003 > Project: Apache Storm > Issue Type: Improvement > Components: storm-server >Reporter: Kishor Patil >Assignee: Kishor Patil >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > Since nimbus ( scheduler generates assignments) it can cache it instead of > polling for it from ZK or other state manager. > This would improve scheduling iteration time, as well as all UI pages that > require assignment information. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-2994) KafkaSpout consumes messages but doesn't commit offsets
[ https://issues.apache.org/jira/browse/STORM-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stig Rohde Døssing updated STORM-2994: -- Fix Version/s: 2.0.0 > KafkaSpout consumes messages but doesn't commit offsets > --- > > Key: STORM-2994 > URL: https://issues.apache.org/jira/browse/STORM-2994 > Project: Apache Storm > Issue Type: Bug > Components: storm-kafka-client >Affects Versions: 2.0.0, 1.1.0, 1.1.2, 1.2.1 >Reporter: Rui Abreu >Assignee: Rui Abreu >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > A topology that consumes from two different Kafka clusters: 0.10.1.1 and > 0.10.2.1. > Spouts consuming from 0.10.2.1 have a low lag (and regularly commit offsets) > The Spout that consumes from 0.10.1.1 exhibits either: > 1- Unknown lag > 2- Lag that increments as the Spout reads messages from Kafka > > In DEBUG, Offset manager logs: "topic-partition has NO offsets ready to be > committed", despite continuing to consume messages. > Several configuration tweaks were tried, including setting maxRetries to 1, > in case messages with a lower offset were being retried (logs didn't show it, > though) > offsetCommitPeriodMs was also lowered to no avail. > The only configuration that works is to have > ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG=true, but this is undesired since > we lose processing guarantees. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2983) Some topologies not working properly
[ https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424191#comment-16424191 ] Ethan Li commented on STORM-2983: - I think it's not worth it to add complexity for this trivial optimization. How about filing a separate Jira for this RAS bug and use [https://github.com/apache/storm/pull/2614] for this particular bug? > Some topologies not working properly > - > > Key: STORM-2983 > URL: https://issues.apache.org/jira/browse/STORM-2983 > Project: Apache Storm > Issue Type: Bug >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > For example, > {code:java} > bin/storm jar storm-loadgen-*.jar > org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 > --counters 1 -c topology.debug=true > {code} > on ResourceAwareScheduler not working properly. > With default cluster settings, there will be only one __acker-executor and it > will be on a separate worker. And it looks like the __acker-executor was not > able to receive messages from spouts and bolts. And spouts and bolts > continued to retry sending messages to acker. It then led to another problem: > STORM-2970 > I tried to run on storm right before > [https://github.com/apache/storm/pull/2502] and right after and confirmed > that this bug should be related to it -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (STORM-3020) Fix race condition is async localizer
Robert Joseph Evans created STORM-3020: -- Summary: Fix race condition is async localizer Key: STORM-3020 URL: https://issues.apache.org/jira/browse/STORM-3020 Project: Apache Storm Issue Type: Bug Components: storm-server Affects Versions: 2.0.0 Reporter: Robert Joseph Evans Assignee: Robert Joseph Evans I think this impacts all of the code that uses asynclocalizer, but I need to check to be sure. As part of a review of a different pull request against AsyncLocalizer I noticed that requestDownloadTopologyBlobs is synchronized, but everything it does is async, but there is a race in one of the async pieces where we read from a map, and then try to update the map later, all outside of a lock. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (STORM-3019) StormReporter doesn't have information on where it's running
Simon Cooper created STORM-3019: --- Summary: StormReporter doesn't have information on where it's running Key: STORM-3019 URL: https://issues.apache.org/jira/browse/STORM-3019 Project: Apache Storm Issue Type: Improvement Affects Versions: 1.2.0 Reporter: Simon Cooper Metrics2 StormReporter implementations don't have a lot of information on where they're running. In particular, they are missing: * Whether they are running for nimbus, supervisor, or worker, and what the worker is. * The full deployed config - it's just provided with the basic topology configuration, not the full effective configuration as specified at topology deployment * A TopologyContext object -- This message was sent by Atlassian JIRA (v7.6.3#76005)