[jira] [Comment Edited] (STORM-2983) Some topologies not working properly

2018-04-03 Thread Jungtaek Lim (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424819#comment-16424819
 ] 

Jungtaek Lim edited comment on STORM-2983 at 4/4/18 12:50 AM:
--

If you're not concerned about retaining the optimization then let's remove it. 
As explained enough (single worker should be minor one in production use 
cases), I think we don't need to put it back again. Other concerns look like 
the things to discuss, since we don't look like having consensus. That's why 
[~ethanli] would like to file new issues regarding them, and IMHO I think it 
may be better to initiate them in dev. mailing list so that we can share the 
concerns and discuss them.

As you seem to think I'm missing, I may need to explain my thought below:

1. Even we want to retain the optimization, I think I proposed "the right way" 
to do it in my comment. Checking topology worker count is "simplest" one for 
activating/deactivating transfer thread, but "right one" is checking whether 
outgoing stream exists in that worker. Multiple workers can be run 
independently (not communicating with each other) depending on how scheduler 
plans. Isn't it? We never utilized topology worker count in specific worker, 
and I think worker still has enough information to deal with such optimization 
without knowing topology worker count.

2. Regarding modifying topology configuration, I think I already put my 2 cent 
on last comment. Whenever I need to check which value I set in configuration, 
I'm checking the value in UI. If the value is different than what user put 
before, we still need to explain why it is happening to avoid confusion. That's 
not ideal. I think the topology configuration should be immutable one, though 
we don't guarantee it from the code. Maybe we would need to have context object 
like StormContext?

3. We already have StormTopology as well as TopologyContext which provide 
necessary information. Dynamic informations like assignment are handled by 
worker. The number of worker in topology is not exposed directly, but can be 
calculated from Assignment. (Please refer 
[https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/worker/WorkerState.java#L357]
 and 
[https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/generated/Assignment.java])
 Based on 2, I would try to calculate it if it really needs.

4. We construct the configuration from combination of command line, code, conf. 
file, etc. We assume they're not changing during runtime, which makes us safe 
to read the configuration multiple times in different places. There's a chance 
for the assumption to be broken if we modify the value in runtime.

5. I admit that it is too ideal for the state of current Storm, but the count 
of worker in topology can be increased or decreased in runtime if we address 
elasticity. If we deal with this via relaunching all the workers then the value 
can be considered as static, but if not, the value will be dynamic which 
topology configuration is not doable to contain it.

 


was (Author: kabhwan):
If you're not concerned about retaining the optimization then let's remove it. 
As explained enough (single worker should be minor one in production use 
cases), I think we don't need to put it back again. Other concerns look like 
the things to discuss, since we don't look like having consensus. That's why 
[~ethanli] would like to file new issues regarding them, and IMHO I think it 
may be better to initiate them in dev. mailing list so that we can share the 
concerns and discuss them.

As you seem to think I'm missing, I may need to explain my thought below:

1. Even we want to retain the optimization, I think I proposed "the right way" 
to do it in my comment. Checking topology worker count is "simplest" one for 
activating/deactivating transfer thread, but "right one" is checking whether 
outgoing stream exists in that worker. Multiple workers can be run 
independently (not communicating with each other) depending on how scheduler 
plans. Isn't it? We never utilized topology worker count in specific worker, 
and I think worker still has enough information to deal with such optimization 
without knowing topology worker count.

2. Regarding modifying topology configuration, I think I already put my 2 cent 
on last comment. Whenever I need to check which value I set in configuration, 
I'm checking the value in UI. If the value is different than what user put 
before, we still need to explain why it is happening to avoid confusion. That's 
not ideal. I think the topology configuration should be immutable one, though 
we don't guarantee it from the code. Maybe we would need to have context object 
like StormContext?

3. We already have StormTopology as well as TopologyContext which provide 
necessary information. Dynamic informations like assignment ar

[jira] [Commented] (STORM-2983) Some topologies not working properly

2018-04-03 Thread Jungtaek Lim (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424819#comment-16424819
 ] 

Jungtaek Lim commented on STORM-2983:
-

If you're not concerned about retaining the optimization then let's remove it. 
As explained enough (single worker should be minor one in production use 
cases), I think we don't need to put it back again. Other concerns look like 
the things to discuss, since we don't look like having consensus. That's why 
[~ethanli] would like to file new issues regarding them, and IMHO I think it 
may be better to initiate them in dev. mailing list so that we can share the 
concerns and discuss them.

As you seem to think I'm missing, I may need to explain my thought below:

1. Even we want to retain the optimization, I think I proposed "the right way" 
to do it in my comment. Checking topology worker count is "simplest" one for 
activating/deactivating transfer thread, but "right one" is checking whether 
outgoing stream exists in that worker. Multiple workers can be run 
independently (not communicating with each other) depending on how scheduler 
plans. Isn't it? We never utilized topology worker count in specific worker, 
and I think worker still has enough information to deal with such optimization 
without knowing topology worker count.

2. Regarding modifying topology configuration, I think I already put my 2 cent 
on last comment. Whenever I need to check which value I set in configuration, 
I'm checking the value in UI. If the value is different than what user put 
before, we still need to explain why it is happening to avoid confusion. That's 
not ideal. I think the topology configuration should be immutable one, though 
we don't guarantee it from the code. Maybe we would need to have context object 
like StormContext?

3. We already have StormTopology as well as TopologyContext which provide 
necessary information. Dynamic informations like assignment are handled by 
worker. The number of worker in topology is not exposed directly, but can be 
calculated from Assignment. (Please refer 
[https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/worker/WorkerState.java#L357]
 and 
[https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/generated/Assignment.java])
 Based on 2, I would try to calculate it if it really needs.

4. We construct the configuration from combination of command line, code, conf. 
file, etc. We assume they're not changing during runtime, which makes us safe 
to read the configuration multiple times in different places. There's a chance 
for the assumption to be broken if we modify the value in runtime.

5. I admit that it is too ideal for the state of current Storm, but the count 
of worker in topology can be increased or decreased in runtime. If we deal with 
this via relaunching all the workers then the value can be considered as 
static, but if not, the value will be dynamic which topology configuration is 
not doable to contain it.

 

> Some topologies not working properly 
> -
>
> Key: STORM-2983
> URL: https://issues.apache.org/jira/browse/STORM-2983
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Ethan Li
>Assignee: Ethan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For example,
> {code:java}
> bin/storm jar storm-loadgen-*.jar 
> org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 
> --counters 1 -c topology.debug=true
> {code}
> on ResourceAwareScheduler not working properly.
> With default cluster settings, there will be only one __acker-executor and it 
> will be on a separate worker. And it looks like the __acker-executor was not 
> able to receive messages from spouts and bolts. And spouts and bolts 
> continued to retry sending messages to acker. It then led to another problem:
> STORM-2970
> I tried to run on storm right before 
> [https://github.com/apache/storm/pull/2502] and right after and confirmed 
> that this bug should be related to it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (STORM-2983) Some topologies not working properly

2018-04-03 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424776#comment-16424776
 ] 

Roshan Naik edited comment on STORM-2983 at 4/3/18 11:48 PM:
-

[~kabhwan] i think you are again missing what I am stressing. 

We need a way to in code to check the worker count (for internal and user 
code). Not be removing code that does such checks. I am not concerned about 
retaining this one optimization. 

But there is no point removing reasonable code and then put it back again.

I would like to see why we cannot either fix the topology.workers or  provide 
something else as substitute.

So i ask again... why cant we fix this setting. 


was (Author: roshan_naik):
[~kabhwan] i think you are again missing what I am stressing. 

We need a way to in code to do this check the worker count (for internal and 
user code). Not be removing code that does such checks. I am not concerned 
about retaining this one optimization. 

There is no point removing reasonable code and then put it back again.

I would like to see why we cannot either fix the topology.workers or  provide 
something else as substitue.

 

> Some topologies not working properly 
> -
>
> Key: STORM-2983
> URL: https://issues.apache.org/jira/browse/STORM-2983
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Ethan Li
>Assignee: Ethan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For example,
> {code:java}
> bin/storm jar storm-loadgen-*.jar 
> org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 
> --counters 1 -c topology.debug=true
> {code}
> on ResourceAwareScheduler not working properly.
> With default cluster settings, there will be only one __acker-executor and it 
> will be on a separate worker. And it looks like the __acker-executor was not 
> able to receive messages from spouts and bolts. And spouts and bolts 
> continued to retry sending messages to acker. It then led to another problem:
> STORM-2970
> I tried to run on storm right before 
> [https://github.com/apache/storm/pull/2502] and right after and confirmed 
> that this bug should be related to it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2983) Some topologies not working properly

2018-04-03 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424776#comment-16424776
 ] 

Roshan Naik commented on STORM-2983:


[~kabhwan] i think you are again missing what I am stressing. 

We need a way to in code to do this check the worker count (for internal and 
user code). Not be removing code that does such checks. I am not concerned 
about retaining this one optimization. 

There is no point removing reasonable code and then put it back again.

I would like to see why we cannot either fix the topology.workers or  provide 
something else as substitue.

 

> Some topologies not working properly 
> -
>
> Key: STORM-2983
> URL: https://issues.apache.org/jira/browse/STORM-2983
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Ethan Li
>Assignee: Ethan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For example,
> {code:java}
> bin/storm jar storm-loadgen-*.jar 
> org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 
> --counters 1 -c topology.debug=true
> {code}
> on ResourceAwareScheduler not working properly.
> With default cluster settings, there will be only one __acker-executor and it 
> will be on a separate worker. And it looks like the __acker-executor was not 
> able to receive messages from spouts and bolts. And spouts and bolts 
> continued to retry sending messages to acker. It then led to another problem:
> STORM-2970
> I tried to run on storm right before 
> [https://github.com/apache/storm/pull/2502] and right after and confirmed 
> that this bug should be related to it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2983) Some topologies not working properly

2018-04-03 Thread Jungtaek Lim (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424755#comment-16424755
 ] 

Jungtaek Lim commented on STORM-2983:
-

I'm still +1 to the proposed fix. Even better if we can avoid removing the 
optimization code with determining there's outgoing connection, but it's OK to 
skip if it is not trivial.

Topology configuration is what users set up the values, and users will get 
confused if some of them are dynamically updated. And we already know that not 
all the configurations may be valid according to the scheduler. 
"topology.workers" is popular one so it doesn't look trivial to ignore such 
value, but I think we can explain why it is ignored in RAS document.

If we really want to maintain runtime status, better to have separated one 
instead of modifying topology configuration to retain users' input.

> Some topologies not working properly 
> -
>
> Key: STORM-2983
> URL: https://issues.apache.org/jira/browse/STORM-2983
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Ethan Li
>Assignee: Ethan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For example,
> {code:java}
> bin/storm jar storm-loadgen-*.jar 
> org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 
> --counters 1 -c topology.debug=true
> {code}
> on ResourceAwareScheduler not working properly.
> With default cluster settings, there will be only one __acker-executor and it 
> will be on a separate worker. And it looks like the __acker-executor was not 
> able to receive messages from spouts and bolts. And spouts and bolts 
> continued to retry sending messages to acker. It then led to another problem:
> STORM-2970
> I tried to run on storm right before 
> [https://github.com/apache/storm/pull/2502] and right after and confirmed 
> that this bug should be related to it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2983) Some topologies not working properly

2018-04-03 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424680#comment-16424680
 ] 

Roshan Naik commented on STORM-2983:


My Q was not about finding and fixing all things that RAS breaks. It is limited 
to fixing this issue with the worker count that is causing the breakage. Any 
code we try to delete now to unblock you would be useful to revive once the 
worker count issue is fixed. 

- Instead of the proposed fix, can you update the worker count to the right 
value ?

- else, could you consider unblocking your work by commenting out the 
optimization in your local build ?

> Some topologies not working properly 
> -
>
> Key: STORM-2983
> URL: https://issues.apache.org/jira/browse/STORM-2983
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Ethan Li
>Assignee: Ethan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For example,
> {code:java}
> bin/storm jar storm-loadgen-*.jar 
> org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 
> --counters 1 -c topology.debug=true
> {code}
> on ResourceAwareScheduler not working properly.
> With default cluster settings, there will be only one __acker-executor and it 
> will be on a separate worker. And it looks like the __acker-executor was not 
> able to receive messages from spouts and bolts. And spouts and bolts 
> continued to retry sending messages to acker. It then led to another problem:
> STORM-2970
> I tried to run on storm right before 
> [https://github.com/apache/storm/pull/2502] and right after and confirmed 
> that this bug should be related to it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-3017) Refactor pacemaker code

2018-04-03 Thread ASF GitHub Bot (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated STORM-3017:
--
Labels: pull-request-available  (was: )

> Refactor pacemaker code
> ---
>
> Key: STORM-3017
> URL: https://issues.apache.org/jira/browse/STORM-3017
> Project: Apache Storm
>  Issue Type: Improvement
>Reporter: Ethan Li
>Assignee: Ethan Li
>Priority: Major
>  Labels: pull-request-available
>
> As [~Srdo] pointed out in [https://github.com/apache/storm/pull/2587] and 
> [~revans2] pointed out in [https://github.com/apache/storm/pull/2608] , there 
> are some pacemaker code we need to revisit and refactor especially for 
> exception handling and retrying.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (STORM-2983) Some topologies not working properly

2018-04-03 Thread Ethan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424644#comment-16424644
 ] 

Ethan Li edited comment on STORM-2983 at 4/3/18 9:38 PM:
-

We definitely need to read through the code and see what RAS breaks and fix 
them. That's why I suggested to file a separate Jira to track it and have 
somebody to do it.  I want to get this in because it blocks my current work for 
quite a long time and I want to get that done sooner than later, if this is OK 
with you.


was (Author: ethanli):
As stated before, we definitely need to read through the code and see what RAS 
breaks and fix them. That's why I suggested to file a separate Jira to track it 
and have somebody to do it.  I want to get this in because it blocks my current 
work for quite a long time and I want to get that done sooner than later, if 
this is OK with you.

> Some topologies not working properly 
> -
>
> Key: STORM-2983
> URL: https://issues.apache.org/jira/browse/STORM-2983
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Ethan Li
>Assignee: Ethan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For example,
> {code:java}
> bin/storm jar storm-loadgen-*.jar 
> org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 
> --counters 1 -c topology.debug=true
> {code}
> on ResourceAwareScheduler not working properly.
> With default cluster settings, there will be only one __acker-executor and it 
> will be on a separate worker. And it looks like the __acker-executor was not 
> able to receive messages from spouts and bolts. And spouts and bolts 
> continued to retry sending messages to acker. It then led to another problem:
> STORM-2970
> I tried to run on storm right before 
> [https://github.com/apache/storm/pull/2502] and right after and confirmed 
> that this bug should be related to it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2983) Some topologies not working properly

2018-04-03 Thread Ethan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424644#comment-16424644
 ] 

Ethan Li commented on STORM-2983:
-

As state before, we definitely need to read through the code and see what RAS 
breaks and fix them. That's why I suggested to file a separate Jira to track it 
and have somebody to do it.  I want to get this in because it blocks my current 
work for quite a long time and I want to get that done sooner than later, if 
this is OK with you.

> Some topologies not working properly 
> -
>
> Key: STORM-2983
> URL: https://issues.apache.org/jira/browse/STORM-2983
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Ethan Li
>Assignee: Ethan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For example,
> {code:java}
> bin/storm jar storm-loadgen-*.jar 
> org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 
> --counters 1 -c topology.debug=true
> {code}
> on ResourceAwareScheduler not working properly.
> With default cluster settings, there will be only one __acker-executor and it 
> will be on a separate worker. And it looks like the __acker-executor was not 
> able to receive messages from spouts and bolts. And spouts and bolts 
> continued to retry sending messages to acker. It then led to another problem:
> STORM-2970
> I tried to run on storm right before 
> [https://github.com/apache/storm/pull/2502] and right after and confirmed 
> that this bug should be related to it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (STORM-2983) Some topologies not working properly

2018-04-03 Thread Ethan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424644#comment-16424644
 ] 

Ethan Li edited comment on STORM-2983 at 4/3/18 9:36 PM:
-

As stated before, we definitely need to read through the code and see what RAS 
breaks and fix them. That's why I suggested to file a separate Jira to track it 
and have somebody to do it.  I want to get this in because it blocks my current 
work for quite a long time and I want to get that done sooner than later, if 
this is OK with you.


was (Author: ethanli):
As state before, we definitely need to read through the code and see what RAS 
breaks and fix them. That's why I suggested to file a separate Jira to track it 
and have somebody to do it.  I want to get this in because it blocks my current 
work for quite a long time and I want to get that done sooner than later, if 
this is OK with you.

> Some topologies not working properly 
> -
>
> Key: STORM-2983
> URL: https://issues.apache.org/jira/browse/STORM-2983
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Ethan Li
>Assignee: Ethan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For example,
> {code:java}
> bin/storm jar storm-loadgen-*.jar 
> org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 
> --counters 1 -c topology.debug=true
> {code}
> on ResourceAwareScheduler not working properly.
> With default cluster settings, there will be only one __acker-executor and it 
> will be on a separate worker. And it looks like the __acker-executor was not 
> able to receive messages from spouts and bolts. And spouts and bolts 
> continued to retry sending messages to acker. It then led to another problem:
> STORM-2970
> I tried to run on storm right before 
> [https://github.com/apache/storm/pull/2502] and right after and confirmed 
> that this bug should be related to it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2983) Some topologies not working properly

2018-04-03 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424616#comment-16424616
 ] 

Roshan Naik commented on STORM-2983:


As stated before, the core issue is not the specific optimization. Along with 
removing this optimization we would have to remove all other code that checks 
the same. It is impt to get RAS working but needs to be done correctly. 

My concern is that (independent of the existence/absence of this 
optimization)...  the mechanism to check the worker count by storm internal 
code or end user code is broken. Fixing that will address RAS as well as does 
not need to remove similar code.

So would like to ask my prev question again:

 
 - Is there good reason why topology.workers cannot be dynamically updated to 
reflect the actual worker count. 

 

> Some topologies not working properly 
> -
>
> Key: STORM-2983
> URL: https://issues.apache.org/jira/browse/STORM-2983
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Ethan Li
>Assignee: Ethan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For example,
> {code:java}
> bin/storm jar storm-loadgen-*.jar 
> org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 
> --counters 1 -c topology.debug=true
> {code}
> on ResourceAwareScheduler not working properly.
> With default cluster settings, there will be only one __acker-executor and it 
> will be on a separate worker. And it looks like the __acker-executor was not 
> able to receive messages from spouts and bolts. And spouts and bolts 
> continued to retry sending messages to acker. It then led to another problem:
> STORM-2970
> I tried to run on storm right before 
> [https://github.com/apache/storm/pull/2502] and right after and confirmed 
> that this bug should be related to it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (STORM-2994) KafkaSpout consumes messages but doesn't commit offsets

2018-04-03 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/STORM-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stig Rohde Døssing resolved STORM-2994.
---
Resolution: Fixed

> KafkaSpout consumes messages but doesn't commit offsets
> ---
>
> Key: STORM-2994
> URL: https://issues.apache.org/jira/browse/STORM-2994
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.1.2, 1.0.6, 1.2.1
>Reporter: Rui Abreu
>Assignee: Rui Abreu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0, 1.1.3, 1.0.7, 1.2.2
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> A topology that consumes from two different Kafka clusters: 0.10.1.1 and 
> 0.10.2.1.
> Spouts consuming from 0.10.2.1 have a low lag (and regularly commit offsets) 
> The Spout that consumes from 0.10.1.1 exhibits either:
> 1- Unknown lag
> 2- Lag that increments as the Spout reads messages from Kafka
>  
> In DEBUG, Offset manager logs: "topic-partition has NO offsets ready to be 
> committed", despite continuing to consume messages.
> Several configuration tweaks were tried, including setting maxRetries to 1, 
> in case messages with a lower offset were being retried (logs didn't show it, 
> though)
> offsetCommitPeriodMs was also  lowered to no avail.
> The only configuration that works is to have 
> ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG=true, but this is undesired   since 
> we lose processing guarantees.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2994) KafkaSpout consumes messages but doesn't commit offsets

2018-04-03 Thread JIRA

[ 
https://issues.apache.org/jira/browse/STORM-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424583#comment-16424583
 ] 

Stig Rohde Døssing commented on STORM-2994:
---

Thanks [~RAbreu], merged to master, 1.x, 1.1.x and 1.0.x branches.

> KafkaSpout consumes messages but doesn't commit offsets
> ---
>
> Key: STORM-2994
> URL: https://issues.apache.org/jira/browse/STORM-2994
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.1.2, 1.0.6, 1.2.1
>Reporter: Rui Abreu
>Assignee: Rui Abreu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0, 1.1.3, 1.0.7, 1.2.2
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> A topology that consumes from two different Kafka clusters: 0.10.1.1 and 
> 0.10.2.1.
> Spouts consuming from 0.10.2.1 have a low lag (and regularly commit offsets) 
> The Spout that consumes from 0.10.1.1 exhibits either:
> 1- Unknown lag
> 2- Lag that increments as the Spout reads messages from Kafka
>  
> In DEBUG, Offset manager logs: "topic-partition has NO offsets ready to be 
> committed", despite continuing to consume messages.
> Several configuration tweaks were tried, including setting maxRetries to 1, 
> in case messages with a lower offset were being retried (logs didn't show it, 
> though)
> offsetCommitPeriodMs was also  lowered to no avail.
> The only configuration that works is to have 
> ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG=true, but this is undesired   since 
> we lose processing guarantees.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (STORM-2994) KafkaSpout consumes messages but doesn't commit offsets

2018-04-03 Thread JIRA

[ 
https://issues.apache.org/jira/browse/STORM-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424583#comment-16424583
 ] 

Stig Rohde Døssing edited comment on STORM-2994 at 4/3/18 8:53 PM:
---

Thanks [~RAbreu], merged to master, 1.x, 1.1.x and 1.0.x branches. Keep up the 
good work.


was (Author: srdo):
Thanks [~RAbreu], merged to master, 1.x, 1.1.x and 1.0.x branches.

> KafkaSpout consumes messages but doesn't commit offsets
> ---
>
> Key: STORM-2994
> URL: https://issues.apache.org/jira/browse/STORM-2994
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.1.2, 1.0.6, 1.2.1
>Reporter: Rui Abreu
>Assignee: Rui Abreu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0, 1.1.3, 1.0.7, 1.2.2
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> A topology that consumes from two different Kafka clusters: 0.10.1.1 and 
> 0.10.2.1.
> Spouts consuming from 0.10.2.1 have a low lag (and regularly commit offsets) 
> The Spout that consumes from 0.10.1.1 exhibits either:
> 1- Unknown lag
> 2- Lag that increments as the Spout reads messages from Kafka
>  
> In DEBUG, Offset manager logs: "topic-partition has NO offsets ready to be 
> committed", despite continuing to consume messages.
> Several configuration tweaks were tried, including setting maxRetries to 1, 
> in case messages with a lower offset were being retried (logs didn't show it, 
> though)
> offsetCommitPeriodMs was also  lowered to no avail.
> The only configuration that works is to have 
> ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG=true, but this is undesired   since 
> we lose processing guarantees.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-2994) KafkaSpout consumes messages but doesn't commit offsets

2018-04-03 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/STORM-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stig Rohde Døssing updated STORM-2994:
--
Fix Version/s: 1.0.7

> KafkaSpout consumes messages but doesn't commit offsets
> ---
>
> Key: STORM-2994
> URL: https://issues.apache.org/jira/browse/STORM-2994
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.1.2, 1.0.6, 1.2.1
>Reporter: Rui Abreu
>Assignee: Rui Abreu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0, 1.1.3, 1.0.7, 1.2.2
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> A topology that consumes from two different Kafka clusters: 0.10.1.1 and 
> 0.10.2.1.
> Spouts consuming from 0.10.2.1 have a low lag (and regularly commit offsets) 
> The Spout that consumes from 0.10.1.1 exhibits either:
> 1- Unknown lag
> 2- Lag that increments as the Spout reads messages from Kafka
>  
> In DEBUG, Offset manager logs: "topic-partition has NO offsets ready to be 
> committed", despite continuing to consume messages.
> Several configuration tweaks were tried, including setting maxRetries to 1, 
> in case messages with a lower offset were being retried (logs didn't show it, 
> though)
> offsetCommitPeriodMs was also  lowered to no avail.
> The only configuration that works is to have 
> ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG=true, but this is undesired   since 
> we lose processing guarantees.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-2994) KafkaSpout consumes messages but doesn't commit offsets

2018-04-03 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/STORM-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stig Rohde Døssing updated STORM-2994:
--
Affects Version/s: (was: 1.1.0)
   1.0.6

> KafkaSpout consumes messages but doesn't commit offsets
> ---
>
> Key: STORM-2994
> URL: https://issues.apache.org/jira/browse/STORM-2994
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.1.2, 1.0.6, 1.2.1
>Reporter: Rui Abreu
>Assignee: Rui Abreu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0, 1.1.3, 1.0.7, 1.2.2
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> A topology that consumes from two different Kafka clusters: 0.10.1.1 and 
> 0.10.2.1.
> Spouts consuming from 0.10.2.1 have a low lag (and regularly commit offsets) 
> The Spout that consumes from 0.10.1.1 exhibits either:
> 1- Unknown lag
> 2- Lag that increments as the Spout reads messages from Kafka
>  
> In DEBUG, Offset manager logs: "topic-partition has NO offsets ready to be 
> committed", despite continuing to consume messages.
> Several configuration tweaks were tried, including setting maxRetries to 1, 
> in case messages with a lower offset were being retried (logs didn't show it, 
> though)
> offsetCommitPeriodMs was also  lowered to no avail.
> The only configuration that works is to have 
> ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG=true, but this is undesired   since 
> we lose processing guarantees.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-2994) KafkaSpout consumes messages but doesn't commit offsets

2018-04-03 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/STORM-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stig Rohde Døssing updated STORM-2994:
--
Fix Version/s: 1.1.3

> KafkaSpout consumes messages but doesn't commit offsets
> ---
>
> Key: STORM-2994
> URL: https://issues.apache.org/jira/browse/STORM-2994
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.1.0, 1.1.2, 1.2.1
>Reporter: Rui Abreu
>Assignee: Rui Abreu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0, 1.1.3, 1.2.2
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> A topology that consumes from two different Kafka clusters: 0.10.1.1 and 
> 0.10.2.1.
> Spouts consuming from 0.10.2.1 have a low lag (and regularly commit offsets) 
> The Spout that consumes from 0.10.1.1 exhibits either:
> 1- Unknown lag
> 2- Lag that increments as the Spout reads messages from Kafka
>  
> In DEBUG, Offset manager logs: "topic-partition has NO offsets ready to be 
> committed", despite continuing to consume messages.
> Several configuration tweaks were tried, including setting maxRetries to 1, 
> in case messages with a lower offset were being retried (logs didn't show it, 
> though)
> offsetCommitPeriodMs was also  lowered to no avail.
> The only configuration that works is to have 
> ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG=true, but this is undesired   since 
> we lose processing guarantees.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-2994) KafkaSpout consumes messages but doesn't commit offsets

2018-04-03 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/STORM-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stig Rohde Døssing updated STORM-2994:
--
Fix Version/s: 1.2.2

> KafkaSpout consumes messages but doesn't commit offsets
> ---
>
> Key: STORM-2994
> URL: https://issues.apache.org/jira/browse/STORM-2994
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.1.0, 1.1.2, 1.2.1
>Reporter: Rui Abreu
>Assignee: Rui Abreu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0, 1.2.2
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> A topology that consumes from two different Kafka clusters: 0.10.1.1 and 
> 0.10.2.1.
> Spouts consuming from 0.10.2.1 have a low lag (and regularly commit offsets) 
> The Spout that consumes from 0.10.1.1 exhibits either:
> 1- Unknown lag
> 2- Lag that increments as the Spout reads messages from Kafka
>  
> In DEBUG, Offset manager logs: "topic-partition has NO offsets ready to be 
> committed", despite continuing to consume messages.
> Several configuration tweaks were tried, including setting maxRetries to 1, 
> in case messages with a lower offset were being retried (logs didn't show it, 
> though)
> offsetCommitPeriodMs was also  lowered to no avail.
> The only configuration that works is to have 
> ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG=true, but this is undesired   since 
> we lose processing guarantees.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (STORM-3003) Nimbus should cache assignments to avoid excess state polling

2018-04-03 Thread Kishor Patil (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kishor Patil closed STORM-3003.
---
Resolution: Duplicate

> Nimbus should cache assignments to avoid excess state polling
> -
>
> Key: STORM-3003
> URL: https://issues.apache.org/jira/browse/STORM-3003
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-server
>Reporter: Kishor Patil
>Assignee: Kishor Patil
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Since nimbus ( scheduler generates assignments) it can cache it instead of 
> polling for it from ZK or other state manager.
> This would improve scheduling iteration time, as well as all UI pages that 
> require assignment information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-3003) Nimbus should cache assignments to avoid excess state polling

2018-04-03 Thread Kishor Patil (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-3003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424262#comment-16424262
 ] 

Kishor Patil commented on STORM-3003:
-

Since STORM-2693 already take care of caching assignments. I would mark this 
duplicate and close it.

> Nimbus should cache assignments to avoid excess state polling
> -
>
> Key: STORM-3003
> URL: https://issues.apache.org/jira/browse/STORM-3003
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-server
>Reporter: Kishor Patil
>Assignee: Kishor Patil
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Since nimbus ( scheduler generates assignments) it can cache it instead of 
> polling for it from ZK or other state manager.
> This would improve scheduling iteration time, as well as all UI pages that 
> require assignment information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-2994) KafkaSpout consumes messages but doesn't commit offsets

2018-04-03 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/STORM-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stig Rohde Døssing updated STORM-2994:
--
Fix Version/s: 2.0.0

> KafkaSpout consumes messages but doesn't commit offsets
> ---
>
> Key: STORM-2994
> URL: https://issues.apache.org/jira/browse/STORM-2994
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-kafka-client
>Affects Versions: 2.0.0, 1.1.0, 1.1.2, 1.2.1
>Reporter: Rui Abreu
>Assignee: Rui Abreu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> A topology that consumes from two different Kafka clusters: 0.10.1.1 and 
> 0.10.2.1.
> Spouts consuming from 0.10.2.1 have a low lag (and regularly commit offsets) 
> The Spout that consumes from 0.10.1.1 exhibits either:
> 1- Unknown lag
> 2- Lag that increments as the Spout reads messages from Kafka
>  
> In DEBUG, Offset manager logs: "topic-partition has NO offsets ready to be 
> committed", despite continuing to consume messages.
> Several configuration tweaks were tried, including setting maxRetries to 1, 
> in case messages with a lower offset were being retried (logs didn't show it, 
> though)
> offsetCommitPeriodMs was also  lowered to no avail.
> The only configuration that works is to have 
> ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG=true, but this is undesired   since 
> we lose processing guarantees.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2983) Some topologies not working properly

2018-04-03 Thread Ethan Li (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424191#comment-16424191
 ] 

Ethan Li commented on STORM-2983:
-

I think it's not worth it to add complexity for this trivial optimization. How 
about filing a separate Jira for this RAS bug and use 
[https://github.com/apache/storm/pull/2614] for this particular bug?

> Some topologies not working properly 
> -
>
> Key: STORM-2983
> URL: https://issues.apache.org/jira/browse/STORM-2983
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Ethan Li
>Assignee: Ethan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For example,
> {code:java}
> bin/storm jar storm-loadgen-*.jar 
> org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 
> --counters 1 -c topology.debug=true
> {code}
> on ResourceAwareScheduler not working properly.
> With default cluster settings, there will be only one __acker-executor and it 
> will be on a separate worker. And it looks like the __acker-executor was not 
> able to receive messages from spouts and bolts. And spouts and bolts 
> continued to retry sending messages to acker. It then led to another problem:
> STORM-2970
> I tried to run on storm right before 
> [https://github.com/apache/storm/pull/2502] and right after and confirmed 
> that this bug should be related to it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (STORM-3020) Fix race condition is async localizer

2018-04-03 Thread Robert Joseph Evans (JIRA)
Robert Joseph Evans created STORM-3020:
--

 Summary: Fix race condition is async localizer
 Key: STORM-3020
 URL: https://issues.apache.org/jira/browse/STORM-3020
 Project: Apache Storm
  Issue Type: Bug
  Components: storm-server
Affects Versions: 2.0.0
Reporter: Robert Joseph Evans
Assignee: Robert Joseph Evans


I think this impacts all of the code that uses asynclocalizer, but I need to 
check to be sure.

As part of a review of a different pull request against AsyncLocalizer I 
noticed that requestDownloadTopologyBlobs is synchronized, but everything it 
does is async, but there is a race in one of the async pieces where we read 
from a map, and then try to update the map later, all outside of a lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (STORM-3019) StormReporter doesn't have information on where it's running

2018-04-03 Thread Simon Cooper (JIRA)
Simon Cooper created STORM-3019:
---

 Summary: StormReporter doesn't have information on where it's 
running
 Key: STORM-3019
 URL: https://issues.apache.org/jira/browse/STORM-3019
 Project: Apache Storm
  Issue Type: Improvement
Affects Versions: 1.2.0
Reporter: Simon Cooper


Metrics2 StormReporter implementations don't have a lot of information on where 
they're running. In particular, they are missing:
 * Whether they are running for nimbus, supervisor, or worker, and what the 
worker is.
 * The full deployed config - it's just provided with the basic topology 
configuration, not the full effective configuration as specified at topology 
deployment
 * A TopologyContext object



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)