[jira] [Commented] (STORM-3685) Detect Loops in Topology at Submit

2021-10-06 Thread Roshan Naik (Jira)


[ 
https://issues.apache.org/jira/browse/STORM-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17425334#comment-17425334
 ] 

Roshan Naik commented on STORM-3685:


[~bipinprasad] & [~ethanli]

a very very late  +1 on this approach of warning but not banning cycles.

Sometimes there is a need for cycles if downstream operators need to report 
progress or send metadata to upstream operators. This can be used for flow 
control where sources can be suspended/resumed based on progress notifications 
from downstream operators. For example:
 * For (event-time based) windowed joins, can be useful to synchronize streams 
involved around event time boundaries. Allowing both streams to progress 
without any flow controls can cause one stream to move far too ahead compared 
to the other stream and that will lead to having too open windows opens for too 
long... and thus  lead to excessive memory consumption.
 * The [Kappa+ Architecture|https://www.youtube.com/watch?v=4qSlsYogALo] uses 
flow control on sources for enabling usecases like efficient backfills using 
native streaming mode execution.
 * As you know, technically, any topology with ACKing enabled has a cycle. And 
{{topology.max.spout.pending}} is a form of flow control.

> Detect Loops in Topology at Submit
> --
>
> Key: STORM-3685
> URL: https://issues.apache.org/jira/browse/STORM-3685
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-client
>Reporter: Bipin Prasad
>Assignee: Bipin Prasad
>Priority: Major
> Fix For: 2.3.0
>
>  Time Spent: 7h 40m
>  Remaining Estimate: 0h
>
> Topology graph is expected to be a Directed Acyclic Graph (DAG). Detect 
> cycles in DAG when Topology is submitted and prevent cycles in Topology.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (STORM-3314) Acker redesign

2019-01-20 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/STORM-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747458#comment-16747458
 ] 

Roshan Naik commented on STORM-3314:


 

*Summary of the above two approaches:*

+First approach+ tries to eliminate ACKer bolt by employing the existing spouts 
and bolts. It doesnt try to mitigate interworker ACK msgs. Also ACK msgs can 
get stuck behind regular msgs under BP.

+Second approach+ is centered around mitigating interworker ACK msging. Retains 
need to tweak Acker count separately to handle more/lesser spout and bolts... 
most likely a fixed number of them in each worker.

Both eliminate need for fields grouping of ACKs. Both need more thought on 
timeout resets.


*Some data points:*
 * *Multiworker Speed:* In Storm 2, two components on +different workers+ can 
sustain a max throughput of ~*4 mill/sec* over the interworker msg-ing (w/o 
ACKing).
 * *ACKer bolt* speed (Single worker mode):(
 ** An ACKer bolt can sustain a max topo throughput of only ~*1.7 mill/sec* in 
*single worker* mode... when handling an ultra bare minimal topo of 1 spout and 
1 bolt.
 ** ACKer slows down to *950k/sec* mill/sec with adding one more bolt in the 
mix (i.e. ConstSpoutIdBoltNullBoltTopo)
 * *ACKing + Multi worker:*  For 2 workers (1 spout & 1 bolt) ACK mode 
throughput is at ~*900k/s.* 

Ideally, ACKing path within single worker should be faster than interworker 
msging, given the absence of de/serialization and network stack. Clearly both 
the ACKer bolt itself and the interworker msging are big factors... with the 
ACKer bolt being the more significant one. Mitigating only interworker msging, 
will likely leave a lot of potential performance on the table.

Perhaps there is a hybrid approach that combine benefits of both approaches... 
reduce interworker and eliminate ACKer bolt.


*Qs about [~kabhwan]'s approach:*
 # If I understand correctly, in single worker mode there will be no 
difference. And in multiworker mode, when the entire tuple tree falls within 
the same worker, as well as in single worker mode, the perf will be same ?
 # What is the method to compute "fully processed" after tuple tree is split 
into per worker fractions ? Sounds like ACKers will msg each other about 
completion of their partial tree ... and this will boil up to the root ACKER. 

 

*Additional Thoughts:*

There appears to be two properties (which I think are true) that remain 
unexploited in the two approaches and a potential hybrid approach:

- Within the worker process, the msging subsystem guarantees *delivery* and 
there can be no msg loss.
- Message loss happens only when a worker goes down (or gets disconnected from 
the rest of the topo)

 

 

> Acker redesign
> --
>
> Key: STORM-3314
> URL: https://issues.apache.org/jira/browse/STORM-3314
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-client
>Reporter: Roshan Naik
>Priority: Major
>
> *Context:* The ACKing mechanism has come focus as one of the next major 
> bottlenecks to address. The strategy to timeout and replay tuples has issues 
> discussed in STORM-2359
> *Basic idea:* Every bolt will send an ACK msg to its upstream spout/bolt once 
> the tuples it emitted have been *fully processed* by downstream bolts.
> *Determining "fully processed”* : For every incoming (parent) tuple, a bolt 
> can emit 0 or more “child” tuples. the Parent tuple is considered fully 
> processed once a bolt receives ACKs for all the child emits (if any). This 
> basic idea cascades all the way back up to the spout that emitted the root of 
> the tuple tree.
> This means that, when a bolt is finished with all the child emits and it 
> calls ack() no ACK message will be generated (unless there were 0 child 
> emits). The ack() marks the completion of all child emits for a parent tuple. 
> The bolt will emit an ACK to its upstream component once all the ACKs from 
> downstream components have been received.
> *Operational changes:* The existing spouts and bolts don’t need any change. 
> The bolt executor will need to process incoming acks from downstream bolts 
> and send an ACK to its upstream component as needed. In the case of 0 child 
> emits, ack() itself could immediately send the ACK to the upstream component. 
> Field grouping is not applied to ACK messages.
> Total ACK messages: The spout output collector will no longer send an 
> ACK-init message to the ACKer bolt. Other than this, the total number of 
> emitted ACK messages does not change. Instead of the ACKs going to an ACKer 
> bolt, they get spread out among the existing bolts. It appears that this mode 
> may reduce some of the inter-worker traffic of ACK messages.
> *Memory use:* If we use the existing XOR logic from ACKer bolt, we need about 
> 20 bytes per outstanding tuple-tree at each bolt. Assuming an 

[jira] [Comment Edited] (STORM-3314) Acker redesign

2019-01-20 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/STORM-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747458#comment-16747458
 ] 

Roshan Naik edited comment on STORM-3314 at 1/20/19 1:22 PM:
-

 

*Summary of the above two approaches:*

+First approach+ tries to eliminate ACKer bolt by employing the existing spouts 
and bolts. It doesnt try to mitigate interworker ACK msgs. Also ACK msgs can 
get stuck behind regular msgs under BP.

+Second approach+ is centered around mitigating interworker ACK msging. Retains 
need to tweak Acker count separately to handle more/lesser spout and bolts... 
most likely a fixed number of them in each worker.

Both eliminate need for fields grouping of ACKs. Both need more thought on 
timeout resets.

*Some data points:*
 * *Multiworker Speed:* In Storm 2, two components on +different workers+ can 
sustain a max throughput of ~*4 mill/sec* over the interworker msg-ing (w/o 
ACKing).
 * *ACKer bolt* speed (Single worker mode) :
 ** An ACKer bolt can sustain a max topo throughput of only ~*1.7 mill/sec* in 
*single worker* mode... when handling an ultra bare minimal topo of 1 spout and 
1 bolt.
 ** ACKer slows down to *950k/sec* mill/sec with adding one more bolt in the 
mix (i.e. ConstSpoutIdBoltNullBoltTopo)
 * *ACKing + Multi worker:*  For 2 workers (1 spout & 1 bolt) ACK mode 
throughput is at ~*900k/s.* 

Ideally, ACKing path within single worker should be faster than interworker 
msging, given the absence of de/serialization and network stack. Clearly both 
the ACKer bolt itself and the interworker msging are big factors... with the 
ACKer bolt being the more significant one. Mitigating only interworker msging, 
will likely leave a lot of potential performance on the table.

Perhaps there is a hybrid approach that combine benefits of both approaches... 
reduce interworker and eliminate ACKer bolt.

*Qs about [~kabhwan]'s approach:*
 # If I understand correctly, in single worker mode there will be no 
difference. And in multiworker mode, when the entire tuple tree falls within 
the same worker, as well as in single worker mode, the perf will be same ?
 # What is the method to compute "fully processed" after tuple tree is split 
into per worker fractions ? Sounds like ACKers will msg each other about 
completion of their partial tree ... and this will boil up to the root ACKER. 

 

*Additional Thoughts:*

There appears to be two properties (which I think are true) that remain 
unexploited in the two approaches and a potential hybrid approach:
 - Within the worker process, the msging subsystem guarantees *delivery* and 
there can be no msg loss.
 - Message loss happens only when a worker goes down (or gets disconnected from 
the rest of the topo)

 

 


was (Author: roshan_naik):
 

*Summary of the above two approaches:*

+First approach+ tries to eliminate ACKer bolt by employing the existing spouts 
and bolts. It doesnt try to mitigate interworker ACK msgs. Also ACK msgs can 
get stuck behind regular msgs under BP.

+Second approach+ is centered around mitigating interworker ACK msging. Retains 
need to tweak Acker count separately to handle more/lesser spout and bolts... 
most likely a fixed number of them in each worker.

Both eliminate need for fields grouping of ACKs. Both need more thought on 
timeout resets.


*Some data points:*
 * *Multiworker Speed:* In Storm 2, two components on +different workers+ can 
sustain a max throughput of ~*4 mill/sec* over the interworker msg-ing (w/o 
ACKing).
 * *ACKer bolt* speed (Single worker mode):(
 ** An ACKer bolt can sustain a max topo throughput of only ~*1.7 mill/sec* in 
*single worker* mode... when handling an ultra bare minimal topo of 1 spout and 
1 bolt.
 ** ACKer slows down to *950k/sec* mill/sec with adding one more bolt in the 
mix (i.e. ConstSpoutIdBoltNullBoltTopo)
 * *ACKing + Multi worker:*  For 2 workers (1 spout & 1 bolt) ACK mode 
throughput is at ~*900k/s.* 

Ideally, ACKing path within single worker should be faster than interworker 
msging, given the absence of de/serialization and network stack. Clearly both 
the ACKer bolt itself and the interworker msging are big factors... with the 
ACKer bolt being the more significant one. Mitigating only interworker msging, 
will likely leave a lot of potential performance on the table.

Perhaps there is a hybrid approach that combine benefits of both approaches... 
reduce interworker and eliminate ACKer bolt.


*Qs about [~kabhwan]'s approach:*
 # If I understand correctly, in single worker mode there will be no 
difference. And in multiworker mode, when the entire tuple tree falls within 
the same worker, as well as in single worker mode, the perf will be same ?
 # What is the method to compute "fully processed" after tuple tree is split 
into per worker fractions ? Sounds like ACKers will msg each other about 
completion of their partial tree ... and this will boil up to 

[jira] [Commented] (STORM-2359) Revising Message Timeouts

2019-01-17 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/STORM-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745845#comment-16745845
 ] 

Roshan Naik commented on STORM-2359:


Yes indeed. It will be quite nice if JC Tools queues could support peek()ing 
and still be lock free.

> Revising Message Timeouts
> -
>
> Key: STORM-2359
> URL: https://issues.apache.org/jira/browse/STORM-2359
> Project: Apache Storm
>  Issue Type: Sub-task
>  Components: storm-core
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Stig Rohde Døssing
>Priority: Major
> Attachments: STORM-2359-with-auto-reset.ods, STORM-2359.ods
>
>
> A revised strategy for message timeouts is proposed here.
> Design Doc:
>  
> https://docs.google.com/document/d/1am1kO7Wmf17U_Vz5_uyBB2OuSsc4TZQWRvbRhX52n5w/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-3314) Acker redesign

2019-01-17 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-3314:
---
Description: 
*Context:* The ACKing mechanism has come focus as one of the next major 
bottlenecks to address. The strategy to timeout and replay tuples has issues 
discussed in STORM-2359

*Basic idea:* Every bolt will send an ACK msg to its upstream spout/bolt once 
the tuples it emitted have been *fully processed* by downstream bolts.

*Determining "fully processed”* : For every incoming (parent) tuple, a bolt can 
emit 0 or more “child” tuples. the Parent tuple is considered fully processed 
once a bolt receives ACKs for all the child emits (if any). This basic idea 
cascades all the way back up to the spout that emitted the root of the tuple 
tree.

This means that, when a bolt is finished with all the child emits and it calls 
ack() no ACK message will be generated (unless there were 0 child emits). The 
ack() marks the completion of all child emits for a parent tuple. The bolt will 
emit an ACK to its upstream component once all the ACKs from downstream 
components have been received.

*Operational changes:* The existing spouts and bolts don’t need any change. The 
bolt executor will need to process incoming acks from downstream bolts and send 
an ACK to its upstream component as needed. In the case of 0 child emits, ack() 
itself could immediately send the ACK to the upstream component. Field grouping 
is not applied to ACK messages.

Total ACK messages: The spout output collector will no longer send an ACK-init 
message to the ACKer bolt. Other than this, the total number of emitted ACK 
messages does not change. Instead of the ACKs going to an ACKer bolt, they get 
spread out among the existing bolts. It appears that this mode may reduce some 
of the inter-worker traffic of ACK messages.

*Memory use:* If we use the existing XOR logic from ACKer bolt, we need about 
20 bytes per outstanding tuple-tree at each bolt. Assuming an average of say 
50k outstanding tuples at each level, we have 50k*20bytes = 1MB per bolt 
instance. There may be room to do something better than XOR, since we only need 
to track one level of outstanding emits at each bolt.

*Replay:* [needs more thinking] One option is to send REPLAY or TIMEOUT msgs 
upstream. Policy of when to emit them needs more thought. Good to avoid 
Timeouts/replays of inflight tuples under backpressure since this will lead to 
"event tsunami" at the worst possible time. Ideally, if possible, replay should 
be avoided unless tuples have been dropped. Would be nice to avoid sending 
TIMEOUT_RESET msgs upstream when under backpressure ... since they are likely 
to face backpressure as well.

On receiving an ACKs or REPLAYs from downstream components, a bolt needs to 
clears the corresponding 20 bytes tracking info.

 

*Concerns:* ACK tuples traversing upstream means it takes longer to get back to 
Spout.

 

 

*Related Note:* +Why ACKer is slow ?:+

Although lightweigh internally, the ACKer bolt has a huge impact on throughput. 
The latency hit does not appear to be as significant.

I have some thoughts around why ACKer slows down throughput. 

Consider the foll simple topo: 

{{Spout ==> Bolt1 ==> Bolt2}}

If we add an ACKer to the above topo, the Acker bolt receives 3x more incoming 
messages than the Bolt1 & Bolt2. Thats instantly a 3x hit on its throughput.

  was:
*Context:* The ACKing mechanism has come focus as one of the next major 
bottlenecks to address. The strategy to timeout and replay tuples has issues 
discussed in STORM-2359

*Basic idea:* Every bolt will send an ACK msg to its upstream spout/bolt once 
the tuples it emitted have been *fully processed* by downstream bolts.

*Determining "fully processed”* : For every incoming (parent) tuple, a bolt can 
emit 0 or more “child” tuples. the Parent tuple is considered fully processed 
once a bolt receives ACKs for all the child emits (if any). This basic idea 
cascades all the way back up to the spout that emitted the root of the tuple 
tree.

This means that, when a bolt is finished with all the child emits and it calls 
ack() no ACK message will be generated (unless there were 0 child emits). The 
ack() marks the completion of all child emits for a parent tuple. The bolt will 
emit an ACK to its upstream component once all the ACKs from downstream 
components have been received.

*Operational changes:* The existing spouts and bolts don’t need any change. The 
bolt executor will need to process incoming acks from downstream bolts and send 
an ACK to its upstream component as needed. In the case of 0 child emits, ack() 
itself could immediately send the ACK to the upstream component. Field grouping 
is not applied to ACK messages.

Total ACK messages: The spout output collector will no longer send an ACK-init 
message to the ACKer bolt. Other than this, the total number of emitted ACK 

[jira] [Updated] (STORM-3314) Acker redesign

2019-01-17 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-3314:
---
Description: 
*Context:* The ACKing mechanism has come focus as one of the next major 
bottlenecks to address. The strategy to timeout and replay tuples has issues 
discussed in STORM-2359

*Basic idea:* Every bolt will send an ACK msg to its upstream spout/bolt once 
the tuples it emitted have been *fully processed* by downstream bolts.

*Determining "fully processed”* : For every incoming (parent) tuple, a bolt can 
emit 0 or more “child” tuples. the Parent tuple is considered fully processed 
once a bolt receives ACKs for all the child emits (if any). This basic idea 
cascades all the way back up to the spout that emitted the root of the tuple 
tree.

This means that, when a bolt is finished with all the child emits and it calls 
ack() no ACK message will be generated (unless there were 0 child emits). The 
ack() marks the completion of all child emits for a parent tuple. The bolt will 
emit an ACK to its upstream component once all the ACKs from downstream 
components have been received.

*Operational changes:* The existing spouts and bolts don’t need any change. The 
bolt executor will need to process incoming acks from downstream bolts and send 
an ACK to its upstream component as needed. In the case of 0 child emits, ack() 
itself could immediately send the ACK to the upstream component. Field grouping 
is not applied to ACK messages.

Total ACK messages: The spout output collector will no longer send an ACK-init 
message to the ACKer bolt. Other than this, the total number of emitted ACK 
messages does not change. Instead of the ACKs going to an ACKer bolt, they get 
spread out among the existing bolts. It appears that this mode may reduce some 
of the inter-worker traffic of ACK messages.

*Memory use:* If we use the existing XOR logic from ACKer bolt, we need about 
20 bytes per outstanding tuple-tree at each bolt. Assuming an average of say 
50k outstanding tuples at each level, we have 50k*20bytes = 1MB per bolt 
instance. There may be room to do something better than XOR, since we only need 
to track one level of outstanding emits at each bolt.

*Replay:* [needs more thinking] One option is to send REPLAY or TIMEOUT msgs 
upstream. Policy of when to emit them needs more thought. Good to avoid 
Timeouts/replays of inflight tuples under backpressure since this will lead to 
"event tsunami" at the worst possible time. Ideally, if possible, replay should 
be avoided unless tuples have been dropped. Would be nice to avoid sending 
TIMEOUT_RESET msgs upstream when under backpressure ... since they are likely 
to face backpressure as well.

On receiving an ACKs or REPLAYs from downstream components, a bolt needs to 
clears the corresponding 20 bytes tracking info.

 

*Concerns:* ACK tuples traversing upstream means it takes longer to get back to 
Spout.

 

 

*Related Note:* +Why ACKer is slow ?:+

Although lightweigh internally, the ACKer bolt has a huge impact on throughput. 
The latency hit does not appear to be as significant.

I have some thoughts around why ACKer slows down throughput. 

Consider the foll simple topo: 

{{Spout ==> Bolt1 ==> Bolt2}}

If we add an ACKer to the above topo, the Acker bolt receives 3x more incoming 
messages than the Bolt1 & Bolt2. Thats instantly a 3x hit on its throughput on 
ACKer. Additionally each spout and bolt now emits 2 msgs instead of 1 (if acker 
were absent). This slows down the spouts and bolts.

  was:
*Context:* The ACKing mechanism has come focus as one of the next major 
bottlenecks to address. The strategy to timeout and replay tuples has issues 
discussed in STORM-2359

*Basic idea:* Every bolt will send an ACK msg to its upstream spout/bolt once 
the tuples it emitted have been *fully processed* by downstream bolts.

*Determining "fully processed”* : For every incoming (parent) tuple, a bolt can 
emit 0 or more “child” tuples. the Parent tuple is considered fully processed 
once a bolt receives ACKs for all the child emits (if any). This basic idea 
cascades all the way back up to the spout that emitted the root of the tuple 
tree.

This means that, when a bolt is finished with all the child emits and it calls 
ack() no ACK message will be generated (unless there were 0 child emits). The 
ack() marks the completion of all child emits for a parent tuple. The bolt will 
emit an ACK to its upstream component once all the ACKs from downstream 
components have been received.

*Operational changes:* The existing spouts and bolts don’t need any change. The 
bolt executor will need to process incoming acks from downstream bolts and send 
an ACK to its upstream component as needed. In the case of 0 child emits, ack() 
itself could immediately send the ACK to the upstream component. Field grouping 
is not applied to ACK messages.

Total ACK messages: The 

[jira] [Updated] (STORM-3314) Acker redesign

2019-01-17 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-3314:
---
Description: 
*Context:* The ACKing mechanism has come focus as one of the next major 
bottlenecks to address. The strategy to timeout and replay tuples has issues 
discussed in STORM-2359

*Basic idea:* Every bolt will send an ACK msg to its upstream spout/bolt once 
the tuples it emitted have been *fully processed* by downstream bolts.

*Determining "fully processed”* : For every incoming (parent) tuple, a bolt can 
emit 0 or more “child” tuples. the Parent tuple is considered fully processed 
once a bolt receives ACKs for all the child emits (if any). This basic idea 
cascades all the way back up to the spout that emitted the root of the tuple 
tree.

This means that, when a bolt is finished with all the child emits and it calls 
ack() no ACK message will be generated (unless there were 0 child emits). The 
ack() marks the completion of all child emits for a parent tuple. The bolt will 
emit an ACK to its upstream component once all the ACKs from downstream 
components have been received.

*Operational changes:* The existing spouts and bolts don’t need any change. The 
bolt executor will need to process incoming acks from downstream bolts and send 
an ACK to its upstream component as needed. In the case of 0 child emits, ack() 
itself could immediately send the ACK to the upstream component. Field grouping 
is not applied to ACK messages.

Total ACK messages: The spout output collector will no longer send an ACK-init 
message to the ACKer bolt. Other than this, the total number of emitted ACK 
messages does not change. Instead of the ACKs going to an ACKer bolt, they get 
spread out among the existing bolts. It appears that this mode may reduce some 
of the inter-worker traffic of ACK messages.

*Memory use:* If we use the existing XOR logic from ACKer bolt, we need about 
20 bytes per outstanding tuple-tree at each bolt. Assuming an average of say 
50k outstanding tuples at each level, we have 50k*20bytes = 1MB per bolt 
instance. There may be room to do something better than XOR, since we only need 
to track one level of outstanding emits at each bolt.

*Replay:* [needs more thinking] One option is to send REPLAY or TIMEOUT msgs 
upstream. Policy of when to emit them needs more thought. Good to avoid 
Timeouts/replays of inflight tuples under backpressure since this will lead to 
"event tsunami" at the worst possible time. Ideally, if possible, replay should 
be avoided unless tuples have been dropped. Would be nice to avoid sending 
TIMEOUT_RESET msgs upstream when under backpressure ... since they are likely 
to face backpressure as well.

On receiving an ACKs or REPLAYs from downstream components, a bolt needs to 
clears the corresponding 20 bytes tracking info.

 

*Concerns:* ACK tuples traversing upstream means it takes longer to get back to 
Spout.

 

 

*Related Note:* Why ACKer is slow ?:

Although lightweigh internally, the ACKer bolt has a huge impact on throughput. 
The latency hit does not appear to be as significant.

I have some thoughts around why ACKer slows down throughput. 

Consider the foll simple topo: 

{{Spout ==> Bolt1 ==> Bolt2}}

If we add an ACKer to the above topo, the Acker bolt receives 3x more incoming 
messages than the Bolt1 & Bolt2. Thats instantly a 3x hit on its throughput.

  was:
*Context:* The ACKing mechanism has come focus as one of the next major 
bottlenecks to address. The strategy to timeout and replay tuples has issues 
discussed in STORM-2359

*Basic idea:* Every bolt will send an ACK msg to its upstream spout/bolt once 
the tuples it emitted have been *fully processed* by downstream bolts.

*Determining "fully processed”* : For every incoming (parent) tuple, a bolt can 
emit 0 or more “child” tuples. the Parent tuple is considered fully processed 
once a bolt receives ACKs for all the child emits (if any). This basic idea 
cascades all the way back up to the spout that emitted the root of the tuple 
tree.

This means that, when a bolt is finished with all the child emits and it calls 
ack() no ACK message will be generated (unless there were 0 child emits). The 
ack() marks the completion of all child emits for a parent tuple. The bolt will 
emit an ACK to its upstream component once all the ACKs from downstream 
components have been received.

*Operational changes:* The existing spouts and bolts don’t need any change. The 
bolt executor will need to process incoming acks from downstream bolts and send 
an ACK to its upstream component as needed. In the case of 0 child emits, ack() 
itself could immediately send the ACK to the upstream component. Field grouping 
is not applied to ACK messages.

Total ACK messages: The spout output collector will no longer send an ACK-init 
message to the ACKer bolt. Other than this, the total number of emitted ACK 
messages 

[jira] [Updated] (STORM-3314) Acker redesign

2019-01-17 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-3314:
---
Description: 
*Context:* The ACKing mechanism has come focus as one of the next major 
bottlenecks to address. The strategy to timeout and replay tuples has issues 
discussed in STORM-2359

*Basic idea:* Every bolt will send an ACK msg to its upstream spout/bolt once 
the tuples it emitted have been *fully processed* by downstream bolts.

*Determining "fully processed”* : For every incoming (parent) tuple, a bolt can 
emit 0 or more “child” tuples. the Parent tuple is considered fully processed 
once a bolt receives ACKs for all the child emits (if any). This basic idea 
cascades all the way back up to the spout that emitted the root of the tuple 
tree.

This means that, when a bolt is finished with all the child emits and it calls 
ack() no ACK message will be generated (unless there were 0 child emits). The 
ack() marks the completion of all child emits for a parent tuple. The bolt will 
emit an ACK to its upstream component once all the ACKs from downstream 
components have been received.

*Operational changes:* The existing spouts and bolts don’t need any change. The 
bolt executor will need to process incoming acks from downstream bolts and send 
an ACK to its upstream component as needed. In the case of 0 child emits, ack() 
itself could immediately send the ACK to the upstream component. Field grouping 
is not applied to ACK messages.

Total ACK messages: The spout output collector will no longer send an ACK-init 
message to the ACKer bolt. Other than this, the total number of emitted ACK 
messages does not change. Instead of the ACKs going to an ACKer bolt, they get 
spread out among the existing bolts. It appears that this mode may reduce some 
of the inter-worker traffic of ACK messages.

*Memory use:* If we use the existing XOR logic from ACKer bolt, we need about 
20 bytes per outstanding tuple-tree at each bolt. Assuming an average of say 
50k outstanding tuples at each level, we have 50k*20bytes = 1MB per bolt 
instance. There may be room to do something better than XOR, since we only need 
to track one level of outstanding emits at each bolt.

*Replay:* [needs more thinking] One option is to send REPLAY or TIMEOUT msgs 
upstream. Policy of when to emit them needs more thought. Good to avoid 
Timeouts/replays of inflight tuples under backpressure since this will lead to 
"event tsunami" at the worst possible time. Ideally, if possible, replay should 
be avoided unless tuples have been dropped. Would be nice to avoid sending 
TIMEOUT_RESET msgs upstream when under backpressure ... since they are likely 
to face backpressure as well.

On receiving an ACKs or REPLAYs from downstream components, a bolt needs to 
clears the corresponding 20 bytes tracking info.

 

*Concerns:* ACK tuples traversing upstream means it takes longer to get back to 
Spout.

 

 

*Related Note:* Why ACKer is slow ?:

Although lightweigh internally, the ACKer bolt has a huge impact on throughput. 
The latency hit does not appear to be as significant.

I have some thoughts around why ACKer slows down throughput. 

Consider the foll simple topo: 

{{Spout > Bolt1 > Bolt2}}

If we add an ACKer to the above topo, the Acker bolt receives 3x more incoming 
messages than the Bolt1 & Bolt2. Thats instantly a 3x hit on its throughput.

  was:
*Context:* The ACKing mechanism has come focus as one of the next major 
bottlenecks to address. The strategy to timeout and replay tuples has issues 
discussed in STORM-2359


*Basic idea:* Every bolt will send an ACK msg to its upstream spout/bolt once 
the tuples it emitted have been *fully processed* by downstream bolts.


*Determining "fully processed”* : For every incoming (parent) tuple, a bolt can 
emit 0 or more “child” tuples. the Parent tuple is considered fully processed 
once a bolt receives ACKs for all the child emits (if any). This basic idea 
cascades all the way back up to the spout that emitted the root of the tuple 
tree.


This means that, when a bolt is finished with all the child emits and it calls 
ack() no ACK message will be generated (unless there were 0 child emits). The 
ack() marks the completion of all child emits for a parent tuple. The bolt will 
emit an ACK to its upstream component once all the ACKs from downstream 
components have been received.


*Operational changes:* The existing spouts and bolts don’t need any change. The 
bolt executor will need to process incoming acks from downstream bolts and send 
an ACK to its upstream component as needed. In the case of 0 child emits, ack() 
itself could immediately send the ACK to the upstream component. Field grouping 
is not applied to ACK messages.

Total ACK messages: The spout output collector will no longer send an ACK-init 
message to the ACKer bolt. Other than this, the total number of emitted ACK 

[jira] [Commented] (STORM-2359) Revising Message Timeouts

2019-01-17 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/STORM-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744817#comment-16744817
 ] 

Roshan Naik commented on STORM-2359:


[~Srdo] Fyi: Just posted the initial thoughts around Acker redesign in 
STORM-3314.

> Revising Message Timeouts
> -
>
> Key: STORM-2359
> URL: https://issues.apache.org/jira/browse/STORM-2359
> Project: Apache Storm
>  Issue Type: Sub-task
>  Components: storm-core
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Stig Rohde Døssing
>Priority: Major
> Attachments: STORM-2359-with-auto-reset.ods, STORM-2359.ods
>
>
> A revised strategy for message timeouts is proposed here.
> Design Doc:
>  
> https://docs.google.com/document/d/1am1kO7Wmf17U_Vz5_uyBB2OuSsc4TZQWRvbRhX52n5w/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-3314) Acker redesign

2019-01-17 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/STORM-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744813#comment-16744813
 ] 

Roshan Naik commented on STORM-3314:


_Quoting thoughts expressed by_ [~kabhwan] _in response to above initial idea :_

 

*Essential idea:* Try to avoid sending ack tuple to another workers, given that 
it incurs serde and network cost. If we could get rid of field grouping it may 
be better.

Roshan brought a great idea about it, which is in line with my essential idea, 
so there may be a trade-off between Roshan’s and mine. My idea is inspired by 
Roshan’s idea and a bit modified to adopt nice things of his idea.

 

*Summary:* Build partial tuple tree into Acker in each worker, and let Acker in 
worker handles its own partial tuple tree. Note that we don’t change the basic 
idea of acking mechanism.

 

*Detail:*
 * When tuple is emitted from spout, it sends ACK_INIT message to Acker within 
worker.
 ** Same as current.

 
 * The logic for emitting tuple in Bolt doesn’t change.
 ** We may want to include acker ID to the tuple to send acknowledge to.
 ** Otherwise, we also could compute (and cache) it from bolt side (based on 
source task ID) to not grow size of each tuple.
 * When worker receives tuples from outside of worker, it also sends ACK_INIT 
message to Acker within worker.
 ** Acker stores additional information (acker task ID for source task’s 
worker, tuple ID of source tuple, anchors, etc.) to “send back ACK to source” 
when partial tuple tree is completed.
 ** The information may be a bit different between referring spout and 
referring another acker.
 * Whenever tuple is acknowledged in Bolt, ACK message is sent to the acker 
within worker.
 * When acker found tuple tree is completed, it sends back ACK to source acker 
based on stored information.
 ** If the source is Spout, just do as same as what we are going.

 

*Expected benefits:*
 * ACK tuples between hops within worker will not be sent to another workers, 
hence avoiding cost of serde and network transfer.
 * No field grouping.
 * It follows the path how tuples are flowing, which could be optimized by 
grouper implementation.

 

*Considerations:*
 * Each worker has to have acker task for each, which is a new requirement even 
it is by default for now.
 ** We could (and should) still turn off acker when needed, but configuration 
for acker needs to be changed to turn on/off, not acker task count.
 * Total count of ack tuples is not changed.
 ** Maybe we could avoid sending ACK tuples within worker and just calculate 
and update what acker is doing on the fly (in executor thread) if it gives more 
benefits.
 * A bit more memory is needed given that we maintain tuple tree for each 
worker (linear to worker count tuple tree has to flow), and we need additional 
information for sending back ACK to another acker.
 * Need to consider how FAIL tuple will work, and how RESET_TIMEOUT tuple will 
work.
 ** They will be going to be headache things to consider from both mine and 
Roshan’s.
 ** I’m still not sure how tuple timeout works with backpressure, even we still 
need this to get rid of old things in rotating tree.

 

*Differences between Roshan’s idea:*
 * We don’t add ACK tuples to the source task’s queue, hence keep bolt’s task 
and acker’s task separate.
 ** Two perspectives: queue, (executor) thread.
 ** It will be less coupled with backpressure. IMHO, ACK tuples and metrics 
tuples should flow regardless of backpressure ongoing.
 ** I think leaving it to separate could keep opening possibilities to apply 
more optimizations on acker, like having separate channel for ACK tuples and 
metrics tuples.
 * It uses less memory if there’re some tasks in worker: it is just what acker 
excels.
 * Less ACK tuple flowing hops when ACK tuples are flowing back, given that it 
only deals with ackers. (Yes, more hops when acking within worker though.)
 * We just need to handle rotating tree only one place (acker) for each worker, 
compared to each task.
 * Complicated than Roshan’s idea: Roshan’s idea is just clearly intuitive, and 
I can’t imagine easier solutions.

 

 

> Acker redesign
> --
>
> Key: STORM-3314
> URL: https://issues.apache.org/jira/browse/STORM-3314
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-client
>Reporter: Roshan Naik
>Priority: Major
>
> *Context:* The ACKing mechanism has come focus as one of the next major 
> bottlenecks to address. The strategy to timeout and replay tuples has issues 
> discussed in STORM-2359
> *Basic idea:* Every bolt will send an ACK msg to its upstream spout/bolt once 
> the tuples it emitted have been *fully processed* by downstream bolts.
> *Determining "fully processed”* : For every incoming (parent) tuple, a bolt 
> can emit 0 or more “child” tuples. the Parent tuple is 

[jira] [Created] (STORM-3314) Acker redesign

2019-01-17 Thread Roshan Naik (JIRA)
Roshan Naik created STORM-3314:
--

 Summary: Acker redesign
 Key: STORM-3314
 URL: https://issues.apache.org/jira/browse/STORM-3314
 Project: Apache Storm
  Issue Type: Improvement
  Components: storm-client
Reporter: Roshan Naik


*Context:* The ACKing mechanism has come focus as one of the next major 
bottlenecks to address. The strategy to timeout and replay tuples has issues 
discussed in STORM-2359


*Basic idea:* Every bolt will send an ACK msg to its upstream spout/bolt once 
the tuples it emitted have been *fully processed* by downstream bolts.


*Determining "fully processed”* : For every incoming (parent) tuple, a bolt can 
emit 0 or more “child” tuples. the Parent tuple is considered fully processed 
once a bolt receives ACKs for all the child emits (if any). This basic idea 
cascades all the way back up to the spout that emitted the root of the tuple 
tree.


This means that, when a bolt is finished with all the child emits and it calls 
ack() no ACK message will be generated (unless there were 0 child emits). The 
ack() marks the completion of all child emits for a parent tuple. The bolt will 
emit an ACK to its upstream component once all the ACKs from downstream 
components have been received.


*Operational changes:* The existing spouts and bolts don’t need any change. The 
bolt executor will need to process incoming acks from downstream bolts and send 
an ACK to its upstream component as needed. In the case of 0 child emits, ack() 
itself could immediately send the ACK to the upstream component. Field grouping 
is not applied to ACK messages.

Total ACK messages: The spout output collector will no longer send an ACK-init 
message to the ACKer bolt. Other than this, the total number of emitted ACK 
messages does not change. Instead of the ACKs going to an ACKer bolt, they get 
spread out among the existing bolts. It appears that this mode may reduce some 
of the inter-worker traffic of ACK messages.


*Memory use:* If we use the existing XOR logic from ACKer bolt, we need about 
20 bytes per outstanding tuple-tree at each bolt. Assuming an average of say 
50k outstanding tuples at each level, we have 50k*20bytes = 1MB per bolt 
instance. There may be room to do something better than XOR, since we only need 
to track one level of outstanding emits at each bolt.


*Replay:* [needs more thinking] One option is to send REPLAY or TIMEOUT msgs 
upstream. Policy of when to emit them needs more thought. Good to avoid 
Timeouts/replays of inflight tuples under backpressure since this will lead to 
"event tsunami" at the worst possible time. Ideally, if possible, replay should 
be avoided unless tuples have been dropped. Would be nice to avoid sending 
TIMEOUT_RESET msgs upstream when under backpressure ... since they are likely 
to face backpressure as well.

On receiving an ACKs or REPLAYs from downstream components, a bolt needs to 
clears the corresponding 20 bytes tracking info.

 

*Concerns:* ACK tuples traversing upstream means it takes longer to get back to 
Spout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2359) Revising Message Timeouts

2019-01-11 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/STORM-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740143#comment-16740143
 ] 

Roshan Naik commented on STORM-2359:


Got a chance to read through your posts more carefully again. Your scheme to 
keep timeout computation in ACKer by having acker send back (A-B) to the spout 
seems sound.

I am unable to grasp the timeout reset strategy though. These 2 stmts in 
particular. 

"For the inbound queue, the anchor is no longer in progress when the associated 
tuple is acked or failed. For the outbound queue (pendingEmits in the 
Executor), the anchor is no longer in progress when the associated tuple gets 
flushed from pendingEmits."

I am thinking ... if a msg is in the inputQ or pendingEmitsQ, then it is *in 
progress* and it cannot also be ACKed. Overall it sounds like you want to track 
inflight tuples at each executor... which if true... is a big memory overhead. 

On the implementation side one key concern is that in the 2.0 perf 
re-architecture (STORM-2306) a good amount of effort was spent into tuning the 
critical path by cutting back on hashmaps and eliminating locks  both of 
which are perf killers. On a very quick scan of the code, I noticed timeout 
strategy is introducing ConcurrentHashMap into the critical path... which 
introduces both locks and hashmaps in one go. 

 

Btw... Since you have reactivated this topic on ACKing... its probably worth 
bringing up  [~kabhwan] and myself were rethinking the ACKing design 
sometime back. Since the ACKer bolt has a severe perf impact in Storm core... 
we were looking at a way to do acking without the ACKer bolt. As you can 
imagine, it would also impact the timeouts as well. I can post the general idea 
in another Jira and see what you think ... either continue down this path for 
now ... and later replace with a newer model ...or investigate a newer model 
directly.

> Revising Message Timeouts
> -
>
> Key: STORM-2359
> URL: https://issues.apache.org/jira/browse/STORM-2359
> Project: Apache Storm
>  Issue Type: Sub-task
>  Components: storm-core
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Stig Rohde Døssing
>Priority: Major
> Attachments: STORM-2359-with-auto-reset.ods, STORM-2359.ods
>
>
> A revised strategy for message timeouts is proposed here.
> Design Doc:
>  
> https://docs.google.com/document/d/1am1kO7Wmf17U_Vz5_uyBB2OuSsc4TZQWRvbRhX52n5w/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2359) Revising Message Timeouts

2018-12-29 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/STORM-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730881#comment-16730881
 ] 

Roshan Naik commented on STORM-2359:


I see your point about having the timeout logic in acker. It’s a good one!. 

Sorry for delay in my response. I am traveling so unable to look into code  

Could u elaborate on the contents of the message being exhanged for Comoutung 
A-B?

> Revising Message Timeouts
> -
>
> Key: STORM-2359
> URL: https://issues.apache.org/jira/browse/STORM-2359
> Project: Apache Storm
>  Issue Type: Sub-task
>  Components: storm-core
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Stig Rohde Døssing
>Priority: Major
> Attachments: STORM-2359.ods
>
>
> A revised strategy for message timeouts is proposed here.
> Design Doc:
>  
> https://docs.google.com/document/d/1am1kO7Wmf17U_Vz5_uyBB2OuSsc4TZQWRvbRhX52n5w/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2359) Revising Message Timeouts

2018-12-20 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/STORM-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726414#comment-16726414
 ] 

Roshan Naik commented on STORM-2359:


{quote}If we were to move timeouts entirely to the acker, there would be a 
potential for "lost" tuples (ones that end up not being reemitted) if messages 
between the acker and spout get lost.
 * The spout may emit a new tuple, and try to notify the acker about it. If 
this message is lost, the acker doesn't know about the tuple, and thus can't 
time it out.
{quote}
Acker will be totally clueless if all acks from downstream bolts are also lost 
for the same tuple tree.

 
{quote}We can move the timeout logic to the acker,
{quote}
Alternatives :
1) Handle timeouts in SpoutExecutor. It sends a timeout msg to ACKer. 
*+Benefit+*: May not be any better than doing in ACKer. Do you have any 
thoughts about this ?

2)  Eliminate timeout communication between Spout & ACK. Let each does its own 
timeout. *+Benefit+*: Eliminates need for:
 * Periodic sync up msgs (also avoid possibility of latency spikes in multi 
worker mode when these msgs get large).
 * A - B calculation, as well as
 * timeout msg exchanges.

TimeoutReset msgs can still be supported.

 -roshan

 

 

> Revising Message Timeouts
> -
>
> Key: STORM-2359
> URL: https://issues.apache.org/jira/browse/STORM-2359
> Project: Apache Storm
>  Issue Type: Sub-task
>  Components: storm-core
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Stig Rohde Døssing
>Priority: Major
> Attachments: STORM-2359.ods
>
>
> A revised strategy for message timeouts is proposed here.
> Design Doc:
>  
> https://docs.google.com/document/d/1am1kO7Wmf17U_Vz5_uyBB2OuSsc4TZQWRvbRhX52n5w/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (STORM-2359) Revising Message Timeouts

2018-12-20 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/STORM-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726414#comment-16726414
 ] 

Roshan Naik edited comment on STORM-2359 at 12/21/18 3:49 AM:
--

{quote}If we were to move timeouts entirely to the acker, there would be a 
potential for "lost" tuples (ones that end up not being reemitted) if messages 
between the acker and spout get lost.
 * The spout may emit a new tuple, and try to notify the acker about it. If 
this message is lost, the acker doesn't know about the tuple, and thus can't 
time it out.{quote}
Acker will be totally clueless if all acks from downstream bolts are also lost 
for the same tuple tree.

 
{quote}We can move the timeout logic to the acker,
{quote}
 

*Alternatives* :
 1) Handle timeouts in SpoutExecutor. It sends a timeout msg to ACKer. 
*+Benefit+*: May not be any better than doing in ACKer. Do you have any 
thoughts about this ?

2)  Eliminate timeout communication between Spout & ACK. Let each does its own 
timeout. *+Benefit+*: Eliminates need for:
 * Periodic sync up msgs (also avoid possibility of latency spikes in multi 
worker mode when these msgs get large).
 * A - B calculation, as well as
 * timeout msg exchanges.

TimeoutReset msgs can still be supported.

 


was (Author: roshan_naik):
{quote}If we were to move timeouts entirely to the acker, there would be a 
potential for "lost" tuples (ones that end up not being reemitted) if messages 
between the acker and spout get lost.
 * The spout may emit a new tuple, and try to notify the acker about it. If 
this message is lost, the acker doesn't know about the tuple, and thus can't 
time it out.
{quote}
Acker will be totally clueless if all acks from downstream bolts are also lost 
for the same tuple tree.

 
{quote}We can move the timeout logic to the acker,
{quote}
Alternatives :
1) Handle timeouts in SpoutExecutor. It sends a timeout msg to ACKer. 
*+Benefit+*: May not be any better than doing in ACKer. Do you have any 
thoughts about this ?

2)  Eliminate timeout communication between Spout & ACK. Let each does its own 
timeout. *+Benefit+*: Eliminates need for:
 * Periodic sync up msgs (also avoid possibility of latency spikes in multi 
worker mode when these msgs get large).
 * A - B calculation, as well as
 * timeout msg exchanges.

TimeoutReset msgs can still be supported.

 -roshan

 

 

> Revising Message Timeouts
> -
>
> Key: STORM-2359
> URL: https://issues.apache.org/jira/browse/STORM-2359
> Project: Apache Storm
>  Issue Type: Sub-task
>  Components: storm-core
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Stig Rohde Døssing
>Priority: Major
> Attachments: STORM-2359.ods
>
>
> A revised strategy for message timeouts is proposed here.
> Design Doc:
>  
> https://docs.google.com/document/d/1am1kO7Wmf17U_Vz5_uyBB2OuSsc4TZQWRvbRhX52n5w/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-2306) Redesign Messaging Subsystem, switch to JCTools Queues and introduce new Backpressure model

2018-09-25 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-2306:
---
Summary: Redesign Messaging Subsystem, switch to JCTools Queues and 
introduce new Backpressure model  (was: Redesign Messaging Subsystem and switch 
to JCTools Queues)

> Redesign Messaging Subsystem, switch to JCTools Queues and introduce new 
> Backpressure model
> ---
>
> Key: STORM-2306
> URL: https://issues.apache.org/jira/browse/STORM-2306
> Project: Apache Storm
>  Issue Type: Sub-task
>  Components: storm-core
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 61h 50m
>  Remaining Estimate: 0h
>
> Details in these documents:
> 1) *Redesign of the messaging subsystem*
> https://docs.google.com/document/d/1NK1DJ3aAkta-Im0m-2FObQ4cSRp8xSa301y6zoqcBeE/edit?usp=sharing
> This doc discusses the new design for the messaging system. Plus some of the 
> optimizations being made.
> 2) *Choosing a high performance messaging queue:*
> https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing
> This doc looks into how fast hardware can do inter-thread messaging and why 
> we chose the JCTools queues.
> 3) *Backpressure Model*
> https://docs.google.com/document/d/1Z9pRdI5wtnK-hVwE3Spe6VGCTsz9g8TkgxbTFcbL3jM/edit?usp=sharing
> Describes the Backpressure model integrated into the new messaging subsystem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-3205) Optimization in TuplImpl

2018-09-14 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-3205:
---
Fix Version/s: 2.0.0

> Optimization in TuplImpl
> 
>
> Key: STORM-3205
> URL: https://issues.apache.org/jira/browse/STORM-3205
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-client
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Wrapping {{TuplImpl.values}} with Collections.unmodifiableList() turns out be 
> very expensive. Its intention is obviously to check and prevent accidental 
> tweaking TuplImpl once created. Given the high cost, if needed, we can limit 
> this extra checking mechanism in debug/dev mode. Being in the critical path 
> it means several thousand/million additional allocations per second of the 
> List wrapper object  proportional to the number of bolt/spout instances.
> *TVL :*
> | |throughput (k/sec)|cores|mem (mb)|
> |*master (#f5a410ba3)*|412 |4.26|103|
> |*storm-3205*|547  (+33%)|4.09|132|
> {{+cmd:+ bin/storm jar topos/storm-loadgen-2.0.0-SNAPSHOT.jar 
> org.apache.storm.loadgen.ThroughputVsLatency *--rate 55* --spouts 1 
> --splitters 3 --counters 2 -c topology.acker.executors=0}}
> *+ConstSpoutIdentityBoltNullBolt :+*
> | |throughput|
> |*master*|4.25 mill/sec|
> |*storm-3205*|5.4 mill/sec (+27%)|
> +cmd+: {{bin/storm jar topos/storm-perf-2.0.0-SNAPSHOT.jar 
> org.apache.storm.perf.ConstSpoutIdBoltNullBoltTopo -c 
> topology.acker.executors=0 -c topology.producer.batch.size=1000 400}}
> *Note:* The perf gains are more evident when operating at high thoughputs w/o 
> backpressure occurring (i.e. some bolts have not yet become a bottleneck)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-3205) Optimization in TuplImpl

2018-09-14 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/STORM-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614579#comment-16614579
 ] 

Roshan Naik commented on STORM-3205:


*Raw TVL readings:*

+MASTER: reading 1+

{code}
 start_time(s)end_time(s)  rate(tuple/s)   mean(ms) 99%ile(ms)   
99.9%ile(ms)  coresmem(MB) 
 0 30363,105.633552.793844.104  
  853.017  3.804174.941 
30 60447,122.200  2,142.178  5,695.865  
5,804.917  4.550 51.488 
60 90416,901.400  9,418.863 13,019.120 
13,086.228  4.414155.132 
90120420,867.467 16,801.866 20,099.105 
20,149.436  4.352100.409 
   120150429,340.700 23,451.448 26,575.110 
26,742.882  4.412105.838 
   150180427,491.233 30,118.689 33,386.660 
33,436.991  4.417145.061 
   180210410,376.700 37,208.048 41,003.516 
41,070.625  4.390139.888 
   210240420,193.767 44,735.209 48,083.501 
48,184.164  4.392 97.299 
   240270424,774.967 51,567.712 54,962.160 
55,029.268  4.412127.620 
{code}

+MASTER: reading2+
{code} 
 start_time(s)end_time(s)  rate(tuple/s)   mean(ms) 99%ile(ms)   
99.9%ile(ms)  coresmem(MB) 
 0 30352,283.467666.673988.807  
  996.147  3.833131.411 
30 60452,429.567  2,457.883  5,981.078  
6,106.907  4.825128.944 
60 90389,890.233 10,536.345 14,738.784 
14,856.225  4.246145.399 
90120399,513.033 19,105.363 22,917.677 
23,068.672  4.138135.547 
   120150412,216.733 26,820.408 30,450.647 
30,601.642  4.195 55.722 
   150180415,783.600 34,411.969 37,882.954 
37,950.063  4.235 95.921 
   180210428,794.100 41,234.477 44,493.177 
44,560.286  4.383 71.357 
   210240427,405.133 47,840.604 51,204.063 
51,271.172  4.349112.849 
{code}





+STORM-3205 : TVL reading1+
{code}
 start_time(s)end_time(s)  rate(tuple/s)   mean(ms) 99%ile(ms)   
99.9%ile(ms)  coresmem(MB)   
 0 30365,140.233389.399851.444  
  865.599  3.733112.723   
30 60551,828.200  2.096 58.917  
  109.838  4.593 68.210   
60 90550,123.200  0.658  4.268  
6.423  4.433107.547   
90120550,244.433  0.567  1.696  
6.160  4.146113.571   
   120150550,271.567  0.547  1.245  
3.652  4.170144.445   
   150180550,279.300  1.077 18.792  
   25.756  4.229112.253   
   180210550,234.733  0.529  1.136  
1.579  4.069136.361   
   210240550,241.433  0.530  1.150  
1.587  4.087160.657   
{code}


+STORM-3205 : TVL reading2+
{code}
 start_time(s)end_time(s)  rate(tuple/s)   mean(ms) 99%ile(ms)   
99.9%ile(ms)  coresmem(MB)  
 0 30366,992.967280.818569.377  
  575.144  3.693155.890  
30 60550,076.200  0.556  1.385  
5.751  4.023163.471  
60 91532,387.032  0.555  1.333  
5.321  3.958142.208  
91121550,137.600  0.545  1.209  
4.522  4.024120.298  
   121151550,156.633  0.569  1.647  
5.382  4.210 87.505  
   151181550,222.900  0.574  1.808  
5.730  4.176169.807  
   181211550,226.400  0.535  1.202  
1.917  4.133154.749  
   211241550,156.300  0.530  1.150  
1.644  4.052117.847  
{code}


> Optimization in TuplImpl
> 

[jira] [Updated] (STORM-3205) Optimization in TuplImpl

2018-09-14 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-3205:
---
Description: 
Wrapping {{TuplImpl.values}} with Collections.unmodifiableList() turns out be 
very expensive. Its intention is obviously to check and prevent accidental 
tweaking TuplImpl once created. Given the high cost we can limit this extra 
checking mechanism in debug/dev mode. Due to this allocation being in the 
critical path it means several thousand/million additional allocations per 
second of the List wrapper object  proportional to the number of bolt/spout 
instances.

*TVL :*
| |throughput (k/sec)|cores|mem (mb)|
|*master (#f5a410ba3)*|412 |4.26|103|
|*storm-3205*|547  (+33%)|4.09|132|

{{+cmd:+ bin/storm jar topos/storm-loadgen-2.0.0-SNAPSHOT.jar 
org.apache.storm.loadgen.ThroughputVsLatency *--rate 55* --spouts 1 
--splitters 3 --counters 2 -c topology.acker.executors=0}}
{{  }}

*+ConstSpoutIdentityBoltNullBolt :+*
| |throughput|
|*master*|4.25 mill/sec|
|*storm-3205*|5.4 mill/sec (+27%)|

{{ }}
{{ +cmd+: bin/storm jar topos/storm-perf-2.0.0-SNAPSHOT.jar 
org.apache.storm.perf.ConstSpoutIdBoltNullBoltTopo -c 
topology.acker.executors=0 -c topology.producer.batch.size=1000 400}}
{{  }}

*Note:* The perf gains are more evident when operating at high thoughputs w/o 
backpressure occurring (i.e. some bolts have not yet become a bottleneck)

  was:
Wrapping {{TuplImpl.values}} with Collections.unmodifiableList() turns out be 
very expensive. Its intention is obviously to check and prevent accidental 
tweaking TuplImpl once created. Given the high cost we can limit this extra 
checking mechanism in debug/dev mode. Due to this allocation being in the 
critical path it means several thousand/million additional allocations per 
second of the List wrapper object  proportional to the number of bolt/spout 
instances.

*TVL :*
| |throughput (k/sec)|cores|mem (mb)|
|*master (#f5a410ba3)*|412 |4.26|103|
|*storm-3205*|547  (+33%)|4.09|132|

+cmd:+ bin/storm jar topos/storm-loadgen-2.0.0-SNAPSHOT.jar 
org.apache.storm.loadgen.ThroughputVsLatency *--rate 55* --spouts 1 
--splitters 3 --counters 2 -c topology.acker.executors=0
  

*+ConstSpoutIdentityBoltNullBolt :+*
| |throughput|
|*master*|4.25 mill/sec|
|*storm-3205*|5.4 mill/sec (+27%)|

 
 +cmd+: bin/storm jar topos/storm-perf-2.0.0-SNAPSHOT.jar 
org.apache.storm.perf.ConstSpoutIdBoltNullBoltTopo -c 
topology.acker.executors=0 -c topology.producer.batch.size=1000 400
  

Note: The perf gains are more evident when operating at high thoughputs w/o 
backpressure occurring (i.e. some bolts have not yet become a bottleneck)


> Optimization in TuplImpl
> 
>
> Key: STORM-3205
> URL: https://issues.apache.org/jira/browse/STORM-3205
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-client
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>Priority: Major
>
> Wrapping {{TuplImpl.values}} with Collections.unmodifiableList() turns out be 
> very expensive. Its intention is obviously to check and prevent accidental 
> tweaking TuplImpl once created. Given the high cost we can limit this extra 
> checking mechanism in debug/dev mode. Due to this allocation being in the 
> critical path it means several thousand/million additional allocations per 
> second of the List wrapper object  proportional to the number of 
> bolt/spout instances.
> *TVL :*
> | |throughput (k/sec)|cores|mem (mb)|
> |*master (#f5a410ba3)*|412 |4.26|103|
> |*storm-3205*|547  (+33%)|4.09|132|
> {{+cmd:+ bin/storm jar topos/storm-loadgen-2.0.0-SNAPSHOT.jar 
> org.apache.storm.loadgen.ThroughputVsLatency *--rate 55* --spouts 1 
> --splitters 3 --counters 2 -c topology.acker.executors=0}}
> {{  }}
> *+ConstSpoutIdentityBoltNullBolt :+*
> | |throughput|
> |*master*|4.25 mill/sec|
> |*storm-3205*|5.4 mill/sec (+27%)|
> {{ }}
> {{ +cmd+: bin/storm jar topos/storm-perf-2.0.0-SNAPSHOT.jar 
> org.apache.storm.perf.ConstSpoutIdBoltNullBoltTopo -c 
> topology.acker.executors=0 -c topology.producer.batch.size=1000 400}}
> {{  }}
> *Note:* The perf gains are more evident when operating at high thoughputs w/o 
> backpressure occurring (i.e. some bolts have not yet become a bottleneck)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-3205) Optimization in TuplImpl

2018-09-14 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-3205:
---
Description: 
Wrapping {{TuplImpl.values}} with Collections.unmodifiableList() turns out be 
very expensive. Its intention is obviously to check and prevent accidental 
tweaking TuplImpl once created. Given the high cost we can limit this extra 
checking mechanism in debug/dev mode. Due to this allocation being in the 
critical path it means several thousand/million additional allocations per 
second of the List wrapper object  proportional to the number of bolt/spout 
instances.

*TVL :*
| |throughput (k/sec)|cores|mem (mb)|
|*master (#f5a410ba3)*|412 |4.26|103|
|*storm-3205*|547  (+33%)|4.09|132|

+cmd:+ bin/storm jar topos/storm-loadgen-2.0.0-SNAPSHOT.jar 
org.apache.storm.loadgen.ThroughputVsLatency *--rate 55* --spouts 1 
--splitters 3 --counters 2 -c topology.acker.executors=0
  

*+ConstSpoutIdentityBoltNullBolt :+*
| |throughput|
|*master*|4.25 mill/sec|
|*storm-3205*|5.4 mill/sec (+27%)|

 
 +cmd+: bin/storm jar topos/storm-perf-2.0.0-SNAPSHOT.jar 
org.apache.storm.perf.ConstSpoutIdBoltNullBoltTopo -c 
topology.acker.executors=0 -c topology.producer.batch.size=1000 400
  

Note: The perf gains are more evident when operating at high thoughputs w/o 
backpressure occurring (i.e. some bolts have not yet become a bottleneck)

  was:
Wrapping {{TuplImpl.values}} with Collections.unmodifiableList() turns out be 
very expensive. Its intention is obviously to check and prevent accidental 
tweaking TuplImpl once created. Given the potentially high cost we can limit 
this extra checking mechanism in debug/dev mode. Due to this allocation being 
in the critical path it means several thousand/million additional allocations 
per second of the List wrapper object  proportional to the number of 
bolt/spout instances.

*TVL :*
| |throughput (k/sec)|cores|mem (mb)|
|*master (#f5a410ba3)*|412 |4.26|103|
|*storm-3205*|547  (+33%)|4.09|132|

+cmd:+ bin/storm jar topos/storm-loadgen-2.0.0-SNAPSHOT.jar 
org.apache.storm.loadgen.ThroughputVsLatency *--rate 55* --spouts 1 
--splitters 3 --counters 2 -c topology.acker.executors=0
  

*+ConstSpoutIdentityBoltNullBolt :+*
| |throughput|
|*master*|4.25 mill/sec|
|*storm-3205*|5.4 mill/sec (+27%)|

 
 +cmd+: bin/storm jar topos/storm-perf-2.0.0-SNAPSHOT.jar 
org.apache.storm.perf.ConstSpoutIdBoltNullBoltTopo -c 
topology.acker.executors=0 -c topology.producer.batch.size=1000 400
  

Note: The perf gains are more evident when operating at high thoughputs w/o 
backpressure occurring (i.e. some bolts have not yet become a bottleneck)


> Optimization in TuplImpl
> 
>
> Key: STORM-3205
> URL: https://issues.apache.org/jira/browse/STORM-3205
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-client
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>Priority: Major
>
> Wrapping {{TuplImpl.values}} with Collections.unmodifiableList() turns out be 
> very expensive. Its intention is obviously to check and prevent accidental 
> tweaking TuplImpl once created. Given the high cost we can limit this extra 
> checking mechanism in debug/dev mode. Due to this allocation being in the 
> critical path it means several thousand/million additional allocations per 
> second of the List wrapper object  proportional to the number of 
> bolt/spout instances.
> *TVL :*
> | |throughput (k/sec)|cores|mem (mb)|
> |*master (#f5a410ba3)*|412 |4.26|103|
> |*storm-3205*|547  (+33%)|4.09|132|
> +cmd:+ bin/storm jar topos/storm-loadgen-2.0.0-SNAPSHOT.jar 
> org.apache.storm.loadgen.ThroughputVsLatency *--rate 55* --spouts 1 
> --splitters 3 --counters 2 -c topology.acker.executors=0
>   
> *+ConstSpoutIdentityBoltNullBolt :+*
> | |throughput|
> |*master*|4.25 mill/sec|
> |*storm-3205*|5.4 mill/sec (+27%)|
>  
>  +cmd+: bin/storm jar topos/storm-perf-2.0.0-SNAPSHOT.jar 
> org.apache.storm.perf.ConstSpoutIdBoltNullBoltTopo -c 
> topology.acker.executors=0 -c topology.producer.batch.size=1000 400
>   
> Note: The perf gains are more evident when operating at high thoughputs w/o 
> backpressure occurring (i.e. some bolts have not yet become a bottleneck)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-3205) Optimization in TuplImpl

2018-09-14 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-3205:
---
Description: 
Wrapping {{TuplImpl.values}} with Collections.unmodifiableList() turns out be 
very expensive. Its intention is obviously to check and prevent accidental 
tweaking TuplImpl once created. Given the potentially high cost we can limit 
this extra checking mechanism in debug/dev mode. Due to this allocation being 
in the critical path it means several thousand/million additional allocations 
per second of the List wrapper object  proportional to the number of 
bolt/spout instances.

*TVL :*
| |throughput (k/sec)|cores|mem (mb)|
|*master (#f5a410ba3)*|412 |4.26|103|
|*storm-3205*|547  (+33%)|4.09|132|

+cmd:+ bin/storm jar topos/storm-loadgen-2.0.0-SNAPSHOT.jar 
org.apache.storm.loadgen.ThroughputVsLatency *--rate 55* --spouts 1 
--splitters 3 --counters 2 -c topology.acker.executors=0
  

*+ConstSpoutIdentityBoltNullBolt :+*
| |throughput|
|*master*|4.25 mill/sec|
|*storm-3205*|5.4 mill/sec (+27%)|

 
 +cmd+: bin/storm jar topos/storm-perf-2.0.0-SNAPSHOT.jar 
org.apache.storm.perf.ConstSpoutIdBoltNullBoltTopo -c 
topology.acker.executors=0 -c topology.producer.batch.size=1000 400
  

Note: The perf gains are more evident when operating at high thoughputs w/o 
backpressure occurring (i.e. some bolts have not yet become a bottleneck)

> Optimization in TuplImpl
> 
>
> Key: STORM-3205
> URL: https://issues.apache.org/jira/browse/STORM-3205
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-client
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>Priority: Major
>
> Wrapping {{TuplImpl.values}} with Collections.unmodifiableList() turns out be 
> very expensive. Its intention is obviously to check and prevent accidental 
> tweaking TuplImpl once created. Given the potentially high cost we can limit 
> this extra checking mechanism in debug/dev mode. Due to this allocation being 
> in the critical path it means several thousand/million additional allocations 
> per second of the List wrapper object  proportional to the number of 
> bolt/spout instances.
> *TVL :*
> | |throughput (k/sec)|cores|mem (mb)|
> |*master (#f5a410ba3)*|412 |4.26|103|
> |*storm-3205*|547  (+33%)|4.09|132|
> +cmd:+ bin/storm jar topos/storm-loadgen-2.0.0-SNAPSHOT.jar 
> org.apache.storm.loadgen.ThroughputVsLatency *--rate 55* --spouts 1 
> --splitters 3 --counters 2 -c topology.acker.executors=0
>   
> *+ConstSpoutIdentityBoltNullBolt :+*
> | |throughput|
> |*master*|4.25 mill/sec|
> |*storm-3205*|5.4 mill/sec (+27%)|
>  
>  +cmd+: bin/storm jar topos/storm-perf-2.0.0-SNAPSHOT.jar 
> org.apache.storm.perf.ConstSpoutIdBoltNullBoltTopo -c 
> topology.acker.executors=0 -c topology.producer.batch.size=1000 400
>   
> Note: The perf gains are more evident when operating at high thoughputs w/o 
> backpressure occurring (i.e. some bolts have not yet become a bottleneck)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-3205) Optimization in TuplImpl

2018-09-14 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-3205:
---
Summary: Optimization in TuplImpl  (was: Optimization in critical path)

> Optimization in TuplImpl
> 
>
> Key: STORM-3205
> URL: https://issues.apache.org/jira/browse/STORM-3205
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-client
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (STORM-3205) Optimization in critical path

2018-09-13 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik reassigned STORM-3205:
--

Assignee: Roshan Naik

> Optimization in critical path
> -
>
> Key: STORM-3205
> URL: https://issues.apache.org/jira/browse/STORM-3205
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-client
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-3205) Optimization in critical path

2018-08-28 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-3205:
---
Component/s: storm-client

> Optimization in critical path
> -
>
> Key: STORM-3205
> URL: https://issues.apache.org/jira/browse/STORM-3205
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-client
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-3205) Optimization in critical path

2018-08-28 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-3205:
---
Affects Version/s: 2.0.0

> Optimization in critical path
> -
>
> Key: STORM-3205
> URL: https://issues.apache.org/jira/browse/STORM-3205
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-client
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (STORM-3205) Optimization in critical path

2018-08-28 Thread Roshan Naik (JIRA)
Roshan Naik created STORM-3205:
--

 Summary: Optimization in critical path
 Key: STORM-3205
 URL: https://issues.apache.org/jira/browse/STORM-3205
 Project: Apache Storm
  Issue Type: Improvement
Reporter: Roshan Naik






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2947) Review and fix/remove deprecated things in Storm 2.0.0

2018-07-14 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/STORM-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544325#comment-16544325
 ] 

Roshan Naik commented on STORM-2947:


At a bare minimum I think we should document the setting noted in the earlier 
comment that are no longer applicable in  2.0 release. 

If thats ok .. let me know... i can go ahead and pick this up. 

> Review and fix/remove deprecated things in Storm 2.0.0
> --
>
> Key: STORM-2947
> URL: https://issues.apache.org/jira/browse/STORM-2947
> Project: Apache Storm
>  Issue Type: Task
>  Components: storm-client, storm-hdfs, storm-kafka, storm-server, 
> storm-solr
>Affects Versions: 2.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> We've been deprecating the things but haven't have time to replace/get rid of 
> them. It should be better if we have time to review and address them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-3100) Minor optimization: Replace HashMap with an array backed data structure for faster lookups

2018-07-09 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/STORM-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538051#comment-16538051
 ] 

Roshan Naik commented on STORM-3100:


fine with me.

> Minor optimization: Replace HashMap with an array backed data 
> structure for faster lookups
> --
>
> Key: STORM-3100
> URL: https://issues.apache.org/jira/browse/STORM-3100
> Project: Apache Storm
>  Issue Type: Improvement
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> * Introduce _CustomIndexArray_: An array backed data structure to speedup 
> HashMap use cases *in critical path*. It needs to supported -ve 
> indexing and a user defined (on construction) Upper and Lower Index range. 
> Does not need to be dynamically resizable given the nature of use cases we 
> have.
>  * Use this data structure for _GeneralTopologyContext._taskToComponent_ 
> mapping which is looked up in the critical path _Task.getOutgoingTasks._ This 
> lookup happens at least once for every emit and consequently can happen 
> millions of times per second.
>  * Also use this for _JCQueue.localReceiveQueues_ where the basic idea is 
> already in use but not in a reusable manner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (STORM-3100) Minor optimization: Replace HashMap with an array backed data structure for faster lookups

2018-06-10 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik reassigned STORM-3100:
--

Assignee: Roshan Naik

> Minor optimization: Replace HashMap with an array backed data 
> structure for faster lookups
> --
>
> Key: STORM-3100
> URL: https://issues.apache.org/jira/browse/STORM-3100
> Project: Apache Storm
>  Issue Type: Improvement
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> * Introduce _CustomIndexArray_: An array backed data structure to speedup 
> HashMap use cases *in critical path*. It needs to supported -ve 
> indexing and a user defined (on construction) Upper and Lower Index range. 
> Does not need to be dynamically resizable given the nature of use cases we 
> have.
>  * Use this data structure for _GeneralTopologyContext._taskToComponent_ 
> mapping which is looked up in the critical path _Task.getOutgoingTasks._ This 
> lookup happens at least once for every emit and consequently can happen 
> millions of times per second.
>  * Also use this for _JCQueue.localReceiveQueues_ where the basic idea is 
> already in use but not in a reusable manner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-3100) Minor optimization: Replace HashMap with an array backed data structure for faster lookups

2018-06-10 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-3100:
---
Description: 
* Introduce _CustomIndexArray_: An array backed data structure to speedup 
HashMap use cases *in critical path*. It needs to supported -ve 
indexing and a user defined (on construction) Upper and Lower Index range. Does 
not need to be dynamically resizable given the nature of use cases we have.
 * Use this data structure for _GeneralTopologyContext._taskToComponent_ 
mapping which is looked up in the critical path _Task.getOutgoingTasks._ This 
lookup happens at least once for every emit and consequently can happen 
millions of times per second.
 * Also use this for _JCQueue.localReceiveQueues_ where the basic idea is 
already in use but not in a reusable manner.

  was:
* Introduce _CustomIndexArray_: An array backed data structure to replace 
HashMap use cases. So it needs to supported -ve indexing. Does not 
need to be dynamically resizable given the nature of use cases we have. Upper 
and lower Index range needs to be specified at construction time.
 * Use this data structure for _GeneralTopologyContext._taskToComponent_ 
mapping which is looked up in the critical path _Task.getOutgoingTasks._ This 
lookup happens at least once for every emit and consequently can happen 
millions of times per second.
 * Also use this for _JCQueue.localReceiveQueues_ where the basic idea is 
already in use but not in a reusable manner.


> Minor optimization: Replace HashMap with an array backed data 
> structure for faster lookups
> --
>
> Key: STORM-3100
> URL: https://issues.apache.org/jira/browse/STORM-3100
> Project: Apache Storm
>  Issue Type: Improvement
>Reporter: Roshan Naik
>Priority: Major
>
> * Introduce _CustomIndexArray_: An array backed data structure to speedup 
> HashMap use cases *in critical path*. It needs to supported -ve 
> indexing and a user defined (on construction) Upper and Lower Index range. 
> Does not need to be dynamically resizable given the nature of use cases we 
> have.
>  * Use this data structure for _GeneralTopologyContext._taskToComponent_ 
> mapping which is looked up in the critical path _Task.getOutgoingTasks._ This 
> lookup happens at least once for every emit and consequently can happen 
> millions of times per second.
>  * Also use this for _JCQueue.localReceiveQueues_ where the basic idea is 
> already in use but not in a reusable manner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-3100) Minor optimization: Replace HashMap with an array backed data structure for faster lookups

2018-06-10 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-3100:
---
Summary: Minor optimization: Replace HashMap with an array 
backed data structure for faster lookups  (was: Minor optimization: Replace 
HashMap in critical path with an array backed data structure for 
faster lookups)

> Minor optimization: Replace HashMap with an array backed data 
> structure for faster lookups
> --
>
> Key: STORM-3100
> URL: https://issues.apache.org/jira/browse/STORM-3100
> Project: Apache Storm
>  Issue Type: Improvement
>Reporter: Roshan Naik
>Priority: Major
>
> * Introduce _CustomIndexArray_: An array backed data structure to replace 
> HashMap use cases. So it needs to supported -ve indexing. Does 
> not need to be dynamically resizable given the nature of use cases we have. 
> Upper and lower Index range needs to be specified at construction time.
>  * Use this data structure for _GeneralTopologyContext._taskToComponent_ 
> mapping which is looked up in the critical path _Task.getOutgoingTasks._ This 
> lookup happens at least once for every emit and consequently can happen 
> millions of times per second.
>  * Also use this for _JCQueue.localReceiveQueues_ where the basic idea is 
> already in use but not in a reusable manner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (STORM-3100) Minor optimization: Replace HashMap in critical path with an array backed data structure for faster lookups

2018-06-10 Thread Roshan Naik (JIRA)
Roshan Naik created STORM-3100:
--

 Summary: Minor optimization: Replace HashMap in 
critical path with an array backed data structure for faster lookups
 Key: STORM-3100
 URL: https://issues.apache.org/jira/browse/STORM-3100
 Project: Apache Storm
  Issue Type: Improvement
Reporter: Roshan Naik


* Introduce _CustomIndexArray_: An array backed data structure to replace 
HashMap use cases. So it needs to supported -ve indexing. Does not 
need to be dynamically resizable given the nature of use cases we have. Upper 
and lower Index range needs to be specified at construction time.
 * Use this data structure for _GeneralTopologyContext._taskToComponent_ 
mapping which is looked up in the critical path _Task.getOutgoingTasks._ This 
lookup happens at least once for every emit and consequently can happen 
millions of times per second.
 * Also use this for _JCQueue.localReceiveQueues_ where the basic idea is 
already in use but not in a reusable manner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (STORM-2983) Some topologies not working properly

2018-04-05 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427559#comment-16427559
 ] 

Roshan Naik edited comment on STORM-2983 at 4/5/18 8:44 PM:


Thanks [~revans2] for those clarifications.

Thanks [~ethanli] for the revised fix.

Similar fixes are needed elsewhere as well, since they are all probably broken:
 - 
[https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/StormCommon.java#L276]

 - 
[https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/StormCommon.java#L363-L364]
 
 - 
[https://github.com/apache/storm/blob/master/examples/storm-loadgen/src/main/java/org/apache/storm/loadgen/CaptureLoad.java#L109-L112]

 - 
[https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/worker/Worker.java#L306]
  

 

 


was (Author: roshan_naik):
Thanks [~revans2] for those clarifications.

Thanks [~ethanli] for the revised fix.

Similar fixes are needed elsewhere as well, since they are all probably broken:

- 
[https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/StormCommon.java#L276]

- 
[https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/StormCommon.java#L363-L364]
 

- 
[https://github.com/apache/storm/blob/master/examples/storm-loadgen/src/main/java/org/apache/storm/loadgen/CaptureLoad.java#L109-L112]

- 
[https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/worker/Worker.java#L306]
  

 

 

> Some topologies not working properly 
> -
>
> Key: STORM-2983
> URL: https://issues.apache.org/jira/browse/STORM-2983
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Ethan Li
>Assignee: Ethan Li
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For example,
> {code:java}
> bin/storm jar storm-loadgen-*.jar 
> org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 
> --counters 1 -c topology.debug=true
> {code}
> on ResourceAwareScheduler not working properly.
> With default cluster settings, there will be only one __acker-executor and it 
> will be on a separate worker. And it looks like the __acker-executor was not 
> able to receive messages from spouts and bolts. And spouts and bolts 
> continued to retry sending messages to acker. It then led to another problem:
> STORM-2970
> I tried to run on storm right before 
> [https://github.com/apache/storm/pull/2502] and right after and confirmed 
> that this bug should be related to it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2983) Some topologies not working properly

2018-04-05 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427559#comment-16427559
 ] 

Roshan Naik commented on STORM-2983:


Thanks [~revans2] for those clarifications.

Thanks [~ethanli] for the revised fix.

Similar fixes are needed elsewhere as well, since they are all probably broken:

- 
[https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/StormCommon.java#L276]

- 
[https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/StormCommon.java#L363-L364]
 

- 
[https://github.com/apache/storm/blob/master/examples/storm-loadgen/src/main/java/org/apache/storm/loadgen/CaptureLoad.java#L109-L112]

- 
[https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/worker/Worker.java#L306]
  

 

 

> Some topologies not working properly 
> -
>
> Key: STORM-2983
> URL: https://issues.apache.org/jira/browse/STORM-2983
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Ethan Li
>Assignee: Ethan Li
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> For example,
> {code:java}
> bin/storm jar storm-loadgen-*.jar 
> org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 
> --counters 1 -c topology.debug=true
> {code}
> on ResourceAwareScheduler not working properly.
> With default cluster settings, there will be only one __acker-executor and it 
> will be on a separate worker. And it looks like the __acker-executor was not 
> able to receive messages from spouts and bolts. And spouts and bolts 
> continued to retry sending messages to acker. It then led to another problem:
> STORM-2970
> I tried to run on storm right before 
> [https://github.com/apache/storm/pull/2502] and right after and confirmed 
> that this bug should be related to it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (STORM-2983) Some topologies not working properly

2018-04-04 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425223#comment-16425223
 ] 

Roshan Naik edited comment on STORM-2983 at 4/4/18 9:15 AM:


Mutability of topology configuration is explicitly allowed in the documentation 
(previously referenced).

 

Usefulness of single worker use case should not be under-estimated. For perf 
critical topos it is a reasonable option to consider deploying multiple single 
worker instances of the topo as opposed to a single multiworker instance to 
avoid interworker communication. I have recommended users on mailing list to 
consider it (for things like video processing) and I have seen others recommend 
users to consider 1worker mode as well.

 

Given the fact that RA creates the divergence in specification v/s reality, 
confusion is bound to exist no matter which we resolve this value of this 
setting. Although IMO it is not a strong enough case to not reflect actual 
worker count ... if it is contentious ... I am ok to resolve it either way  
 as long we provide an API for so that code can check.  Currently I am aware of 
a few additional use cases for knowing worker count. 
 * enable/disable interworker BP subsystem.(already in place)
 * auto disabling Locality Awareness for 1 worker (i have measured this impact 
to be a bit under 10% or something for 1 worker)
 * user bolts inspecting this to make decisions..  like potentially allocate a 
certain number of external resources based on worker count... or who knows what.

 

My concern is limited to the fact that we must have official support to be able 
to detect the worker count. It is simply too heavy handed to simply say NO to 
all current and potential use cases.


was (Author: roshan_naik):
Mutability of topology configuration is explicitly allowed in the documentation 
(previously referenced).

 

Usefulness of single worker use case should not be under-estimated. For perf 
critical topos it is a reasonable option to consider deploying multiple single 
worker instances of the topo as opposed to a single multiworker instance to 
avoid interworker communication. I have recommended users on mailing list to 
consider it (for things like video processing) and I have seen others recommend 
considering 1worker mode as well.

 

Given the fact that RA creates the divergence in specification v/s reality 
confusion is bound to exist no matter which we resolve this value of this 
setting. Although IMO it is not a strong enough case to not reflect actual 
worker count ... if it is contentious ... I am ok to resolve it either way  
 as long we provide an API for so that code can check.  Currently I am aware of 
a few additional use cases for knowing worker count. 
 * enable/disable interworker BP subsystem.(already in place)
 * auto disabling Locality Awareness for 1 worker (i have measured this impact 
to be a bit under 10% or something for 1 worker)
 * user bolts inspecting this to make decisions..  like potentially allocate a 
certain number of external resources based on worker count... or who knows what.

 

My concern is limited to the fact that we must have official support to be able 
to detect the worker count. It is simply too heavy handed to simply say NO to 
all current and potential use cases.

> Some topologies not working properly 
> -
>
> Key: STORM-2983
> URL: https://issues.apache.org/jira/browse/STORM-2983
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Ethan Li
>Assignee: Ethan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For example,
> {code:java}
> bin/storm jar storm-loadgen-*.jar 
> org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 
> --counters 1 -c topology.debug=true
> {code}
> on ResourceAwareScheduler not working properly.
> With default cluster settings, there will be only one __acker-executor and it 
> will be on a separate worker. And it looks like the __acker-executor was not 
> able to receive messages from spouts and bolts. And spouts and bolts 
> continued to retry sending messages to acker. It then led to another problem:
> STORM-2970
> I tried to run on storm right before 
> [https://github.com/apache/storm/pull/2502] and right after and confirmed 
> that this bug should be related to it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2983) Some topologies not working properly

2018-04-04 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425223#comment-16425223
 ] 

Roshan Naik commented on STORM-2983:


Mutability of topology configuration is explicitly allowed in the documentation 
(previously referenced).

 

Usefulness of single worker use case should not be under-estimated. For perf 
critical topos it is a reasonable option to consider deploying multiple single 
worker instances of the topo as opposed to a single multiworker instance to 
avoid interworker communication. I have recommended users on mailing list to 
consider it (for things like video processing) and I have seen others recommend 
considering 1worker mode as well.

 

Given the fact that RA creates the divergence in specification v/s reality 
confusion is bound to exist no matter which we resolve this value of this 
setting. Although IMO it is not a strong enough case to not reflect actual 
worker count ... if it is contentious ... I am ok to resolve it either way  
 as long we provide an API for so that code can check.  Currently I am aware of 
a few additional use cases for knowing worker count. 
 * enable/disable interworker BP subsystem.(already in place)
 * auto disabling Locality Awareness for 1 worker (i have measured this impact 
to be a bit under 10% or something for 1 worker)
 * user bolts inspecting this to make decisions..  like potentially allocate a 
certain number of external resources based on worker count... or who knows what.

 

My concern is limited to the fact that we must have official support to be able 
to detect the worker count. It is simply too heavy handed to simply say NO to 
all current and potential use cases.

> Some topologies not working properly 
> -
>
> Key: STORM-2983
> URL: https://issues.apache.org/jira/browse/STORM-2983
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Ethan Li
>Assignee: Ethan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For example,
> {code:java}
> bin/storm jar storm-loadgen-*.jar 
> org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 
> --counters 1 -c topology.debug=true
> {code}
> on ResourceAwareScheduler not working properly.
> With default cluster settings, there will be only one __acker-executor and it 
> will be on a separate worker. And it looks like the __acker-executor was not 
> able to receive messages from spouts and bolts. And spouts and bolts 
> continued to retry sending messages to acker. It then led to another problem:
> STORM-2970
> I tried to run on storm right before 
> [https://github.com/apache/storm/pull/2502] and right after and confirmed 
> that this bug should be related to it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2983) Some topologies not working properly

2018-04-03 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424776#comment-16424776
 ] 

Roshan Naik commented on STORM-2983:


[~kabhwan] i think you are again missing what I am stressing. 

We need a way to in code to do this check the worker count (for internal and 
user code). Not be removing code that does such checks. I am not concerned 
about retaining this one optimization. 

There is no point removing reasonable code and then put it back again.

I would like to see why we cannot either fix the topology.workers or  provide 
something else as substitue.

 

> Some topologies not working properly 
> -
>
> Key: STORM-2983
> URL: https://issues.apache.org/jira/browse/STORM-2983
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Ethan Li
>Assignee: Ethan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For example,
> {code:java}
> bin/storm jar storm-loadgen-*.jar 
> org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 
> --counters 1 -c topology.debug=true
> {code}
> on ResourceAwareScheduler not working properly.
> With default cluster settings, there will be only one __acker-executor and it 
> will be on a separate worker. And it looks like the __acker-executor was not 
> able to receive messages from spouts and bolts. And spouts and bolts 
> continued to retry sending messages to acker. It then led to another problem:
> STORM-2970
> I tried to run on storm right before 
> [https://github.com/apache/storm/pull/2502] and right after and confirmed 
> that this bug should be related to it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2983) Some topologies not working properly

2018-04-03 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424680#comment-16424680
 ] 

Roshan Naik commented on STORM-2983:


My Q was not about finding and fixing all things that RAS breaks. It is limited 
to fixing this issue with the worker count that is causing the breakage. Any 
code we try to delete now to unblock you would be useful to revive once the 
worker count issue is fixed. 

- Instead of the proposed fix, can you update the worker count to the right 
value ?

- else, could you consider unblocking your work by commenting out the 
optimization in your local build ?

> Some topologies not working properly 
> -
>
> Key: STORM-2983
> URL: https://issues.apache.org/jira/browse/STORM-2983
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Ethan Li
>Assignee: Ethan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For example,
> {code:java}
> bin/storm jar storm-loadgen-*.jar 
> org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 
> --counters 1 -c topology.debug=true
> {code}
> on ResourceAwareScheduler not working properly.
> With default cluster settings, there will be only one __acker-executor and it 
> will be on a separate worker. And it looks like the __acker-executor was not 
> able to receive messages from spouts and bolts. And spouts and bolts 
> continued to retry sending messages to acker. It then led to another problem:
> STORM-2970
> I tried to run on storm right before 
> [https://github.com/apache/storm/pull/2502] and right after and confirmed 
> that this bug should be related to it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2983) Some topologies not working properly

2018-04-03 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424616#comment-16424616
 ] 

Roshan Naik commented on STORM-2983:


As stated before, the core issue is not the specific optimization. Along with 
removing this optimization we would have to remove all other code that checks 
the same. It is impt to get RAS working but needs to be done correctly. 

My concern is that (independent of the existence/absence of this 
optimization)...  the mechanism to check the worker count by storm internal 
code or end user code is broken. Fixing that will address RAS as well as does 
not need to remove similar code.

So would like to ask my prev question again:

 
 - Is there good reason why topology.workers cannot be dynamically updated to 
reflect the actual worker count. 

 

> Some topologies not working properly 
> -
>
> Key: STORM-2983
> URL: https://issues.apache.org/jira/browse/STORM-2983
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Ethan Li
>Assignee: Ethan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For example,
> {code:java}
> bin/storm jar storm-loadgen-*.jar 
> org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 
> --counters 1 -c topology.debug=true
> {code}
> on ResourceAwareScheduler not working properly.
> With default cluster settings, there will be only one __acker-executor and it 
> will be on a separate worker. And it looks like the __acker-executor was not 
> able to receive messages from spouts and bolts. And spouts and bolts 
> continued to retry sending messages to acker. It then led to another problem:
> STORM-2970
> I tried to run on storm right before 
> [https://github.com/apache/storm/pull/2502] and right after and confirmed 
> that this bug should be related to it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (STORM-2983) Some topologies not working properly

2018-03-30 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421019#comment-16421019
 ] 

Roshan Naik edited comment on STORM-2983 at 3/30/18 10:26 PM:
--

Removing / preserving the optimization will not address the core problem ... 
which is .. topology.workers is not usable and therefore any code 
(present/future) would be considered broken with no workaround.

There are other use cases in code for this setting plus more that I can think 
of, but to limit the scope I will stay with the topic of some settings in 
topoConf being unreliable. I dont think its a good idea to say any such code 
should not allowed in storm.

 

Since topoConf is part of the API  ...  it is surprising to know that any 
setting stored in it cannot be relied upon. The  
[documentation|http://storm.apache.org/releases/1.2.1/Configuration.html] 
specifically states that the system supports overriding settings.  *The actual 
bug here is that we are not dynamically overriding the setting to what it 
should be.*

I think we need to have a good justification for making any setting unreliable 
and not open the door for more of the same.

Would like to know:
 - Is there good reason why this setting cannot be made reliable ?

 - Are you aware of other such settings that are not reflecting the correct 
values ?

 


was (Author: roshan_naik):
Removing / preserving the optimization will not address the core problem ... 
which is .. topology.worker.count is not usable and therefore any code 
(present/future) would be considered broken with no workaround.

There are other use cases in code for this setting plus more that I can think 
of, but to limit the scope I will stay with the topic of some settings in 
topoConf being unreliable. I dont think its a good idea to say any such code 
should not allowed in storm.

 

Since topoConf is part of the API  ...  it is surprising to know that any 
setting stored in it cannot be relied upon. The  
[documentation|http://storm.apache.org/releases/1.2.1/Configuration.html] 
specifically states that the system supports overriding settings.  *The actual 
bug here is that we are not dynamically overriding the setting to what it 
should be.*

I think we need to have a good justification for making any setting unreliable 
and not open the door for more of the same.

Would like to know:
 - Is there good reason why this setting cannot be made reliable ?

 - Are you aware of other such settings that are not reflecting the correct 
values ?

 

> Some topologies not working properly 
> -
>
> Key: STORM-2983
> URL: https://issues.apache.org/jira/browse/STORM-2983
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Ethan Li
>Assignee: Ethan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For example,
> {code:java}
> bin/storm jar storm-loadgen-*.jar 
> org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 
> --counters 1 -c topology.debug=true
> {code}
> on ResourceAwareScheduler not working properly.
> With default cluster settings, there will be only one __acker-executor and it 
> will be on a separate worker. And it looks like the __acker-executor was not 
> able to receive messages from spouts and bolts. And spouts and bolts 
> continued to retry sending messages to acker. It then led to another problem:
> STORM-2970
> I tried to run on storm right before 
> [https://github.com/apache/storm/pull/2502] and right after and confirmed 
> that this bug should be related to it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (STORM-2983) Some topologies not working properly

2018-03-30 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421019#comment-16421019
 ] 

Roshan Naik edited comment on STORM-2983 at 3/30/18 10:25 PM:
--

Removing / preserving the optimization will not address the core problem ... 
which is .. topology.worker.count is not usable and therefore any code 
(present/future) would be considered broken with no workaround.

There are other use cases in code for this setting plus more that I can think 
of, but to limit the scope I will stay with the topic of some settings in 
topoConf being unreliable. I dont think its a good idea to say any such code 
should not allowed in storm.

 

Since topoConf is part of the API  ...  it is surprising to know that any 
setting stored in it cannot be relied upon. The  
[documentation|http://storm.apache.org/releases/1.2.1/Configuration.html] 
specifically states that the system supports overriding settings.  *The actual 
bug here is that we are not dynamically overriding the setting to what it 
should be.*

I think we need to have a good justification for making any setting unreliable 
and not open the door for more of the same.

Would like to know:
 - Is there good reason why this setting cannot be made reliable ?

 - Are you aware of other such settings that are not reflecting the correct 
values ?

 


was (Author: roshan_naik):
Removing / preserving the optimization will not address the core problem ... 
which is .. topology.worker.count is not usable and therefore any code 
(present/future) would be considered broken with no workaround. There are other 
use cases in code for this setting plus more that I can think of, but to limit 
the scope I will stay with the topic of some settings in topoConf being 
unreliable. I dont think its a good idea to say any such code should not 
allowed in storm.

 

Since topoConf is part of the API  ...  it is surprising to know that any 
setting stored in it cannot be relied upon. The  
[documentation|http://storm.apache.org/releases/1.2.1/Configuration.html] 
specifically states that the system supports overriding settings.  *The actual 
bug here is that we are not dynamically overriding the setting to what it 
should be.*

I think we need to have a good justification for making any setting unreliable 
and not open the door for more of the same.

Would like to know:
 - Is there good reason why this setting cannot be made reliable ?

 - Are you aware of other such settings that are not reflecting the correct 
values ?

 

> Some topologies not working properly 
> -
>
> Key: STORM-2983
> URL: https://issues.apache.org/jira/browse/STORM-2983
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Ethan Li
>Assignee: Ethan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For example,
> {code:java}
> bin/storm jar storm-loadgen-*.jar 
> org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 
> --counters 1 -c topology.debug=true
> {code}
> on ResourceAwareScheduler not working properly.
> With default cluster settings, there will be only one __acker-executor and it 
> will be on a separate worker. And it looks like the __acker-executor was not 
> able to receive messages from spouts and bolts. And spouts and bolts 
> continued to retry sending messages to acker. It then led to another problem:
> STORM-2970
> I tried to run on storm right before 
> [https://github.com/apache/storm/pull/2502] and right after and confirmed 
> that this bug should be related to it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2983) Some topologies not working properly

2018-03-30 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421019#comment-16421019
 ] 

Roshan Naik commented on STORM-2983:


Removing / preserving the optimization will not address the core problem ... 
which is .. topology.worker.count is not usable and therefore any code 
(present/future) would be considered broken with no workaround. There are other 
use cases in code for this setting plus more that I can think of, but to limit 
the scope I will stay with the topic of some settings in topoConf being 
unreliable. I dont think its a good idea to say any such code should not 
allowed in storm.

 

Since topoConf is part of the API  ...  it is surprising to know that any 
setting stored in it cannot be relied upon. The  
[documentation|http://storm.apache.org/releases/1.2.1/Configuration.html] 
specifically states that the system supports overriding settings.  *The actual 
bug here is that we are not dynamically overriding the setting to what it 
should be.*

I think we need to have a good justification for making any setting unreliable 
and not open the door for more of the same.

Would like to know:
 - Is there good reason why this setting cannot be made reliable ?

 - Are you aware of other such settings that are not reflecting the correct 
values ?

 

> Some topologies not working properly 
> -
>
> Key: STORM-2983
> URL: https://issues.apache.org/jira/browse/STORM-2983
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Ethan Li
>Assignee: Ethan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For example,
> {code:java}
> bin/storm jar storm-loadgen-*.jar 
> org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 
> --counters 1 -c topology.debug=true
> {code}
> on ResourceAwareScheduler not working properly.
> With default cluster settings, there will be only one __acker-executor and it 
> will be on a separate worker. And it looks like the __acker-executor was not 
> able to receive messages from spouts and bolts. And spouts and bolts 
> continued to retry sending messages to acker. It then led to another problem:
> STORM-2970
> I tried to run on storm right before 
> [https://github.com/apache/storm/pull/2502] and right after and confirmed 
> that this bug should be related to it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (STORM-2983) Some topologies not working properly

2018-03-30 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420273#comment-16420273
 ] 

Roshan Naik edited comment on STORM-2983 at 3/30/18 8:20 AM:
-

[~kabhwan] 

IMO this should not be viewed as a sidestepping an minor optimization. In which 
case it does look like a minor issue. The optimization only exposed a bug. So 
it is only the "messenger".

The key issue here is supporting (internal or user code) to check the state 
correctly .. to do whatever it needs to do  optimization or behavior 
change. 

By not fixing the core bug, we wlll sidestep the core issue. IMO this a 
critical bug and needs to fixed. Or we need to provide a diff mechanism for 
code to make such checks correctly.

 


was (Author: roshan_naik):
[~kabhwan] 

IMO this should not be viewed as a sidestepping an optimization. In which case 
it does look like a minor issue. The optimization only exposed a bug. So it is 
only the "messenger".

The key issue here is supporting (internal or user code) to check the state 
correctly .. to do whatever it needs to do  optimization or behavior 
change. 

By not fixing the core bug, we wlll sidestep the core issue. IMO this a 
critical bug and needs to fixed. Or we need to provide a diff mechanism for 
code to make such checks correctly.

 

> Some topologies not working properly 
> -
>
> Key: STORM-2983
> URL: https://issues.apache.org/jira/browse/STORM-2983
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Ethan Li
>Assignee: Ethan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For example,
> {code:java}
> bin/storm jar storm-loadgen-*.jar 
> org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 
> --counters 1 -c topology.debug=true
> {code}
> on ResourceAwareScheduler not working properly.
> With default cluster settings, there will be only one __acker-executor and it 
> will be on a separate worker. And it looks like the __acker-executor was not 
> able to receive messages from spouts and bolts. And spouts and bolts 
> continued to retry sending messages to acker. It then led to another problem:
> STORM-2970
> I tried to run on storm right before 
> [https://github.com/apache/storm/pull/2502] and right after and confirmed 
> that this bug should be related to it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2983) Some topologies not working properly

2018-03-30 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420273#comment-16420273
 ] 

Roshan Naik commented on STORM-2983:


[~kabhwan] 

IMO this should not be viewed as a sidestepping an optimization. In which case 
it does look like a minor issue. The optimization only exposed a bug. So it is 
only the "messenger".

The key issue here is supporting (internal or user code) to check the state 
correctly .. to do whatever it needs to do  optimization or behavior 
change. 

By not fixing the core bug, we wlll sidestep the core issue. IMO this a 
critical bug and needs to fixed. Or we need to provide a diff mechanism for 
code to make such checks correctly.

 

> Some topologies not working properly 
> -
>
> Key: STORM-2983
> URL: https://issues.apache.org/jira/browse/STORM-2983
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Ethan Li
>Assignee: Ethan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For example,
> {code:java}
> bin/storm jar storm-loadgen-*.jar 
> org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 
> --counters 1 -c topology.debug=true
> {code}
> on ResourceAwareScheduler not working properly.
> With default cluster settings, there will be only one __acker-executor and it 
> will be on a separate worker. And it looks like the __acker-executor was not 
> able to receive messages from spouts and bolts. And spouts and bolts 
> continued to retry sending messages to acker. It then led to another problem:
> STORM-2970
> I tried to run on storm right before 
> [https://github.com/apache/storm/pull/2502] and right after and confirmed 
> that this bug should be related to it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2983) Some topologies not working properly

2018-03-29 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420162#comment-16420162
 ] 

Roshan Naik commented on STORM-2983:


[~ethanli]

We have a problem in Storm with an excessive number of threads. See details in 
STORM-2647 for how bad the issue is. In general, best to avoid having useless 
threads lingering around as they are not free in terms of energy consumption in 
in the long run and keep bumping off useful threads from their cores 
periodically.

IMO, the right approach is to fix this by addressing the fundamental issue of 
worker count reflecting the true state of the topology. Having the settings 
reflect the right values is also useful for topologies (as they can query them 
to find out the state of the topology). This would have to be fixed sooner or 
later.

 

> Some topologies not working properly 
> -
>
> Key: STORM-2983
> URL: https://issues.apache.org/jira/browse/STORM-2983
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Ethan Li
>Assignee: Ethan Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For example,
> {code:java}
> bin/storm jar storm-loadgen-*.jar 
> org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 
> --counters 1 -c topology.debug=true
> {code}
> on ResourceAwareScheduler not working properly.
> With default cluster settings, there will be only one __acker-executor and it 
> will be on a separate worker. And it looks like the __acker-executor was not 
> able to receive messages from spouts and bolts. And spouts and bolts 
> continued to retry sending messages to acker. It then led to another problem:
> STORM-2970
> I tried to run on storm right before 
> [https://github.com/apache/storm/pull/2502] and right after and confirmed 
> that this bug should be related to it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2983) Some topologies not working properly

2018-03-29 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419547#comment-16419547
 ] 

Roshan Naik commented on STORM-2983:


Go  ahead. I didn’t take it up since the changes  are in the RAS code path. 


> Some topologies not working properly 
> -
>
> Key: STORM-2983
> URL: https://issues.apache.org/jira/browse/STORM-2983
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Ethan Li
>Priority: Major
>
> For example,
> {code:java}
> bin/storm jar storm-loadgen-*.jar 
> org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 
> --counters 1 -c topology.debug=true
> {code}
> on ResourceAwareScheduler not working properly.
> With default cluster settings, there will be only one __acker-executor and it 
> will be on a separate worker. And it looks like the __acker-executor was not 
> able to receive messages from spouts and bolts. And spouts and bolts 
> continued to retry sending messages to acker. It then led to another problem:
> STORM-2970
> I tried to run on storm right before 
> [https://github.com/apache/storm/pull/2502] and right after and confirmed 
> that this bug should be related to it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2947) Review and fix/remove deprecated things in Storm 2.0.0

2018-03-21 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407593#comment-16407593
 ] 

Roshan Naik commented on STORM-2947:


[~kabhwan] .. I realized. We cannot deprecate the above noted settings in 1.x 
code base. As they are still perfectly valid there w/o alternatives. We will 
just need to make a note of old ones that are no longer used in master branch.

> Review and fix/remove deprecated things in Storm 2.0.0
> --
>
> Key: STORM-2947
> URL: https://issues.apache.org/jira/browse/STORM-2947
> Project: Apache Storm
>  Issue Type: Task
>  Components: storm-client, storm-hdfs, storm-kafka, storm-server, 
> storm-solr
>Affects Versions: 2.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>
> We've been deprecating the things but haven't have time to replace/get rid of 
> them. It should be better if we have time to review and address them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-2964) Deprecate configs/interfaces/classes in 1.x line

2018-03-19 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-2964:
---
Summary: Deprecate configs/interfaces/classes in 1.x line  (was: Deprecate 
removed configs/interfaces/classes from Storm 2.0.0 to 1.x version line)

> Deprecate configs/interfaces/classes in 1.x line
> 
>
> Key: STORM-2964
> URL: https://issues.apache.org/jira/browse/STORM-2964
> Project: Apache Storm
>  Issue Type: Task
>  Components: storm-client
>Reporter: Jungtaek Lim
>Priority: Major
>
> This is to track the effort on deprecating removed configs/interfaces/classes 
> from Storm 2.0.0 to 1.x version line. We may be late on the party but still 
> be better to annotate rather than suddenly introducing the change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2970) JCQueue can be full and will throw IllegalStateException

2018-03-19 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405773#comment-16405773
 ] 

Roshan Naik commented on STORM-2970:


[~ethanli] i think this issue was a side effect of worker thread not being spun 
up in multi-worker mode with RAS enabled. can we close this if not relevant 
anymore ?

> JCQueue can be full and will throw IllegalStateException
> 
>
> Key: STORM-2970
> URL: https://issues.apache.org/jira/browse/STORM-2970
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Ethan Li
>Priority: Major
>
> I ran 
> {code:java}
> storm jar /tmp/storm-loadgen-2.0.0-SNAPSHOT.jar  
> org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2  
> --counters 1 --rate 1000 -c topology.debug=true
> {code}
> and the topology was not running properly. And some exception showed up in 
> the worker.log:
> {code:java}
> 2018-02-23 15:13:29.939 o.a.s.u.Utils Thread-15-spout-executor[7, 7] [ERROR] 
> Async loop died!
> java.lang.RuntimeException: java.lang.IllegalStateException: Queue full
> at org.apache.storm.executor.Executor.accept(Executor.java:288) 
> ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
> at org.apache.storm.utils.JCQueue.consumeImpl(JCQueue.java:309) 
> ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
> at org.apache.storm.utils.JCQueue.consume(JCQueue.java:290) 
> ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
> at org.apache.storm.utils.JCQueue.consume(JCQueue.java:281) 
> ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
> at 
> org.apache.storm.executor.spout.SpoutExecutor$2.call(SpoutExecutor.java:173) 
> ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
> at 
> org.apache.storm.executor.spout.SpoutExecutor$2.call(SpoutExecutor.java:163) 
> ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
> at org.apache.storm.utils.Utils$2.run(Utils.java:352) 
> [storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
> Caused by: java.lang.IllegalStateException: Queue full
> at java.util.AbstractQueue.add(AbstractQueue.java:98) ~[?:1.8.0_131]
> at 
> org.apache.storm.daemon.worker.WorkerTransfer.tryTransferRemote(WorkerTransfer.java:112)
>  ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
> at 
> org.apache.storm.daemon.worker.WorkerState.tryTransferRemote(WorkerState.java:526)
>  ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
> at 
> org.apache.storm.executor.ExecutorTransfer.tryTransfer(ExecutorTransfer.java:71)
>  ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
> at org.apache.storm.daemon.Task.sendUnanchored(Task.java:196) 
> ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
> at 
> org.apache.storm.executor.spout.SpoutOutputCollectorImpl.sendSpoutMsg(SpoutOutputCollectorImpl.java:164)
>  ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
> at 
> org.apache.storm.executor.spout.SpoutOutputCollectorImpl.emit(SpoutOutputCollectorImpl.java:75)
>  ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
> at 
> org.apache.storm.spout.SpoutOutputCollector.emit(SpoutOutputCollector.java:51)
>  ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
> at org.apache.storm.loadgen.LoadSpout.fail(LoadSpout.java:135) 
> ~[stormjar.jar:2.0.0-SNAPSHOT]
> at 
> org.apache.storm.executor.spout.SpoutExecutor.failSpoutMsg(SpoutExecutor.java:362)
>  ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
> at 
> org.apache.storm.executor.spout.SpoutExecutor$1.expire(SpoutExecutor.java:126)
>  ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
> at 
> org.apache.storm.executor.spout.SpoutExecutor$1.expire(SpoutExecutor.java:119)
>  ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
> at org.apache.storm.utils.RotatingMap.rotate(RotatingMap.java:77) 
> ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
> at 
> org.apache.storm.executor.spout.SpoutExecutor.tupleActionFn(SpoutExecutor.java:298)
>  ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
> at org.apache.storm.executor.Executor.accept(Executor.java:284) 
> ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
> ... 7 more
> {code}
> More log from the worker.log:
> https://gist.github.com/Ethanlm/bcab1289a3813af6d7135aa95b1aaa14



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2963) Updates to Performance.md

2018-02-25 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376431#comment-16376431
 ] 

Roshan Naik commented on STORM-2963:


This is a catch all jira. As things come up I will keep accumulating them, and  
finally create a PR  close to the 2.0 release.

> Updates to Performance.md 
> --
>
> Key: STORM-2963
> URL: https://issues.apache.org/jira/browse/STORM-2963
> Project: Apache Storm
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2964) Deprecate old Spout wait strategy model in 1.x version line

2018-02-20 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370880#comment-16370880
 ] 

Roshan Naik commented on STORM-2964:


[~kabhwan]

After STORM-2959 was marked as duplicate of STORM-2947, I added my notes to 
2947 itself about all such deprecations (including this one) ..  so all 
deprecations are handled/documented uniformly.

> Deprecate old Spout wait strategy model in 1.x version line
> ---
>
> Key: STORM-2964
> URL: https://issues.apache.org/jira/browse/STORM-2964
> Project: Apache Storm
>  Issue Type: Task
>  Components: storm-client
>Reporter: Jungtaek Lim
>Priority: Major
>
> This is follow-up issue for STORM-2958, deprecating old Spout wait strategy 
> in 1.x version line with properly noting that it will be removed at Storm 
> 2.0.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (STORM-2963) Updates to Performance.md

2018-02-19 Thread Roshan Naik (JIRA)
Roshan Naik created STORM-2963:
--

 Summary: Updates to Performance.md 
 Key: STORM-2963
 URL: https://issues.apache.org/jira/browse/STORM-2963
 Project: Apache Storm
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Roshan Naik
Assignee: Roshan Naik
 Fix For: 2.0.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2947) Review and fix/remove deprecated things in Storm 2.0.0

2018-02-19 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369615#comment-16369615
 ] 

Roshan Naik commented on STORM-2947:


*Notes:* 

*Settings to deprecate in 1.x :* As they have been removed in 2.x:
 - topology.sleep.spout.wait.strategy.time.ms

 - topology.backpressure.enable
 - backpressure.disruptor.high.watermark
 - backpressure.disruptor.low.watermark
 - backpressure.znode.timeout.secs
 - backpressure.znode.update.freq.secs
 - task.backpressure.poll.secs

 - topology.executor.receive.buffer.size: 1024 #batched
 - topology.executor.send.buffer.size: 1024 #individual messages
 - topology.transfer.buffer.size: 1024 # batched

 - topology.bolts.outgoing.overflow.buffer.enable: false
 - topology.disruptor.wait.timeout.millis: 1000
 - topology.disruptor.batch.size: 100
 - topology.disruptor.batch.timeout.millis:

*Classes to deprecate in 1.x:*
 - org.apache.storm.spout.SleepSpoutWaitStrategy
 - org.apache.storm.spout.ISpoutWaitStrategy
 - org.apache.storm.spout.NothingEmptyEmitStrategy

*Settings to remove in 2.x:*
 - nimbus.host
 - storm.messaging.netty.max_retries
 - storm.local.mode.zmq

> Review and fix/remove deprecated things in Storm 2.0.0
> --
>
> Key: STORM-2947
> URL: https://issues.apache.org/jira/browse/STORM-2947
> Project: Apache Storm
>  Issue Type: Task
>  Components: storm-client, storm-hdfs, storm-kafka, storm-server, 
> storm-solr
>Affects Versions: 2.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>
> We've been deprecating the things but haven't have time to replace/get rid of 
> them. It should be better if we have time to review and address them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (STORM-2947) Review and fix/remove deprecated things in Storm 2.0.0

2018-02-19 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369615#comment-16369615
 ] 

Roshan Naik edited comment on STORM-2947 at 2/20/18 12:39 AM:
--

Few things that i am aware of:

 

*Settings to deprecate in 1.x :* As they have been removed in 2.x:
 - topology.sleep.spout.wait.strategy.time.ms

 - topology.backpressure.enable
 - backpressure.disruptor.high.watermark
 - backpressure.disruptor.low.watermark
 - backpressure.znode.timeout.secs
 - backpressure.znode.update.freq.secs
 - task.backpressure.poll.secs

 - topology.executor.receive.buffer.size: 1024 #batched
 - topology.executor.send.buffer.size: 1024 #individual messages
 - topology.transfer.buffer.size: 1024 # batched

 - topology.bolts.outgoing.overflow.buffer.enable: false
 - topology.disruptor.wait.timeout.millis: 1000
 - topology.disruptor.batch.size: 100
 - topology.disruptor.batch.timeout.millis:

*Classes to deprecate in 1.x:*
 - org.apache.storm.spout.SleepSpoutWaitStrategy
 - org.apache.storm.spout.ISpoutWaitStrategy
 - org.apache.storm.spout.NothingEmptyEmitStrategy

*Settings to remove in 2.x:*
 - nimbus.host
 - storm.messaging.netty.max_retries
 - storm.local.mode.zmq


was (Author: roshan_naik):
*Notes:* 

*Settings to deprecate in 1.x :* As they have been removed in 2.x:
 - topology.sleep.spout.wait.strategy.time.ms

 - topology.backpressure.enable
 - backpressure.disruptor.high.watermark
 - backpressure.disruptor.low.watermark
 - backpressure.znode.timeout.secs
 - backpressure.znode.update.freq.secs
 - task.backpressure.poll.secs

 - topology.executor.receive.buffer.size: 1024 #batched
 - topology.executor.send.buffer.size: 1024 #individual messages
 - topology.transfer.buffer.size: 1024 # batched

 - topology.bolts.outgoing.overflow.buffer.enable: false
 - topology.disruptor.wait.timeout.millis: 1000
 - topology.disruptor.batch.size: 100
 - topology.disruptor.batch.timeout.millis:

*Classes to deprecate in 1.x:*
 - org.apache.storm.spout.SleepSpoutWaitStrategy
 - org.apache.storm.spout.ISpoutWaitStrategy
 - org.apache.storm.spout.NothingEmptyEmitStrategy

*Settings to remove in 2.x:*
 - nimbus.host
 - storm.messaging.netty.max_retries
 - storm.local.mode.zmq

> Review and fix/remove deprecated things in Storm 2.0.0
> --
>
> Key: STORM-2947
> URL: https://issues.apache.org/jira/browse/STORM-2947
> Project: Apache Storm
>  Issue Type: Task
>  Components: storm-client, storm-hdfs, storm-kafka, storm-server, 
> storm-solr
>Affects Versions: 2.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
>
> We've been deprecating the things but haven't have time to replace/get rid of 
> them. It should be better if we have time to review and address them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (STORM-2959) Remove deprecated, old/unused config settings for 2.0

2018-02-18 Thread Roshan Naik (JIRA)
Roshan Naik created STORM-2959:
--

 Summary: Remove deprecated, old/unused config settings for 2.0
 Key: STORM-2959
 URL: https://issues.apache.org/jira/browse/STORM-2959
 Project: Apache Storm
  Issue Type: Improvement
Reporter: Roshan Naik


See 3 settings that can be removed :

nimbus.host

storm.messaging.netty.max_retries

storm.local.mode.zmq

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-2958) Use standard wait strategies for Spout as well

2018-02-18 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-2958:
---
Description: 
STORM-2306 introduced a new configurable wait strategy system for these 
situations
 * BackPressure Wait (used by spout & bolt)
 * No incoming data (used by bolt)

There is another wait situation in the spout when there are no emits generated 
in a nextTuple() or if max.spout.pending has been reached. This Jira is to 
transition the spout wait strategy from the old model to the new model. Thereby 
we have a uniform model for dealing with wait strategies.

  was:
STORM-2306 introduced a new configurable wait strategy system for these 
situations
 * BackPressure Wait (used by spout & bolt)
 * No incoming data (used by bolt)

There is another wait situation in the spout when there are no emits generated 
in a nextTuple() or if max.spout.pending has been reached. This Jira is to 
transition the spout wait strategy from the old model to the new model. Thereby 
we have the uniform model for dealing with wait strategies.


> Use standard wait strategies for Spout as well
> --
>
> Key: STORM-2958
> URL: https://issues.apache.org/jira/browse/STORM-2958
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-client
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>Priority: Major
> Fix For: 2.0.0
>
>
> STORM-2306 introduced a new configurable wait strategy system for these 
> situations
>  * BackPressure Wait (used by spout & bolt)
>  * No incoming data (used by bolt)
> There is another wait situation in the spout when there are no emits 
> generated in a nextTuple() or if max.spout.pending has been reached. This 
> Jira is to transition the spout wait strategy from the old model to the new 
> model. Thereby we have a uniform model for dealing with wait strategies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-2958) Use standard wait strategies for Spout as well

2018-02-18 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-2958:
---
Description: 
STORM-2306 introduced a new configurable wait strategy system for these 
situations
 * BackPressure Wait (used by spout & bolt)
 * No incoming data (used by bolt)

There is another wait situation in the spout when there are no emits generated 
in a nextTuple() or if max.spout.pending has been reached. This Jira is to 
transition the spout wait strategy from the old model to the new model. Thereby 
we have the uniform model for dealing with wait strategies.

  was:
STORM-2306 introduced a new configurable wait strategy system for these 
situations
 * BackPressure Wait (used by spout & bolt)
 * No incoming data (used by bolt)

There is another wait situation in the spout when there are no emits generated 
in a nextTuple() or if max.spout.pending has been reached. This Jira is to 
transition the spout wait strategy from the old model to the new model.


> Use standard wait strategies for Spout as well
> --
>
> Key: STORM-2958
> URL: https://issues.apache.org/jira/browse/STORM-2958
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-client
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>Priority: Major
> Fix For: 2.0.0
>
>
> STORM-2306 introduced a new configurable wait strategy system for these 
> situations
>  * BackPressure Wait (used by spout & bolt)
>  * No incoming data (used by bolt)
> There is another wait situation in the spout when there are no emits 
> generated in a nextTuple() or if max.spout.pending has been reached. This 
> Jira is to transition the spout wait strategy from the old model to the new 
> model. Thereby we have the uniform model for dealing with wait strategies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-2958) Use standard wait strategies for Spout as well

2018-02-18 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-2958:
---
Description: 
STORM-2306 introduced a new configurable wait strategy system for these 
situations
 * BackPressure Wait (used by spout & bolt)
 * No incoming data (used by bolt)

There is another wait situation in the spout when there are no emits generated 
in a nextTuple() or if max.spout.pending has been reached. This Jira is to 
transition the spout wait strategy from the old model to the new model.

  was:
STORM-2306 introduced a new configurable wait strategy system for these 
situations
 * BackPressure Wait (used by spout & bolt)
 * No incoming data (used by bolt)

There is another wait situation in the spout when there are no emits generated 
in a nextTuple() or if max.spout.pending has been reached. This Jira is 
transition the spout wait strategy from the old model to the new model.


> Use standard wait strategies for Spout as well
> --
>
> Key: STORM-2958
> URL: https://issues.apache.org/jira/browse/STORM-2958
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-client
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>Priority: Major
> Fix For: 2.0.0
>
>
> STORM-2306 introduced a new configurable wait strategy system for these 
> situations
>  * BackPressure Wait (used by spout & bolt)
>  * No incoming data (used by bolt)
> There is another wait situation in the spout when there are no emits 
> generated in a nextTuple() or if max.spout.pending has been reached. This 
> Jira is to transition the spout wait strategy from the old model to the new 
> model.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (STORM-2958) Use standard wait strategies for Spout as well

2018-02-18 Thread Roshan Naik (JIRA)
Roshan Naik created STORM-2958:
--

 Summary: Use standard wait strategies for Spout as well
 Key: STORM-2958
 URL: https://issues.apache.org/jira/browse/STORM-2958
 Project: Apache Storm
  Issue Type: Improvement
  Components: storm-client
Affects Versions: 2.0.0
Reporter: Roshan Naik
Assignee: Roshan Naik
 Fix For: 2.0.0


STORM-2306 introduced a new configurable wait strategy system for these 
situations
 * BackPressure Wait (used by spout & bolt)
 * No incoming data (used by bolt)

There is another wait situation in the spout when there are no emits generated 
in a nextTuple() or if max.spout.pending has been reached. This Jira is 
transition the spout wait strategy from the old model to the new model.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (STORM-2310) Handling Back Pressure

2018-02-17 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik closed STORM-2310.
--
Resolution: Won't Fix

> Handling Back Pressure
> --
>
> Key: STORM-2310
> URL: https://issues.apache.org/jira/browse/STORM-2310
> Project: Apache Storm
>  Issue Type: Sub-task
>  Components: storm-core
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>Priority: Major
>
> Design Doc:
> https://docs.google.com/document/d/1btmpBpFeEl-bh1uhQ_W4Ao-cQK7muNBYbXXMwuc4YaU/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (STORM-2310) Handling Back Pressure

2018-02-17 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik reopened STORM-2310:


> Handling Back Pressure
> --
>
> Key: STORM-2310
> URL: https://issues.apache.org/jira/browse/STORM-2310
> Project: Apache Storm
>  Issue Type: Sub-task
>  Components: storm-core
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>Priority: Major
>
> Design Doc:
> https://docs.google.com/document/d/1btmpBpFeEl-bh1uhQ_W4Ao-cQK7muNBYbXXMwuc4YaU/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (STORM-2310) Handling Back Pressure

2018-02-17 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik closed STORM-2310.
--
Resolution: Fixed
  Assignee: Roshan Naik

The basic backpressure model is implemented as part of STORM-2306. 

The design doc here proposes an extension to the basic model which allows for 
buffering a much larger backlog of messages during BP situations. My experience 
with STORM-2306 suggests the basic BP model with shorter queues is preferable 
to this extended model due to the following reasons:
 * Having more messages waiting in deeper queues only worsens amount of time it 
takes for the messages to make it through all the bolts and thus increases 
latency (spout ->...-> terminal bolt) with no added throughput benefit.
 * When worker crashes: 
 ** If ACKing is disabled, leads to a bigger loss of messages.
 ** If ACKing is enabled, the amount of reprocessing required is quite high.

Consequently changing my mind and closing this JIRA.

> Handling Back Pressure
> --
>
> Key: STORM-2310
> URL: https://issues.apache.org/jira/browse/STORM-2310
> Project: Apache Storm
>  Issue Type: Sub-task
>  Components: storm-core
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>Priority: Major
>
> Design Doc:
> https://docs.google.com/document/d/1btmpBpFeEl-bh1uhQ_W4Ao-cQK7muNBYbXXMwuc4YaU/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (STORM-2945) Nail down and document how to support background emits in Spouts and Bolts

2018-02-13 Thread Roshan Naik (JIRA)
Roshan Naik created STORM-2945:
--

 Summary: Nail down and document how to support background emits in 
Spouts and Bolts
 Key: STORM-2945
 URL: https://issues.apache.org/jira/browse/STORM-2945
 Project: Apache Storm
  Issue Type: Bug
Reporter: Roshan Naik
 Fix For: 2.0.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2944) Eliminate these deprecated methods in Storm 3

2018-02-09 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359164#comment-16359164
 ] 

Roshan Naik commented on STORM-2944:


[~revans2] cold you add a 3.0 version in Jira so that this JIRA can be 
associated with that.

> Eliminate these deprecated methods in Storm 3
> -
>
> Key: STORM-2944
> URL: https://issues.apache.org/jira/browse/STORM-2944
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-client
>Reporter: Roshan Naik
>Priority: Major
>
> In Storm 2 we deprecated these methods. These methods were retained to allow 
> storm 2.x nimbus & supervisor to manage older Storm 1.x workers.  
> These are the methods in IStormClusterState.java 
> {code}
> /** @deprecated: In Storm 2.0. Retained for enabling transition from 1.x. 
> Will be removed soon. */
> @Deprecated
>  boolean topologyBackpressure(String stormId, long timeoutMs, Runnable 
> callback);
> /** @deprecated: In Storm 2.0. Retained for enabling transition from 1.x. 
> Will be removed soon. */
>  @Deprecated
>  void setupBackpressure(String stormId);
> /** @deprecated: In Storm 2.0. Retained for enabling transition from 1.x. 
> Will be removed soon. */
>  @Deprecated
>  void removeBackpressure(String stormId);
> /** @deprecated: In Storm 2.0. Retained for enabling transition from 1.x. 
> Will be removed soon. */
>  @Deprecated
>  void removeWorkerBackpressure(String stormId, String node, Long port);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (STORM-2306) Redesign Messaging Subsystem and switch to JCTools Queues

2018-01-30 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346283#comment-16346283
 ] 

Roshan Naik commented on STORM-2306:


is it possible to have it in both places ?

> Redesign Messaging Subsystem and switch to JCTools Queues
> -
>
> Key: STORM-2306
> URL: https://issues.apache.org/jira/browse/STORM-2306
> Project: Apache Storm
>  Issue Type: Sub-task
>  Components: storm-core
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 51h
>  Remaining Estimate: 0h
>
> Details in these documents:
> 1) *Redesign of the messaging subsystem*
> https://docs.google.com/document/d/1NK1DJ3aAkta-Im0m-2FObQ4cSRp8xSa301y6zoqcBeE/edit?usp=sharing
> This doc discusses the new design for the messaging system. Plus some of the 
> optimizations being made.
> 2) *Choosing a high performance messaging queue:*
> https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing
> This doc looks into how fast hardware can do inter-thread messaging and why 
> we chose the JCTools queues.
> 3) *Backpressure Model*
> https://docs.google.com/document/d/1Z9pRdI5wtnK-hVwE3Spe6VGCTsz9g8TkgxbTFcbL3jM/edit?usp=sharing
> Describes the Backpressure model integrated into the new messaging subsystem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (STORM-2871) Performance optimizations for getOutgoingTasks

2017-12-28 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-2871:
---
Component/s: storm-client

> Performance optimizations for getOutgoingTasks 
> ---
>
> Key: STORM-2871
> URL: https://issues.apache.org/jira/browse/STORM-2871
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-client
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>
> Task.getOutgoingTasks() is in critical messaging path. Two observed 
> bottlenecks in it :
> - Looking up HashMap 'streamToGroupers'. Need to look into converting HashMap 
> into Array lookup ?
> - 
> [outTasks.addAll(compTasks)|https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/Task.java#L139]
>   seems to be impacting throughput as well. Identified by .. running 
> ConstSpoutNullBoltTopo with 1 spout & bolt paralllelism (no Ack) and 
> replacing this line with hard coded logic to add the single known bolt's 
> taskID. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (STORM-2871) Performance optimizations for getOutgoingTasks

2017-12-28 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-2871:
---
Affects Version/s: 2.0.0

> Performance optimizations for getOutgoingTasks 
> ---
>
> Key: STORM-2871
> URL: https://issues.apache.org/jira/browse/STORM-2871
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-client
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>
> Task.getOutgoingTasks() is in critical messaging path. Two observed 
> bottlenecks in it :
> - Looking up HashMap 'streamToGroupers'. Need to look into converting HashMap 
> into Array lookup ?
> - 
> [outTasks.addAll(compTasks)|https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/Task.java#L139]
>   seems to be impacting throughput as well. Identified by .. running 
> ConstSpoutNullBoltTopo with 1 spout & bolt paralllelism (no Ack) and 
> replacing this line with hard coded logic to add the single known bolt's 
> taskID. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (STORM-2871) Performance optimizations for getOutgoingTasks

2017-12-28 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-2871:
---
Description: 
Task.getOutgoingTasks() is in critical messaging path. Two observed bottlenecks 
in it :

- [Looking up HashMap|] 'streamToGroupers'. Need to look into converting 
HashMap into Array lookup ?
- 
[outTasks.addAll(compTasks)|https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/Task.java#L139]
  seems to be impacting throughput as well. Identified by .. running 
ConstSpoutNullBoltTopo with 1 spout & bolt paralllelism (no Ack) and replacing 
this line with hard coded logic to add the single known bolt's taskID. 

  was:
Task.getOutgoingTasks() is in critical messaging path. Two observed bottlenecks 
in it :

-[Looking up HashMap|] 'streamToGroupers'. Need to look into converting HashMap 
into Array lookup ?
- 
[outTasks.addAll(compTasks)|https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/Task.java#L139]
  seems to be impacting throughput as well. Identified by .. running 
ConstSpoutNullBoltTopo with 1 spout & bolt paralllelism (no Ack) and replacing 
this line with hard coded logic to add the single known bolt's taskID. 


> Performance optimizations for getOutgoingTasks 
> ---
>
> Key: STORM-2871
> URL: https://issues.apache.org/jira/browse/STORM-2871
> Project: Apache Storm
>  Issue Type: Improvement
>Reporter: Roshan Naik
>
> Task.getOutgoingTasks() is in critical messaging path. Two observed 
> bottlenecks in it :
> - [Looking up HashMap|] 'streamToGroupers'. Need to look into converting 
> HashMap into Array lookup ?
> - 
> [outTasks.addAll(compTasks)|https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/Task.java#L139]
>   seems to be impacting throughput as well. Identified by .. running 
> ConstSpoutNullBoltTopo with 1 spout & bolt paralllelism (no Ack) and 
> replacing this line with hard coded logic to add the single known bolt's 
> taskID. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (STORM-2871) Performance optimizations for getOutgoingTasks

2017-12-28 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-2871:
---
Description: 
Task.getOutgoingTasks() is in critical messaging path. Two observed bottlenecks 
in it :

- Looking up HashMap 'streamToGroupers'. Need to look into converting HashMap 
into Array lookup ?
- 
[outTasks.addAll(compTasks)|https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/Task.java#L139]
  seems to be impacting throughput as well. Identified by .. running 
ConstSpoutNullBoltTopo with 1 spout & bolt paralllelism (no Ack) and replacing 
this line with hard coded logic to add the single known bolt's taskID. 

  was:
Task.getOutgoingTasks() is in critical messaging path. Two observed bottlenecks 
in it :

- [Looking up HashMap|] 'streamToGroupers'. Need to look into converting 
HashMap into Array lookup ?
- 
[outTasks.addAll(compTasks)|https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/Task.java#L139]
  seems to be impacting throughput as well. Identified by .. running 
ConstSpoutNullBoltTopo with 1 spout & bolt paralllelism (no Ack) and replacing 
this line with hard coded logic to add the single known bolt's taskID. 


> Performance optimizations for getOutgoingTasks 
> ---
>
> Key: STORM-2871
> URL: https://issues.apache.org/jira/browse/STORM-2871
> Project: Apache Storm
>  Issue Type: Improvement
>Reporter: Roshan Naik
>
> Task.getOutgoingTasks() is in critical messaging path. Two observed 
> bottlenecks in it :
> - Looking up HashMap 'streamToGroupers'. Need to look into converting HashMap 
> into Array lookup ?
> - 
> [outTasks.addAll(compTasks)|https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/Task.java#L139]
>   seems to be impacting throughput as well. Identified by .. running 
> ConstSpoutNullBoltTopo with 1 spout & bolt paralllelism (no Ack) and 
> replacing this line with hard coded logic to add the single known bolt's 
> taskID. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (STORM-2871) Performance optimizations for getOutgoingTasks

2017-12-28 Thread Roshan Naik (JIRA)
Roshan Naik created STORM-2871:
--

 Summary: Performance optimizations for getOutgoingTasks 
 Key: STORM-2871
 URL: https://issues.apache.org/jira/browse/STORM-2871
 Project: Apache Storm
  Issue Type: Improvement
Reporter: Roshan Naik


Task.getOutgoingTasks() is in critical messaging path. Two observed bottlenecks 
in it :

-[Looking up HashMap|] 'streamToGroupers'. Need to look into converting HashMap 
into Array lookup ?
- 
[outTasks.addAll(compTasks)|https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/Task.java#L139]
  seems to be impacting throughput as well. Identified by .. running 
ConstSpoutNullBoltTopo with 1 spout & bolt paralllelism (no Ack) and replacing 
this line with hard coded logic to add the single known bolt's taskID. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (STORM-2306) Redeisgn Messaging Subsystem and switch to JCTools Queues

2017-12-06 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16280106#comment-16280106
 ] 

Roshan Naik commented on STORM-2306:


Since the PR webpage has become terribly slow due to the large number of 
comments, moving the non-code-review type conversation to the JIRA. 

The PR is ready to move forward with additional reviews and testing by anyone 
interested in test driving their topologies on this (much appreciated!).

- I believe all the key issues raised so far should be addressed, please take a 
look.  Also includes fixes to issues discovered during testing and perf runs.
- Have added a new design doc detailing the BackPressure model. That is the 
part that has undergone the most change off late.
- Based on my observations from perf runs and prior feedback on the PR, the new 
defaults have been tweaked to make it easy for existing workloads to transition 
to this with minimal or no tweaking but still get good perf.
- My colleague is in the process of running some perf numbers comparing master 
vs the latest 2306. Will share them soon.

cc: [~revans2] 

> Redeisgn Messaging Subsystem and switch to JCTools Queues
> -
>
> Key: STORM-2306
> URL: https://issues.apache.org/jira/browse/STORM-2306
> Project: Apache Storm
>  Issue Type: Sub-task
>  Components: storm-core
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: pull-request-available
>  Time Spent: 33h
>  Remaining Estimate: 0h
>
> Details in these documents:
> 1) *Redesign of the messaging subsystem*
> https://docs.google.com/document/d/1NK1DJ3aAkta-Im0m-2FObQ4cSRp8xSa301y6zoqcBeE/edit?usp=sharing
> This doc discusses the new design for the messaging system. Plus some of the 
> optimizations being made.
> 2) *Choosing a high performance messaging queue:*
> https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing
> This doc looks into how fast hardware can do inter-thread messaging and why 
> we chose the JCTools queues.
> 3) *Backpressure Model*
> https://docs.google.com/document/d/1Z9pRdI5wtnK-hVwE3Spe6VGCTsz9g8TkgxbTFcbL3jM/edit?usp=sharing
> Describes the Backpressure model integrated into the new messaging subsystem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (STORM-2306) Redeisgn Messaging Subsystem and switch to JCTools Queues

2017-12-06 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-2306:
---
Description: 
Details in these documents:

1) *Redesign of the messaging subsystem*
https://docs.google.com/document/d/1NK1DJ3aAkta-Im0m-2FObQ4cSRp8xSa301y6zoqcBeE/edit?usp=sharing
This doc discusses the new design for the messaging system. Plus some of the 
optimizations being made.

2) *Choosing a high performance messaging queue:*
https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing
This doc looks into how fast hardware can do inter-thread messaging and why we 
chose the JCTools queues.

3) *Backpressure Model*
https://docs.google.com/document/d/1Z9pRdI5wtnK-hVwE3Spe6VGCTsz9g8TkgxbTFcbL3jM/edit?usp=sharing
Describes the Backpressure model integrated into the new messaging subsystem.

  was:
Details in these documents:

1) *Redesign of the messaging subsystem*
https://docs.google.com/document/d/1NK1DJ3aAkta-Im0m-2FObQ4cSRp8xSa301y6zoqcBeE/edit?usp=sharing
This doc discusses the new design for the messaging system. Plus some of the 
optimizations being made.

2) *Choosing a high performance messaging queue:*
https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing
This doc looks into how fast hardware can do inter-thread messaging and why we 
chose the JCTools queues.

3) *BackPressure Model*
https://docs.google.com/document/d/1Z9pRdI5wtnK-hVwE3Spe6VGCTsz9g8TkgxbTFcbL3jM/edit?usp=sharing


> Redeisgn Messaging Subsystem and switch to JCTools Queues
> -
>
> Key: STORM-2306
> URL: https://issues.apache.org/jira/browse/STORM-2306
> Project: Apache Storm
>  Issue Type: Sub-task
>  Components: storm-core
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: pull-request-available
>  Time Spent: 33h
>  Remaining Estimate: 0h
>
> Details in these documents:
> 1) *Redesign of the messaging subsystem*
> https://docs.google.com/document/d/1NK1DJ3aAkta-Im0m-2FObQ4cSRp8xSa301y6zoqcBeE/edit?usp=sharing
> This doc discusses the new design for the messaging system. Plus some of the 
> optimizations being made.
> 2) *Choosing a high performance messaging queue:*
> https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing
> This doc looks into how fast hardware can do inter-thread messaging and why 
> we chose the JCTools queues.
> 3) *Backpressure Model*
> https://docs.google.com/document/d/1Z9pRdI5wtnK-hVwE3Spe6VGCTsz9g8TkgxbTFcbL3jM/edit?usp=sharing
> Describes the Backpressure model integrated into the new messaging subsystem.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (STORM-2306) Redeisgn Messaging Subsystem and switch to JCTools Queues

2017-12-06 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-2306:
---
Description: 
Details in these documents:

1) *Redesign of the messaging subsystem*
https://docs.google.com/document/d/1NK1DJ3aAkta-Im0m-2FObQ4cSRp8xSa301y6zoqcBeE/edit?usp=sharing
This doc discusses the new design for the messaging system. Plus some of the 
optimizations being made.

2) *Choosing a high performance messaging queue:*
https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing
This doc looks into how fast hardware can do inter-thread messaging and why we 
chose the JCTools queues.

3) *BackPressure Model*
https://docs.google.com/document/d/1Z9pRdI5wtnK-hVwE3Spe6VGCTsz9g8TkgxbTFcbL3jM/edit?usp=sharing

  was:
Details in these documents:

1) *Redesign of the messaging subsystem*
https://docs.google.com/document/d/1NK1DJ3aAkta-Im0m-2FObQ4cSRp8xSa301y6zoqcBeE/edit?usp=sharing
This doc discusses the new design for the messaging system. Plus some of the 
optimizations being made.

2) *Choosing a high performance messaging queue:*
https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing
This doc looks into how fast hardware can do inter-thread messaging and why we 
chose the JCTools queues.



> Redeisgn Messaging Subsystem and switch to JCTools Queues
> -
>
> Key: STORM-2306
> URL: https://issues.apache.org/jira/browse/STORM-2306
> Project: Apache Storm
>  Issue Type: Sub-task
>  Components: storm-core
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>  Labels: pull-request-available
>  Time Spent: 33h
>  Remaining Estimate: 0h
>
> Details in these documents:
> 1) *Redesign of the messaging subsystem*
> https://docs.google.com/document/d/1NK1DJ3aAkta-Im0m-2FObQ4cSRp8xSa301y6zoqcBeE/edit?usp=sharing
> This doc discusses the new design for the messaging system. Plus some of the 
> optimizations being made.
> 2) *Choosing a high performance messaging queue:*
> https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing
> This doc looks into how fast hardware can do inter-thread messaging and why 
> we chose the JCTools queues.
> 3) *BackPressure Model*
> https://docs.google.com/document/d/1Z9pRdI5wtnK-hVwE3Spe6VGCTsz9g8TkgxbTFcbL3jM/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (STORM-2796) Flux: Provide means for invoking static factory methods

2017-11-06 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238504#comment-16238504
 ] 

Roshan Naik edited comment on STORM-2796 at 11/7/17 12:09 AM:
--

Yaml looks fine. Would be nice to make sure that  its works with this type of 
method signatures as well:

{code}
public static MyComponent newInstance(Integer x, SomeObject... variadic)
{code}

Basically be able to pass a mix of objects and arrays/variadics as args. I 
think the ability to pass array args is done via an undocumented feature called 
`reflist`. Just want to make sure it is supported for factory methods as well.

Which reminds me that it would be good to update the docs on 'reflist'  while 
updating the docs about factory methods.



was (Author: roshan_naik):
Yaml looks fine. Would be nice to make sure that  its works with this type of 
method signatures as well:

{code}
public static MyComponent newInstance(Integer x, SomeObject... variadic)
{code}

Basically be able to pass a mix of objects and arrays/variadics as args. I 
think the ability to pass array args is done via an undocumented feature called 
`reflist`. Just want to make sure it it supported for factory methods as well.

Which reminds me that it would be good to update the docs on 'reflist'  while 
updating the docs about factor methods.


> Flux: Provide means for invoking static factory methods
> ---
>
> Key: STORM-2796
> URL: https://issues.apache.org/jira/browse/STORM-2796
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: Flux
>Affects Versions: 2.0.0, 1.1.1, 1.2.0, 1.0.6
>Reporter: P. Taylor Goetz
>Assignee: P. Taylor Goetz
>
> Provide a means to invoke static factory methods for flux components. E.g:
> Java signature:
> {code}
> public static MyComponent newInstance(String... params)
> {code}
> Yaml:
> {code}
> className: "org.apache.storm.flux.test.MyComponent"
> factory: "newInstance"
> factoryArgs: ["a", "b", "c"]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (STORM-2796) Flux: Provide means for invoking static factory methods

2017-11-03 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238504#comment-16238504
 ] 

Roshan Naik commented on STORM-2796:


Yaml looks fine. Would be nice to make sure that  its works with this type of 
method signatures as well:

{code}
public static MyComponent newInstance(Integer x, SomeObject... variadic)
{code}

Basically be able to pass a mix of objects and arrays/variadics as args. I 
think the ability to pass array args is done via an undocumented feature called 
`reflist`. Just want to make sure it it supported for factory methods as well.

Which reminds me that it would be good to update the docs on 'reflist'  while 
updating the docs about factor methods.


> Flux: Provide means for invoking static factory methods
> ---
>
> Key: STORM-2796
> URL: https://issues.apache.org/jira/browse/STORM-2796
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: Flux
>Affects Versions: 2.0.0, 1.1.1, 1.2.0, 1.0.6
>Reporter: P. Taylor Goetz
>Assignee: P. Taylor Goetz
>Priority: Normal
>
> Provide a means to invoke static factory methods for flux components. E.g:
> Java signature:
> {code}
> public static MyComponent newInstance(String... params)
> {code}
> Yaml:
> {code}
> className: "org.apache.storm.flux.test.MyComponent"
> factory: "newInstance"
> factoryArgs: ["a", "b", "c"]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (STORM-2306) Redeisgn Messaging Subsystem and switch to JCTools Queues

2017-07-24 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-2306:
---
Description: 
Details in these documents:

1) *Redesign of the messaging subsystem*
https://docs.google.com/document/d/1NK1DJ3aAkta-Im0m-2FObQ4cSRp8xSa301y6zoqcBeE/edit?usp=sharing
This doc discusses the new design for the messaging system. Plus some of the 
optimizations being made.

2) *Choosing a high performance messaging queue:*
https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing
This doc looks into how fast hardware can do inter-thread messaging and why we 
chose the JCTools queues.


  was:
Details in these documents:

1) *Redesign of the messaging subsystem*
https://docs.google.com/document/d/1NK1DJ3aAkta-Im0m-2FObQ4cSRp8xSa301y6zoqcBeE/edit#heading=h.59lnwp28s0q6
This doc discusses the new design for the messaging system. Plus some of the 
optimizations being made.

2) *Choosing a high performance messaging queue:*
https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing
This doc looks into how fast hardware can do inter-thread messaging and why we 
chose the JCTools queues.



> Redeisgn Messaging Subsystem and switch to JCTools Queues
> -
>
> Key: STORM-2306
> URL: https://issues.apache.org/jira/browse/STORM-2306
> Project: Apache Storm
>  Issue Type: Sub-task
>  Components: storm-core
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>
> Details in these documents:
> 1) *Redesign of the messaging subsystem*
> https://docs.google.com/document/d/1NK1DJ3aAkta-Im0m-2FObQ4cSRp8xSa301y6zoqcBeE/edit?usp=sharing
> This doc discusses the new design for the messaging system. Plus some of the 
> optimizations being made.
> 2) *Choosing a high performance messaging queue:*
> https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing
> This doc looks into how fast hardware can do inter-thread messaging and why 
> we chose the JCTools queues.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (STORM-2306) Redeisgn Messaging Subsystem and switch to JCTools Queues

2017-07-24 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-2306:
---
Description: 
Details in these documents:

1) *Redesign of the messaging subsystem*
https://docs.google.com/document/d/1NK1DJ3aAkta-Im0m-2FObQ4cSRp8xSa301y6zoqcBeE/edit#heading=h.59lnwp28s0q6
This doc discusses the new design for the messaging system. Plus some of the 
optimizations being made.

2) *Choosing a high performance messaging queue:*
https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing
This doc looks into how fast hardware can do inter-thread messaging and why we 
chose the JCTools queues.


  was:
Details in these documents:

1) *Redesign of the messaging subsystem*
https://docs.google.com/document/d/1NK1DJ3aAkta-Im0m-2FObQ4cSRp8xSa301y6zoqcBeE/edit#heading=h.59lnwp28s0q6
This doc discusses the new design for the messaging system. Plus some of the 
optimizations being made.

2) *Choosing a high performance messaging queue:*
https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing
This doc looks into how fast hardware can we can inter-thread messaging and why 
we chose the JCTools queues.



> Redeisgn Messaging Subsystem and switch to JCTools Queues
> -
>
> Key: STORM-2306
> URL: https://issues.apache.org/jira/browse/STORM-2306
> Project: Apache Storm
>  Issue Type: Sub-task
>  Components: storm-core
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>
> Details in these documents:
> 1) *Redesign of the messaging subsystem*
> https://docs.google.com/document/d/1NK1DJ3aAkta-Im0m-2FObQ4cSRp8xSa301y6zoqcBeE/edit#heading=h.59lnwp28s0q6
> This doc discusses the new design for the messaging system. Plus some of the 
> optimizations being made.
> 2) *Choosing a high performance messaging queue:*
> https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing
> This doc looks into how fast hardware can do inter-thread messaging and why 
> we chose the JCTools queues.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (STORM-2647) Reduce Number of Threads running in the Worker

2017-07-19 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-2647:
---
Description: 
Below is an account of all the threads in a single worker running a topology 
with 1 spout instance, 1 bolt instance, and 1 acker bolt instance. Topology 
debugging feature was disabled as it brings in additional threads.

Total 34 threads:  
10 timer threads
2 Curator threads
2 ZooKeeper threads
4 Netty threads
7 Disruptor thread
1 Spout executor thread
2 worker thread
1 Back Pressure thread
4 thread - unclear what these are
1 Finalizer thread



- Would be good to collapse the Timer threads into 
SingleThreadScheduledExecutor.
- We have 4 threads to communicate with ZK (2curator + 2ZK). If necessary can 
we eliminate curator to accomplish that ?
- Similarly see if we can reduce the number of Netty threads.
- How many threads does topology debugging need, can we reduce that ?
- Same with event logging (topology.eventlogger.executors).
- The Back pressure and Disruptor threads will be reduced by STORM-2306 

  was:
Below is an account of all the threads in a single worker running a topology 
with 1 spout instance, 1 bolt instance, and 1 acker bolt instance. Topology 
debugging feature was disabled as it brings in additional threads.

Total 34 threads:  
10 timer threads
2 Curator threads
2 ZooKeeper threads
4 Netty threads
7 Disruptor thread
1 Spout executor thread
2 worker thread
1 Back Pressure thread
4 thread - unclear what these are
1 Finalizer thread



- Would be good to collapse the Timer threads into 
SingleThreadScheduledExecutor.
- We have 4 threads to communicate with ZK (2curator + 2ZK). If necessary can 
we eliminate curator to accomplish that ?
- Similarly see if we can reduce the number of Netty threads.
- How many threads does topology debugging need, can we reduce that ?
- The Back pressure and Disruptor threads will be reduced by STORM-2306 


> Reduce Number of Threads running in the Worker
> --
>
> Key: STORM-2647
> URL: https://issues.apache.org/jira/browse/STORM-2647
> Project: Apache Storm
>  Issue Type: Sub-task
>  Components: storm-core
>Reporter: Roshan Naik
>Assignee: Priyank Shah
>
> Below is an account of all the threads in a single worker running a topology 
> with 1 spout instance, 1 bolt instance, and 1 acker bolt instance. Topology 
> debugging feature was disabled as it brings in additional threads.
> Total 34 threads:  
> 10 timer threads
> 2 Curator threads
> 2 ZooKeeper threads
> 4 Netty threads
> 7 Disruptor thread
> 1 Spout executor thread
> 2 worker thread
> 1 Back Pressure thread
> 4 thread - unclear what these are
> 1 Finalizer thread
> - Would be good to collapse the Timer threads into 
> SingleThreadScheduledExecutor.
> - We have 4 threads to communicate with ZK (2curator + 2ZK). If necessary can 
> we eliminate curator to accomplish that ?
> - Similarly see if we can reduce the number of Netty threads.
> - How many threads does topology debugging need, can we reduce that ?
> - Same with event logging (topology.eventlogger.executors).
> - The Back pressure and Disruptor threads will be reduced by STORM-2306 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (STORM-2647) Reduce Number of Threads running in the Worker

2017-07-19 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16093917#comment-16093917
 ] 

Roshan Naik commented on STORM-2647:


all yours ! thanks for volunteering.

> Reduce Number of Threads running in the Worker
> --
>
> Key: STORM-2647
> URL: https://issues.apache.org/jira/browse/STORM-2647
> Project: Apache Storm
>  Issue Type: Sub-task
>  Components: storm-core
>Reporter: Roshan Naik
>Assignee: Priyank Shah
>
> Below is an account of all the threads in a single worker running a topology 
> with 1 spout instance, 1 bolt instance, and 1 acker bolt instance. Topology 
> debugging feature was disabled as it brings in additional threads.
> Total 34 threads:  
> 10 timer threads
> 2 Curator threads
> 2 ZooKeeper threads
> 4 Netty threads
> 7 Disruptor thread
> 1 Spout executor thread
> 2 worker thread
> 1 Back Pressure thread
> 4 thread - unclear what these are
> 1 Finalizer thread
> - Would be good to collapse the Timer threads into 
> SingleThreadScheduledExecutor.
> - We have 4 threads to communicate with ZK (2curator + 2ZK). If necessary can 
> we eliminate curator to accomplish that ?
> - Similarly see if we can reduce the number of Netty threads.
> - How many threads does topology debugging need, can we reduce that ?
> - The Back pressure and Disruptor threads will be reduced by STORM-2306 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (STORM-2647) Reduce Number of Threads running in the Worker

2017-07-19 Thread Roshan Naik (JIRA)
Roshan Naik created STORM-2647:
--

 Summary: Reduce Number of Threads running in the Worker
 Key: STORM-2647
 URL: https://issues.apache.org/jira/browse/STORM-2647
 Project: Apache Storm
  Issue Type: Sub-task
Reporter: Roshan Naik


Below is an account of all the threads in a single worker running a topology 
with 1 spout instance, 1 bolt instance, and 1 acker bolt instance. Topology 
debugging feature was disabled as it brings in additional threads.

Total 34 threads:  
10 timer threads
2 Curator threads
2 ZooKeeper threads
4 Netty threads
7 Disruptor thread
1 Spout executor thread
2 worker thread
1 Back Pressure thread
4 thread - unclear what these are
1 Finalizer thread



- Would be good to collapse the Timer threads into 
SingleThreadScheduledExecutor.
- We have 4 threads to communicate with ZK (2curator + 2ZK). If necessary can 
we eliminate curator to accomplish that ?
- Similarly see if we can reduce the number of Netty threads.
- How many threads does topology debugging need, can we reduce that ?
- The Back pressure and Disruptor threads will be reduced by STORM-2306 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (STORM-2359) Revising Message Timeouts

2017-06-09 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045053#comment-16045053
 ] 

Roshan Naik commented on STORM-2359:


{quote} The current implementation has a pending map in both the acker and the 
spout, which rotate every topology.message.timeout.secs.{quote}
Need to see if we can eliminate the timeout logic from the spout and have it 
only the ACKer (i can think of some issues). If we must retain that logic in 
the spouts, the timeout value that it operates on (full tuple tree processing) 
would have to be separated from the timeout value that the ACKER uses to track 
progress between stages. 

{quote}The spout then reemitted the expired tuples, and they got into the queue 
behind their own expired instances. {quote}

Perfect example indeed. The motivation of this jira is to try to 
eliminate/mitigate triggering of timeouts for queued/inflight tuples that are 
not lost. The only time we need timeouts/remits to be triggered is when 
one/more tuples in the tuple tree are truly lost. *I think* that can only 
happen if a worker/bolt/spout died. So the case your describing should not 
happen if we solve this problem correctly.
 
IMO, the ideal solution would have the spouts remit only the specific tuples 
whose tuple trees had some loss due to a worker going down. I am not yet 
certain whether/not this initial idea described in the doc is the optimal 
solution. Perhaps a better way is to trigger such re-emits only if a 
worker/bolt/spout went down.

> Revising Message Timeouts
> -
>
> Key: STORM-2359
> URL: https://issues.apache.org/jira/browse/STORM-2359
> Project: Apache Storm
>  Issue Type: Sub-task
>  Components: storm-core
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>
> A revised strategy for message timeouts is proposed here.
> Design Doc:
>  
> https://docs.google.com/document/d/1am1kO7Wmf17U_Vz5_uyBB2OuSsc4TZQWRvbRhX52n5w/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (STORM-2359) Revising Message Timeouts

2017-06-07 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041851#comment-16041851
 ] 

Roshan Naik commented on STORM-2359:


{quote} In order to reduce load on the spout, we should try to "bundle up" 
these resets in the acker bolts before sending them to the spout {quote}
The resets are managed internally by the ACKer bolt. The spout only gets 
notified if the timeout expires or if tuple-tree is fully processed. 

{quote}When tuples are stuck in queues behind other tuples{quote}
That case would be more accurately classified as "progress is being made" ... 
but slower than expected.
The case of 'progress is not being made' is when a worker that is processing 
part of the tuple tree dies.

{quote}If a bolt is taking longer to process a tuple than expected, it can be 
solved in the concrete bolt implementation by using 
OutputCollector.resetTimeout at an appropriate interval (e.g. the tuple timeout 
minus a few seconds).{quote}
I think you are trying to mitigate the number of resets being sent to Spout... 
but like i mentioned before reset are never sent to spout.



> Revising Message Timeouts
> -
>
> Key: STORM-2359
> URL: https://issues.apache.org/jira/browse/STORM-2359
> Project: Apache Storm
>  Issue Type: Sub-task
>  Components: storm-core
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>
> A revised strategy for message timeouts is proposed here.
> Design Doc:
>  
> https://docs.google.com/document/d/1am1kO7Wmf17U_Vz5_uyBB2OuSsc4TZQWRvbRhX52n5w/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (STORM-1949) Backpressure can cause spout to stop emitting and stall topology

2017-05-02 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15993879#comment-15993879
 ] 

Roshan Naik edited comment on STORM-1949 at 5/2/17 10:05 PM:
-

Sorry I have not been able to put time on this. Looks like some users on the 
mailing list are also running into this issue in production.  I am unable to 
give this time for at least a couple weeks, but I dont want to be a blocker to 
getting this fix into 1.x. I have shared the topology here in the jira, would 
be great if we got some help on this ? [~abellina] or [~zhuoliu] would you be 
able to give it a shot ?


was (Author: roshan_naik):
Sorry I have not been able to put time on this. Looks like some users on the 
mailing list are also running into this issue in production.  I am unable to 
give this sometime for at least a couple weeks, but I dont want to be a blocker 
to getting this fix into 1.x. I have shared the topology here in the jira, 
would be great if we got some help on this ? [~abellina] or [~zhuoliu] would 
you be able to give it a shot ?

> Backpressure can cause spout to stop emitting and stall topology
> 
>
> Key: STORM-1949
> URL: https://issues.apache.org/jira/browse/STORM-1949
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Roshan Naik
>Assignee: Alessandro Bellina
> Attachments: 1.x-branch-works-perfect.png, wordcounttopo.zip
>
>
> Problem can be reproduced by this [Word count 
> topology|https://github.com/hortonworks/storm/blob/perftopos1.x/examples/storm-starter/src/jvm/org/apache/storm/starter/perf/FileReadWordCountTopo.java]
> within a IDE.
> I ran it with 1 spout instance, 2 splitter bolt instances, 2 counter bolt 
> instances.
> The problem is more easily reproduced with WC topology as it causes an 
> explosion of tuples due to splitting a sentence tuple into word tuples. As 
> the bolts have to process more tuples than the  spout is producing, spout 
> needs to operate slower.
> The amount of time it takes for the topology to stall can vary.. but 
> typically under 10 mins. 
> *My theory:*  I suspect there is a race condition in the way ZK is being 
> utilized to enable/disable back pressure. When congested (i.e pressure 
> exceeds high water mark), the bolt's worker records this congested situation 
> in ZK by creating a node. Once the congestion is reduced below the low water 
> mark, it deletes this node. 
> The spout's worker has setup a watch on the parent node, expecting a callback 
> whenever there is change in the child nodes. On receiving the callback the 
> spout's worker lists the parent node to check if there are 0 or more child 
> nodes it is essentially trying to figure out the nature of state change 
> in ZK to determine whether to throttle or not. Subsequently  it setsup 
> another watch in ZK to keep an eye on future changes.
> When there are multiple bolts, there can be rapid creation/deletion of these 
> ZK nodes. Between the time the worker receives a callback and sets up the 
> next watch.. many changes may have undergone in ZK which will go unnoticed by 
> the spout. 
> The condition that the bolts are no longer congested may not get noticed as a 
> result of this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (STORM-1949) Backpressure can cause spout to stop emitting and stall topology

2017-05-02 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15993879#comment-15993879
 ] 

Roshan Naik commented on STORM-1949:


Sorry I have not been able to put time on this. Looks like some users on the 
mailing list are also running into this issue in production.  I am unable to 
give this sometime for at least a couple weeks, but I dont want to be a blocker 
to getting this fix into 1.x. I have shared the topology here in the jira, 
would be great if we got some help on this ? [~abellina] or [~zhuoliu] would 
you be able to give it a shot ?

> Backpressure can cause spout to stop emitting and stall topology
> 
>
> Key: STORM-1949
> URL: https://issues.apache.org/jira/browse/STORM-1949
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Roshan Naik
>Assignee: Alessandro Bellina
> Attachments: 1.x-branch-works-perfect.png, wordcounttopo.zip
>
>
> Problem can be reproduced by this [Word count 
> topology|https://github.com/hortonworks/storm/blob/perftopos1.x/examples/storm-starter/src/jvm/org/apache/storm/starter/perf/FileReadWordCountTopo.java]
> within a IDE.
> I ran it with 1 spout instance, 2 splitter bolt instances, 2 counter bolt 
> instances.
> The problem is more easily reproduced with WC topology as it causes an 
> explosion of tuples due to splitting a sentence tuple into word tuples. As 
> the bolts have to process more tuples than the  spout is producing, spout 
> needs to operate slower.
> The amount of time it takes for the topology to stall can vary.. but 
> typically under 10 mins. 
> *My theory:*  I suspect there is a race condition in the way ZK is being 
> utilized to enable/disable back pressure. When congested (i.e pressure 
> exceeds high water mark), the bolt's worker records this congested situation 
> in ZK by creating a node. Once the congestion is reduced below the low water 
> mark, it deletes this node. 
> The spout's worker has setup a watch on the parent node, expecting a callback 
> whenever there is change in the child nodes. On receiving the callback the 
> spout's worker lists the parent node to check if there are 0 or more child 
> nodes it is essentially trying to figure out the nature of state change 
> in ZK to determine whether to throttle or not. Subsequently  it setsup 
> another watch in ZK to keep an eye on future changes.
> When there are multiple bolts, there can be rapid creation/deletion of these 
> ZK nodes. Between the time the worker receives a callback and sets up the 
> next watch.. many changes may have undergone in ZK which will go unnoticed by 
> the spout. 
> The condition that the bolts are no longer congested may not get noticed as a 
> result of this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (STORM-2306) Redeisgn Messaging Subsystem and switch to JCTools Queues

2017-04-20 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-2306:
---
Summary: Redeisgn Messaging Subsystem and switch to JCTools Queues  (was: 
Switch to JCTools for internal queueing)

> Redeisgn Messaging Subsystem and switch to JCTools Queues
> -
>
> Key: STORM-2306
> URL: https://issues.apache.org/jira/browse/STORM-2306
> Project: Apache Storm
>  Issue Type: Sub-task
>  Components: storm-core
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>
> Details in this document:
> https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (STORM-2284) Storm Worker Redesign

2017-03-28 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15946420#comment-15946420
 ] 

Roshan Naik commented on STORM-2284:


Hmm... i think this will need some more thought into how that changes would be 
needed to the scheduler / elsewhere.

I am thinking .. each supervisor would need to know how many cores it is 
managing.. and each new worker gets assigned unused cores. At submit time, need 
to figure out where each spout and bolt instance goes... based on the groupings 
used.

Ya I/O threads over allocation is reasonable... we have two types of IO : 
outbound & inbound. Hard to say without experimenting and would depend on 
throughput being sustainedbut perhaps the outbound threads could be pinned 
to same core.. and inbound threads to another core.

If we decide to do memory mapped Queues for intra host messaging, then we have  
inter-host I/O and intra-host I/O... each with in/bound variations.

> Storm Worker Redesign
> -
>
> Key: STORM-2284
> URL: https://issues.apache.org/jira/browse/STORM-2284
> Project: Apache Storm
>  Issue Type: Umbrella
>  Components: storm-core
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>
> Much has been learnt from evolving the 1.x line. We can now use the benefit 
> of hindsight and apply these learnings into the future work on 2.x line. 
> The goal is to rethink the Worker to improve performance, enhance its 
> abilities and also retain compatibility.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (STORM-2416) Release Packaging Improvements

2017-03-21 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935673#comment-15935673
 ] 

Roshan Naik commented on STORM-2416:


Didnt notice this move of the perf module from top level to examples dir
Is there a reason for this ? 
I felt the perf belongs outside of the examples since it isn't really intended 
to be examples for users.
Perf module is intended to evolve for purposes of perf measurements and 
benchmarking... for now its mostly for Storm devs, but eventually it could be 
run by end users as well.

> Release Packaging Improvements
> --
>
> Key: STORM-2416
> URL: https://issues.apache.org/jira/browse/STORM-2416
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: build
>Reporter: P. Taylor Goetz
>Assignee: P. Taylor Goetz
> Fix For: 1.1.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> This issue is to address distribution packaging improvements discussed on the 
> dev@ list:
> 1. Move remaining examples to "examples" directory.
> 2. Package examples as source-only, to be compiled by users
> 3. Remove connector jars from binary distribution (since they are available 
> in Maven, and we want to discourage users from hand-crafting topology jars)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (STORM-2423) Join Bolt : Use explicit instead of default window anchoring of emitted tuples

2017-03-17 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-2423:
---
Priority: Critical  (was: Major)

> Join Bolt : Use explicit instead of default window anchoring of emitted tuples
> --
>
> Key: STORM-2423
> URL: https://issues.apache.org/jira/browse/STORM-2423
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Default anchoring will anchor each emitted tuple to every tuple in current 
> window. This requires a very large numbers of ACKs from any downstream bolt.  
> If topology.debug is enabled, it also worsens the load on the system 
> significantly. 
> Letting the topo run in this mode (in particular with max.spout.pending 
> disabled), could lead to the worker running out of memory and crashing.
> Fix: Join Bolt should avoid using default window anchoring, and explicitly 
> anchor each emitted tuple with the exact matching tuples form each inputs 
> streams. This reduces the complexity of the tuple trees and consequently the 
> reduces burden on the ACKing & messaging subsystems. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (STORM-2355) Storm-HDFS: inotify support

2017-03-17 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929183#comment-15929183
 ] 

Roshan Naik edited comment on STORM-2355 at 3/17/17 9:23 PM:
-

Fist of all... thanks for your work on this.

I took a look at the HDFS's inotify side of things and also spoke to a Hdfs 
committer.  These over-arching concerns came up (partly alluded to earlier by 
you):

1. Currently INotify is restricted to HDFS admins because it doesn't scale (wrt 
namenode). So in its current state it seems unsuitable for Hdfs Spout kind of 
use case... even if we (unrealistically) asked users to run Storm worker as a 
HDFS admin user.
2. The proposal for scaling Inotify and opening it up to end users appears to 
have stalled for some time now. 

Although we are seeing some improvements (that you noted) in resource 
utilization from the Storm side, it seems not advisable from the HDFS namenode 
perspective. I think this feature in HDFS Spout would be useful once a 
scaleable inotify solution is made publicly available by HDFS.

The other option is to get this into Storm now and not use it till HDFS 
implements their scaleable inotify. My concern with that is we cant bet with 
certainty that final inotify will still work as we now expect it to (although 
the intent is there) ... it may even change in a incompatible way. 

Either way it appears like a feature that cant be used until the scaleable 
inotify happens (if it happens).


was (Author: roshan_naik):
Fist of all... thanks for your work on this.

I took a look at the HDFS's inotify side of things and also spoke to a Hdfs 
committer.  These over-arching concerns came up (partly alluded to earlier by 
you):

1. Currently INotify is restricted to HDFS admins because it doesn't scale (wrt 
namenode). So in its current state it seems unsuitable for Hdfs Spout kind of 
use case... even if we (unrealistically) asked users to run Storm worker as a 
HDFS admin user.
2. The proposal for scaling Inotify and opening it up to end users appears to 
have stalled for some time now. 

Although we are seeing some improvements (that you noted) in resource 
utilization from the Storm side, it seems not advisable from the HDFS namenode 
perspective. I think this feature in HDFS would be useful once a scaleable 
inotify solution is made publicly available by HDFS.

The other option is to get this into Storm now and not use it till HDFS 
implements their scaleable inotify. My concern with that is we cant bet with 
certainty that final inotify will still work as we now expect it to (although 
the intent is there) ... it may even change in a incompatible way. 

Either way it appears like a feature that cant be used until the scaleable 
inotify happens (if it happens).

> Storm-HDFS: inotify support
> ---
>
> Key: STORM-2355
> URL: https://issues.apache.org/jira/browse/STORM-2355
> Project: Apache Storm
>  Issue Type: New Feature
>  Components: storm-hdfs
>Reporter: Tibor Kiss
>Assignee: Tibor Kiss
> Fix For: 2.0.0, 1.1.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> This is a proposal to implement inotify based watch dir monitoring in 
> Storm-HDFS Spout.
> *Motivation*
> Storm-HDFS's HdfsSpout currently polls the Spout’s input directory using 
> Hadoop's {{FileSystem.listFiles()}}. This operation is expensive since it 
> returns the block locations and all stat information of the files inside the 
> watch directory. Moreover HdfsSpout currently uses only one element of the 
> returned Path list which is inefficient as the rest of the entries are thrown 
> away without processing.
> The proposed design provides greater efficiency through the inotify interface 
> and also enables to easier extension of the original ({{listFiles()}} based) 
> monitoring with buffering (see Further work section below). 
> *High level design*
> Goal is to leverage [HDFS inotify 
> API|http://hadoop.apache.org/docs/current/api//org/apache/hadoop/hdfs/DFSInotifyEventInputStream.html]
>  to monitor new file arrival to HdfsSpout's input directory.
> The inotify based monitoring is an addition to the original 
> {{FileSystem.listFiles()}} based implementation, the default behavior of the 
> spout will be unchanged by this modification.
> To unify the two monitoring methods and enable buffering an iterator based 
> ({{HdfsDirectoryMonitor}}) class is created.
> To retain backward compatibility the HdfsSpout's default monitoring behavior 
> is unchanged, inotify based monitoring could be enabled through a parameter.
> As inotify requires administrative privileges (see Caveat section below) a 
> fallback mechanism is be implemented in HdfsSpout to use the original 
> {{listFiles()}} based monitoring if initialization fails for inotify based 
> monitoring.
> *Implementation 

[jira] [Assigned] (STORM-2307) Revised Threading and Execution Model

2017-03-16 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik reassigned STORM-2307:
--

Assignee: Roshan Naik

> Revised Threading and Execution Model
> -
>
> Key: STORM-2307
> URL: https://issues.apache.org/jira/browse/STORM-2307
> Project: Apache Storm
>  Issue Type: Sub-task
>  Components: storm-core
>Affects Versions: 2.0.0
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>
> Design Doc:
> https://docs.google.com/document/d/1PBGQomJQ67gsLR0CNZlYfVWjGyzAEJMsQjpKumuyHuQ/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (STORM-2355) Storm-HDFS: inotify support

2017-03-16 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929183#comment-15929183
 ] 

Roshan Naik commented on STORM-2355:


Fist of all... thanks for your work on this.

I took a look at the HDFS's inotify side of things and also spoke to a Hdfs 
committer.  These over-arching concerns came up (partly alluded to earlier by 
you):

1. Currently INotify is restricted to HDFS admins because it doesn't scale (wrt 
namenode). So in its current state it seems unsuitable for Hdfs Spout kind of 
use case... even if we (unrealistically) asked users to run Storm worker as a 
HDFS admin user.
2. The proposal for scaling Inotify and opening it up to end users appears to 
have stalled for some time now. 

Although we are seeing some improvements (that you noted) in resource 
utilization from the Storm side, it seems not advisable from the HDFS namenode 
perspective. I think this feature in HDFS would be useful once a scaleable 
inotify solution is made publicly available by HDFS.

The other option is to get this into Storm now and not use it till HDFS 
implements their scaleable inotify. My concern with that is we cant bet with 
certainty that final inotify will still work as we now expect it to (although 
the intent is there) ... it may even change in a incompatible way. 

Either way it appears like a feature that cant be used until the scaleable 
inotify happens (if it happens).

> Storm-HDFS: inotify support
> ---
>
> Key: STORM-2355
> URL: https://issues.apache.org/jira/browse/STORM-2355
> Project: Apache Storm
>  Issue Type: New Feature
>  Components: storm-hdfs
>Reporter: Tibor Kiss
>Assignee: Tibor Kiss
> Fix For: 2.0.0, 1.1.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> This is a proposal to implement inotify based watch dir monitoring in 
> Storm-HDFS Spout.
> *Motivation*
> Storm-HDFS's HdfsSpout currently polls the Spout’s input directory using 
> Hadoop's {{FileSystem.listFiles()}}. This operation is expensive since it 
> returns the block locations and all stat information of the files inside the 
> watch directory. Moreover HdfsSpout currently uses only one element of the 
> returned Path list which is inefficient as the rest of the entries are thrown 
> away without processing.
> The proposed design provides greater efficiency through the inotify interface 
> and also enables to easier extension of the original ({{listFiles()}} based) 
> monitoring with buffering (see Further work section below). 
> *High level design*
> Goal is to leverage [HDFS inotify 
> API|http://hadoop.apache.org/docs/current/api//org/apache/hadoop/hdfs/DFSInotifyEventInputStream.html]
>  to monitor new file arrival to HdfsSpout's input directory.
> The inotify based monitoring is an addition to the original 
> {{FileSystem.listFiles()}} based implementation, the default behavior of the 
> spout will be unchanged by this modification.
> To unify the two monitoring methods and enable buffering an iterator based 
> ({{HdfsDirectoryMonitor}}) class is created.
> To retain backward compatibility the HdfsSpout's default monitoring behavior 
> is unchanged, inotify based monitoring could be enabled through a parameter.
> As inotify requires administrative privileges (see Caveat section below) a 
> fallback mechanism is be implemented in HdfsSpout to use the original 
> {{listFiles()}} based monitoring if initialization fails for inotify based 
> monitoring.
> *Implementation details*
> As inotify provides only a delta of the filesystem events from a given Tx Id 
> (of Hdfs Edit Log) it is required to do a {{FileSystem.listFiles()}} based 
> collection during the Spout's initialization to ensure that any left over 
> files are processed.
> The inotify based implementation uses HdfsAdmin's 
> [{{DFSInotifyEventInputStream.poll()}}|http://hadoop.apache.org/docs/current/api//org/apache/hadoop/hdfs/DFSInotifyEventInputStream.html#poll--]
>  method to fetch and buffer the list of new files created since the provided 
> Tx Id to {{newFileList}} buffer.
> During {{HdfsSpout.nextTuple()}} call one element is taken from the 
> {{newFileList}} buffer and processed by the spout.
> The {{newFileList}} buffer is extended with the result of the 
> {{DFSInotifyEventInputStream.poll(lastTxId)}} call in every nextTuple() call.
> Since HdfsSpout is able to create it's own {{HdfsAdmin()}} instance there 
> will be no need for the user to do additional initialization for the spout 
> even if inotify is enabled.
> *Caveat*
> HDFS inotify is currently available through hdfs administrator user only, but 
> there is ongoing discussion in Hadoop community to extend its support to 
> users. See: HDFS-8940 
> *Further work*
> 1) The number of calls to {{DFSInotifyEventInputStream.poll(lastTxId)}} could 
> be further reduced if 

[jira] [Assigned] (STORM-2391) HdfsSpoutTopology example needs to be moved into storm-hdfs-examples from storm-starter

2017-03-01 Thread Roshan Naik (JIRA)

 [ 
https://issues.apache.org/jira/browse/STORM-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik reassigned STORM-2391:
--

Assignee: Roshan Naik

> HdfsSpoutTopology example needs to be moved into storm-hdfs-examples from 
> storm-starter
> ---
>
> Key: STORM-2391
> URL: https://issues.apache.org/jira/browse/STORM-2391
> Project: Apache Storm
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (STORM-2391) HdfsSpoutTopology example needs to be moved into storm-hdfs-examples from storm-starter

2017-03-01 Thread Roshan Naik (JIRA)
Roshan Naik created STORM-2391:
--

 Summary: HdfsSpoutTopology example needs to be moved into 
storm-hdfs-examples from storm-starter
 Key: STORM-2391
 URL: https://issues.apache.org/jira/browse/STORM-2391
 Project: Apache Storm
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Roshan Naik






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (STORM-2390) The storm-*-examples jars are missing in the binary distro

2017-03-01 Thread Roshan Naik (JIRA)
Roshan Naik created STORM-2390:
--

 Summary: The storm-*-examples jars are missing in the binary distro
 Key: STORM-2390
 URL: https://issues.apache.org/jira/browse/STORM-2390
 Project: Apache Storm
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Roshan Naik
Priority: Critical
 Fix For: 1.1.0






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (STORM-2389) Event Logger bolt is instantiated even if topology.eventlogger.executors=0

2017-03-01 Thread Roshan Naik (JIRA)
Roshan Naik created STORM-2389:
--

 Summary: Event Logger bolt is instantiated even if 
topology.eventlogger.executors=0
 Key: STORM-2389
 URL: https://issues.apache.org/jira/browse/STORM-2389
 Project: Apache Storm
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Roshan Naik
Priority: Blocker
 Fix For: 1.1.0






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (STORM-2364) Ensure kafka-monitor jar is included in classpath for UI process

2017-02-17 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872761#comment-15872761
 ] 

Roshan Naik commented on STORM-2364:


[~kabhwan] Here you go...

{code}
2017-02-14 08:51:13.037 o.a.s.u.TopologySpoutLag [WARN] Exception thrown while 
getting lag for spout id: kafka-spout and spout class: 
org.apache.storm.kafka.KafkaSpout
2017-02-14 08:51:13.037 o.a.s.u.TopologySpoutLag [WARN] Exception 
message:Error: Could not find or load main class 
org.apache.storm.kafka.monitor.KafkaOffsetLagUtil
org.apache.storm.utils.ShellUtils$ExitCodeException: Error: Could not find or 
load main class org.apache.storm.kafka.monitor.KafkaOffsetLagUtil
at org.apache.storm.utils.ShellUtils.runCommand(ShellUtils.java:230)
at org.apache.storm.utils.ShellUtils.run(ShellUtils.java:160)
at 
org.apache.storm.utils.ShellUtils$ShellCommandExecutor.execute(ShellUtils.java:370)
at org.apache.storm.utils.ShellUtils.execCommand(ShellUtils.java:460)
at org.apache.storm.utils.ShellUtils.execCommand(ShellUtils.java:443)
at 
org.apache.storm.utils.TopologySpoutLag.getLagResultForKafka(TopologySpoutLag.java:153)
at 
org.apache.storm.utils.TopologySpoutLag.getLagResultForOldKafkaSpout(TopologySpoutLag.java:183)
at org.apache.storm.utils.TopologySpoutLag.lag(TopologySpoutLag.java:59)
at org.apache.storm.ui.core$topology_lag.invoke(core.clj:727)
at org.apache.storm.ui.core$fn__10090.invoke(core.clj:1073)
at 
org.apache.storm.shade.compojure.core$make_route$fn__1792.invoke(core.clj:93)
at 
org.apache.storm.shade.compojure.core$if_route$fn__1780.invoke(core.clj:39)
at 
org.apache.storm.shade.compojure.core$if_method$fn__1773.invoke(core.clj:24)
at 
org.apache.storm.shade.compojure.core$routing$fn__1798.invoke(core.clj:106)
at clojure.core$some.invoke(core.clj:2570)
at org.apache.storm.shade.compojure.core$routing.doInvoke(core.clj:106)
at clojure.lang.RestFn.applyTo(RestFn.java:139)
at clojure.core$apply.invoke(core.clj:632)
at 
org.apache.storm.shade.compojure.core$routes$fn__1802.invoke(core.clj:111)
at 
org.apache.storm.shade.ring.middleware.cors$wrap_cors$fn__9443.invoke(cors.clj:149)
at 
org.apache.storm.shade.ring.middleware.json$wrap_json_params$fn__9390.invoke(json.clj:56)
at 
org.apache.storm.shade.ring.middleware.multipart_params$wrap_multipart_params$fn__2954.invoke(multipart_params.clj:103)
at 
org.apache.storm.shade.ring.middleware.reload$wrap_reload$fn__8948.invoke(reload.clj:22)
at 
org.apache.storm.ui.helpers$requests_middleware$fn__3197.invoke(helpers.clj:46)
at org.apache.storm.ui.core$catch_errors$fn__10265.invoke(core.clj:1336)
at 
org.apache.storm.shade.ring.middleware.keyword_params$wrap_keyword_params$fn__2885.invoke(keyword_params.clj:27)
at 
org.apache.storm.shade.ring.middleware.nested_params$wrap_nested_params$fn__2925.invoke(nested_params.clj:65)
at 
org.apache.storm.shade.ring.middleware.params$wrap_params$fn__2856.invoke(params.clj:55)
at 
org.apache.storm.shade.ring.middleware.multipart_params$wrap_multipart_params$fn__2954.invoke(multipart_params.clj:103)
at 
org.apache.storm.shade.ring.middleware.flash$wrap_flash$fn__3140.invoke(flash.clj:14)
at 
org.apache.storm.shade.ring.middleware.session$wrap_session$fn__3128.invoke(session.clj:43)
at 
org.apache.storm.shade.ring.middleware.cookies$wrap_cookies$fn__3056.invoke(cookies.clj:160)
{code}

> Ensure kafka-monitor jar is included in classpath for UI process
> 
>
> Key: STORM-2364
> URL: https://issues.apache.org/jira/browse/STORM-2364
> Project: Apache Storm
>  Issue Type: Bug
>Affects Versions: 1.x
>Reporter: Roshan Naik
>Assignee: Roshan Naik
>
> The kafka-monitor jar has been moved into  toollib/ and not featuring in UI 
> process's classpath. Leading to ClassNotFoundException for KafkaOffsetLagUtil 
> class ... seen in the ui.log



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (STORM-2364) Ensure kafka-monitor jar is included in classpath for UI process

2017-02-15 Thread Roshan Naik (JIRA)
Roshan Naik created STORM-2364:
--

 Summary: Ensure kafka-monitor jar is included in classpath for UI 
process
 Key: STORM-2364
 URL: https://issues.apache.org/jira/browse/STORM-2364
 Project: Apache Storm
  Issue Type: Bug
Affects Versions: 1.x
Reporter: Roshan Naik
Assignee: Roshan Naik


The kafka-monitor jar has been moved into  toollib/ and not featuring in UI 
process's classpath. Leading to ClassNotFoundException for KafkaOffsetLagUtil 
class ... seen in the ui.log



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (STORM-2358) Update storm hdfs spout to remove specific implementation handlings

2017-02-15 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868672#comment-15868672
 ] 

Roshan Naik commented on STORM-2358:


*Pt 2*
If you have a proposal to support aliasing without that additional entries, 
then thats worth discussing. But if you are lobbying to remove features for 
minor beautification / little extra work, it sounds like wrong attitude.

*Pt 3*
Really ? Perhaps you can answer this question... How many fields in a CSV file ?

I feel you going topsy turvy over non-issues.

> Update storm hdfs spout to remove specific implementation handlings
> ---
>
> Key: STORM-2358
> URL: https://issues.apache.org/jira/browse/STORM-2358
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-hdfs
>Affects Versions: 1.x
>Reporter: Sachin Pasalkar
>  Labels: newbie
> Attachments: AbstractFileReader.java, FileReader.java, 
> HDFSSpout.java, SequenceFileReader.java, TextFileReader.java
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I was looking at storm hdfs spout code in 1.x branch, I found below
> improvements can be made in below code.
>   1.  Make org.apache.storm.hdfs.spout.AbstractFileReader as public so
> that it can be used in generics.
>   2.  org.apache.storm.hdfs.spout.HdfsSpout requires readerType as
> String. It will be great to have class
> readerType; So we will not use Class.forName at multiple places also it
> will help in below point.
>   3.  HdfsSpout also needs to provide outFields which are declared as
> constants in each reader(e.g.SequenceFileReader). We can have abstract
> API AbstractFileReader in which return them to user to make it generic.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (STORM-2358) Update storm hdfs spout to remove specific implementation handlings

2017-02-14 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867316#comment-15867316
 ] 

Roshan Naik commented on STORM-2358:


*Pt 2.*
Switching to Class wont make things any more generic than what we already have. 
It will make things less user-friendly. as we cant have alias-es for built-ins 
anymore, which is desirable IMO. Would be nice if we could have alias-es for 
custom types as well.. but we cant .. as it requires a registration step. This 
isn't a good reason to be changing a public API and lose aliasing ability at 
the same time. `ZippedTextFileReader` should add an alias if it is going to be 
a built-in.

*Pt 3*
We don't want to do that. We can't expect all the custom FileReaders to dictate 
these output field names (even optionally) to the spout.  Reason being, in many 
cases its not even possible for a custom FileReader to know what the field 
names will be or even how many fields there will be ... for example any format 
with multiple columns... say a CSV reader or RegEx reader. We want the user to 
set the output fields names depending on the data he is feeding it with.Thats 
why those constants are for convenience only and not part of the FileReader 
interface.


> Update storm hdfs spout to remove specific implementation handlings
> ---
>
> Key: STORM-2358
> URL: https://issues.apache.org/jira/browse/STORM-2358
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-hdfs
>Affects Versions: 1.x
>Reporter: Sachin Pasalkar
>  Labels: newbie
> Attachments: AbstractFileReader.java, FileReader.java, 
> HDFSSpout.java, SequenceFileReader.java, TextFileReader.java
>
>
> I was looking at storm hdfs spout code in 1.x branch, I found below
> improvements can be made in below code.
>   1.  Make org.apache.storm.hdfs.spout.AbstractFileReader as public so
> that it can be used in generics.
>   2.  org.apache.storm.hdfs.spout.HdfsSpout requires readerType as
> String. It will be great to have class
> readerType; So we will not use Class.forName at multiple places also it
> will help in below point.
>   3.  HdfsSpout also needs to provide outFields which are declared as
> constants in each reader(e.g.SequenceFileReader). We can have abstract
> API AbstractFileReader in which return them to user to make it generic.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (STORM-2358) Update storm hdfs spout to remove specific implementation handlings

2017-02-14 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867169#comment-15867169
 ] 

Roshan Naik commented on STORM-2358:


*Pt 1:*
  It is not clear that there is a real need to make it public... but, as i 
said, i am not opposed to it. The Java generics reasoning is invalid though.

*Pt 2:
 Again .. the current design *does* support custom readers and not limited to 
built-in readers. Just provide a fully qualified class name (FQCN) there for 
custom reader for the reader type. For built-ins u can use either alias or 
FQCN. Putting in a class type takes away the ability to have short aliases for 
built-in readers.

pt 3:
Sounds like you agree


> Update storm hdfs spout to remove specific implementation handlings
> ---
>
> Key: STORM-2358
> URL: https://issues.apache.org/jira/browse/STORM-2358
> Project: Apache Storm
>  Issue Type: Improvement
>  Components: storm-hdfs
>Affects Versions: 1.x
>Reporter: Sachin Pasalkar
>  Labels: newbie
> Attachments: AbstractFileReader.java, FileReader.java, 
> HDFSSpout.java, SequenceFileReader.java, TextFileReader.java
>
>
> I was looking at storm hdfs spout code in 1.x branch, I found below
> improvements can be made in below code.
>   1.  Make org.apache.storm.hdfs.spout.AbstractFileReader as public so
> that it can be used in generics.
>   2.  org.apache.storm.hdfs.spout.HdfsSpout requires readerType as
> String. It will be great to have class
> readerType; So we will not use Class.forName at multiple places also it
> will help in below point.
>   3.  HdfsSpout also needs to provide outFields which are declared as
> constants in each reader(e.g.SequenceFileReader). We can have abstract
> API AbstractFileReader in which return them to user to make it generic.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   >