[jira] [Commented] (STORM-3685) Detect Loops in Topology at Submit
[ https://issues.apache.org/jira/browse/STORM-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17425334#comment-17425334 ] Roshan Naik commented on STORM-3685: [~bipinprasad] & [~ethanli] a very very late +1 on this approach of warning but not banning cycles. Sometimes there is a need for cycles if downstream operators need to report progress or send metadata to upstream operators. This can be used for flow control where sources can be suspended/resumed based on progress notifications from downstream operators. For example: * For (event-time based) windowed joins, can be useful to synchronize streams involved around event time boundaries. Allowing both streams to progress without any flow controls can cause one stream to move far too ahead compared to the other stream and that will lead to having too open windows opens for too long... and thus lead to excessive memory consumption. * The [Kappa+ Architecture|https://www.youtube.com/watch?v=4qSlsYogALo] uses flow control on sources for enabling usecases like efficient backfills using native streaming mode execution. * As you know, technically, any topology with ACKing enabled has a cycle. And {{topology.max.spout.pending}} is a form of flow control. > Detect Loops in Topology at Submit > -- > > Key: STORM-3685 > URL: https://issues.apache.org/jira/browse/STORM-3685 > Project: Apache Storm > Issue Type: Improvement > Components: storm-client >Reporter: Bipin Prasad >Assignee: Bipin Prasad >Priority: Major > Fix For: 2.3.0 > > Time Spent: 7h 40m > Remaining Estimate: 0h > > Topology graph is expected to be a Directed Acyclic Graph (DAG). Detect > cycles in DAG when Topology is submitted and prevent cycles in Topology. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (STORM-3314) Acker redesign
[ https://issues.apache.org/jira/browse/STORM-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747458#comment-16747458 ] Roshan Naik commented on STORM-3314: *Summary of the above two approaches:* +First approach+ tries to eliminate ACKer bolt by employing the existing spouts and bolts. It doesnt try to mitigate interworker ACK msgs. Also ACK msgs can get stuck behind regular msgs under BP. +Second approach+ is centered around mitigating interworker ACK msging. Retains need to tweak Acker count separately to handle more/lesser spout and bolts... most likely a fixed number of them in each worker. Both eliminate need for fields grouping of ACKs. Both need more thought on timeout resets. *Some data points:* * *Multiworker Speed:* In Storm 2, two components on +different workers+ can sustain a max throughput of ~*4 mill/sec* over the interworker msg-ing (w/o ACKing). * *ACKer bolt* speed (Single worker mode):( ** An ACKer bolt can sustain a max topo throughput of only ~*1.7 mill/sec* in *single worker* mode... when handling an ultra bare minimal topo of 1 spout and 1 bolt. ** ACKer slows down to *950k/sec* mill/sec with adding one more bolt in the mix (i.e. ConstSpoutIdBoltNullBoltTopo) * *ACKing + Multi worker:* For 2 workers (1 spout & 1 bolt) ACK mode throughput is at ~*900k/s.* Ideally, ACKing path within single worker should be faster than interworker msging, given the absence of de/serialization and network stack. Clearly both the ACKer bolt itself and the interworker msging are big factors... with the ACKer bolt being the more significant one. Mitigating only interworker msging, will likely leave a lot of potential performance on the table. Perhaps there is a hybrid approach that combine benefits of both approaches... reduce interworker and eliminate ACKer bolt. *Qs about [~kabhwan]'s approach:* # If I understand correctly, in single worker mode there will be no difference. And in multiworker mode, when the entire tuple tree falls within the same worker, as well as in single worker mode, the perf will be same ? # What is the method to compute "fully processed" after tuple tree is split into per worker fractions ? Sounds like ACKers will msg each other about completion of their partial tree ... and this will boil up to the root ACKER. *Additional Thoughts:* There appears to be two properties (which I think are true) that remain unexploited in the two approaches and a potential hybrid approach: - Within the worker process, the msging subsystem guarantees *delivery* and there can be no msg loss. - Message loss happens only when a worker goes down (or gets disconnected from the rest of the topo) > Acker redesign > -- > > Key: STORM-3314 > URL: https://issues.apache.org/jira/browse/STORM-3314 > Project: Apache Storm > Issue Type: Improvement > Components: storm-client >Reporter: Roshan Naik >Priority: Major > > *Context:* The ACKing mechanism has come focus as one of the next major > bottlenecks to address. The strategy to timeout and replay tuples has issues > discussed in STORM-2359 > *Basic idea:* Every bolt will send an ACK msg to its upstream spout/bolt once > the tuples it emitted have been *fully processed* by downstream bolts. > *Determining "fully processed”* : For every incoming (parent) tuple, a bolt > can emit 0 or more “child” tuples. the Parent tuple is considered fully > processed once a bolt receives ACKs for all the child emits (if any). This > basic idea cascades all the way back up to the spout that emitted the root of > the tuple tree. > This means that, when a bolt is finished with all the child emits and it > calls ack() no ACK message will be generated (unless there were 0 child > emits). The ack() marks the completion of all child emits for a parent tuple. > The bolt will emit an ACK to its upstream component once all the ACKs from > downstream components have been received. > *Operational changes:* The existing spouts and bolts don’t need any change. > The bolt executor will need to process incoming acks from downstream bolts > and send an ACK to its upstream component as needed. In the case of 0 child > emits, ack() itself could immediately send the ACK to the upstream component. > Field grouping is not applied to ACK messages. > Total ACK messages: The spout output collector will no longer send an > ACK-init message to the ACKer bolt. Other than this, the total number of > emitted ACK messages does not change. Instead of the ACKs going to an ACKer > bolt, they get spread out among the existing bolts. It appears that this mode > may reduce some of the inter-worker traffic of ACK messages. > *Memory use:* If we use the existing XOR logic from ACKer bolt, we need about > 20 bytes per outstanding tuple-tree at each bolt. Assuming an
[jira] [Comment Edited] (STORM-3314) Acker redesign
[ https://issues.apache.org/jira/browse/STORM-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747458#comment-16747458 ] Roshan Naik edited comment on STORM-3314 at 1/20/19 1:22 PM: - *Summary of the above two approaches:* +First approach+ tries to eliminate ACKer bolt by employing the existing spouts and bolts. It doesnt try to mitigate interworker ACK msgs. Also ACK msgs can get stuck behind regular msgs under BP. +Second approach+ is centered around mitigating interworker ACK msging. Retains need to tweak Acker count separately to handle more/lesser spout and bolts... most likely a fixed number of them in each worker. Both eliminate need for fields grouping of ACKs. Both need more thought on timeout resets. *Some data points:* * *Multiworker Speed:* In Storm 2, two components on +different workers+ can sustain a max throughput of ~*4 mill/sec* over the interworker msg-ing (w/o ACKing). * *ACKer bolt* speed (Single worker mode) : ** An ACKer bolt can sustain a max topo throughput of only ~*1.7 mill/sec* in *single worker* mode... when handling an ultra bare minimal topo of 1 spout and 1 bolt. ** ACKer slows down to *950k/sec* mill/sec with adding one more bolt in the mix (i.e. ConstSpoutIdBoltNullBoltTopo) * *ACKing + Multi worker:* For 2 workers (1 spout & 1 bolt) ACK mode throughput is at ~*900k/s.* Ideally, ACKing path within single worker should be faster than interworker msging, given the absence of de/serialization and network stack. Clearly both the ACKer bolt itself and the interworker msging are big factors... with the ACKer bolt being the more significant one. Mitigating only interworker msging, will likely leave a lot of potential performance on the table. Perhaps there is a hybrid approach that combine benefits of both approaches... reduce interworker and eliminate ACKer bolt. *Qs about [~kabhwan]'s approach:* # If I understand correctly, in single worker mode there will be no difference. And in multiworker mode, when the entire tuple tree falls within the same worker, as well as in single worker mode, the perf will be same ? # What is the method to compute "fully processed" after tuple tree is split into per worker fractions ? Sounds like ACKers will msg each other about completion of their partial tree ... and this will boil up to the root ACKER. *Additional Thoughts:* There appears to be two properties (which I think are true) that remain unexploited in the two approaches and a potential hybrid approach: - Within the worker process, the msging subsystem guarantees *delivery* and there can be no msg loss. - Message loss happens only when a worker goes down (or gets disconnected from the rest of the topo) was (Author: roshan_naik): *Summary of the above two approaches:* +First approach+ tries to eliminate ACKer bolt by employing the existing spouts and bolts. It doesnt try to mitigate interworker ACK msgs. Also ACK msgs can get stuck behind regular msgs under BP. +Second approach+ is centered around mitigating interworker ACK msging. Retains need to tweak Acker count separately to handle more/lesser spout and bolts... most likely a fixed number of them in each worker. Both eliminate need for fields grouping of ACKs. Both need more thought on timeout resets. *Some data points:* * *Multiworker Speed:* In Storm 2, two components on +different workers+ can sustain a max throughput of ~*4 mill/sec* over the interworker msg-ing (w/o ACKing). * *ACKer bolt* speed (Single worker mode):( ** An ACKer bolt can sustain a max topo throughput of only ~*1.7 mill/sec* in *single worker* mode... when handling an ultra bare minimal topo of 1 spout and 1 bolt. ** ACKer slows down to *950k/sec* mill/sec with adding one more bolt in the mix (i.e. ConstSpoutIdBoltNullBoltTopo) * *ACKing + Multi worker:* For 2 workers (1 spout & 1 bolt) ACK mode throughput is at ~*900k/s.* Ideally, ACKing path within single worker should be faster than interworker msging, given the absence of de/serialization and network stack. Clearly both the ACKer bolt itself and the interworker msging are big factors... with the ACKer bolt being the more significant one. Mitigating only interworker msging, will likely leave a lot of potential performance on the table. Perhaps there is a hybrid approach that combine benefits of both approaches... reduce interworker and eliminate ACKer bolt. *Qs about [~kabhwan]'s approach:* # If I understand correctly, in single worker mode there will be no difference. And in multiworker mode, when the entire tuple tree falls within the same worker, as well as in single worker mode, the perf will be same ? # What is the method to compute "fully processed" after tuple tree is split into per worker fractions ? Sounds like ACKers will msg each other about completion of their partial tree ... and this will boil up to
[jira] [Commented] (STORM-2359) Revising Message Timeouts
[ https://issues.apache.org/jira/browse/STORM-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745845#comment-16745845 ] Roshan Naik commented on STORM-2359: Yes indeed. It will be quite nice if JC Tools queues could support peek()ing and still be lock free. > Revising Message Timeouts > - > > Key: STORM-2359 > URL: https://issues.apache.org/jira/browse/STORM-2359 > Project: Apache Storm > Issue Type: Sub-task > Components: storm-core >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Assignee: Stig Rohde Døssing >Priority: Major > Attachments: STORM-2359-with-auto-reset.ods, STORM-2359.ods > > > A revised strategy for message timeouts is proposed here. > Design Doc: > > https://docs.google.com/document/d/1am1kO7Wmf17U_Vz5_uyBB2OuSsc4TZQWRvbRhX52n5w/edit?usp=sharing -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-3314) Acker redesign
[ https://issues.apache.org/jira/browse/STORM-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated STORM-3314: --- Description: *Context:* The ACKing mechanism has come focus as one of the next major bottlenecks to address. The strategy to timeout and replay tuples has issues discussed in STORM-2359 *Basic idea:* Every bolt will send an ACK msg to its upstream spout/bolt once the tuples it emitted have been *fully processed* by downstream bolts. *Determining "fully processed”* : For every incoming (parent) tuple, a bolt can emit 0 or more “child” tuples. the Parent tuple is considered fully processed once a bolt receives ACKs for all the child emits (if any). This basic idea cascades all the way back up to the spout that emitted the root of the tuple tree. This means that, when a bolt is finished with all the child emits and it calls ack() no ACK message will be generated (unless there were 0 child emits). The ack() marks the completion of all child emits for a parent tuple. The bolt will emit an ACK to its upstream component once all the ACKs from downstream components have been received. *Operational changes:* The existing spouts and bolts don’t need any change. The bolt executor will need to process incoming acks from downstream bolts and send an ACK to its upstream component as needed. In the case of 0 child emits, ack() itself could immediately send the ACK to the upstream component. Field grouping is not applied to ACK messages. Total ACK messages: The spout output collector will no longer send an ACK-init message to the ACKer bolt. Other than this, the total number of emitted ACK messages does not change. Instead of the ACKs going to an ACKer bolt, they get spread out among the existing bolts. It appears that this mode may reduce some of the inter-worker traffic of ACK messages. *Memory use:* If we use the existing XOR logic from ACKer bolt, we need about 20 bytes per outstanding tuple-tree at each bolt. Assuming an average of say 50k outstanding tuples at each level, we have 50k*20bytes = 1MB per bolt instance. There may be room to do something better than XOR, since we only need to track one level of outstanding emits at each bolt. *Replay:* [needs more thinking] One option is to send REPLAY or TIMEOUT msgs upstream. Policy of when to emit them needs more thought. Good to avoid Timeouts/replays of inflight tuples under backpressure since this will lead to "event tsunami" at the worst possible time. Ideally, if possible, replay should be avoided unless tuples have been dropped. Would be nice to avoid sending TIMEOUT_RESET msgs upstream when under backpressure ... since they are likely to face backpressure as well. On receiving an ACKs or REPLAYs from downstream components, a bolt needs to clears the corresponding 20 bytes tracking info. *Concerns:* ACK tuples traversing upstream means it takes longer to get back to Spout. *Related Note:* +Why ACKer is slow ?:+ Although lightweigh internally, the ACKer bolt has a huge impact on throughput. The latency hit does not appear to be as significant. I have some thoughts around why ACKer slows down throughput. Consider the foll simple topo: {{Spout ==> Bolt1 ==> Bolt2}} If we add an ACKer to the above topo, the Acker bolt receives 3x more incoming messages than the Bolt1 & Bolt2. Thats instantly a 3x hit on its throughput. was: *Context:* The ACKing mechanism has come focus as one of the next major bottlenecks to address. The strategy to timeout and replay tuples has issues discussed in STORM-2359 *Basic idea:* Every bolt will send an ACK msg to its upstream spout/bolt once the tuples it emitted have been *fully processed* by downstream bolts. *Determining "fully processed”* : For every incoming (parent) tuple, a bolt can emit 0 or more “child” tuples. the Parent tuple is considered fully processed once a bolt receives ACKs for all the child emits (if any). This basic idea cascades all the way back up to the spout that emitted the root of the tuple tree. This means that, when a bolt is finished with all the child emits and it calls ack() no ACK message will be generated (unless there were 0 child emits). The ack() marks the completion of all child emits for a parent tuple. The bolt will emit an ACK to its upstream component once all the ACKs from downstream components have been received. *Operational changes:* The existing spouts and bolts don’t need any change. The bolt executor will need to process incoming acks from downstream bolts and send an ACK to its upstream component as needed. In the case of 0 child emits, ack() itself could immediately send the ACK to the upstream component. Field grouping is not applied to ACK messages. Total ACK messages: The spout output collector will no longer send an ACK-init message to the ACKer bolt. Other than this, the total number of emitted ACK
[jira] [Updated] (STORM-3314) Acker redesign
[ https://issues.apache.org/jira/browse/STORM-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated STORM-3314: --- Description: *Context:* The ACKing mechanism has come focus as one of the next major bottlenecks to address. The strategy to timeout and replay tuples has issues discussed in STORM-2359 *Basic idea:* Every bolt will send an ACK msg to its upstream spout/bolt once the tuples it emitted have been *fully processed* by downstream bolts. *Determining "fully processed”* : For every incoming (parent) tuple, a bolt can emit 0 or more “child” tuples. the Parent tuple is considered fully processed once a bolt receives ACKs for all the child emits (if any). This basic idea cascades all the way back up to the spout that emitted the root of the tuple tree. This means that, when a bolt is finished with all the child emits and it calls ack() no ACK message will be generated (unless there were 0 child emits). The ack() marks the completion of all child emits for a parent tuple. The bolt will emit an ACK to its upstream component once all the ACKs from downstream components have been received. *Operational changes:* The existing spouts and bolts don’t need any change. The bolt executor will need to process incoming acks from downstream bolts and send an ACK to its upstream component as needed. In the case of 0 child emits, ack() itself could immediately send the ACK to the upstream component. Field grouping is not applied to ACK messages. Total ACK messages: The spout output collector will no longer send an ACK-init message to the ACKer bolt. Other than this, the total number of emitted ACK messages does not change. Instead of the ACKs going to an ACKer bolt, they get spread out among the existing bolts. It appears that this mode may reduce some of the inter-worker traffic of ACK messages. *Memory use:* If we use the existing XOR logic from ACKer bolt, we need about 20 bytes per outstanding tuple-tree at each bolt. Assuming an average of say 50k outstanding tuples at each level, we have 50k*20bytes = 1MB per bolt instance. There may be room to do something better than XOR, since we only need to track one level of outstanding emits at each bolt. *Replay:* [needs more thinking] One option is to send REPLAY or TIMEOUT msgs upstream. Policy of when to emit them needs more thought. Good to avoid Timeouts/replays of inflight tuples under backpressure since this will lead to "event tsunami" at the worst possible time. Ideally, if possible, replay should be avoided unless tuples have been dropped. Would be nice to avoid sending TIMEOUT_RESET msgs upstream when under backpressure ... since they are likely to face backpressure as well. On receiving an ACKs or REPLAYs from downstream components, a bolt needs to clears the corresponding 20 bytes tracking info. *Concerns:* ACK tuples traversing upstream means it takes longer to get back to Spout. *Related Note:* +Why ACKer is slow ?:+ Although lightweigh internally, the ACKer bolt has a huge impact on throughput. The latency hit does not appear to be as significant. I have some thoughts around why ACKer slows down throughput. Consider the foll simple topo: {{Spout ==> Bolt1 ==> Bolt2}} If we add an ACKer to the above topo, the Acker bolt receives 3x more incoming messages than the Bolt1 & Bolt2. Thats instantly a 3x hit on its throughput on ACKer. Additionally each spout and bolt now emits 2 msgs instead of 1 (if acker were absent). This slows down the spouts and bolts. was: *Context:* The ACKing mechanism has come focus as one of the next major bottlenecks to address. The strategy to timeout and replay tuples has issues discussed in STORM-2359 *Basic idea:* Every bolt will send an ACK msg to its upstream spout/bolt once the tuples it emitted have been *fully processed* by downstream bolts. *Determining "fully processed”* : For every incoming (parent) tuple, a bolt can emit 0 or more “child” tuples. the Parent tuple is considered fully processed once a bolt receives ACKs for all the child emits (if any). This basic idea cascades all the way back up to the spout that emitted the root of the tuple tree. This means that, when a bolt is finished with all the child emits and it calls ack() no ACK message will be generated (unless there were 0 child emits). The ack() marks the completion of all child emits for a parent tuple. The bolt will emit an ACK to its upstream component once all the ACKs from downstream components have been received. *Operational changes:* The existing spouts and bolts don’t need any change. The bolt executor will need to process incoming acks from downstream bolts and send an ACK to its upstream component as needed. In the case of 0 child emits, ack() itself could immediately send the ACK to the upstream component. Field grouping is not applied to ACK messages. Total ACK messages: The
[jira] [Updated] (STORM-3314) Acker redesign
[ https://issues.apache.org/jira/browse/STORM-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated STORM-3314: --- Description: *Context:* The ACKing mechanism has come focus as one of the next major bottlenecks to address. The strategy to timeout and replay tuples has issues discussed in STORM-2359 *Basic idea:* Every bolt will send an ACK msg to its upstream spout/bolt once the tuples it emitted have been *fully processed* by downstream bolts. *Determining "fully processed”* : For every incoming (parent) tuple, a bolt can emit 0 or more “child” tuples. the Parent tuple is considered fully processed once a bolt receives ACKs for all the child emits (if any). This basic idea cascades all the way back up to the spout that emitted the root of the tuple tree. This means that, when a bolt is finished with all the child emits and it calls ack() no ACK message will be generated (unless there were 0 child emits). The ack() marks the completion of all child emits for a parent tuple. The bolt will emit an ACK to its upstream component once all the ACKs from downstream components have been received. *Operational changes:* The existing spouts and bolts don’t need any change. The bolt executor will need to process incoming acks from downstream bolts and send an ACK to its upstream component as needed. In the case of 0 child emits, ack() itself could immediately send the ACK to the upstream component. Field grouping is not applied to ACK messages. Total ACK messages: The spout output collector will no longer send an ACK-init message to the ACKer bolt. Other than this, the total number of emitted ACK messages does not change. Instead of the ACKs going to an ACKer bolt, they get spread out among the existing bolts. It appears that this mode may reduce some of the inter-worker traffic of ACK messages. *Memory use:* If we use the existing XOR logic from ACKer bolt, we need about 20 bytes per outstanding tuple-tree at each bolt. Assuming an average of say 50k outstanding tuples at each level, we have 50k*20bytes = 1MB per bolt instance. There may be room to do something better than XOR, since we only need to track one level of outstanding emits at each bolt. *Replay:* [needs more thinking] One option is to send REPLAY or TIMEOUT msgs upstream. Policy of when to emit them needs more thought. Good to avoid Timeouts/replays of inflight tuples under backpressure since this will lead to "event tsunami" at the worst possible time. Ideally, if possible, replay should be avoided unless tuples have been dropped. Would be nice to avoid sending TIMEOUT_RESET msgs upstream when under backpressure ... since they are likely to face backpressure as well. On receiving an ACKs or REPLAYs from downstream components, a bolt needs to clears the corresponding 20 bytes tracking info. *Concerns:* ACK tuples traversing upstream means it takes longer to get back to Spout. *Related Note:* Why ACKer is slow ?: Although lightweigh internally, the ACKer bolt has a huge impact on throughput. The latency hit does not appear to be as significant. I have some thoughts around why ACKer slows down throughput. Consider the foll simple topo: {{Spout ==> Bolt1 ==> Bolt2}} If we add an ACKer to the above topo, the Acker bolt receives 3x more incoming messages than the Bolt1 & Bolt2. Thats instantly a 3x hit on its throughput. was: *Context:* The ACKing mechanism has come focus as one of the next major bottlenecks to address. The strategy to timeout and replay tuples has issues discussed in STORM-2359 *Basic idea:* Every bolt will send an ACK msg to its upstream spout/bolt once the tuples it emitted have been *fully processed* by downstream bolts. *Determining "fully processed”* : For every incoming (parent) tuple, a bolt can emit 0 or more “child” tuples. the Parent tuple is considered fully processed once a bolt receives ACKs for all the child emits (if any). This basic idea cascades all the way back up to the spout that emitted the root of the tuple tree. This means that, when a bolt is finished with all the child emits and it calls ack() no ACK message will be generated (unless there were 0 child emits). The ack() marks the completion of all child emits for a parent tuple. The bolt will emit an ACK to its upstream component once all the ACKs from downstream components have been received. *Operational changes:* The existing spouts and bolts don’t need any change. The bolt executor will need to process incoming acks from downstream bolts and send an ACK to its upstream component as needed. In the case of 0 child emits, ack() itself could immediately send the ACK to the upstream component. Field grouping is not applied to ACK messages. Total ACK messages: The spout output collector will no longer send an ACK-init message to the ACKer bolt. Other than this, the total number of emitted ACK messages
[jira] [Updated] (STORM-3314) Acker redesign
[ https://issues.apache.org/jira/browse/STORM-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated STORM-3314: --- Description: *Context:* The ACKing mechanism has come focus as one of the next major bottlenecks to address. The strategy to timeout and replay tuples has issues discussed in STORM-2359 *Basic idea:* Every bolt will send an ACK msg to its upstream spout/bolt once the tuples it emitted have been *fully processed* by downstream bolts. *Determining "fully processed”* : For every incoming (parent) tuple, a bolt can emit 0 or more “child” tuples. the Parent tuple is considered fully processed once a bolt receives ACKs for all the child emits (if any). This basic idea cascades all the way back up to the spout that emitted the root of the tuple tree. This means that, when a bolt is finished with all the child emits and it calls ack() no ACK message will be generated (unless there were 0 child emits). The ack() marks the completion of all child emits for a parent tuple. The bolt will emit an ACK to its upstream component once all the ACKs from downstream components have been received. *Operational changes:* The existing spouts and bolts don’t need any change. The bolt executor will need to process incoming acks from downstream bolts and send an ACK to its upstream component as needed. In the case of 0 child emits, ack() itself could immediately send the ACK to the upstream component. Field grouping is not applied to ACK messages. Total ACK messages: The spout output collector will no longer send an ACK-init message to the ACKer bolt. Other than this, the total number of emitted ACK messages does not change. Instead of the ACKs going to an ACKer bolt, they get spread out among the existing bolts. It appears that this mode may reduce some of the inter-worker traffic of ACK messages. *Memory use:* If we use the existing XOR logic from ACKer bolt, we need about 20 bytes per outstanding tuple-tree at each bolt. Assuming an average of say 50k outstanding tuples at each level, we have 50k*20bytes = 1MB per bolt instance. There may be room to do something better than XOR, since we only need to track one level of outstanding emits at each bolt. *Replay:* [needs more thinking] One option is to send REPLAY or TIMEOUT msgs upstream. Policy of when to emit them needs more thought. Good to avoid Timeouts/replays of inflight tuples under backpressure since this will lead to "event tsunami" at the worst possible time. Ideally, if possible, replay should be avoided unless tuples have been dropped. Would be nice to avoid sending TIMEOUT_RESET msgs upstream when under backpressure ... since they are likely to face backpressure as well. On receiving an ACKs or REPLAYs from downstream components, a bolt needs to clears the corresponding 20 bytes tracking info. *Concerns:* ACK tuples traversing upstream means it takes longer to get back to Spout. *Related Note:* Why ACKer is slow ?: Although lightweigh internally, the ACKer bolt has a huge impact on throughput. The latency hit does not appear to be as significant. I have some thoughts around why ACKer slows down throughput. Consider the foll simple topo: {{Spout > Bolt1 > Bolt2}} If we add an ACKer to the above topo, the Acker bolt receives 3x more incoming messages than the Bolt1 & Bolt2. Thats instantly a 3x hit on its throughput. was: *Context:* The ACKing mechanism has come focus as one of the next major bottlenecks to address. The strategy to timeout and replay tuples has issues discussed in STORM-2359 *Basic idea:* Every bolt will send an ACK msg to its upstream spout/bolt once the tuples it emitted have been *fully processed* by downstream bolts. *Determining "fully processed”* : For every incoming (parent) tuple, a bolt can emit 0 or more “child” tuples. the Parent tuple is considered fully processed once a bolt receives ACKs for all the child emits (if any). This basic idea cascades all the way back up to the spout that emitted the root of the tuple tree. This means that, when a bolt is finished with all the child emits and it calls ack() no ACK message will be generated (unless there were 0 child emits). The ack() marks the completion of all child emits for a parent tuple. The bolt will emit an ACK to its upstream component once all the ACKs from downstream components have been received. *Operational changes:* The existing spouts and bolts don’t need any change. The bolt executor will need to process incoming acks from downstream bolts and send an ACK to its upstream component as needed. In the case of 0 child emits, ack() itself could immediately send the ACK to the upstream component. Field grouping is not applied to ACK messages. Total ACK messages: The spout output collector will no longer send an ACK-init message to the ACKer bolt. Other than this, the total number of emitted ACK
[jira] [Commented] (STORM-2359) Revising Message Timeouts
[ https://issues.apache.org/jira/browse/STORM-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744817#comment-16744817 ] Roshan Naik commented on STORM-2359: [~Srdo] Fyi: Just posted the initial thoughts around Acker redesign in STORM-3314. > Revising Message Timeouts > - > > Key: STORM-2359 > URL: https://issues.apache.org/jira/browse/STORM-2359 > Project: Apache Storm > Issue Type: Sub-task > Components: storm-core >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Assignee: Stig Rohde Døssing >Priority: Major > Attachments: STORM-2359-with-auto-reset.ods, STORM-2359.ods > > > A revised strategy for message timeouts is proposed here. > Design Doc: > > https://docs.google.com/document/d/1am1kO7Wmf17U_Vz5_uyBB2OuSsc4TZQWRvbRhX52n5w/edit?usp=sharing -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-3314) Acker redesign
[ https://issues.apache.org/jira/browse/STORM-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16744813#comment-16744813 ] Roshan Naik commented on STORM-3314: _Quoting thoughts expressed by_ [~kabhwan] _in response to above initial idea :_ *Essential idea:* Try to avoid sending ack tuple to another workers, given that it incurs serde and network cost. If we could get rid of field grouping it may be better. Roshan brought a great idea about it, which is in line with my essential idea, so there may be a trade-off between Roshan’s and mine. My idea is inspired by Roshan’s idea and a bit modified to adopt nice things of his idea. *Summary:* Build partial tuple tree into Acker in each worker, and let Acker in worker handles its own partial tuple tree. Note that we don’t change the basic idea of acking mechanism. *Detail:* * When tuple is emitted from spout, it sends ACK_INIT message to Acker within worker. ** Same as current. * The logic for emitting tuple in Bolt doesn’t change. ** We may want to include acker ID to the tuple to send acknowledge to. ** Otherwise, we also could compute (and cache) it from bolt side (based on source task ID) to not grow size of each tuple. * When worker receives tuples from outside of worker, it also sends ACK_INIT message to Acker within worker. ** Acker stores additional information (acker task ID for source task’s worker, tuple ID of source tuple, anchors, etc.) to “send back ACK to source” when partial tuple tree is completed. ** The information may be a bit different between referring spout and referring another acker. * Whenever tuple is acknowledged in Bolt, ACK message is sent to the acker within worker. * When acker found tuple tree is completed, it sends back ACK to source acker based on stored information. ** If the source is Spout, just do as same as what we are going. *Expected benefits:* * ACK tuples between hops within worker will not be sent to another workers, hence avoiding cost of serde and network transfer. * No field grouping. * It follows the path how tuples are flowing, which could be optimized by grouper implementation. *Considerations:* * Each worker has to have acker task for each, which is a new requirement even it is by default for now. ** We could (and should) still turn off acker when needed, but configuration for acker needs to be changed to turn on/off, not acker task count. * Total count of ack tuples is not changed. ** Maybe we could avoid sending ACK tuples within worker and just calculate and update what acker is doing on the fly (in executor thread) if it gives more benefits. * A bit more memory is needed given that we maintain tuple tree for each worker (linear to worker count tuple tree has to flow), and we need additional information for sending back ACK to another acker. * Need to consider how FAIL tuple will work, and how RESET_TIMEOUT tuple will work. ** They will be going to be headache things to consider from both mine and Roshan’s. ** I’m still not sure how tuple timeout works with backpressure, even we still need this to get rid of old things in rotating tree. *Differences between Roshan’s idea:* * We don’t add ACK tuples to the source task’s queue, hence keep bolt’s task and acker’s task separate. ** Two perspectives: queue, (executor) thread. ** It will be less coupled with backpressure. IMHO, ACK tuples and metrics tuples should flow regardless of backpressure ongoing. ** I think leaving it to separate could keep opening possibilities to apply more optimizations on acker, like having separate channel for ACK tuples and metrics tuples. * It uses less memory if there’re some tasks in worker: it is just what acker excels. * Less ACK tuple flowing hops when ACK tuples are flowing back, given that it only deals with ackers. (Yes, more hops when acking within worker though.) * We just need to handle rotating tree only one place (acker) for each worker, compared to each task. * Complicated than Roshan’s idea: Roshan’s idea is just clearly intuitive, and I can’t imagine easier solutions. > Acker redesign > -- > > Key: STORM-3314 > URL: https://issues.apache.org/jira/browse/STORM-3314 > Project: Apache Storm > Issue Type: Improvement > Components: storm-client >Reporter: Roshan Naik >Priority: Major > > *Context:* The ACKing mechanism has come focus as one of the next major > bottlenecks to address. The strategy to timeout and replay tuples has issues > discussed in STORM-2359 > *Basic idea:* Every bolt will send an ACK msg to its upstream spout/bolt once > the tuples it emitted have been *fully processed* by downstream bolts. > *Determining "fully processed”* : For every incoming (parent) tuple, a bolt > can emit 0 or more “child” tuples. the Parent tuple is
[jira] [Created] (STORM-3314) Acker redesign
Roshan Naik created STORM-3314: -- Summary: Acker redesign Key: STORM-3314 URL: https://issues.apache.org/jira/browse/STORM-3314 Project: Apache Storm Issue Type: Improvement Components: storm-client Reporter: Roshan Naik *Context:* The ACKing mechanism has come focus as one of the next major bottlenecks to address. The strategy to timeout and replay tuples has issues discussed in STORM-2359 *Basic idea:* Every bolt will send an ACK msg to its upstream spout/bolt once the tuples it emitted have been *fully processed* by downstream bolts. *Determining "fully processed”* : For every incoming (parent) tuple, a bolt can emit 0 or more “child” tuples. the Parent tuple is considered fully processed once a bolt receives ACKs for all the child emits (if any). This basic idea cascades all the way back up to the spout that emitted the root of the tuple tree. This means that, when a bolt is finished with all the child emits and it calls ack() no ACK message will be generated (unless there were 0 child emits). The ack() marks the completion of all child emits for a parent tuple. The bolt will emit an ACK to its upstream component once all the ACKs from downstream components have been received. *Operational changes:* The existing spouts and bolts don’t need any change. The bolt executor will need to process incoming acks from downstream bolts and send an ACK to its upstream component as needed. In the case of 0 child emits, ack() itself could immediately send the ACK to the upstream component. Field grouping is not applied to ACK messages. Total ACK messages: The spout output collector will no longer send an ACK-init message to the ACKer bolt. Other than this, the total number of emitted ACK messages does not change. Instead of the ACKs going to an ACKer bolt, they get spread out among the existing bolts. It appears that this mode may reduce some of the inter-worker traffic of ACK messages. *Memory use:* If we use the existing XOR logic from ACKer bolt, we need about 20 bytes per outstanding tuple-tree at each bolt. Assuming an average of say 50k outstanding tuples at each level, we have 50k*20bytes = 1MB per bolt instance. There may be room to do something better than XOR, since we only need to track one level of outstanding emits at each bolt. *Replay:* [needs more thinking] One option is to send REPLAY or TIMEOUT msgs upstream. Policy of when to emit them needs more thought. Good to avoid Timeouts/replays of inflight tuples under backpressure since this will lead to "event tsunami" at the worst possible time. Ideally, if possible, replay should be avoided unless tuples have been dropped. Would be nice to avoid sending TIMEOUT_RESET msgs upstream when under backpressure ... since they are likely to face backpressure as well. On receiving an ACKs or REPLAYs from downstream components, a bolt needs to clears the corresponding 20 bytes tracking info. *Concerns:* ACK tuples traversing upstream means it takes longer to get back to Spout. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2359) Revising Message Timeouts
[ https://issues.apache.org/jira/browse/STORM-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740143#comment-16740143 ] Roshan Naik commented on STORM-2359: Got a chance to read through your posts more carefully again. Your scheme to keep timeout computation in ACKer by having acker send back (A-B) to the spout seems sound. I am unable to grasp the timeout reset strategy though. These 2 stmts in particular. "For the inbound queue, the anchor is no longer in progress when the associated tuple is acked or failed. For the outbound queue (pendingEmits in the Executor), the anchor is no longer in progress when the associated tuple gets flushed from pendingEmits." I am thinking ... if a msg is in the inputQ or pendingEmitsQ, then it is *in progress* and it cannot also be ACKed. Overall it sounds like you want to track inflight tuples at each executor... which if true... is a big memory overhead. On the implementation side one key concern is that in the 2.0 perf re-architecture (STORM-2306) a good amount of effort was spent into tuning the critical path by cutting back on hashmaps and eliminating locks both of which are perf killers. On a very quick scan of the code, I noticed timeout strategy is introducing ConcurrentHashMap into the critical path... which introduces both locks and hashmaps in one go. Btw... Since you have reactivated this topic on ACKing... its probably worth bringing up [~kabhwan] and myself were rethinking the ACKing design sometime back. Since the ACKer bolt has a severe perf impact in Storm core... we were looking at a way to do acking without the ACKer bolt. As you can imagine, it would also impact the timeouts as well. I can post the general idea in another Jira and see what you think ... either continue down this path for now ... and later replace with a newer model ...or investigate a newer model directly. > Revising Message Timeouts > - > > Key: STORM-2359 > URL: https://issues.apache.org/jira/browse/STORM-2359 > Project: Apache Storm > Issue Type: Sub-task > Components: storm-core >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Assignee: Stig Rohde Døssing >Priority: Major > Attachments: STORM-2359-with-auto-reset.ods, STORM-2359.ods > > > A revised strategy for message timeouts is proposed here. > Design Doc: > > https://docs.google.com/document/d/1am1kO7Wmf17U_Vz5_uyBB2OuSsc4TZQWRvbRhX52n5w/edit?usp=sharing -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2359) Revising Message Timeouts
[ https://issues.apache.org/jira/browse/STORM-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730881#comment-16730881 ] Roshan Naik commented on STORM-2359: I see your point about having the timeout logic in acker. It’s a good one!. Sorry for delay in my response. I am traveling so unable to look into code Could u elaborate on the contents of the message being exhanged for Comoutung A-B? > Revising Message Timeouts > - > > Key: STORM-2359 > URL: https://issues.apache.org/jira/browse/STORM-2359 > Project: Apache Storm > Issue Type: Sub-task > Components: storm-core >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Assignee: Stig Rohde Døssing >Priority: Major > Attachments: STORM-2359.ods > > > A revised strategy for message timeouts is proposed here. > Design Doc: > > https://docs.google.com/document/d/1am1kO7Wmf17U_Vz5_uyBB2OuSsc4TZQWRvbRhX52n5w/edit?usp=sharing -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2359) Revising Message Timeouts
[ https://issues.apache.org/jira/browse/STORM-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726414#comment-16726414 ] Roshan Naik commented on STORM-2359: {quote}If we were to move timeouts entirely to the acker, there would be a potential for "lost" tuples (ones that end up not being reemitted) if messages between the acker and spout get lost. * The spout may emit a new tuple, and try to notify the acker about it. If this message is lost, the acker doesn't know about the tuple, and thus can't time it out. {quote} Acker will be totally clueless if all acks from downstream bolts are also lost for the same tuple tree. {quote}We can move the timeout logic to the acker, {quote} Alternatives : 1) Handle timeouts in SpoutExecutor. It sends a timeout msg to ACKer. *+Benefit+*: May not be any better than doing in ACKer. Do you have any thoughts about this ? 2) Eliminate timeout communication between Spout & ACK. Let each does its own timeout. *+Benefit+*: Eliminates need for: * Periodic sync up msgs (also avoid possibility of latency spikes in multi worker mode when these msgs get large). * A - B calculation, as well as * timeout msg exchanges. TimeoutReset msgs can still be supported. -roshan > Revising Message Timeouts > - > > Key: STORM-2359 > URL: https://issues.apache.org/jira/browse/STORM-2359 > Project: Apache Storm > Issue Type: Sub-task > Components: storm-core >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Assignee: Stig Rohde Døssing >Priority: Major > Attachments: STORM-2359.ods > > > A revised strategy for message timeouts is proposed here. > Design Doc: > > https://docs.google.com/document/d/1am1kO7Wmf17U_Vz5_uyBB2OuSsc4TZQWRvbRhX52n5w/edit?usp=sharing -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (STORM-2359) Revising Message Timeouts
[ https://issues.apache.org/jira/browse/STORM-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726414#comment-16726414 ] Roshan Naik edited comment on STORM-2359 at 12/21/18 3:49 AM: -- {quote}If we were to move timeouts entirely to the acker, there would be a potential for "lost" tuples (ones that end up not being reemitted) if messages between the acker and spout get lost. * The spout may emit a new tuple, and try to notify the acker about it. If this message is lost, the acker doesn't know about the tuple, and thus can't time it out.{quote} Acker will be totally clueless if all acks from downstream bolts are also lost for the same tuple tree. {quote}We can move the timeout logic to the acker, {quote} *Alternatives* : 1) Handle timeouts in SpoutExecutor. It sends a timeout msg to ACKer. *+Benefit+*: May not be any better than doing in ACKer. Do you have any thoughts about this ? 2) Eliminate timeout communication between Spout & ACK. Let each does its own timeout. *+Benefit+*: Eliminates need for: * Periodic sync up msgs (also avoid possibility of latency spikes in multi worker mode when these msgs get large). * A - B calculation, as well as * timeout msg exchanges. TimeoutReset msgs can still be supported. was (Author: roshan_naik): {quote}If we were to move timeouts entirely to the acker, there would be a potential for "lost" tuples (ones that end up not being reemitted) if messages between the acker and spout get lost. * The spout may emit a new tuple, and try to notify the acker about it. If this message is lost, the acker doesn't know about the tuple, and thus can't time it out. {quote} Acker will be totally clueless if all acks from downstream bolts are also lost for the same tuple tree. {quote}We can move the timeout logic to the acker, {quote} Alternatives : 1) Handle timeouts in SpoutExecutor. It sends a timeout msg to ACKer. *+Benefit+*: May not be any better than doing in ACKer. Do you have any thoughts about this ? 2) Eliminate timeout communication between Spout & ACK. Let each does its own timeout. *+Benefit+*: Eliminates need for: * Periodic sync up msgs (also avoid possibility of latency spikes in multi worker mode when these msgs get large). * A - B calculation, as well as * timeout msg exchanges. TimeoutReset msgs can still be supported. -roshan > Revising Message Timeouts > - > > Key: STORM-2359 > URL: https://issues.apache.org/jira/browse/STORM-2359 > Project: Apache Storm > Issue Type: Sub-task > Components: storm-core >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Assignee: Stig Rohde Døssing >Priority: Major > Attachments: STORM-2359.ods > > > A revised strategy for message timeouts is proposed here. > Design Doc: > > https://docs.google.com/document/d/1am1kO7Wmf17U_Vz5_uyBB2OuSsc4TZQWRvbRhX52n5w/edit?usp=sharing -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-2306) Redesign Messaging Subsystem, switch to JCTools Queues and introduce new Backpressure model
[ https://issues.apache.org/jira/browse/STORM-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated STORM-2306: --- Summary: Redesign Messaging Subsystem, switch to JCTools Queues and introduce new Backpressure model (was: Redesign Messaging Subsystem and switch to JCTools Queues) > Redesign Messaging Subsystem, switch to JCTools Queues and introduce new > Backpressure model > --- > > Key: STORM-2306 > URL: https://issues.apache.org/jira/browse/STORM-2306 > Project: Apache Storm > Issue Type: Sub-task > Components: storm-core >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Assignee: Roshan Naik >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 61h 50m > Remaining Estimate: 0h > > Details in these documents: > 1) *Redesign of the messaging subsystem* > https://docs.google.com/document/d/1NK1DJ3aAkta-Im0m-2FObQ4cSRp8xSa301y6zoqcBeE/edit?usp=sharing > This doc discusses the new design for the messaging system. Plus some of the > optimizations being made. > 2) *Choosing a high performance messaging queue:* > https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing > This doc looks into how fast hardware can do inter-thread messaging and why > we chose the JCTools queues. > 3) *Backpressure Model* > https://docs.google.com/document/d/1Z9pRdI5wtnK-hVwE3Spe6VGCTsz9g8TkgxbTFcbL3jM/edit?usp=sharing > Describes the Backpressure model integrated into the new messaging subsystem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-3205) Optimization in TuplImpl
[ https://issues.apache.org/jira/browse/STORM-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated STORM-3205: --- Fix Version/s: 2.0.0 > Optimization in TuplImpl > > > Key: STORM-3205 > URL: https://issues.apache.org/jira/browse/STORM-3205 > Project: Apache Storm > Issue Type: Improvement > Components: storm-client >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Assignee: Roshan Naik >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Wrapping {{TuplImpl.values}} with Collections.unmodifiableList() turns out be > very expensive. Its intention is obviously to check and prevent accidental > tweaking TuplImpl once created. Given the high cost, if needed, we can limit > this extra checking mechanism in debug/dev mode. Being in the critical path > it means several thousand/million additional allocations per second of the > List wrapper object proportional to the number of bolt/spout instances. > *TVL :* > | |throughput (k/sec)|cores|mem (mb)| > |*master (#f5a410ba3)*|412 |4.26|103| > |*storm-3205*|547 (+33%)|4.09|132| > {{+cmd:+ bin/storm jar topos/storm-loadgen-2.0.0-SNAPSHOT.jar > org.apache.storm.loadgen.ThroughputVsLatency *--rate 55* --spouts 1 > --splitters 3 --counters 2 -c topology.acker.executors=0}} > *+ConstSpoutIdentityBoltNullBolt :+* > | |throughput| > |*master*|4.25 mill/sec| > |*storm-3205*|5.4 mill/sec (+27%)| > +cmd+: {{bin/storm jar topos/storm-perf-2.0.0-SNAPSHOT.jar > org.apache.storm.perf.ConstSpoutIdBoltNullBoltTopo -c > topology.acker.executors=0 -c topology.producer.batch.size=1000 400}} > *Note:* The perf gains are more evident when operating at high thoughputs w/o > backpressure occurring (i.e. some bolts have not yet become a bottleneck) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-3205) Optimization in TuplImpl
[ https://issues.apache.org/jira/browse/STORM-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16614579#comment-16614579 ] Roshan Naik commented on STORM-3205: *Raw TVL readings:* +MASTER: reading 1+ {code} start_time(s)end_time(s) rate(tuple/s) mean(ms) 99%ile(ms) 99.9%ile(ms) coresmem(MB) 0 30363,105.633552.793844.104 853.017 3.804174.941 30 60447,122.200 2,142.178 5,695.865 5,804.917 4.550 51.488 60 90416,901.400 9,418.863 13,019.120 13,086.228 4.414155.132 90120420,867.467 16,801.866 20,099.105 20,149.436 4.352100.409 120150429,340.700 23,451.448 26,575.110 26,742.882 4.412105.838 150180427,491.233 30,118.689 33,386.660 33,436.991 4.417145.061 180210410,376.700 37,208.048 41,003.516 41,070.625 4.390139.888 210240420,193.767 44,735.209 48,083.501 48,184.164 4.392 97.299 240270424,774.967 51,567.712 54,962.160 55,029.268 4.412127.620 {code} +MASTER: reading2+ {code} start_time(s)end_time(s) rate(tuple/s) mean(ms) 99%ile(ms) 99.9%ile(ms) coresmem(MB) 0 30352,283.467666.673988.807 996.147 3.833131.411 30 60452,429.567 2,457.883 5,981.078 6,106.907 4.825128.944 60 90389,890.233 10,536.345 14,738.784 14,856.225 4.246145.399 90120399,513.033 19,105.363 22,917.677 23,068.672 4.138135.547 120150412,216.733 26,820.408 30,450.647 30,601.642 4.195 55.722 150180415,783.600 34,411.969 37,882.954 37,950.063 4.235 95.921 180210428,794.100 41,234.477 44,493.177 44,560.286 4.383 71.357 210240427,405.133 47,840.604 51,204.063 51,271.172 4.349112.849 {code} +STORM-3205 : TVL reading1+ {code} start_time(s)end_time(s) rate(tuple/s) mean(ms) 99%ile(ms) 99.9%ile(ms) coresmem(MB) 0 30365,140.233389.399851.444 865.599 3.733112.723 30 60551,828.200 2.096 58.917 109.838 4.593 68.210 60 90550,123.200 0.658 4.268 6.423 4.433107.547 90120550,244.433 0.567 1.696 6.160 4.146113.571 120150550,271.567 0.547 1.245 3.652 4.170144.445 150180550,279.300 1.077 18.792 25.756 4.229112.253 180210550,234.733 0.529 1.136 1.579 4.069136.361 210240550,241.433 0.530 1.150 1.587 4.087160.657 {code} +STORM-3205 : TVL reading2+ {code} start_time(s)end_time(s) rate(tuple/s) mean(ms) 99%ile(ms) 99.9%ile(ms) coresmem(MB) 0 30366,992.967280.818569.377 575.144 3.693155.890 30 60550,076.200 0.556 1.385 5.751 4.023163.471 60 91532,387.032 0.555 1.333 5.321 3.958142.208 91121550,137.600 0.545 1.209 4.522 4.024120.298 121151550,156.633 0.569 1.647 5.382 4.210 87.505 151181550,222.900 0.574 1.808 5.730 4.176169.807 181211550,226.400 0.535 1.202 1.917 4.133154.749 211241550,156.300 0.530 1.150 1.644 4.052117.847 {code} > Optimization in TuplImpl >
[jira] [Updated] (STORM-3205) Optimization in TuplImpl
[ https://issues.apache.org/jira/browse/STORM-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated STORM-3205: --- Description: Wrapping {{TuplImpl.values}} with Collections.unmodifiableList() turns out be very expensive. Its intention is obviously to check and prevent accidental tweaking TuplImpl once created. Given the high cost we can limit this extra checking mechanism in debug/dev mode. Due to this allocation being in the critical path it means several thousand/million additional allocations per second of the List wrapper object proportional to the number of bolt/spout instances. *TVL :* | |throughput (k/sec)|cores|mem (mb)| |*master (#f5a410ba3)*|412 |4.26|103| |*storm-3205*|547 (+33%)|4.09|132| {{+cmd:+ bin/storm jar topos/storm-loadgen-2.0.0-SNAPSHOT.jar org.apache.storm.loadgen.ThroughputVsLatency *--rate 55* --spouts 1 --splitters 3 --counters 2 -c topology.acker.executors=0}} {{ }} *+ConstSpoutIdentityBoltNullBolt :+* | |throughput| |*master*|4.25 mill/sec| |*storm-3205*|5.4 mill/sec (+27%)| {{ }} {{ +cmd+: bin/storm jar topos/storm-perf-2.0.0-SNAPSHOT.jar org.apache.storm.perf.ConstSpoutIdBoltNullBoltTopo -c topology.acker.executors=0 -c topology.producer.batch.size=1000 400}} {{ }} *Note:* The perf gains are more evident when operating at high thoughputs w/o backpressure occurring (i.e. some bolts have not yet become a bottleneck) was: Wrapping {{TuplImpl.values}} with Collections.unmodifiableList() turns out be very expensive. Its intention is obviously to check and prevent accidental tweaking TuplImpl once created. Given the high cost we can limit this extra checking mechanism in debug/dev mode. Due to this allocation being in the critical path it means several thousand/million additional allocations per second of the List wrapper object proportional to the number of bolt/spout instances. *TVL :* | |throughput (k/sec)|cores|mem (mb)| |*master (#f5a410ba3)*|412 |4.26|103| |*storm-3205*|547 (+33%)|4.09|132| +cmd:+ bin/storm jar topos/storm-loadgen-2.0.0-SNAPSHOT.jar org.apache.storm.loadgen.ThroughputVsLatency *--rate 55* --spouts 1 --splitters 3 --counters 2 -c topology.acker.executors=0 *+ConstSpoutIdentityBoltNullBolt :+* | |throughput| |*master*|4.25 mill/sec| |*storm-3205*|5.4 mill/sec (+27%)| +cmd+: bin/storm jar topos/storm-perf-2.0.0-SNAPSHOT.jar org.apache.storm.perf.ConstSpoutIdBoltNullBoltTopo -c topology.acker.executors=0 -c topology.producer.batch.size=1000 400 Note: The perf gains are more evident when operating at high thoughputs w/o backpressure occurring (i.e. some bolts have not yet become a bottleneck) > Optimization in TuplImpl > > > Key: STORM-3205 > URL: https://issues.apache.org/jira/browse/STORM-3205 > Project: Apache Storm > Issue Type: Improvement > Components: storm-client >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Assignee: Roshan Naik >Priority: Major > > Wrapping {{TuplImpl.values}} with Collections.unmodifiableList() turns out be > very expensive. Its intention is obviously to check and prevent accidental > tweaking TuplImpl once created. Given the high cost we can limit this extra > checking mechanism in debug/dev mode. Due to this allocation being in the > critical path it means several thousand/million additional allocations per > second of the List wrapper object proportional to the number of > bolt/spout instances. > *TVL :* > | |throughput (k/sec)|cores|mem (mb)| > |*master (#f5a410ba3)*|412 |4.26|103| > |*storm-3205*|547 (+33%)|4.09|132| > {{+cmd:+ bin/storm jar topos/storm-loadgen-2.0.0-SNAPSHOT.jar > org.apache.storm.loadgen.ThroughputVsLatency *--rate 55* --spouts 1 > --splitters 3 --counters 2 -c topology.acker.executors=0}} > {{ }} > *+ConstSpoutIdentityBoltNullBolt :+* > | |throughput| > |*master*|4.25 mill/sec| > |*storm-3205*|5.4 mill/sec (+27%)| > {{ }} > {{ +cmd+: bin/storm jar topos/storm-perf-2.0.0-SNAPSHOT.jar > org.apache.storm.perf.ConstSpoutIdBoltNullBoltTopo -c > topology.acker.executors=0 -c topology.producer.batch.size=1000 400}} > {{ }} > *Note:* The perf gains are more evident when operating at high thoughputs w/o > backpressure occurring (i.e. some bolts have not yet become a bottleneck) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-3205) Optimization in TuplImpl
[ https://issues.apache.org/jira/browse/STORM-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated STORM-3205: --- Description: Wrapping {{TuplImpl.values}} with Collections.unmodifiableList() turns out be very expensive. Its intention is obviously to check and prevent accidental tweaking TuplImpl once created. Given the high cost we can limit this extra checking mechanism in debug/dev mode. Due to this allocation being in the critical path it means several thousand/million additional allocations per second of the List wrapper object proportional to the number of bolt/spout instances. *TVL :* | |throughput (k/sec)|cores|mem (mb)| |*master (#f5a410ba3)*|412 |4.26|103| |*storm-3205*|547 (+33%)|4.09|132| +cmd:+ bin/storm jar topos/storm-loadgen-2.0.0-SNAPSHOT.jar org.apache.storm.loadgen.ThroughputVsLatency *--rate 55* --spouts 1 --splitters 3 --counters 2 -c topology.acker.executors=0 *+ConstSpoutIdentityBoltNullBolt :+* | |throughput| |*master*|4.25 mill/sec| |*storm-3205*|5.4 mill/sec (+27%)| +cmd+: bin/storm jar topos/storm-perf-2.0.0-SNAPSHOT.jar org.apache.storm.perf.ConstSpoutIdBoltNullBoltTopo -c topology.acker.executors=0 -c topology.producer.batch.size=1000 400 Note: The perf gains are more evident when operating at high thoughputs w/o backpressure occurring (i.e. some bolts have not yet become a bottleneck) was: Wrapping {{TuplImpl.values}} with Collections.unmodifiableList() turns out be very expensive. Its intention is obviously to check and prevent accidental tweaking TuplImpl once created. Given the potentially high cost we can limit this extra checking mechanism in debug/dev mode. Due to this allocation being in the critical path it means several thousand/million additional allocations per second of the List wrapper object proportional to the number of bolt/spout instances. *TVL :* | |throughput (k/sec)|cores|mem (mb)| |*master (#f5a410ba3)*|412 |4.26|103| |*storm-3205*|547 (+33%)|4.09|132| +cmd:+ bin/storm jar topos/storm-loadgen-2.0.0-SNAPSHOT.jar org.apache.storm.loadgen.ThroughputVsLatency *--rate 55* --spouts 1 --splitters 3 --counters 2 -c topology.acker.executors=0 *+ConstSpoutIdentityBoltNullBolt :+* | |throughput| |*master*|4.25 mill/sec| |*storm-3205*|5.4 mill/sec (+27%)| +cmd+: bin/storm jar topos/storm-perf-2.0.0-SNAPSHOT.jar org.apache.storm.perf.ConstSpoutIdBoltNullBoltTopo -c topology.acker.executors=0 -c topology.producer.batch.size=1000 400 Note: The perf gains are more evident when operating at high thoughputs w/o backpressure occurring (i.e. some bolts have not yet become a bottleneck) > Optimization in TuplImpl > > > Key: STORM-3205 > URL: https://issues.apache.org/jira/browse/STORM-3205 > Project: Apache Storm > Issue Type: Improvement > Components: storm-client >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Assignee: Roshan Naik >Priority: Major > > Wrapping {{TuplImpl.values}} with Collections.unmodifiableList() turns out be > very expensive. Its intention is obviously to check and prevent accidental > tweaking TuplImpl once created. Given the high cost we can limit this extra > checking mechanism in debug/dev mode. Due to this allocation being in the > critical path it means several thousand/million additional allocations per > second of the List wrapper object proportional to the number of > bolt/spout instances. > *TVL :* > | |throughput (k/sec)|cores|mem (mb)| > |*master (#f5a410ba3)*|412 |4.26|103| > |*storm-3205*|547 (+33%)|4.09|132| > +cmd:+ bin/storm jar topos/storm-loadgen-2.0.0-SNAPSHOT.jar > org.apache.storm.loadgen.ThroughputVsLatency *--rate 55* --spouts 1 > --splitters 3 --counters 2 -c topology.acker.executors=0 > > *+ConstSpoutIdentityBoltNullBolt :+* > | |throughput| > |*master*|4.25 mill/sec| > |*storm-3205*|5.4 mill/sec (+27%)| > > +cmd+: bin/storm jar topos/storm-perf-2.0.0-SNAPSHOT.jar > org.apache.storm.perf.ConstSpoutIdBoltNullBoltTopo -c > topology.acker.executors=0 -c topology.producer.batch.size=1000 400 > > Note: The perf gains are more evident when operating at high thoughputs w/o > backpressure occurring (i.e. some bolts have not yet become a bottleneck) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-3205) Optimization in TuplImpl
[ https://issues.apache.org/jira/browse/STORM-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated STORM-3205: --- Description: Wrapping {{TuplImpl.values}} with Collections.unmodifiableList() turns out be very expensive. Its intention is obviously to check and prevent accidental tweaking TuplImpl once created. Given the potentially high cost we can limit this extra checking mechanism in debug/dev mode. Due to this allocation being in the critical path it means several thousand/million additional allocations per second of the List wrapper object proportional to the number of bolt/spout instances. *TVL :* | |throughput (k/sec)|cores|mem (mb)| |*master (#f5a410ba3)*|412 |4.26|103| |*storm-3205*|547 (+33%)|4.09|132| +cmd:+ bin/storm jar topos/storm-loadgen-2.0.0-SNAPSHOT.jar org.apache.storm.loadgen.ThroughputVsLatency *--rate 55* --spouts 1 --splitters 3 --counters 2 -c topology.acker.executors=0 *+ConstSpoutIdentityBoltNullBolt :+* | |throughput| |*master*|4.25 mill/sec| |*storm-3205*|5.4 mill/sec (+27%)| +cmd+: bin/storm jar topos/storm-perf-2.0.0-SNAPSHOT.jar org.apache.storm.perf.ConstSpoutIdBoltNullBoltTopo -c topology.acker.executors=0 -c topology.producer.batch.size=1000 400 Note: The perf gains are more evident when operating at high thoughputs w/o backpressure occurring (i.e. some bolts have not yet become a bottleneck) > Optimization in TuplImpl > > > Key: STORM-3205 > URL: https://issues.apache.org/jira/browse/STORM-3205 > Project: Apache Storm > Issue Type: Improvement > Components: storm-client >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Assignee: Roshan Naik >Priority: Major > > Wrapping {{TuplImpl.values}} with Collections.unmodifiableList() turns out be > very expensive. Its intention is obviously to check and prevent accidental > tweaking TuplImpl once created. Given the potentially high cost we can limit > this extra checking mechanism in debug/dev mode. Due to this allocation being > in the critical path it means several thousand/million additional allocations > per second of the List wrapper object proportional to the number of > bolt/spout instances. > *TVL :* > | |throughput (k/sec)|cores|mem (mb)| > |*master (#f5a410ba3)*|412 |4.26|103| > |*storm-3205*|547 (+33%)|4.09|132| > +cmd:+ bin/storm jar topos/storm-loadgen-2.0.0-SNAPSHOT.jar > org.apache.storm.loadgen.ThroughputVsLatency *--rate 55* --spouts 1 > --splitters 3 --counters 2 -c topology.acker.executors=0 > > *+ConstSpoutIdentityBoltNullBolt :+* > | |throughput| > |*master*|4.25 mill/sec| > |*storm-3205*|5.4 mill/sec (+27%)| > > +cmd+: bin/storm jar topos/storm-perf-2.0.0-SNAPSHOT.jar > org.apache.storm.perf.ConstSpoutIdBoltNullBoltTopo -c > topology.acker.executors=0 -c topology.producer.batch.size=1000 400 > > Note: The perf gains are more evident when operating at high thoughputs w/o > backpressure occurring (i.e. some bolts have not yet become a bottleneck) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-3205) Optimization in TuplImpl
[ https://issues.apache.org/jira/browse/STORM-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated STORM-3205: --- Summary: Optimization in TuplImpl (was: Optimization in critical path) > Optimization in TuplImpl > > > Key: STORM-3205 > URL: https://issues.apache.org/jira/browse/STORM-3205 > Project: Apache Storm > Issue Type: Improvement > Components: storm-client >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Assignee: Roshan Naik >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (STORM-3205) Optimization in critical path
[ https://issues.apache.org/jira/browse/STORM-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik reassigned STORM-3205: -- Assignee: Roshan Naik > Optimization in critical path > - > > Key: STORM-3205 > URL: https://issues.apache.org/jira/browse/STORM-3205 > Project: Apache Storm > Issue Type: Improvement > Components: storm-client >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Assignee: Roshan Naik >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-3205) Optimization in critical path
[ https://issues.apache.org/jira/browse/STORM-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated STORM-3205: --- Component/s: storm-client > Optimization in critical path > - > > Key: STORM-3205 > URL: https://issues.apache.org/jira/browse/STORM-3205 > Project: Apache Storm > Issue Type: Improvement > Components: storm-client >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-3205) Optimization in critical path
[ https://issues.apache.org/jira/browse/STORM-3205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated STORM-3205: --- Affects Version/s: 2.0.0 > Optimization in critical path > - > > Key: STORM-3205 > URL: https://issues.apache.org/jira/browse/STORM-3205 > Project: Apache Storm > Issue Type: Improvement > Components: storm-client >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (STORM-3205) Optimization in critical path
Roshan Naik created STORM-3205: -- Summary: Optimization in critical path Key: STORM-3205 URL: https://issues.apache.org/jira/browse/STORM-3205 Project: Apache Storm Issue Type: Improvement Reporter: Roshan Naik -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2947) Review and fix/remove deprecated things in Storm 2.0.0
[ https://issues.apache.org/jira/browse/STORM-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16544325#comment-16544325 ] Roshan Naik commented on STORM-2947: At a bare minimum I think we should document the setting noted in the earlier comment that are no longer applicable in 2.0 release. If thats ok .. let me know... i can go ahead and pick this up. > Review and fix/remove deprecated things in Storm 2.0.0 > -- > > Key: STORM-2947 > URL: https://issues.apache.org/jira/browse/STORM-2947 > Project: Apache Storm > Issue Type: Task > Components: storm-client, storm-hdfs, storm-kafka, storm-server, > storm-solr >Affects Versions: 2.0.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Labels: pull-request-available > Time Spent: 3.5h > Remaining Estimate: 0h > > We've been deprecating the things but haven't have time to replace/get rid of > them. It should be better if we have time to review and address them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-3100) Minor optimization: Replace HashMap with an array backed data structure for faster lookups
[ https://issues.apache.org/jira/browse/STORM-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538051#comment-16538051 ] Roshan Naik commented on STORM-3100: fine with me. > Minor optimization: Replace HashMap with an array backed data > structure for faster lookups > -- > > Key: STORM-3100 > URL: https://issues.apache.org/jira/browse/STORM-3100 > Project: Apache Storm > Issue Type: Improvement >Reporter: Roshan Naik >Assignee: Roshan Naik >Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > * Introduce _CustomIndexArray_: An array backed data structure to speedup > HashMap use cases *in critical path*. It needs to supported -ve > indexing and a user defined (on construction) Upper and Lower Index range. > Does not need to be dynamically resizable given the nature of use cases we > have. > * Use this data structure for _GeneralTopologyContext._taskToComponent_ > mapping which is looked up in the critical path _Task.getOutgoingTasks._ This > lookup happens at least once for every emit and consequently can happen > millions of times per second. > * Also use this for _JCQueue.localReceiveQueues_ where the basic idea is > already in use but not in a reusable manner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (STORM-3100) Minor optimization: Replace HashMap with an array backed data structure for faster lookups
[ https://issues.apache.org/jira/browse/STORM-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik reassigned STORM-3100: -- Assignee: Roshan Naik > Minor optimization: Replace HashMap with an array backed data > structure for faster lookups > -- > > Key: STORM-3100 > URL: https://issues.apache.org/jira/browse/STORM-3100 > Project: Apache Storm > Issue Type: Improvement >Reporter: Roshan Naik >Assignee: Roshan Naik >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > * Introduce _CustomIndexArray_: An array backed data structure to speedup > HashMap use cases *in critical path*. It needs to supported -ve > indexing and a user defined (on construction) Upper and Lower Index range. > Does not need to be dynamically resizable given the nature of use cases we > have. > * Use this data structure for _GeneralTopologyContext._taskToComponent_ > mapping which is looked up in the critical path _Task.getOutgoingTasks._ This > lookup happens at least once for every emit and consequently can happen > millions of times per second. > * Also use this for _JCQueue.localReceiveQueues_ where the basic idea is > already in use but not in a reusable manner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-3100) Minor optimization: Replace HashMap with an array backed data structure for faster lookups
[ https://issues.apache.org/jira/browse/STORM-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated STORM-3100: --- Description: * Introduce _CustomIndexArray_: An array backed data structure to speedup HashMap use cases *in critical path*. It needs to supported -ve indexing and a user defined (on construction) Upper and Lower Index range. Does not need to be dynamically resizable given the nature of use cases we have. * Use this data structure for _GeneralTopologyContext._taskToComponent_ mapping which is looked up in the critical path _Task.getOutgoingTasks._ This lookup happens at least once for every emit and consequently can happen millions of times per second. * Also use this for _JCQueue.localReceiveQueues_ where the basic idea is already in use but not in a reusable manner. was: * Introduce _CustomIndexArray_: An array backed data structure to replace HashMap use cases. So it needs to supported -ve indexing. Does not need to be dynamically resizable given the nature of use cases we have. Upper and lower Index range needs to be specified at construction time. * Use this data structure for _GeneralTopologyContext._taskToComponent_ mapping which is looked up in the critical path _Task.getOutgoingTasks._ This lookup happens at least once for every emit and consequently can happen millions of times per second. * Also use this for _JCQueue.localReceiveQueues_ where the basic idea is already in use but not in a reusable manner. > Minor optimization: Replace HashMap with an array backed data > structure for faster lookups > -- > > Key: STORM-3100 > URL: https://issues.apache.org/jira/browse/STORM-3100 > Project: Apache Storm > Issue Type: Improvement >Reporter: Roshan Naik >Priority: Major > > * Introduce _CustomIndexArray_: An array backed data structure to speedup > HashMap use cases *in critical path*. It needs to supported -ve > indexing and a user defined (on construction) Upper and Lower Index range. > Does not need to be dynamically resizable given the nature of use cases we > have. > * Use this data structure for _GeneralTopologyContext._taskToComponent_ > mapping which is looked up in the critical path _Task.getOutgoingTasks._ This > lookup happens at least once for every emit and consequently can happen > millions of times per second. > * Also use this for _JCQueue.localReceiveQueues_ where the basic idea is > already in use but not in a reusable manner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-3100) Minor optimization: Replace HashMap with an array backed data structure for faster lookups
[ https://issues.apache.org/jira/browse/STORM-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated STORM-3100: --- Summary: Minor optimization: Replace HashMap with an array backed data structure for faster lookups (was: Minor optimization: Replace HashMap in critical path with an array backed data structure for faster lookups) > Minor optimization: Replace HashMap with an array backed data > structure for faster lookups > -- > > Key: STORM-3100 > URL: https://issues.apache.org/jira/browse/STORM-3100 > Project: Apache Storm > Issue Type: Improvement >Reporter: Roshan Naik >Priority: Major > > * Introduce _CustomIndexArray_: An array backed data structure to replace > HashMap use cases. So it needs to supported -ve indexing. Does > not need to be dynamically resizable given the nature of use cases we have. > Upper and lower Index range needs to be specified at construction time. > * Use this data structure for _GeneralTopologyContext._taskToComponent_ > mapping which is looked up in the critical path _Task.getOutgoingTasks._ This > lookup happens at least once for every emit and consequently can happen > millions of times per second. > * Also use this for _JCQueue.localReceiveQueues_ where the basic idea is > already in use but not in a reusable manner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (STORM-3100) Minor optimization: Replace HashMap in critical path with an array backed data structure for faster lookups
Roshan Naik created STORM-3100: -- Summary: Minor optimization: Replace HashMap in critical path with an array backed data structure for faster lookups Key: STORM-3100 URL: https://issues.apache.org/jira/browse/STORM-3100 Project: Apache Storm Issue Type: Improvement Reporter: Roshan Naik * Introduce _CustomIndexArray_: An array backed data structure to replace HashMap use cases. So it needs to supported -ve indexing. Does not need to be dynamically resizable given the nature of use cases we have. Upper and lower Index range needs to be specified at construction time. * Use this data structure for _GeneralTopologyContext._taskToComponent_ mapping which is looked up in the critical path _Task.getOutgoingTasks._ This lookup happens at least once for every emit and consequently can happen millions of times per second. * Also use this for _JCQueue.localReceiveQueues_ where the basic idea is already in use but not in a reusable manner. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (STORM-2983) Some topologies not working properly
[ https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427559#comment-16427559 ] Roshan Naik edited comment on STORM-2983 at 4/5/18 8:44 PM: Thanks [~revans2] for those clarifications. Thanks [~ethanli] for the revised fix. Similar fixes are needed elsewhere as well, since they are all probably broken: - [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/StormCommon.java#L276] - [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/StormCommon.java#L363-L364] - [https://github.com/apache/storm/blob/master/examples/storm-loadgen/src/main/java/org/apache/storm/loadgen/CaptureLoad.java#L109-L112] - [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/worker/Worker.java#L306] was (Author: roshan_naik): Thanks [~revans2] for those clarifications. Thanks [~ethanli] for the revised fix. Similar fixes are needed elsewhere as well, since they are all probably broken: - [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/StormCommon.java#L276] - [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/StormCommon.java#L363-L364] - [https://github.com/apache/storm/blob/master/examples/storm-loadgen/src/main/java/org/apache/storm/loadgen/CaptureLoad.java#L109-L112] - [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/worker/Worker.java#L306] > Some topologies not working properly > - > > Key: STORM-2983 > URL: https://issues.apache.org/jira/browse/STORM-2983 > Project: Apache Storm > Issue Type: Bug >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Critical > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > For example, > {code:java} > bin/storm jar storm-loadgen-*.jar > org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 > --counters 1 -c topology.debug=true > {code} > on ResourceAwareScheduler not working properly. > With default cluster settings, there will be only one __acker-executor and it > will be on a separate worker. And it looks like the __acker-executor was not > able to receive messages from spouts and bolts. And spouts and bolts > continued to retry sending messages to acker. It then led to another problem: > STORM-2970 > I tried to run on storm right before > [https://github.com/apache/storm/pull/2502] and right after and confirmed > that this bug should be related to it -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2983) Some topologies not working properly
[ https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427559#comment-16427559 ] Roshan Naik commented on STORM-2983: Thanks [~revans2] for those clarifications. Thanks [~ethanli] for the revised fix. Similar fixes are needed elsewhere as well, since they are all probably broken: - [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/StormCommon.java#L276] - [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/StormCommon.java#L363-L364] - [https://github.com/apache/storm/blob/master/examples/storm-loadgen/src/main/java/org/apache/storm/loadgen/CaptureLoad.java#L109-L112] - [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/worker/Worker.java#L306] > Some topologies not working properly > - > > Key: STORM-2983 > URL: https://issues.apache.org/jira/browse/STORM-2983 > Project: Apache Storm > Issue Type: Bug >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Critical > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > For example, > {code:java} > bin/storm jar storm-loadgen-*.jar > org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 > --counters 1 -c topology.debug=true > {code} > on ResourceAwareScheduler not working properly. > With default cluster settings, there will be only one __acker-executor and it > will be on a separate worker. And it looks like the __acker-executor was not > able to receive messages from spouts and bolts. And spouts and bolts > continued to retry sending messages to acker. It then led to another problem: > STORM-2970 > I tried to run on storm right before > [https://github.com/apache/storm/pull/2502] and right after and confirmed > that this bug should be related to it -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (STORM-2983) Some topologies not working properly
[ https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425223#comment-16425223 ] Roshan Naik edited comment on STORM-2983 at 4/4/18 9:15 AM: Mutability of topology configuration is explicitly allowed in the documentation (previously referenced). Usefulness of single worker use case should not be under-estimated. For perf critical topos it is a reasonable option to consider deploying multiple single worker instances of the topo as opposed to a single multiworker instance to avoid interworker communication. I have recommended users on mailing list to consider it (for things like video processing) and I have seen others recommend users to consider 1worker mode as well. Given the fact that RA creates the divergence in specification v/s reality, confusion is bound to exist no matter which we resolve this value of this setting. Although IMO it is not a strong enough case to not reflect actual worker count ... if it is contentious ... I am ok to resolve it either way as long we provide an API for so that code can check. Currently I am aware of a few additional use cases for knowing worker count. * enable/disable interworker BP subsystem.(already in place) * auto disabling Locality Awareness for 1 worker (i have measured this impact to be a bit under 10% or something for 1 worker) * user bolts inspecting this to make decisions.. like potentially allocate a certain number of external resources based on worker count... or who knows what. My concern is limited to the fact that we must have official support to be able to detect the worker count. It is simply too heavy handed to simply say NO to all current and potential use cases. was (Author: roshan_naik): Mutability of topology configuration is explicitly allowed in the documentation (previously referenced). Usefulness of single worker use case should not be under-estimated. For perf critical topos it is a reasonable option to consider deploying multiple single worker instances of the topo as opposed to a single multiworker instance to avoid interworker communication. I have recommended users on mailing list to consider it (for things like video processing) and I have seen others recommend considering 1worker mode as well. Given the fact that RA creates the divergence in specification v/s reality confusion is bound to exist no matter which we resolve this value of this setting. Although IMO it is not a strong enough case to not reflect actual worker count ... if it is contentious ... I am ok to resolve it either way as long we provide an API for so that code can check. Currently I am aware of a few additional use cases for knowing worker count. * enable/disable interworker BP subsystem.(already in place) * auto disabling Locality Awareness for 1 worker (i have measured this impact to be a bit under 10% or something for 1 worker) * user bolts inspecting this to make decisions.. like potentially allocate a certain number of external resources based on worker count... or who knows what. My concern is limited to the fact that we must have official support to be able to detect the worker count. It is simply too heavy handed to simply say NO to all current and potential use cases. > Some topologies not working properly > - > > Key: STORM-2983 > URL: https://issues.apache.org/jira/browse/STORM-2983 > Project: Apache Storm > Issue Type: Bug >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > For example, > {code:java} > bin/storm jar storm-loadgen-*.jar > org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 > --counters 1 -c topology.debug=true > {code} > on ResourceAwareScheduler not working properly. > With default cluster settings, there will be only one __acker-executor and it > will be on a separate worker. And it looks like the __acker-executor was not > able to receive messages from spouts and bolts. And spouts and bolts > continued to retry sending messages to acker. It then led to another problem: > STORM-2970 > I tried to run on storm right before > [https://github.com/apache/storm/pull/2502] and right after and confirmed > that this bug should be related to it -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2983) Some topologies not working properly
[ https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425223#comment-16425223 ] Roshan Naik commented on STORM-2983: Mutability of topology configuration is explicitly allowed in the documentation (previously referenced). Usefulness of single worker use case should not be under-estimated. For perf critical topos it is a reasonable option to consider deploying multiple single worker instances of the topo as opposed to a single multiworker instance to avoid interworker communication. I have recommended users on mailing list to consider it (for things like video processing) and I have seen others recommend considering 1worker mode as well. Given the fact that RA creates the divergence in specification v/s reality confusion is bound to exist no matter which we resolve this value of this setting. Although IMO it is not a strong enough case to not reflect actual worker count ... if it is contentious ... I am ok to resolve it either way as long we provide an API for so that code can check. Currently I am aware of a few additional use cases for knowing worker count. * enable/disable interworker BP subsystem.(already in place) * auto disabling Locality Awareness for 1 worker (i have measured this impact to be a bit under 10% or something for 1 worker) * user bolts inspecting this to make decisions.. like potentially allocate a certain number of external resources based on worker count... or who knows what. My concern is limited to the fact that we must have official support to be able to detect the worker count. It is simply too heavy handed to simply say NO to all current and potential use cases. > Some topologies not working properly > - > > Key: STORM-2983 > URL: https://issues.apache.org/jira/browse/STORM-2983 > Project: Apache Storm > Issue Type: Bug >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > For example, > {code:java} > bin/storm jar storm-loadgen-*.jar > org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 > --counters 1 -c topology.debug=true > {code} > on ResourceAwareScheduler not working properly. > With default cluster settings, there will be only one __acker-executor and it > will be on a separate worker. And it looks like the __acker-executor was not > able to receive messages from spouts and bolts. And spouts and bolts > continued to retry sending messages to acker. It then led to another problem: > STORM-2970 > I tried to run on storm right before > [https://github.com/apache/storm/pull/2502] and right after and confirmed > that this bug should be related to it -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2983) Some topologies not working properly
[ https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424776#comment-16424776 ] Roshan Naik commented on STORM-2983: [~kabhwan] i think you are again missing what I am stressing. We need a way to in code to do this check the worker count (for internal and user code). Not be removing code that does such checks. I am not concerned about retaining this one optimization. There is no point removing reasonable code and then put it back again. I would like to see why we cannot either fix the topology.workers or provide something else as substitue. > Some topologies not working properly > - > > Key: STORM-2983 > URL: https://issues.apache.org/jira/browse/STORM-2983 > Project: Apache Storm > Issue Type: Bug >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > For example, > {code:java} > bin/storm jar storm-loadgen-*.jar > org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 > --counters 1 -c topology.debug=true > {code} > on ResourceAwareScheduler not working properly. > With default cluster settings, there will be only one __acker-executor and it > will be on a separate worker. And it looks like the __acker-executor was not > able to receive messages from spouts and bolts. And spouts and bolts > continued to retry sending messages to acker. It then led to another problem: > STORM-2970 > I tried to run on storm right before > [https://github.com/apache/storm/pull/2502] and right after and confirmed > that this bug should be related to it -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2983) Some topologies not working properly
[ https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424680#comment-16424680 ] Roshan Naik commented on STORM-2983: My Q was not about finding and fixing all things that RAS breaks. It is limited to fixing this issue with the worker count that is causing the breakage. Any code we try to delete now to unblock you would be useful to revive once the worker count issue is fixed. - Instead of the proposed fix, can you update the worker count to the right value ? - else, could you consider unblocking your work by commenting out the optimization in your local build ? > Some topologies not working properly > - > > Key: STORM-2983 > URL: https://issues.apache.org/jira/browse/STORM-2983 > Project: Apache Storm > Issue Type: Bug >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > For example, > {code:java} > bin/storm jar storm-loadgen-*.jar > org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 > --counters 1 -c topology.debug=true > {code} > on ResourceAwareScheduler not working properly. > With default cluster settings, there will be only one __acker-executor and it > will be on a separate worker. And it looks like the __acker-executor was not > able to receive messages from spouts and bolts. And spouts and bolts > continued to retry sending messages to acker. It then led to another problem: > STORM-2970 > I tried to run on storm right before > [https://github.com/apache/storm/pull/2502] and right after and confirmed > that this bug should be related to it -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2983) Some topologies not working properly
[ https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424616#comment-16424616 ] Roshan Naik commented on STORM-2983: As stated before, the core issue is not the specific optimization. Along with removing this optimization we would have to remove all other code that checks the same. It is impt to get RAS working but needs to be done correctly. My concern is that (independent of the existence/absence of this optimization)... the mechanism to check the worker count by storm internal code or end user code is broken. Fixing that will address RAS as well as does not need to remove similar code. So would like to ask my prev question again: - Is there good reason why topology.workers cannot be dynamically updated to reflect the actual worker count. > Some topologies not working properly > - > > Key: STORM-2983 > URL: https://issues.apache.org/jira/browse/STORM-2983 > Project: Apache Storm > Issue Type: Bug >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > For example, > {code:java} > bin/storm jar storm-loadgen-*.jar > org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 > --counters 1 -c topology.debug=true > {code} > on ResourceAwareScheduler not working properly. > With default cluster settings, there will be only one __acker-executor and it > will be on a separate worker. And it looks like the __acker-executor was not > able to receive messages from spouts and bolts. And spouts and bolts > continued to retry sending messages to acker. It then led to another problem: > STORM-2970 > I tried to run on storm right before > [https://github.com/apache/storm/pull/2502] and right after and confirmed > that this bug should be related to it -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (STORM-2983) Some topologies not working properly
[ https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421019#comment-16421019 ] Roshan Naik edited comment on STORM-2983 at 3/30/18 10:26 PM: -- Removing / preserving the optimization will not address the core problem ... which is .. topology.workers is not usable and therefore any code (present/future) would be considered broken with no workaround. There are other use cases in code for this setting plus more that I can think of, but to limit the scope I will stay with the topic of some settings in topoConf being unreliable. I dont think its a good idea to say any such code should not allowed in storm. Since topoConf is part of the API ... it is surprising to know that any setting stored in it cannot be relied upon. The [documentation|http://storm.apache.org/releases/1.2.1/Configuration.html] specifically states that the system supports overriding settings. *The actual bug here is that we are not dynamically overriding the setting to what it should be.* I think we need to have a good justification for making any setting unreliable and not open the door for more of the same. Would like to know: - Is there good reason why this setting cannot be made reliable ? - Are you aware of other such settings that are not reflecting the correct values ? was (Author: roshan_naik): Removing / preserving the optimization will not address the core problem ... which is .. topology.worker.count is not usable and therefore any code (present/future) would be considered broken with no workaround. There are other use cases in code for this setting plus more that I can think of, but to limit the scope I will stay with the topic of some settings in topoConf being unreliable. I dont think its a good idea to say any such code should not allowed in storm. Since topoConf is part of the API ... it is surprising to know that any setting stored in it cannot be relied upon. The [documentation|http://storm.apache.org/releases/1.2.1/Configuration.html] specifically states that the system supports overriding settings. *The actual bug here is that we are not dynamically overriding the setting to what it should be.* I think we need to have a good justification for making any setting unreliable and not open the door for more of the same. Would like to know: - Is there good reason why this setting cannot be made reliable ? - Are you aware of other such settings that are not reflecting the correct values ? > Some topologies not working properly > - > > Key: STORM-2983 > URL: https://issues.apache.org/jira/browse/STORM-2983 > Project: Apache Storm > Issue Type: Bug >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > For example, > {code:java} > bin/storm jar storm-loadgen-*.jar > org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 > --counters 1 -c topology.debug=true > {code} > on ResourceAwareScheduler not working properly. > With default cluster settings, there will be only one __acker-executor and it > will be on a separate worker. And it looks like the __acker-executor was not > able to receive messages from spouts and bolts. And spouts and bolts > continued to retry sending messages to acker. It then led to another problem: > STORM-2970 > I tried to run on storm right before > [https://github.com/apache/storm/pull/2502] and right after and confirmed > that this bug should be related to it -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (STORM-2983) Some topologies not working properly
[ https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421019#comment-16421019 ] Roshan Naik edited comment on STORM-2983 at 3/30/18 10:25 PM: -- Removing / preserving the optimization will not address the core problem ... which is .. topology.worker.count is not usable and therefore any code (present/future) would be considered broken with no workaround. There are other use cases in code for this setting plus more that I can think of, but to limit the scope I will stay with the topic of some settings in topoConf being unreliable. I dont think its a good idea to say any such code should not allowed in storm. Since topoConf is part of the API ... it is surprising to know that any setting stored in it cannot be relied upon. The [documentation|http://storm.apache.org/releases/1.2.1/Configuration.html] specifically states that the system supports overriding settings. *The actual bug here is that we are not dynamically overriding the setting to what it should be.* I think we need to have a good justification for making any setting unreliable and not open the door for more of the same. Would like to know: - Is there good reason why this setting cannot be made reliable ? - Are you aware of other such settings that are not reflecting the correct values ? was (Author: roshan_naik): Removing / preserving the optimization will not address the core problem ... which is .. topology.worker.count is not usable and therefore any code (present/future) would be considered broken with no workaround. There are other use cases in code for this setting plus more that I can think of, but to limit the scope I will stay with the topic of some settings in topoConf being unreliable. I dont think its a good idea to say any such code should not allowed in storm. Since topoConf is part of the API ... it is surprising to know that any setting stored in it cannot be relied upon. The [documentation|http://storm.apache.org/releases/1.2.1/Configuration.html] specifically states that the system supports overriding settings. *The actual bug here is that we are not dynamically overriding the setting to what it should be.* I think we need to have a good justification for making any setting unreliable and not open the door for more of the same. Would like to know: - Is there good reason why this setting cannot be made reliable ? - Are you aware of other such settings that are not reflecting the correct values ? > Some topologies not working properly > - > > Key: STORM-2983 > URL: https://issues.apache.org/jira/browse/STORM-2983 > Project: Apache Storm > Issue Type: Bug >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > For example, > {code:java} > bin/storm jar storm-loadgen-*.jar > org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 > --counters 1 -c topology.debug=true > {code} > on ResourceAwareScheduler not working properly. > With default cluster settings, there will be only one __acker-executor and it > will be on a separate worker. And it looks like the __acker-executor was not > able to receive messages from spouts and bolts. And spouts and bolts > continued to retry sending messages to acker. It then led to another problem: > STORM-2970 > I tried to run on storm right before > [https://github.com/apache/storm/pull/2502] and right after and confirmed > that this bug should be related to it -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2983) Some topologies not working properly
[ https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421019#comment-16421019 ] Roshan Naik commented on STORM-2983: Removing / preserving the optimization will not address the core problem ... which is .. topology.worker.count is not usable and therefore any code (present/future) would be considered broken with no workaround. There are other use cases in code for this setting plus more that I can think of, but to limit the scope I will stay with the topic of some settings in topoConf being unreliable. I dont think its a good idea to say any such code should not allowed in storm. Since topoConf is part of the API ... it is surprising to know that any setting stored in it cannot be relied upon. The [documentation|http://storm.apache.org/releases/1.2.1/Configuration.html] specifically states that the system supports overriding settings. *The actual bug here is that we are not dynamically overriding the setting to what it should be.* I think we need to have a good justification for making any setting unreliable and not open the door for more of the same. Would like to know: - Is there good reason why this setting cannot be made reliable ? - Are you aware of other such settings that are not reflecting the correct values ? > Some topologies not working properly > - > > Key: STORM-2983 > URL: https://issues.apache.org/jira/browse/STORM-2983 > Project: Apache Storm > Issue Type: Bug >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > For example, > {code:java} > bin/storm jar storm-loadgen-*.jar > org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 > --counters 1 -c topology.debug=true > {code} > on ResourceAwareScheduler not working properly. > With default cluster settings, there will be only one __acker-executor and it > will be on a separate worker. And it looks like the __acker-executor was not > able to receive messages from spouts and bolts. And spouts and bolts > continued to retry sending messages to acker. It then led to another problem: > STORM-2970 > I tried to run on storm right before > [https://github.com/apache/storm/pull/2502] and right after and confirmed > that this bug should be related to it -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (STORM-2983) Some topologies not working properly
[ https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420273#comment-16420273 ] Roshan Naik edited comment on STORM-2983 at 3/30/18 8:20 AM: - [~kabhwan] IMO this should not be viewed as a sidestepping an minor optimization. In which case it does look like a minor issue. The optimization only exposed a bug. So it is only the "messenger". The key issue here is supporting (internal or user code) to check the state correctly .. to do whatever it needs to do optimization or behavior change. By not fixing the core bug, we wlll sidestep the core issue. IMO this a critical bug and needs to fixed. Or we need to provide a diff mechanism for code to make such checks correctly. was (Author: roshan_naik): [~kabhwan] IMO this should not be viewed as a sidestepping an optimization. In which case it does look like a minor issue. The optimization only exposed a bug. So it is only the "messenger". The key issue here is supporting (internal or user code) to check the state correctly .. to do whatever it needs to do optimization or behavior change. By not fixing the core bug, we wlll sidestep the core issue. IMO this a critical bug and needs to fixed. Or we need to provide a diff mechanism for code to make such checks correctly. > Some topologies not working properly > - > > Key: STORM-2983 > URL: https://issues.apache.org/jira/browse/STORM-2983 > Project: Apache Storm > Issue Type: Bug >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > For example, > {code:java} > bin/storm jar storm-loadgen-*.jar > org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 > --counters 1 -c topology.debug=true > {code} > on ResourceAwareScheduler not working properly. > With default cluster settings, there will be only one __acker-executor and it > will be on a separate worker. And it looks like the __acker-executor was not > able to receive messages from spouts and bolts. And spouts and bolts > continued to retry sending messages to acker. It then led to another problem: > STORM-2970 > I tried to run on storm right before > [https://github.com/apache/storm/pull/2502] and right after and confirmed > that this bug should be related to it -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2983) Some topologies not working properly
[ https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420273#comment-16420273 ] Roshan Naik commented on STORM-2983: [~kabhwan] IMO this should not be viewed as a sidestepping an optimization. In which case it does look like a minor issue. The optimization only exposed a bug. So it is only the "messenger". The key issue here is supporting (internal or user code) to check the state correctly .. to do whatever it needs to do optimization or behavior change. By not fixing the core bug, we wlll sidestep the core issue. IMO this a critical bug and needs to fixed. Or we need to provide a diff mechanism for code to make such checks correctly. > Some topologies not working properly > - > > Key: STORM-2983 > URL: https://issues.apache.org/jira/browse/STORM-2983 > Project: Apache Storm > Issue Type: Bug >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > For example, > {code:java} > bin/storm jar storm-loadgen-*.jar > org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 > --counters 1 -c topology.debug=true > {code} > on ResourceAwareScheduler not working properly. > With default cluster settings, there will be only one __acker-executor and it > will be on a separate worker. And it looks like the __acker-executor was not > able to receive messages from spouts and bolts. And spouts and bolts > continued to retry sending messages to acker. It then led to another problem: > STORM-2970 > I tried to run on storm right before > [https://github.com/apache/storm/pull/2502] and right after and confirmed > that this bug should be related to it -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2983) Some topologies not working properly
[ https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420162#comment-16420162 ] Roshan Naik commented on STORM-2983: [~ethanli] We have a problem in Storm with an excessive number of threads. See details in STORM-2647 for how bad the issue is. In general, best to avoid having useless threads lingering around as they are not free in terms of energy consumption in in the long run and keep bumping off useful threads from their cores periodically. IMO, the right approach is to fix this by addressing the fundamental issue of worker count reflecting the true state of the topology. Having the settings reflect the right values is also useful for topologies (as they can query them to find out the state of the topology). This would have to be fixed sooner or later. > Some topologies not working properly > - > > Key: STORM-2983 > URL: https://issues.apache.org/jira/browse/STORM-2983 > Project: Apache Storm > Issue Type: Bug >Reporter: Ethan Li >Assignee: Ethan Li >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > For example, > {code:java} > bin/storm jar storm-loadgen-*.jar > org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 > --counters 1 -c topology.debug=true > {code} > on ResourceAwareScheduler not working properly. > With default cluster settings, there will be only one __acker-executor and it > will be on a separate worker. And it looks like the __acker-executor was not > able to receive messages from spouts and bolts. And spouts and bolts > continued to retry sending messages to acker. It then led to another problem: > STORM-2970 > I tried to run on storm right before > [https://github.com/apache/storm/pull/2502] and right after and confirmed > that this bug should be related to it -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2983) Some topologies not working properly
[ https://issues.apache.org/jira/browse/STORM-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419547#comment-16419547 ] Roshan Naik commented on STORM-2983: Go ahead. I didn’t take it up since the changes are in the RAS code path. > Some topologies not working properly > - > > Key: STORM-2983 > URL: https://issues.apache.org/jira/browse/STORM-2983 > Project: Apache Storm > Issue Type: Bug >Reporter: Ethan Li >Priority: Major > > For example, > {code:java} > bin/storm jar storm-loadgen-*.jar > org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 > --counters 1 -c topology.debug=true > {code} > on ResourceAwareScheduler not working properly. > With default cluster settings, there will be only one __acker-executor and it > will be on a separate worker. And it looks like the __acker-executor was not > able to receive messages from spouts and bolts. And spouts and bolts > continued to retry sending messages to acker. It then led to another problem: > STORM-2970 > I tried to run on storm right before > [https://github.com/apache/storm/pull/2502] and right after and confirmed > that this bug should be related to it -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2947) Review and fix/remove deprecated things in Storm 2.0.0
[ https://issues.apache.org/jira/browse/STORM-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16407593#comment-16407593 ] Roshan Naik commented on STORM-2947: [~kabhwan] .. I realized. We cannot deprecate the above noted settings in 1.x code base. As they are still perfectly valid there w/o alternatives. We will just need to make a note of old ones that are no longer used in master branch. > Review and fix/remove deprecated things in Storm 2.0.0 > -- > > Key: STORM-2947 > URL: https://issues.apache.org/jira/browse/STORM-2947 > Project: Apache Storm > Issue Type: Task > Components: storm-client, storm-hdfs, storm-kafka, storm-server, > storm-solr >Affects Versions: 2.0.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > > We've been deprecating the things but haven't have time to replace/get rid of > them. It should be better if we have time to review and address them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-2964) Deprecate configs/interfaces/classes in 1.x line
[ https://issues.apache.org/jira/browse/STORM-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated STORM-2964: --- Summary: Deprecate configs/interfaces/classes in 1.x line (was: Deprecate removed configs/interfaces/classes from Storm 2.0.0 to 1.x version line) > Deprecate configs/interfaces/classes in 1.x line > > > Key: STORM-2964 > URL: https://issues.apache.org/jira/browse/STORM-2964 > Project: Apache Storm > Issue Type: Task > Components: storm-client >Reporter: Jungtaek Lim >Priority: Major > > This is to track the effort on deprecating removed configs/interfaces/classes > from Storm 2.0.0 to 1.x version line. We may be late on the party but still > be better to annotate rather than suddenly introducing the change. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2970) JCQueue can be full and will throw IllegalStateException
[ https://issues.apache.org/jira/browse/STORM-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16405773#comment-16405773 ] Roshan Naik commented on STORM-2970: [~ethanli] i think this issue was a side effect of worker thread not being spun up in multi-worker mode with RAS enabled. can we close this if not relevant anymore ? > JCQueue can be full and will throw IllegalStateException > > > Key: STORM-2970 > URL: https://issues.apache.org/jira/browse/STORM-2970 > Project: Apache Storm > Issue Type: Bug >Reporter: Ethan Li >Priority: Major > > I ran > {code:java} > storm jar /tmp/storm-loadgen-2.0.0-SNAPSHOT.jar > org.apache.storm.loadgen.ThroughputVsLatency --spouts 1 --splitters 2 > --counters 1 --rate 1000 -c topology.debug=true > {code} > and the topology was not running properly. And some exception showed up in > the worker.log: > {code:java} > 2018-02-23 15:13:29.939 o.a.s.u.Utils Thread-15-spout-executor[7, 7] [ERROR] > Async loop died! > java.lang.RuntimeException: java.lang.IllegalStateException: Queue full > at org.apache.storm.executor.Executor.accept(Executor.java:288) > ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at org.apache.storm.utils.JCQueue.consumeImpl(JCQueue.java:309) > ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at org.apache.storm.utils.JCQueue.consume(JCQueue.java:290) > ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at org.apache.storm.utils.JCQueue.consume(JCQueue.java:281) > ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at > org.apache.storm.executor.spout.SpoutExecutor$2.call(SpoutExecutor.java:173) > ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at > org.apache.storm.executor.spout.SpoutExecutor$2.call(SpoutExecutor.java:163) > ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at org.apache.storm.utils.Utils$2.run(Utils.java:352) > [storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131] > Caused by: java.lang.IllegalStateException: Queue full > at java.util.AbstractQueue.add(AbstractQueue.java:98) ~[?:1.8.0_131] > at > org.apache.storm.daemon.worker.WorkerTransfer.tryTransferRemote(WorkerTransfer.java:112) > ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at > org.apache.storm.daemon.worker.WorkerState.tryTransferRemote(WorkerState.java:526) > ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at > org.apache.storm.executor.ExecutorTransfer.tryTransfer(ExecutorTransfer.java:71) > ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at org.apache.storm.daemon.Task.sendUnanchored(Task.java:196) > ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at > org.apache.storm.executor.spout.SpoutOutputCollectorImpl.sendSpoutMsg(SpoutOutputCollectorImpl.java:164) > ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at > org.apache.storm.executor.spout.SpoutOutputCollectorImpl.emit(SpoutOutputCollectorImpl.java:75) > ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at > org.apache.storm.spout.SpoutOutputCollector.emit(SpoutOutputCollector.java:51) > ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at org.apache.storm.loadgen.LoadSpout.fail(LoadSpout.java:135) > ~[stormjar.jar:2.0.0-SNAPSHOT] > at > org.apache.storm.executor.spout.SpoutExecutor.failSpoutMsg(SpoutExecutor.java:362) > ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at > org.apache.storm.executor.spout.SpoutExecutor$1.expire(SpoutExecutor.java:126) > ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at > org.apache.storm.executor.spout.SpoutExecutor$1.expire(SpoutExecutor.java:119) > ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at org.apache.storm.utils.RotatingMap.rotate(RotatingMap.java:77) > ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at > org.apache.storm.executor.spout.SpoutExecutor.tupleActionFn(SpoutExecutor.java:298) > ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > at org.apache.storm.executor.Executor.accept(Executor.java:284) > ~[storm-client-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT] > ... 7 more > {code} > More log from the worker.log: > https://gist.github.com/Ethanlm/bcab1289a3813af6d7135aa95b1aaa14 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2963) Updates to Performance.md
[ https://issues.apache.org/jira/browse/STORM-2963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16376431#comment-16376431 ] Roshan Naik commented on STORM-2963: This is a catch all jira. As things come up I will keep accumulating them, and finally create a PR close to the 2.0 release. > Updates to Performance.md > -- > > Key: STORM-2963 > URL: https://issues.apache.org/jira/browse/STORM-2963 > Project: Apache Storm > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Assignee: Roshan Naik >Priority: Major > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2964) Deprecate old Spout wait strategy model in 1.x version line
[ https://issues.apache.org/jira/browse/STORM-2964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370880#comment-16370880 ] Roshan Naik commented on STORM-2964: [~kabhwan] After STORM-2959 was marked as duplicate of STORM-2947, I added my notes to 2947 itself about all such deprecations (including this one) .. so all deprecations are handled/documented uniformly. > Deprecate old Spout wait strategy model in 1.x version line > --- > > Key: STORM-2964 > URL: https://issues.apache.org/jira/browse/STORM-2964 > Project: Apache Storm > Issue Type: Task > Components: storm-client >Reporter: Jungtaek Lim >Priority: Major > > This is follow-up issue for STORM-2958, deprecating old Spout wait strategy > in 1.x version line with properly noting that it will be removed at Storm > 2.0.0. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (STORM-2963) Updates to Performance.md
Roshan Naik created STORM-2963: -- Summary: Updates to Performance.md Key: STORM-2963 URL: https://issues.apache.org/jira/browse/STORM-2963 Project: Apache Storm Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Roshan Naik Assignee: Roshan Naik Fix For: 2.0.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2947) Review and fix/remove deprecated things in Storm 2.0.0
[ https://issues.apache.org/jira/browse/STORM-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369615#comment-16369615 ] Roshan Naik commented on STORM-2947: *Notes:* *Settings to deprecate in 1.x :* As they have been removed in 2.x: - topology.sleep.spout.wait.strategy.time.ms - topology.backpressure.enable - backpressure.disruptor.high.watermark - backpressure.disruptor.low.watermark - backpressure.znode.timeout.secs - backpressure.znode.update.freq.secs - task.backpressure.poll.secs - topology.executor.receive.buffer.size: 1024 #batched - topology.executor.send.buffer.size: 1024 #individual messages - topology.transfer.buffer.size: 1024 # batched - topology.bolts.outgoing.overflow.buffer.enable: false - topology.disruptor.wait.timeout.millis: 1000 - topology.disruptor.batch.size: 100 - topology.disruptor.batch.timeout.millis: *Classes to deprecate in 1.x:* - org.apache.storm.spout.SleepSpoutWaitStrategy - org.apache.storm.spout.ISpoutWaitStrategy - org.apache.storm.spout.NothingEmptyEmitStrategy *Settings to remove in 2.x:* - nimbus.host - storm.messaging.netty.max_retries - storm.local.mode.zmq > Review and fix/remove deprecated things in Storm 2.0.0 > -- > > Key: STORM-2947 > URL: https://issues.apache.org/jira/browse/STORM-2947 > Project: Apache Storm > Issue Type: Task > Components: storm-client, storm-hdfs, storm-kafka, storm-server, > storm-solr >Affects Versions: 2.0.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > > We've been deprecating the things but haven't have time to replace/get rid of > them. It should be better if we have time to review and address them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (STORM-2947) Review and fix/remove deprecated things in Storm 2.0.0
[ https://issues.apache.org/jira/browse/STORM-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369615#comment-16369615 ] Roshan Naik edited comment on STORM-2947 at 2/20/18 12:39 AM: -- Few things that i am aware of: *Settings to deprecate in 1.x :* As they have been removed in 2.x: - topology.sleep.spout.wait.strategy.time.ms - topology.backpressure.enable - backpressure.disruptor.high.watermark - backpressure.disruptor.low.watermark - backpressure.znode.timeout.secs - backpressure.znode.update.freq.secs - task.backpressure.poll.secs - topology.executor.receive.buffer.size: 1024 #batched - topology.executor.send.buffer.size: 1024 #individual messages - topology.transfer.buffer.size: 1024 # batched - topology.bolts.outgoing.overflow.buffer.enable: false - topology.disruptor.wait.timeout.millis: 1000 - topology.disruptor.batch.size: 100 - topology.disruptor.batch.timeout.millis: *Classes to deprecate in 1.x:* - org.apache.storm.spout.SleepSpoutWaitStrategy - org.apache.storm.spout.ISpoutWaitStrategy - org.apache.storm.spout.NothingEmptyEmitStrategy *Settings to remove in 2.x:* - nimbus.host - storm.messaging.netty.max_retries - storm.local.mode.zmq was (Author: roshan_naik): *Notes:* *Settings to deprecate in 1.x :* As they have been removed in 2.x: - topology.sleep.spout.wait.strategy.time.ms - topology.backpressure.enable - backpressure.disruptor.high.watermark - backpressure.disruptor.low.watermark - backpressure.znode.timeout.secs - backpressure.znode.update.freq.secs - task.backpressure.poll.secs - topology.executor.receive.buffer.size: 1024 #batched - topology.executor.send.buffer.size: 1024 #individual messages - topology.transfer.buffer.size: 1024 # batched - topology.bolts.outgoing.overflow.buffer.enable: false - topology.disruptor.wait.timeout.millis: 1000 - topology.disruptor.batch.size: 100 - topology.disruptor.batch.timeout.millis: *Classes to deprecate in 1.x:* - org.apache.storm.spout.SleepSpoutWaitStrategy - org.apache.storm.spout.ISpoutWaitStrategy - org.apache.storm.spout.NothingEmptyEmitStrategy *Settings to remove in 2.x:* - nimbus.host - storm.messaging.netty.max_retries - storm.local.mode.zmq > Review and fix/remove deprecated things in Storm 2.0.0 > -- > > Key: STORM-2947 > URL: https://issues.apache.org/jira/browse/STORM-2947 > Project: Apache Storm > Issue Type: Task > Components: storm-client, storm-hdfs, storm-kafka, storm-server, > storm-solr >Affects Versions: 2.0.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > > We've been deprecating the things but haven't have time to replace/get rid of > them. It should be better if we have time to review and address them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (STORM-2959) Remove deprecated, old/unused config settings for 2.0
Roshan Naik created STORM-2959: -- Summary: Remove deprecated, old/unused config settings for 2.0 Key: STORM-2959 URL: https://issues.apache.org/jira/browse/STORM-2959 Project: Apache Storm Issue Type: Improvement Reporter: Roshan Naik See 3 settings that can be removed : nimbus.host storm.messaging.netty.max_retries storm.local.mode.zmq -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-2958) Use standard wait strategies for Spout as well
[ https://issues.apache.org/jira/browse/STORM-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated STORM-2958: --- Description: STORM-2306 introduced a new configurable wait strategy system for these situations * BackPressure Wait (used by spout & bolt) * No incoming data (used by bolt) There is another wait situation in the spout when there are no emits generated in a nextTuple() or if max.spout.pending has been reached. This Jira is to transition the spout wait strategy from the old model to the new model. Thereby we have a uniform model for dealing with wait strategies. was: STORM-2306 introduced a new configurable wait strategy system for these situations * BackPressure Wait (used by spout & bolt) * No incoming data (used by bolt) There is another wait situation in the spout when there are no emits generated in a nextTuple() or if max.spout.pending has been reached. This Jira is to transition the spout wait strategy from the old model to the new model. Thereby we have the uniform model for dealing with wait strategies. > Use standard wait strategies for Spout as well > -- > > Key: STORM-2958 > URL: https://issues.apache.org/jira/browse/STORM-2958 > Project: Apache Storm > Issue Type: Improvement > Components: storm-client >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Assignee: Roshan Naik >Priority: Major > Fix For: 2.0.0 > > > STORM-2306 introduced a new configurable wait strategy system for these > situations > * BackPressure Wait (used by spout & bolt) > * No incoming data (used by bolt) > There is another wait situation in the spout when there are no emits > generated in a nextTuple() or if max.spout.pending has been reached. This > Jira is to transition the spout wait strategy from the old model to the new > model. Thereby we have a uniform model for dealing with wait strategies. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-2958) Use standard wait strategies for Spout as well
[ https://issues.apache.org/jira/browse/STORM-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated STORM-2958: --- Description: STORM-2306 introduced a new configurable wait strategy system for these situations * BackPressure Wait (used by spout & bolt) * No incoming data (used by bolt) There is another wait situation in the spout when there are no emits generated in a nextTuple() or if max.spout.pending has been reached. This Jira is to transition the spout wait strategy from the old model to the new model. Thereby we have the uniform model for dealing with wait strategies. was: STORM-2306 introduced a new configurable wait strategy system for these situations * BackPressure Wait (used by spout & bolt) * No incoming data (used by bolt) There is another wait situation in the spout when there are no emits generated in a nextTuple() or if max.spout.pending has been reached. This Jira is to transition the spout wait strategy from the old model to the new model. > Use standard wait strategies for Spout as well > -- > > Key: STORM-2958 > URL: https://issues.apache.org/jira/browse/STORM-2958 > Project: Apache Storm > Issue Type: Improvement > Components: storm-client >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Assignee: Roshan Naik >Priority: Major > Fix For: 2.0.0 > > > STORM-2306 introduced a new configurable wait strategy system for these > situations > * BackPressure Wait (used by spout & bolt) > * No incoming data (used by bolt) > There is another wait situation in the spout when there are no emits > generated in a nextTuple() or if max.spout.pending has been reached. This > Jira is to transition the spout wait strategy from the old model to the new > model. Thereby we have the uniform model for dealing with wait strategies. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-2958) Use standard wait strategies for Spout as well
[ https://issues.apache.org/jira/browse/STORM-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated STORM-2958: --- Description: STORM-2306 introduced a new configurable wait strategy system for these situations * BackPressure Wait (used by spout & bolt) * No incoming data (used by bolt) There is another wait situation in the spout when there are no emits generated in a nextTuple() or if max.spout.pending has been reached. This Jira is to transition the spout wait strategy from the old model to the new model. was: STORM-2306 introduced a new configurable wait strategy system for these situations * BackPressure Wait (used by spout & bolt) * No incoming data (used by bolt) There is another wait situation in the spout when there are no emits generated in a nextTuple() or if max.spout.pending has been reached. This Jira is transition the spout wait strategy from the old model to the new model. > Use standard wait strategies for Spout as well > -- > > Key: STORM-2958 > URL: https://issues.apache.org/jira/browse/STORM-2958 > Project: Apache Storm > Issue Type: Improvement > Components: storm-client >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Assignee: Roshan Naik >Priority: Major > Fix For: 2.0.0 > > > STORM-2306 introduced a new configurable wait strategy system for these > situations > * BackPressure Wait (used by spout & bolt) > * No incoming data (used by bolt) > There is another wait situation in the spout when there are no emits > generated in a nextTuple() or if max.spout.pending has been reached. This > Jira is to transition the spout wait strategy from the old model to the new > model. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (STORM-2958) Use standard wait strategies for Spout as well
Roshan Naik created STORM-2958: -- Summary: Use standard wait strategies for Spout as well Key: STORM-2958 URL: https://issues.apache.org/jira/browse/STORM-2958 Project: Apache Storm Issue Type: Improvement Components: storm-client Affects Versions: 2.0.0 Reporter: Roshan Naik Assignee: Roshan Naik Fix For: 2.0.0 STORM-2306 introduced a new configurable wait strategy system for these situations * BackPressure Wait (used by spout & bolt) * No incoming data (used by bolt) There is another wait situation in the spout when there are no emits generated in a nextTuple() or if max.spout.pending has been reached. This Jira is transition the spout wait strategy from the old model to the new model. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (STORM-2310) Handling Back Pressure
[ https://issues.apache.org/jira/browse/STORM-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik closed STORM-2310. -- Resolution: Won't Fix > Handling Back Pressure > -- > > Key: STORM-2310 > URL: https://issues.apache.org/jira/browse/STORM-2310 > Project: Apache Storm > Issue Type: Sub-task > Components: storm-core >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Assignee: Roshan Naik >Priority: Major > > Design Doc: > https://docs.google.com/document/d/1btmpBpFeEl-bh1uhQ_W4Ao-cQK7muNBYbXXMwuc4YaU/edit?usp=sharing -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (STORM-2310) Handling Back Pressure
[ https://issues.apache.org/jira/browse/STORM-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik reopened STORM-2310: > Handling Back Pressure > -- > > Key: STORM-2310 > URL: https://issues.apache.org/jira/browse/STORM-2310 > Project: Apache Storm > Issue Type: Sub-task > Components: storm-core >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Assignee: Roshan Naik >Priority: Major > > Design Doc: > https://docs.google.com/document/d/1btmpBpFeEl-bh1uhQ_W4Ao-cQK7muNBYbXXMwuc4YaU/edit?usp=sharing -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (STORM-2310) Handling Back Pressure
[ https://issues.apache.org/jira/browse/STORM-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik closed STORM-2310. -- Resolution: Fixed Assignee: Roshan Naik The basic backpressure model is implemented as part of STORM-2306. The design doc here proposes an extension to the basic model which allows for buffering a much larger backlog of messages during BP situations. My experience with STORM-2306 suggests the basic BP model with shorter queues is preferable to this extended model due to the following reasons: * Having more messages waiting in deeper queues only worsens amount of time it takes for the messages to make it through all the bolts and thus increases latency (spout ->...-> terminal bolt) with no added throughput benefit. * When worker crashes: ** If ACKing is disabled, leads to a bigger loss of messages. ** If ACKing is enabled, the amount of reprocessing required is quite high. Consequently changing my mind and closing this JIRA. > Handling Back Pressure > -- > > Key: STORM-2310 > URL: https://issues.apache.org/jira/browse/STORM-2310 > Project: Apache Storm > Issue Type: Sub-task > Components: storm-core >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Assignee: Roshan Naik >Priority: Major > > Design Doc: > https://docs.google.com/document/d/1btmpBpFeEl-bh1uhQ_W4Ao-cQK7muNBYbXXMwuc4YaU/edit?usp=sharing -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (STORM-2945) Nail down and document how to support background emits in Spouts and Bolts
Roshan Naik created STORM-2945: -- Summary: Nail down and document how to support background emits in Spouts and Bolts Key: STORM-2945 URL: https://issues.apache.org/jira/browse/STORM-2945 Project: Apache Storm Issue Type: Bug Reporter: Roshan Naik Fix For: 2.0.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2944) Eliminate these deprecated methods in Storm 3
[ https://issues.apache.org/jira/browse/STORM-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359164#comment-16359164 ] Roshan Naik commented on STORM-2944: [~revans2] cold you add a 3.0 version in Jira so that this JIRA can be associated with that. > Eliminate these deprecated methods in Storm 3 > - > > Key: STORM-2944 > URL: https://issues.apache.org/jira/browse/STORM-2944 > Project: Apache Storm > Issue Type: Bug > Components: storm-client >Reporter: Roshan Naik >Priority: Major > > In Storm 2 we deprecated these methods. These methods were retained to allow > storm 2.x nimbus & supervisor to manage older Storm 1.x workers. > These are the methods in IStormClusterState.java > {code} > /** @deprecated: In Storm 2.0. Retained for enabling transition from 1.x. > Will be removed soon. */ > @Deprecated > boolean topologyBackpressure(String stormId, long timeoutMs, Runnable > callback); > /** @deprecated: In Storm 2.0. Retained for enabling transition from 1.x. > Will be removed soon. */ > @Deprecated > void setupBackpressure(String stormId); > /** @deprecated: In Storm 2.0. Retained for enabling transition from 1.x. > Will be removed soon. */ > @Deprecated > void removeBackpressure(String stormId); > /** @deprecated: In Storm 2.0. Retained for enabling transition from 1.x. > Will be removed soon. */ > @Deprecated > void removeWorkerBackpressure(String stormId, String node, Long port); > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (STORM-2306) Redesign Messaging Subsystem and switch to JCTools Queues
[ https://issues.apache.org/jira/browse/STORM-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16346283#comment-16346283 ] Roshan Naik commented on STORM-2306: is it possible to have it in both places ? > Redesign Messaging Subsystem and switch to JCTools Queues > - > > Key: STORM-2306 > URL: https://issues.apache.org/jira/browse/STORM-2306 > Project: Apache Storm > Issue Type: Sub-task > Components: storm-core >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Assignee: Roshan Naik >Priority: Major > Labels: pull-request-available > Time Spent: 51h > Remaining Estimate: 0h > > Details in these documents: > 1) *Redesign of the messaging subsystem* > https://docs.google.com/document/d/1NK1DJ3aAkta-Im0m-2FObQ4cSRp8xSa301y6zoqcBeE/edit?usp=sharing > This doc discusses the new design for the messaging system. Plus some of the > optimizations being made. > 2) *Choosing a high performance messaging queue:* > https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing > This doc looks into how fast hardware can do inter-thread messaging and why > we chose the JCTools queues. > 3) *Backpressure Model* > https://docs.google.com/document/d/1Z9pRdI5wtnK-hVwE3Spe6VGCTsz9g8TkgxbTFcbL3jM/edit?usp=sharing > Describes the Backpressure model integrated into the new messaging subsystem. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (STORM-2871) Performance optimizations for getOutgoingTasks
[ https://issues.apache.org/jira/browse/STORM-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated STORM-2871: --- Component/s: storm-client > Performance optimizations for getOutgoingTasks > --- > > Key: STORM-2871 > URL: https://issues.apache.org/jira/browse/STORM-2871 > Project: Apache Storm > Issue Type: Improvement > Components: storm-client >Affects Versions: 2.0.0 >Reporter: Roshan Naik > > Task.getOutgoingTasks() is in critical messaging path. Two observed > bottlenecks in it : > - Looking up HashMap 'streamToGroupers'. Need to look into converting HashMap > into Array lookup ? > - > [outTasks.addAll(compTasks)|https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/Task.java#L139] > seems to be impacting throughput as well. Identified by .. running > ConstSpoutNullBoltTopo with 1 spout & bolt paralllelism (no Ack) and > replacing this line with hard coded logic to add the single known bolt's > taskID. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (STORM-2871) Performance optimizations for getOutgoingTasks
[ https://issues.apache.org/jira/browse/STORM-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated STORM-2871: --- Affects Version/s: 2.0.0 > Performance optimizations for getOutgoingTasks > --- > > Key: STORM-2871 > URL: https://issues.apache.org/jira/browse/STORM-2871 > Project: Apache Storm > Issue Type: Improvement > Components: storm-client >Affects Versions: 2.0.0 >Reporter: Roshan Naik > > Task.getOutgoingTasks() is in critical messaging path. Two observed > bottlenecks in it : > - Looking up HashMap 'streamToGroupers'. Need to look into converting HashMap > into Array lookup ? > - > [outTasks.addAll(compTasks)|https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/Task.java#L139] > seems to be impacting throughput as well. Identified by .. running > ConstSpoutNullBoltTopo with 1 spout & bolt paralllelism (no Ack) and > replacing this line with hard coded logic to add the single known bolt's > taskID. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (STORM-2871) Performance optimizations for getOutgoingTasks
[ https://issues.apache.org/jira/browse/STORM-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated STORM-2871: --- Description: Task.getOutgoingTasks() is in critical messaging path. Two observed bottlenecks in it : - [Looking up HashMap|] 'streamToGroupers'. Need to look into converting HashMap into Array lookup ? - [outTasks.addAll(compTasks)|https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/Task.java#L139] seems to be impacting throughput as well. Identified by .. running ConstSpoutNullBoltTopo with 1 spout & bolt paralllelism (no Ack) and replacing this line with hard coded logic to add the single known bolt's taskID. was: Task.getOutgoingTasks() is in critical messaging path. Two observed bottlenecks in it : -[Looking up HashMap|] 'streamToGroupers'. Need to look into converting HashMap into Array lookup ? - [outTasks.addAll(compTasks)|https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/Task.java#L139] seems to be impacting throughput as well. Identified by .. running ConstSpoutNullBoltTopo with 1 spout & bolt paralllelism (no Ack) and replacing this line with hard coded logic to add the single known bolt's taskID. > Performance optimizations for getOutgoingTasks > --- > > Key: STORM-2871 > URL: https://issues.apache.org/jira/browse/STORM-2871 > Project: Apache Storm > Issue Type: Improvement >Reporter: Roshan Naik > > Task.getOutgoingTasks() is in critical messaging path. Two observed > bottlenecks in it : > - [Looking up HashMap|] 'streamToGroupers'. Need to look into converting > HashMap into Array lookup ? > - > [outTasks.addAll(compTasks)|https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/Task.java#L139] > seems to be impacting throughput as well. Identified by .. running > ConstSpoutNullBoltTopo with 1 spout & bolt paralllelism (no Ack) and > replacing this line with hard coded logic to add the single known bolt's > taskID. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (STORM-2871) Performance optimizations for getOutgoingTasks
[ https://issues.apache.org/jira/browse/STORM-2871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated STORM-2871: --- Description: Task.getOutgoingTasks() is in critical messaging path. Two observed bottlenecks in it : - Looking up HashMap 'streamToGroupers'. Need to look into converting HashMap into Array lookup ? - [outTasks.addAll(compTasks)|https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/Task.java#L139] seems to be impacting throughput as well. Identified by .. running ConstSpoutNullBoltTopo with 1 spout & bolt paralllelism (no Ack) and replacing this line with hard coded logic to add the single known bolt's taskID. was: Task.getOutgoingTasks() is in critical messaging path. Two observed bottlenecks in it : - [Looking up HashMap|] 'streamToGroupers'. Need to look into converting HashMap into Array lookup ? - [outTasks.addAll(compTasks)|https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/Task.java#L139] seems to be impacting throughput as well. Identified by .. running ConstSpoutNullBoltTopo with 1 spout & bolt paralllelism (no Ack) and replacing this line with hard coded logic to add the single known bolt's taskID. > Performance optimizations for getOutgoingTasks > --- > > Key: STORM-2871 > URL: https://issues.apache.org/jira/browse/STORM-2871 > Project: Apache Storm > Issue Type: Improvement >Reporter: Roshan Naik > > Task.getOutgoingTasks() is in critical messaging path. Two observed > bottlenecks in it : > - Looking up HashMap 'streamToGroupers'. Need to look into converting HashMap > into Array lookup ? > - > [outTasks.addAll(compTasks)|https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/Task.java#L139] > seems to be impacting throughput as well. Identified by .. running > ConstSpoutNullBoltTopo with 1 spout & bolt paralllelism (no Ack) and > replacing this line with hard coded logic to add the single known bolt's > taskID. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (STORM-2871) Performance optimizations for getOutgoingTasks
Roshan Naik created STORM-2871: -- Summary: Performance optimizations for getOutgoingTasks Key: STORM-2871 URL: https://issues.apache.org/jira/browse/STORM-2871 Project: Apache Storm Issue Type: Improvement Reporter: Roshan Naik Task.getOutgoingTasks() is in critical messaging path. Two observed bottlenecks in it : -[Looking up HashMap|] 'streamToGroupers'. Need to look into converting HashMap into Array lookup ? - [outTasks.addAll(compTasks)|https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/daemon/Task.java#L139] seems to be impacting throughput as well. Identified by .. running ConstSpoutNullBoltTopo with 1 spout & bolt paralllelism (no Ack) and replacing this line with hard coded logic to add the single known bolt's taskID. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (STORM-2306) Redeisgn Messaging Subsystem and switch to JCTools Queues
[ https://issues.apache.org/jira/browse/STORM-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16280106#comment-16280106 ] Roshan Naik commented on STORM-2306: Since the PR webpage has become terribly slow due to the large number of comments, moving the non-code-review type conversation to the JIRA. The PR is ready to move forward with additional reviews and testing by anyone interested in test driving their topologies on this (much appreciated!). - I believe all the key issues raised so far should be addressed, please take a look. Also includes fixes to issues discovered during testing and perf runs. - Have added a new design doc detailing the BackPressure model. That is the part that has undergone the most change off late. - Based on my observations from perf runs and prior feedback on the PR, the new defaults have been tweaked to make it easy for existing workloads to transition to this with minimal or no tweaking but still get good perf. - My colleague is in the process of running some perf numbers comparing master vs the latest 2306. Will share them soon. cc: [~revans2] > Redeisgn Messaging Subsystem and switch to JCTools Queues > - > > Key: STORM-2306 > URL: https://issues.apache.org/jira/browse/STORM-2306 > Project: Apache Storm > Issue Type: Sub-task > Components: storm-core >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Assignee: Roshan Naik > Labels: pull-request-available > Time Spent: 33h > Remaining Estimate: 0h > > Details in these documents: > 1) *Redesign of the messaging subsystem* > https://docs.google.com/document/d/1NK1DJ3aAkta-Im0m-2FObQ4cSRp8xSa301y6zoqcBeE/edit?usp=sharing > This doc discusses the new design for the messaging system. Plus some of the > optimizations being made. > 2) *Choosing a high performance messaging queue:* > https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing > This doc looks into how fast hardware can do inter-thread messaging and why > we chose the JCTools queues. > 3) *Backpressure Model* > https://docs.google.com/document/d/1Z9pRdI5wtnK-hVwE3Spe6VGCTsz9g8TkgxbTFcbL3jM/edit?usp=sharing > Describes the Backpressure model integrated into the new messaging subsystem. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (STORM-2306) Redeisgn Messaging Subsystem and switch to JCTools Queues
[ https://issues.apache.org/jira/browse/STORM-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated STORM-2306: --- Description: Details in these documents: 1) *Redesign of the messaging subsystem* https://docs.google.com/document/d/1NK1DJ3aAkta-Im0m-2FObQ4cSRp8xSa301y6zoqcBeE/edit?usp=sharing This doc discusses the new design for the messaging system. Plus some of the optimizations being made. 2) *Choosing a high performance messaging queue:* https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing This doc looks into how fast hardware can do inter-thread messaging and why we chose the JCTools queues. 3) *Backpressure Model* https://docs.google.com/document/d/1Z9pRdI5wtnK-hVwE3Spe6VGCTsz9g8TkgxbTFcbL3jM/edit?usp=sharing Describes the Backpressure model integrated into the new messaging subsystem. was: Details in these documents: 1) *Redesign of the messaging subsystem* https://docs.google.com/document/d/1NK1DJ3aAkta-Im0m-2FObQ4cSRp8xSa301y6zoqcBeE/edit?usp=sharing This doc discusses the new design for the messaging system. Plus some of the optimizations being made. 2) *Choosing a high performance messaging queue:* https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing This doc looks into how fast hardware can do inter-thread messaging and why we chose the JCTools queues. 3) *BackPressure Model* https://docs.google.com/document/d/1Z9pRdI5wtnK-hVwE3Spe6VGCTsz9g8TkgxbTFcbL3jM/edit?usp=sharing > Redeisgn Messaging Subsystem and switch to JCTools Queues > - > > Key: STORM-2306 > URL: https://issues.apache.org/jira/browse/STORM-2306 > Project: Apache Storm > Issue Type: Sub-task > Components: storm-core >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Assignee: Roshan Naik > Labels: pull-request-available > Time Spent: 33h > Remaining Estimate: 0h > > Details in these documents: > 1) *Redesign of the messaging subsystem* > https://docs.google.com/document/d/1NK1DJ3aAkta-Im0m-2FObQ4cSRp8xSa301y6zoqcBeE/edit?usp=sharing > This doc discusses the new design for the messaging system. Plus some of the > optimizations being made. > 2) *Choosing a high performance messaging queue:* > https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing > This doc looks into how fast hardware can do inter-thread messaging and why > we chose the JCTools queues. > 3) *Backpressure Model* > https://docs.google.com/document/d/1Z9pRdI5wtnK-hVwE3Spe6VGCTsz9g8TkgxbTFcbL3jM/edit?usp=sharing > Describes the Backpressure model integrated into the new messaging subsystem. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (STORM-2306) Redeisgn Messaging Subsystem and switch to JCTools Queues
[ https://issues.apache.org/jira/browse/STORM-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated STORM-2306: --- Description: Details in these documents: 1) *Redesign of the messaging subsystem* https://docs.google.com/document/d/1NK1DJ3aAkta-Im0m-2FObQ4cSRp8xSa301y6zoqcBeE/edit?usp=sharing This doc discusses the new design for the messaging system. Plus some of the optimizations being made. 2) *Choosing a high performance messaging queue:* https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing This doc looks into how fast hardware can do inter-thread messaging and why we chose the JCTools queues. 3) *BackPressure Model* https://docs.google.com/document/d/1Z9pRdI5wtnK-hVwE3Spe6VGCTsz9g8TkgxbTFcbL3jM/edit?usp=sharing was: Details in these documents: 1) *Redesign of the messaging subsystem* https://docs.google.com/document/d/1NK1DJ3aAkta-Im0m-2FObQ4cSRp8xSa301y6zoqcBeE/edit?usp=sharing This doc discusses the new design for the messaging system. Plus some of the optimizations being made. 2) *Choosing a high performance messaging queue:* https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing This doc looks into how fast hardware can do inter-thread messaging and why we chose the JCTools queues. > Redeisgn Messaging Subsystem and switch to JCTools Queues > - > > Key: STORM-2306 > URL: https://issues.apache.org/jira/browse/STORM-2306 > Project: Apache Storm > Issue Type: Sub-task > Components: storm-core >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Assignee: Roshan Naik > Labels: pull-request-available > Time Spent: 33h > Remaining Estimate: 0h > > Details in these documents: > 1) *Redesign of the messaging subsystem* > https://docs.google.com/document/d/1NK1DJ3aAkta-Im0m-2FObQ4cSRp8xSa301y6zoqcBeE/edit?usp=sharing > This doc discusses the new design for the messaging system. Plus some of the > optimizations being made. > 2) *Choosing a high performance messaging queue:* > https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing > This doc looks into how fast hardware can do inter-thread messaging and why > we chose the JCTools queues. > 3) *BackPressure Model* > https://docs.google.com/document/d/1Z9pRdI5wtnK-hVwE3Spe6VGCTsz9g8TkgxbTFcbL3jM/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (STORM-2796) Flux: Provide means for invoking static factory methods
[ https://issues.apache.org/jira/browse/STORM-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238504#comment-16238504 ] Roshan Naik edited comment on STORM-2796 at 11/7/17 12:09 AM: -- Yaml looks fine. Would be nice to make sure that its works with this type of method signatures as well: {code} public static MyComponent newInstance(Integer x, SomeObject... variadic) {code} Basically be able to pass a mix of objects and arrays/variadics as args. I think the ability to pass array args is done via an undocumented feature called `reflist`. Just want to make sure it is supported for factory methods as well. Which reminds me that it would be good to update the docs on 'reflist' while updating the docs about factory methods. was (Author: roshan_naik): Yaml looks fine. Would be nice to make sure that its works with this type of method signatures as well: {code} public static MyComponent newInstance(Integer x, SomeObject... variadic) {code} Basically be able to pass a mix of objects and arrays/variadics as args. I think the ability to pass array args is done via an undocumented feature called `reflist`. Just want to make sure it it supported for factory methods as well. Which reminds me that it would be good to update the docs on 'reflist' while updating the docs about factor methods. > Flux: Provide means for invoking static factory methods > --- > > Key: STORM-2796 > URL: https://issues.apache.org/jira/browse/STORM-2796 > Project: Apache Storm > Issue Type: Improvement > Components: Flux >Affects Versions: 2.0.0, 1.1.1, 1.2.0, 1.0.6 >Reporter: P. Taylor Goetz >Assignee: P. Taylor Goetz > > Provide a means to invoke static factory methods for flux components. E.g: > Java signature: > {code} > public static MyComponent newInstance(String... params) > {code} > Yaml: > {code} > className: "org.apache.storm.flux.test.MyComponent" > factory: "newInstance" > factoryArgs: ["a", "b", "c"] > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (STORM-2796) Flux: Provide means for invoking static factory methods
[ https://issues.apache.org/jira/browse/STORM-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238504#comment-16238504 ] Roshan Naik commented on STORM-2796: Yaml looks fine. Would be nice to make sure that its works with this type of method signatures as well: {code} public static MyComponent newInstance(Integer x, SomeObject... variadic) {code} Basically be able to pass a mix of objects and arrays/variadics as args. I think the ability to pass array args is done via an undocumented feature called `reflist`. Just want to make sure it it supported for factory methods as well. Which reminds me that it would be good to update the docs on 'reflist' while updating the docs about factor methods. > Flux: Provide means for invoking static factory methods > --- > > Key: STORM-2796 > URL: https://issues.apache.org/jira/browse/STORM-2796 > Project: Apache Storm > Issue Type: Improvement > Components: Flux >Affects Versions: 2.0.0, 1.1.1, 1.2.0, 1.0.6 >Reporter: P. Taylor Goetz >Assignee: P. Taylor Goetz >Priority: Normal > > Provide a means to invoke static factory methods for flux components. E.g: > Java signature: > {code} > public static MyComponent newInstance(String... params) > {code} > Yaml: > {code} > className: "org.apache.storm.flux.test.MyComponent" > factory: "newInstance" > factoryArgs: ["a", "b", "c"] > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (STORM-2306) Redeisgn Messaging Subsystem and switch to JCTools Queues
[ https://issues.apache.org/jira/browse/STORM-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated STORM-2306: --- Description: Details in these documents: 1) *Redesign of the messaging subsystem* https://docs.google.com/document/d/1NK1DJ3aAkta-Im0m-2FObQ4cSRp8xSa301y6zoqcBeE/edit?usp=sharing This doc discusses the new design for the messaging system. Plus some of the optimizations being made. 2) *Choosing a high performance messaging queue:* https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing This doc looks into how fast hardware can do inter-thread messaging and why we chose the JCTools queues. was: Details in these documents: 1) *Redesign of the messaging subsystem* https://docs.google.com/document/d/1NK1DJ3aAkta-Im0m-2FObQ4cSRp8xSa301y6zoqcBeE/edit#heading=h.59lnwp28s0q6 This doc discusses the new design for the messaging system. Plus some of the optimizations being made. 2) *Choosing a high performance messaging queue:* https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing This doc looks into how fast hardware can do inter-thread messaging and why we chose the JCTools queues. > Redeisgn Messaging Subsystem and switch to JCTools Queues > - > > Key: STORM-2306 > URL: https://issues.apache.org/jira/browse/STORM-2306 > Project: Apache Storm > Issue Type: Sub-task > Components: storm-core >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Assignee: Roshan Naik > > Details in these documents: > 1) *Redesign of the messaging subsystem* > https://docs.google.com/document/d/1NK1DJ3aAkta-Im0m-2FObQ4cSRp8xSa301y6zoqcBeE/edit?usp=sharing > This doc discusses the new design for the messaging system. Plus some of the > optimizations being made. > 2) *Choosing a high performance messaging queue:* > https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing > This doc looks into how fast hardware can do inter-thread messaging and why > we chose the JCTools queues. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (STORM-2306) Redeisgn Messaging Subsystem and switch to JCTools Queues
[ https://issues.apache.org/jira/browse/STORM-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated STORM-2306: --- Description: Details in these documents: 1) *Redesign of the messaging subsystem* https://docs.google.com/document/d/1NK1DJ3aAkta-Im0m-2FObQ4cSRp8xSa301y6zoqcBeE/edit#heading=h.59lnwp28s0q6 This doc discusses the new design for the messaging system. Plus some of the optimizations being made. 2) *Choosing a high performance messaging queue:* https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing This doc looks into how fast hardware can do inter-thread messaging and why we chose the JCTools queues. was: Details in these documents: 1) *Redesign of the messaging subsystem* https://docs.google.com/document/d/1NK1DJ3aAkta-Im0m-2FObQ4cSRp8xSa301y6zoqcBeE/edit#heading=h.59lnwp28s0q6 This doc discusses the new design for the messaging system. Plus some of the optimizations being made. 2) *Choosing a high performance messaging queue:* https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing This doc looks into how fast hardware can we can inter-thread messaging and why we chose the JCTools queues. > Redeisgn Messaging Subsystem and switch to JCTools Queues > - > > Key: STORM-2306 > URL: https://issues.apache.org/jira/browse/STORM-2306 > Project: Apache Storm > Issue Type: Sub-task > Components: storm-core >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Assignee: Roshan Naik > > Details in these documents: > 1) *Redesign of the messaging subsystem* > https://docs.google.com/document/d/1NK1DJ3aAkta-Im0m-2FObQ4cSRp8xSa301y6zoqcBeE/edit#heading=h.59lnwp28s0q6 > This doc discusses the new design for the messaging system. Plus some of the > optimizations being made. > 2) *Choosing a high performance messaging queue:* > https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing > This doc looks into how fast hardware can do inter-thread messaging and why > we chose the JCTools queues. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (STORM-2647) Reduce Number of Threads running in the Worker
[ https://issues.apache.org/jira/browse/STORM-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated STORM-2647: --- Description: Below is an account of all the threads in a single worker running a topology with 1 spout instance, 1 bolt instance, and 1 acker bolt instance. Topology debugging feature was disabled as it brings in additional threads. Total 34 threads: 10 timer threads 2 Curator threads 2 ZooKeeper threads 4 Netty threads 7 Disruptor thread 1 Spout executor thread 2 worker thread 1 Back Pressure thread 4 thread - unclear what these are 1 Finalizer thread - Would be good to collapse the Timer threads into SingleThreadScheduledExecutor. - We have 4 threads to communicate with ZK (2curator + 2ZK). If necessary can we eliminate curator to accomplish that ? - Similarly see if we can reduce the number of Netty threads. - How many threads does topology debugging need, can we reduce that ? - Same with event logging (topology.eventlogger.executors). - The Back pressure and Disruptor threads will be reduced by STORM-2306 was: Below is an account of all the threads in a single worker running a topology with 1 spout instance, 1 bolt instance, and 1 acker bolt instance. Topology debugging feature was disabled as it brings in additional threads. Total 34 threads: 10 timer threads 2 Curator threads 2 ZooKeeper threads 4 Netty threads 7 Disruptor thread 1 Spout executor thread 2 worker thread 1 Back Pressure thread 4 thread - unclear what these are 1 Finalizer thread - Would be good to collapse the Timer threads into SingleThreadScheduledExecutor. - We have 4 threads to communicate with ZK (2curator + 2ZK). If necessary can we eliminate curator to accomplish that ? - Similarly see if we can reduce the number of Netty threads. - How many threads does topology debugging need, can we reduce that ? - The Back pressure and Disruptor threads will be reduced by STORM-2306 > Reduce Number of Threads running in the Worker > -- > > Key: STORM-2647 > URL: https://issues.apache.org/jira/browse/STORM-2647 > Project: Apache Storm > Issue Type: Sub-task > Components: storm-core >Reporter: Roshan Naik >Assignee: Priyank Shah > > Below is an account of all the threads in a single worker running a topology > with 1 spout instance, 1 bolt instance, and 1 acker bolt instance. Topology > debugging feature was disabled as it brings in additional threads. > Total 34 threads: > 10 timer threads > 2 Curator threads > 2 ZooKeeper threads > 4 Netty threads > 7 Disruptor thread > 1 Spout executor thread > 2 worker thread > 1 Back Pressure thread > 4 thread - unclear what these are > 1 Finalizer thread > - Would be good to collapse the Timer threads into > SingleThreadScheduledExecutor. > - We have 4 threads to communicate with ZK (2curator + 2ZK). If necessary can > we eliminate curator to accomplish that ? > - Similarly see if we can reduce the number of Netty threads. > - How many threads does topology debugging need, can we reduce that ? > - Same with event logging (topology.eventlogger.executors). > - The Back pressure and Disruptor threads will be reduced by STORM-2306 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (STORM-2647) Reduce Number of Threads running in the Worker
[ https://issues.apache.org/jira/browse/STORM-2647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16093917#comment-16093917 ] Roshan Naik commented on STORM-2647: all yours ! thanks for volunteering. > Reduce Number of Threads running in the Worker > -- > > Key: STORM-2647 > URL: https://issues.apache.org/jira/browse/STORM-2647 > Project: Apache Storm > Issue Type: Sub-task > Components: storm-core >Reporter: Roshan Naik >Assignee: Priyank Shah > > Below is an account of all the threads in a single worker running a topology > with 1 spout instance, 1 bolt instance, and 1 acker bolt instance. Topology > debugging feature was disabled as it brings in additional threads. > Total 34 threads: > 10 timer threads > 2 Curator threads > 2 ZooKeeper threads > 4 Netty threads > 7 Disruptor thread > 1 Spout executor thread > 2 worker thread > 1 Back Pressure thread > 4 thread - unclear what these are > 1 Finalizer thread > - Would be good to collapse the Timer threads into > SingleThreadScheduledExecutor. > - We have 4 threads to communicate with ZK (2curator + 2ZK). If necessary can > we eliminate curator to accomplish that ? > - Similarly see if we can reduce the number of Netty threads. > - How many threads does topology debugging need, can we reduce that ? > - The Back pressure and Disruptor threads will be reduced by STORM-2306 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (STORM-2647) Reduce Number of Threads running in the Worker
Roshan Naik created STORM-2647: -- Summary: Reduce Number of Threads running in the Worker Key: STORM-2647 URL: https://issues.apache.org/jira/browse/STORM-2647 Project: Apache Storm Issue Type: Sub-task Reporter: Roshan Naik Below is an account of all the threads in a single worker running a topology with 1 spout instance, 1 bolt instance, and 1 acker bolt instance. Topology debugging feature was disabled as it brings in additional threads. Total 34 threads: 10 timer threads 2 Curator threads 2 ZooKeeper threads 4 Netty threads 7 Disruptor thread 1 Spout executor thread 2 worker thread 1 Back Pressure thread 4 thread - unclear what these are 1 Finalizer thread - Would be good to collapse the Timer threads into SingleThreadScheduledExecutor. - We have 4 threads to communicate with ZK (2curator + 2ZK). If necessary can we eliminate curator to accomplish that ? - Similarly see if we can reduce the number of Netty threads. - How many threads does topology debugging need, can we reduce that ? - The Back pressure and Disruptor threads will be reduced by STORM-2306 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (STORM-2359) Revising Message Timeouts
[ https://issues.apache.org/jira/browse/STORM-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045053#comment-16045053 ] Roshan Naik commented on STORM-2359: {quote} The current implementation has a pending map in both the acker and the spout, which rotate every topology.message.timeout.secs.{quote} Need to see if we can eliminate the timeout logic from the spout and have it only the ACKer (i can think of some issues). If we must retain that logic in the spouts, the timeout value that it operates on (full tuple tree processing) would have to be separated from the timeout value that the ACKER uses to track progress between stages. {quote}The spout then reemitted the expired tuples, and they got into the queue behind their own expired instances. {quote} Perfect example indeed. The motivation of this jira is to try to eliminate/mitigate triggering of timeouts for queued/inflight tuples that are not lost. The only time we need timeouts/remits to be triggered is when one/more tuples in the tuple tree are truly lost. *I think* that can only happen if a worker/bolt/spout died. So the case your describing should not happen if we solve this problem correctly. IMO, the ideal solution would have the spouts remit only the specific tuples whose tuple trees had some loss due to a worker going down. I am not yet certain whether/not this initial idea described in the doc is the optimal solution. Perhaps a better way is to trigger such re-emits only if a worker/bolt/spout went down. > Revising Message Timeouts > - > > Key: STORM-2359 > URL: https://issues.apache.org/jira/browse/STORM-2359 > Project: Apache Storm > Issue Type: Sub-task > Components: storm-core >Affects Versions: 2.0.0 >Reporter: Roshan Naik > > A revised strategy for message timeouts is proposed here. > Design Doc: > > https://docs.google.com/document/d/1am1kO7Wmf17U_Vz5_uyBB2OuSsc4TZQWRvbRhX52n5w/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (STORM-2359) Revising Message Timeouts
[ https://issues.apache.org/jira/browse/STORM-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041851#comment-16041851 ] Roshan Naik commented on STORM-2359: {quote} In order to reduce load on the spout, we should try to "bundle up" these resets in the acker bolts before sending them to the spout {quote} The resets are managed internally by the ACKer bolt. The spout only gets notified if the timeout expires or if tuple-tree is fully processed. {quote}When tuples are stuck in queues behind other tuples{quote} That case would be more accurately classified as "progress is being made" ... but slower than expected. The case of 'progress is not being made' is when a worker that is processing part of the tuple tree dies. {quote}If a bolt is taking longer to process a tuple than expected, it can be solved in the concrete bolt implementation by using OutputCollector.resetTimeout at an appropriate interval (e.g. the tuple timeout minus a few seconds).{quote} I think you are trying to mitigate the number of resets being sent to Spout... but like i mentioned before reset are never sent to spout. > Revising Message Timeouts > - > > Key: STORM-2359 > URL: https://issues.apache.org/jira/browse/STORM-2359 > Project: Apache Storm > Issue Type: Sub-task > Components: storm-core >Affects Versions: 2.0.0 >Reporter: Roshan Naik > > A revised strategy for message timeouts is proposed here. > Design Doc: > > https://docs.google.com/document/d/1am1kO7Wmf17U_Vz5_uyBB2OuSsc4TZQWRvbRhX52n5w/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (STORM-1949) Backpressure can cause spout to stop emitting and stall topology
[ https://issues.apache.org/jira/browse/STORM-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15993879#comment-15993879 ] Roshan Naik edited comment on STORM-1949 at 5/2/17 10:05 PM: - Sorry I have not been able to put time on this. Looks like some users on the mailing list are also running into this issue in production. I am unable to give this time for at least a couple weeks, but I dont want to be a blocker to getting this fix into 1.x. I have shared the topology here in the jira, would be great if we got some help on this ? [~abellina] or [~zhuoliu] would you be able to give it a shot ? was (Author: roshan_naik): Sorry I have not been able to put time on this. Looks like some users on the mailing list are also running into this issue in production. I am unable to give this sometime for at least a couple weeks, but I dont want to be a blocker to getting this fix into 1.x. I have shared the topology here in the jira, would be great if we got some help on this ? [~abellina] or [~zhuoliu] would you be able to give it a shot ? > Backpressure can cause spout to stop emitting and stall topology > > > Key: STORM-1949 > URL: https://issues.apache.org/jira/browse/STORM-1949 > Project: Apache Storm > Issue Type: Bug >Reporter: Roshan Naik >Assignee: Alessandro Bellina > Attachments: 1.x-branch-works-perfect.png, wordcounttopo.zip > > > Problem can be reproduced by this [Word count > topology|https://github.com/hortonworks/storm/blob/perftopos1.x/examples/storm-starter/src/jvm/org/apache/storm/starter/perf/FileReadWordCountTopo.java] > within a IDE. > I ran it with 1 spout instance, 2 splitter bolt instances, 2 counter bolt > instances. > The problem is more easily reproduced with WC topology as it causes an > explosion of tuples due to splitting a sentence tuple into word tuples. As > the bolts have to process more tuples than the spout is producing, spout > needs to operate slower. > The amount of time it takes for the topology to stall can vary.. but > typically under 10 mins. > *My theory:* I suspect there is a race condition in the way ZK is being > utilized to enable/disable back pressure. When congested (i.e pressure > exceeds high water mark), the bolt's worker records this congested situation > in ZK by creating a node. Once the congestion is reduced below the low water > mark, it deletes this node. > The spout's worker has setup a watch on the parent node, expecting a callback > whenever there is change in the child nodes. On receiving the callback the > spout's worker lists the parent node to check if there are 0 or more child > nodes it is essentially trying to figure out the nature of state change > in ZK to determine whether to throttle or not. Subsequently it setsup > another watch in ZK to keep an eye on future changes. > When there are multiple bolts, there can be rapid creation/deletion of these > ZK nodes. Between the time the worker receives a callback and sets up the > next watch.. many changes may have undergone in ZK which will go unnoticed by > the spout. > The condition that the bolts are no longer congested may not get noticed as a > result of this. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (STORM-1949) Backpressure can cause spout to stop emitting and stall topology
[ https://issues.apache.org/jira/browse/STORM-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15993879#comment-15993879 ] Roshan Naik commented on STORM-1949: Sorry I have not been able to put time on this. Looks like some users on the mailing list are also running into this issue in production. I am unable to give this sometime for at least a couple weeks, but I dont want to be a blocker to getting this fix into 1.x. I have shared the topology here in the jira, would be great if we got some help on this ? [~abellina] or [~zhuoliu] would you be able to give it a shot ? > Backpressure can cause spout to stop emitting and stall topology > > > Key: STORM-1949 > URL: https://issues.apache.org/jira/browse/STORM-1949 > Project: Apache Storm > Issue Type: Bug >Reporter: Roshan Naik >Assignee: Alessandro Bellina > Attachments: 1.x-branch-works-perfect.png, wordcounttopo.zip > > > Problem can be reproduced by this [Word count > topology|https://github.com/hortonworks/storm/blob/perftopos1.x/examples/storm-starter/src/jvm/org/apache/storm/starter/perf/FileReadWordCountTopo.java] > within a IDE. > I ran it with 1 spout instance, 2 splitter bolt instances, 2 counter bolt > instances. > The problem is more easily reproduced with WC topology as it causes an > explosion of tuples due to splitting a sentence tuple into word tuples. As > the bolts have to process more tuples than the spout is producing, spout > needs to operate slower. > The amount of time it takes for the topology to stall can vary.. but > typically under 10 mins. > *My theory:* I suspect there is a race condition in the way ZK is being > utilized to enable/disable back pressure. When congested (i.e pressure > exceeds high water mark), the bolt's worker records this congested situation > in ZK by creating a node. Once the congestion is reduced below the low water > mark, it deletes this node. > The spout's worker has setup a watch on the parent node, expecting a callback > whenever there is change in the child nodes. On receiving the callback the > spout's worker lists the parent node to check if there are 0 or more child > nodes it is essentially trying to figure out the nature of state change > in ZK to determine whether to throttle or not. Subsequently it setsup > another watch in ZK to keep an eye on future changes. > When there are multiple bolts, there can be rapid creation/deletion of these > ZK nodes. Between the time the worker receives a callback and sets up the > next watch.. many changes may have undergone in ZK which will go unnoticed by > the spout. > The condition that the bolts are no longer congested may not get noticed as a > result of this. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (STORM-2306) Redeisgn Messaging Subsystem and switch to JCTools Queues
[ https://issues.apache.org/jira/browse/STORM-2306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated STORM-2306: --- Summary: Redeisgn Messaging Subsystem and switch to JCTools Queues (was: Switch to JCTools for internal queueing) > Redeisgn Messaging Subsystem and switch to JCTools Queues > - > > Key: STORM-2306 > URL: https://issues.apache.org/jira/browse/STORM-2306 > Project: Apache Storm > Issue Type: Sub-task > Components: storm-core >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Assignee: Roshan Naik > > Details in this document: > https://docs.google.com/document/d/1PpQaWVHg06-OqxTzYxQlzg1yEhzA4Y46_NC7HMO6tsI/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (STORM-2284) Storm Worker Redesign
[ https://issues.apache.org/jira/browse/STORM-2284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15946420#comment-15946420 ] Roshan Naik commented on STORM-2284: Hmm... i think this will need some more thought into how that changes would be needed to the scheduler / elsewhere. I am thinking .. each supervisor would need to know how many cores it is managing.. and each new worker gets assigned unused cores. At submit time, need to figure out where each spout and bolt instance goes... based on the groupings used. Ya I/O threads over allocation is reasonable... we have two types of IO : outbound & inbound. Hard to say without experimenting and would depend on throughput being sustainedbut perhaps the outbound threads could be pinned to same core.. and inbound threads to another core. If we decide to do memory mapped Queues for intra host messaging, then we have inter-host I/O and intra-host I/O... each with in/bound variations. > Storm Worker Redesign > - > > Key: STORM-2284 > URL: https://issues.apache.org/jira/browse/STORM-2284 > Project: Apache Storm > Issue Type: Umbrella > Components: storm-core >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Assignee: Roshan Naik > > Much has been learnt from evolving the 1.x line. We can now use the benefit > of hindsight and apply these learnings into the future work on 2.x line. > The goal is to rethink the Worker to improve performance, enhance its > abilities and also retain compatibility. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (STORM-2416) Release Packaging Improvements
[ https://issues.apache.org/jira/browse/STORM-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935673#comment-15935673 ] Roshan Naik commented on STORM-2416: Didnt notice this move of the perf module from top level to examples dir Is there a reason for this ? I felt the perf belongs outside of the examples since it isn't really intended to be examples for users. Perf module is intended to evolve for purposes of perf measurements and benchmarking... for now its mostly for Storm devs, but eventually it could be run by end users as well. > Release Packaging Improvements > -- > > Key: STORM-2416 > URL: https://issues.apache.org/jira/browse/STORM-2416 > Project: Apache Storm > Issue Type: Improvement > Components: build >Reporter: P. Taylor Goetz >Assignee: P. Taylor Goetz > Fix For: 1.1.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > This issue is to address distribution packaging improvements discussed on the > dev@ list: > 1. Move remaining examples to "examples" directory. > 2. Package examples as source-only, to be compiled by users > 3. Remove connector jars from binary distribution (since they are available > in Maven, and we want to discourage users from hand-crafting topology jars) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (STORM-2423) Join Bolt : Use explicit instead of default window anchoring of emitted tuples
[ https://issues.apache.org/jira/browse/STORM-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik updated STORM-2423: --- Priority: Critical (was: Major) > Join Bolt : Use explicit instead of default window anchoring of emitted tuples > -- > > Key: STORM-2423 > URL: https://issues.apache.org/jira/browse/STORM-2423 > Project: Apache Storm > Issue Type: Bug >Reporter: Roshan Naik >Assignee: Roshan Naik >Priority: Critical > Time Spent: 10m > Remaining Estimate: 0h > > Default anchoring will anchor each emitted tuple to every tuple in current > window. This requires a very large numbers of ACKs from any downstream bolt. > If topology.debug is enabled, it also worsens the load on the system > significantly. > Letting the topo run in this mode (in particular with max.spout.pending > disabled), could lead to the worker running out of memory and crashing. > Fix: Join Bolt should avoid using default window anchoring, and explicitly > anchor each emitted tuple with the exact matching tuples form each inputs > streams. This reduces the complexity of the tuple trees and consequently the > reduces burden on the ACKing & messaging subsystems. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (STORM-2355) Storm-HDFS: inotify support
[ https://issues.apache.org/jira/browse/STORM-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929183#comment-15929183 ] Roshan Naik edited comment on STORM-2355 at 3/17/17 9:23 PM: - Fist of all... thanks for your work on this. I took a look at the HDFS's inotify side of things and also spoke to a Hdfs committer. These over-arching concerns came up (partly alluded to earlier by you): 1. Currently INotify is restricted to HDFS admins because it doesn't scale (wrt namenode). So in its current state it seems unsuitable for Hdfs Spout kind of use case... even if we (unrealistically) asked users to run Storm worker as a HDFS admin user. 2. The proposal for scaling Inotify and opening it up to end users appears to have stalled for some time now. Although we are seeing some improvements (that you noted) in resource utilization from the Storm side, it seems not advisable from the HDFS namenode perspective. I think this feature in HDFS Spout would be useful once a scaleable inotify solution is made publicly available by HDFS. The other option is to get this into Storm now and not use it till HDFS implements their scaleable inotify. My concern with that is we cant bet with certainty that final inotify will still work as we now expect it to (although the intent is there) ... it may even change in a incompatible way. Either way it appears like a feature that cant be used until the scaleable inotify happens (if it happens). was (Author: roshan_naik): Fist of all... thanks for your work on this. I took a look at the HDFS's inotify side of things and also spoke to a Hdfs committer. These over-arching concerns came up (partly alluded to earlier by you): 1. Currently INotify is restricted to HDFS admins because it doesn't scale (wrt namenode). So in its current state it seems unsuitable for Hdfs Spout kind of use case... even if we (unrealistically) asked users to run Storm worker as a HDFS admin user. 2. The proposal for scaling Inotify and opening it up to end users appears to have stalled for some time now. Although we are seeing some improvements (that you noted) in resource utilization from the Storm side, it seems not advisable from the HDFS namenode perspective. I think this feature in HDFS would be useful once a scaleable inotify solution is made publicly available by HDFS. The other option is to get this into Storm now and not use it till HDFS implements their scaleable inotify. My concern with that is we cant bet with certainty that final inotify will still work as we now expect it to (although the intent is there) ... it may even change in a incompatible way. Either way it appears like a feature that cant be used until the scaleable inotify happens (if it happens). > Storm-HDFS: inotify support > --- > > Key: STORM-2355 > URL: https://issues.apache.org/jira/browse/STORM-2355 > Project: Apache Storm > Issue Type: New Feature > Components: storm-hdfs >Reporter: Tibor Kiss >Assignee: Tibor Kiss > Fix For: 2.0.0, 1.1.0 > > Time Spent: 2h > Remaining Estimate: 0h > > This is a proposal to implement inotify based watch dir monitoring in > Storm-HDFS Spout. > *Motivation* > Storm-HDFS's HdfsSpout currently polls the Spout’s input directory using > Hadoop's {{FileSystem.listFiles()}}. This operation is expensive since it > returns the block locations and all stat information of the files inside the > watch directory. Moreover HdfsSpout currently uses only one element of the > returned Path list which is inefficient as the rest of the entries are thrown > away without processing. > The proposed design provides greater efficiency through the inotify interface > and also enables to easier extension of the original ({{listFiles()}} based) > monitoring with buffering (see Further work section below). > *High level design* > Goal is to leverage [HDFS inotify > API|http://hadoop.apache.org/docs/current/api//org/apache/hadoop/hdfs/DFSInotifyEventInputStream.html] > to monitor new file arrival to HdfsSpout's input directory. > The inotify based monitoring is an addition to the original > {{FileSystem.listFiles()}} based implementation, the default behavior of the > spout will be unchanged by this modification. > To unify the two monitoring methods and enable buffering an iterator based > ({{HdfsDirectoryMonitor}}) class is created. > To retain backward compatibility the HdfsSpout's default monitoring behavior > is unchanged, inotify based monitoring could be enabled through a parameter. > As inotify requires administrative privileges (see Caveat section below) a > fallback mechanism is be implemented in HdfsSpout to use the original > {{listFiles()}} based monitoring if initialization fails for inotify based > monitoring. > *Implementation
[jira] [Assigned] (STORM-2307) Revised Threading and Execution Model
[ https://issues.apache.org/jira/browse/STORM-2307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik reassigned STORM-2307: -- Assignee: Roshan Naik > Revised Threading and Execution Model > - > > Key: STORM-2307 > URL: https://issues.apache.org/jira/browse/STORM-2307 > Project: Apache Storm > Issue Type: Sub-task > Components: storm-core >Affects Versions: 2.0.0 >Reporter: Roshan Naik >Assignee: Roshan Naik > > Design Doc: > https://docs.google.com/document/d/1PBGQomJQ67gsLR0CNZlYfVWjGyzAEJMsQjpKumuyHuQ/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (STORM-2355) Storm-HDFS: inotify support
[ https://issues.apache.org/jira/browse/STORM-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15929183#comment-15929183 ] Roshan Naik commented on STORM-2355: Fist of all... thanks for your work on this. I took a look at the HDFS's inotify side of things and also spoke to a Hdfs committer. These over-arching concerns came up (partly alluded to earlier by you): 1. Currently INotify is restricted to HDFS admins because it doesn't scale (wrt namenode). So in its current state it seems unsuitable for Hdfs Spout kind of use case... even if we (unrealistically) asked users to run Storm worker as a HDFS admin user. 2. The proposal for scaling Inotify and opening it up to end users appears to have stalled for some time now. Although we are seeing some improvements (that you noted) in resource utilization from the Storm side, it seems not advisable from the HDFS namenode perspective. I think this feature in HDFS would be useful once a scaleable inotify solution is made publicly available by HDFS. The other option is to get this into Storm now and not use it till HDFS implements their scaleable inotify. My concern with that is we cant bet with certainty that final inotify will still work as we now expect it to (although the intent is there) ... it may even change in a incompatible way. Either way it appears like a feature that cant be used until the scaleable inotify happens (if it happens). > Storm-HDFS: inotify support > --- > > Key: STORM-2355 > URL: https://issues.apache.org/jira/browse/STORM-2355 > Project: Apache Storm > Issue Type: New Feature > Components: storm-hdfs >Reporter: Tibor Kiss >Assignee: Tibor Kiss > Fix For: 2.0.0, 1.1.0 > > Time Spent: 2h > Remaining Estimate: 0h > > This is a proposal to implement inotify based watch dir monitoring in > Storm-HDFS Spout. > *Motivation* > Storm-HDFS's HdfsSpout currently polls the Spout’s input directory using > Hadoop's {{FileSystem.listFiles()}}. This operation is expensive since it > returns the block locations and all stat information of the files inside the > watch directory. Moreover HdfsSpout currently uses only one element of the > returned Path list which is inefficient as the rest of the entries are thrown > away without processing. > The proposed design provides greater efficiency through the inotify interface > and also enables to easier extension of the original ({{listFiles()}} based) > monitoring with buffering (see Further work section below). > *High level design* > Goal is to leverage [HDFS inotify > API|http://hadoop.apache.org/docs/current/api//org/apache/hadoop/hdfs/DFSInotifyEventInputStream.html] > to monitor new file arrival to HdfsSpout's input directory. > The inotify based monitoring is an addition to the original > {{FileSystem.listFiles()}} based implementation, the default behavior of the > spout will be unchanged by this modification. > To unify the two monitoring methods and enable buffering an iterator based > ({{HdfsDirectoryMonitor}}) class is created. > To retain backward compatibility the HdfsSpout's default monitoring behavior > is unchanged, inotify based monitoring could be enabled through a parameter. > As inotify requires administrative privileges (see Caveat section below) a > fallback mechanism is be implemented in HdfsSpout to use the original > {{listFiles()}} based monitoring if initialization fails for inotify based > monitoring. > *Implementation details* > As inotify provides only a delta of the filesystem events from a given Tx Id > (of Hdfs Edit Log) it is required to do a {{FileSystem.listFiles()}} based > collection during the Spout's initialization to ensure that any left over > files are processed. > The inotify based implementation uses HdfsAdmin's > [{{DFSInotifyEventInputStream.poll()}}|http://hadoop.apache.org/docs/current/api//org/apache/hadoop/hdfs/DFSInotifyEventInputStream.html#poll--] > method to fetch and buffer the list of new files created since the provided > Tx Id to {{newFileList}} buffer. > During {{HdfsSpout.nextTuple()}} call one element is taken from the > {{newFileList}} buffer and processed by the spout. > The {{newFileList}} buffer is extended with the result of the > {{DFSInotifyEventInputStream.poll(lastTxId)}} call in every nextTuple() call. > Since HdfsSpout is able to create it's own {{HdfsAdmin()}} instance there > will be no need for the user to do additional initialization for the spout > even if inotify is enabled. > *Caveat* > HDFS inotify is currently available through hdfs administrator user only, but > there is ongoing discussion in Hadoop community to extend its support to > users. See: HDFS-8940 > *Further work* > 1) The number of calls to {{DFSInotifyEventInputStream.poll(lastTxId)}} could > be further reduced if
[jira] [Assigned] (STORM-2391) HdfsSpoutTopology example needs to be moved into storm-hdfs-examples from storm-starter
[ https://issues.apache.org/jira/browse/STORM-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roshan Naik reassigned STORM-2391: -- Assignee: Roshan Naik > HdfsSpoutTopology example needs to be moved into storm-hdfs-examples from > storm-starter > --- > > Key: STORM-2391 > URL: https://issues.apache.org/jira/browse/STORM-2391 > Project: Apache Storm > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Roshan Naik >Assignee: Roshan Naik > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (STORM-2391) HdfsSpoutTopology example needs to be moved into storm-hdfs-examples from storm-starter
Roshan Naik created STORM-2391: -- Summary: HdfsSpoutTopology example needs to be moved into storm-hdfs-examples from storm-starter Key: STORM-2391 URL: https://issues.apache.org/jira/browse/STORM-2391 Project: Apache Storm Issue Type: Bug Affects Versions: 1.1.0 Reporter: Roshan Naik -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (STORM-2390) The storm-*-examples jars are missing in the binary distro
Roshan Naik created STORM-2390: -- Summary: The storm-*-examples jars are missing in the binary distro Key: STORM-2390 URL: https://issues.apache.org/jira/browse/STORM-2390 Project: Apache Storm Issue Type: Bug Affects Versions: 1.1.0 Reporter: Roshan Naik Priority: Critical Fix For: 1.1.0 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (STORM-2389) Event Logger bolt is instantiated even if topology.eventlogger.executors=0
Roshan Naik created STORM-2389: -- Summary: Event Logger bolt is instantiated even if topology.eventlogger.executors=0 Key: STORM-2389 URL: https://issues.apache.org/jira/browse/STORM-2389 Project: Apache Storm Issue Type: Bug Affects Versions: 1.1.0 Reporter: Roshan Naik Priority: Blocker Fix For: 1.1.0 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (STORM-2364) Ensure kafka-monitor jar is included in classpath for UI process
[ https://issues.apache.org/jira/browse/STORM-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15872761#comment-15872761 ] Roshan Naik commented on STORM-2364: [~kabhwan] Here you go... {code} 2017-02-14 08:51:13.037 o.a.s.u.TopologySpoutLag [WARN] Exception thrown while getting lag for spout id: kafka-spout and spout class: org.apache.storm.kafka.KafkaSpout 2017-02-14 08:51:13.037 o.a.s.u.TopologySpoutLag [WARN] Exception message:Error: Could not find or load main class org.apache.storm.kafka.monitor.KafkaOffsetLagUtil org.apache.storm.utils.ShellUtils$ExitCodeException: Error: Could not find or load main class org.apache.storm.kafka.monitor.KafkaOffsetLagUtil at org.apache.storm.utils.ShellUtils.runCommand(ShellUtils.java:230) at org.apache.storm.utils.ShellUtils.run(ShellUtils.java:160) at org.apache.storm.utils.ShellUtils$ShellCommandExecutor.execute(ShellUtils.java:370) at org.apache.storm.utils.ShellUtils.execCommand(ShellUtils.java:460) at org.apache.storm.utils.ShellUtils.execCommand(ShellUtils.java:443) at org.apache.storm.utils.TopologySpoutLag.getLagResultForKafka(TopologySpoutLag.java:153) at org.apache.storm.utils.TopologySpoutLag.getLagResultForOldKafkaSpout(TopologySpoutLag.java:183) at org.apache.storm.utils.TopologySpoutLag.lag(TopologySpoutLag.java:59) at org.apache.storm.ui.core$topology_lag.invoke(core.clj:727) at org.apache.storm.ui.core$fn__10090.invoke(core.clj:1073) at org.apache.storm.shade.compojure.core$make_route$fn__1792.invoke(core.clj:93) at org.apache.storm.shade.compojure.core$if_route$fn__1780.invoke(core.clj:39) at org.apache.storm.shade.compojure.core$if_method$fn__1773.invoke(core.clj:24) at org.apache.storm.shade.compojure.core$routing$fn__1798.invoke(core.clj:106) at clojure.core$some.invoke(core.clj:2570) at org.apache.storm.shade.compojure.core$routing.doInvoke(core.clj:106) at clojure.lang.RestFn.applyTo(RestFn.java:139) at clojure.core$apply.invoke(core.clj:632) at org.apache.storm.shade.compojure.core$routes$fn__1802.invoke(core.clj:111) at org.apache.storm.shade.ring.middleware.cors$wrap_cors$fn__9443.invoke(cors.clj:149) at org.apache.storm.shade.ring.middleware.json$wrap_json_params$fn__9390.invoke(json.clj:56) at org.apache.storm.shade.ring.middleware.multipart_params$wrap_multipart_params$fn__2954.invoke(multipart_params.clj:103) at org.apache.storm.shade.ring.middleware.reload$wrap_reload$fn__8948.invoke(reload.clj:22) at org.apache.storm.ui.helpers$requests_middleware$fn__3197.invoke(helpers.clj:46) at org.apache.storm.ui.core$catch_errors$fn__10265.invoke(core.clj:1336) at org.apache.storm.shade.ring.middleware.keyword_params$wrap_keyword_params$fn__2885.invoke(keyword_params.clj:27) at org.apache.storm.shade.ring.middleware.nested_params$wrap_nested_params$fn__2925.invoke(nested_params.clj:65) at org.apache.storm.shade.ring.middleware.params$wrap_params$fn__2856.invoke(params.clj:55) at org.apache.storm.shade.ring.middleware.multipart_params$wrap_multipart_params$fn__2954.invoke(multipart_params.clj:103) at org.apache.storm.shade.ring.middleware.flash$wrap_flash$fn__3140.invoke(flash.clj:14) at org.apache.storm.shade.ring.middleware.session$wrap_session$fn__3128.invoke(session.clj:43) at org.apache.storm.shade.ring.middleware.cookies$wrap_cookies$fn__3056.invoke(cookies.clj:160) {code} > Ensure kafka-monitor jar is included in classpath for UI process > > > Key: STORM-2364 > URL: https://issues.apache.org/jira/browse/STORM-2364 > Project: Apache Storm > Issue Type: Bug >Affects Versions: 1.x >Reporter: Roshan Naik >Assignee: Roshan Naik > > The kafka-monitor jar has been moved into toollib/ and not featuring in UI > process's classpath. Leading to ClassNotFoundException for KafkaOffsetLagUtil > class ... seen in the ui.log -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (STORM-2364) Ensure kafka-monitor jar is included in classpath for UI process
Roshan Naik created STORM-2364: -- Summary: Ensure kafka-monitor jar is included in classpath for UI process Key: STORM-2364 URL: https://issues.apache.org/jira/browse/STORM-2364 Project: Apache Storm Issue Type: Bug Affects Versions: 1.x Reporter: Roshan Naik Assignee: Roshan Naik The kafka-monitor jar has been moved into toollib/ and not featuring in UI process's classpath. Leading to ClassNotFoundException for KafkaOffsetLagUtil class ... seen in the ui.log -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (STORM-2358) Update storm hdfs spout to remove specific implementation handlings
[ https://issues.apache.org/jira/browse/STORM-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15868672#comment-15868672 ] Roshan Naik commented on STORM-2358: *Pt 2* If you have a proposal to support aliasing without that additional entries, then thats worth discussing. But if you are lobbying to remove features for minor beautification / little extra work, it sounds like wrong attitude. *Pt 3* Really ? Perhaps you can answer this question... How many fields in a CSV file ? I feel you going topsy turvy over non-issues. > Update storm hdfs spout to remove specific implementation handlings > --- > > Key: STORM-2358 > URL: https://issues.apache.org/jira/browse/STORM-2358 > Project: Apache Storm > Issue Type: Improvement > Components: storm-hdfs >Affects Versions: 1.x >Reporter: Sachin Pasalkar > Labels: newbie > Attachments: AbstractFileReader.java, FileReader.java, > HDFSSpout.java, SequenceFileReader.java, TextFileReader.java > > Time Spent: 10m > Remaining Estimate: 0h > > I was looking at storm hdfs spout code in 1.x branch, I found below > improvements can be made in below code. > 1. Make org.apache.storm.hdfs.spout.AbstractFileReader as public so > that it can be used in generics. > 2. org.apache.storm.hdfs.spout.HdfsSpout requires readerType as > String. It will be great to have class > readerType; So we will not use Class.forName at multiple places also it > will help in below point. > 3. HdfsSpout also needs to provide outFields which are declared as > constants in each reader(e.g.SequenceFileReader). We can have abstract > API AbstractFileReader in which return them to user to make it generic. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (STORM-2358) Update storm hdfs spout to remove specific implementation handlings
[ https://issues.apache.org/jira/browse/STORM-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867316#comment-15867316 ] Roshan Naik commented on STORM-2358: *Pt 2.* Switching to Class wont make things any more generic than what we already have. It will make things less user-friendly. as we cant have alias-es for built-ins anymore, which is desirable IMO. Would be nice if we could have alias-es for custom types as well.. but we cant .. as it requires a registration step. This isn't a good reason to be changing a public API and lose aliasing ability at the same time. `ZippedTextFileReader` should add an alias if it is going to be a built-in. *Pt 3* We don't want to do that. We can't expect all the custom FileReaders to dictate these output field names (even optionally) to the spout. Reason being, in many cases its not even possible for a custom FileReader to know what the field names will be or even how many fields there will be ... for example any format with multiple columns... say a CSV reader or RegEx reader. We want the user to set the output fields names depending on the data he is feeding it with.Thats why those constants are for convenience only and not part of the FileReader interface. > Update storm hdfs spout to remove specific implementation handlings > --- > > Key: STORM-2358 > URL: https://issues.apache.org/jira/browse/STORM-2358 > Project: Apache Storm > Issue Type: Improvement > Components: storm-hdfs >Affects Versions: 1.x >Reporter: Sachin Pasalkar > Labels: newbie > Attachments: AbstractFileReader.java, FileReader.java, > HDFSSpout.java, SequenceFileReader.java, TextFileReader.java > > > I was looking at storm hdfs spout code in 1.x branch, I found below > improvements can be made in below code. > 1. Make org.apache.storm.hdfs.spout.AbstractFileReader as public so > that it can be used in generics. > 2. org.apache.storm.hdfs.spout.HdfsSpout requires readerType as > String. It will be great to have class > readerType; So we will not use Class.forName at multiple places also it > will help in below point. > 3. HdfsSpout also needs to provide outFields which are declared as > constants in each reader(e.g.SequenceFileReader). We can have abstract > API AbstractFileReader in which return them to user to make it generic. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (STORM-2358) Update storm hdfs spout to remove specific implementation handlings
[ https://issues.apache.org/jira/browse/STORM-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15867169#comment-15867169 ] Roshan Naik commented on STORM-2358: *Pt 1:* It is not clear that there is a real need to make it public... but, as i said, i am not opposed to it. The Java generics reasoning is invalid though. *Pt 2: Again .. the current design *does* support custom readers and not limited to built-in readers. Just provide a fully qualified class name (FQCN) there for custom reader for the reader type. For built-ins u can use either alias or FQCN. Putting in a class type takes away the ability to have short aliases for built-in readers. pt 3: Sounds like you agree > Update storm hdfs spout to remove specific implementation handlings > --- > > Key: STORM-2358 > URL: https://issues.apache.org/jira/browse/STORM-2358 > Project: Apache Storm > Issue Type: Improvement > Components: storm-hdfs >Affects Versions: 1.x >Reporter: Sachin Pasalkar > Labels: newbie > Attachments: AbstractFileReader.java, FileReader.java, > HDFSSpout.java, SequenceFileReader.java, TextFileReader.java > > > I was looking at storm hdfs spout code in 1.x branch, I found below > improvements can be made in below code. > 1. Make org.apache.storm.hdfs.spout.AbstractFileReader as public so > that it can be used in generics. > 2. org.apache.storm.hdfs.spout.HdfsSpout requires readerType as > String. It will be great to have class > readerType; So we will not use Class.forName at multiple places also it > will help in below point. > 3. HdfsSpout also needs to provide outFields which are declared as > constants in each reader(e.g.SequenceFileReader). We can have abstract > API AbstractFileReader in which return them to user to make it generic. -- This message was sent by Atlassian JIRA (v6.3.15#6346)