from:"Roshan Naik"

[jira] [Created] (STORM-1892) class org.apache.storm.hdfs.spout.TextFileReader should be public

2016-06-08 Thread Roshan Naik (JIRA)

Roshan Naik created STORM-1892:
--

 Summary: class org.apache.storm.hdfs.spout.TextFileReader should 
be public
 Key: STORM-1892
 URL: https://issues.apache.org/jira/browse/STORM-1892
 Project: Apache Storm
  Issue Type: Bug
Affects Versions: 1.0.1
Reporter: Roshan Naik
Assignee: Roshan Naik






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (STORM-1474) Address remaining minor review comments for STORM-1199

2016-01-14 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-1474:
---
Issue Type: Sub-task  (was: Bug)
Parent: STORM-1199

> Address remaining minor review comments for STORM-1199
> --
>
> Key: STORM-1474
> URL: https://issues.apache.org/jira/browse/STORM-1474
> Project: Apache Storm
>  Issue Type: Sub-task
>  Components: storm-hdfs
>    Reporter: Roshan Naik
>    Assignee: Roshan Naik
>Priority: Minor
>
> Address the last few pending review comments  from 
> https://github.com/apache/storm/pull/936



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (STORM-1474) Address remaining minor review comments for STORM-1199

2016-01-14 Thread Roshan Naik (JIRA)

Roshan Naik created STORM-1474:
--

 Summary: Address remaining minor review comments for STORM-1199
 Key: STORM-1474
 URL: https://issues.apache.org/jira/browse/STORM-1474
 Project: Apache Storm
  Issue Type: Bug
  Components: storm-hdfs
Reporter: Roshan Naik
Assignee: Roshan Naik
Priority: Minor


Address the last few pending review comments  from 
https://github.com/apache/storm/pull/936



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-1199) Create HDFS Spout

2016-01-14 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/STORM-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098948#comment-15098948
 ] 

Roshan Naik commented on STORM-1199:


Thanks all for your input/feedback/reviews ..  they were very useful.

> Create HDFS Spout
> -
>
> Key: STORM-1199
> URL: https://issues.apache.org/jira/browse/STORM-1199
> Project: Apache Storm
>  Issue Type: New Feature
>    Reporter: Roshan Naik
>    Assignee: Roshan Naik
> Fix For: 1.0.0
>
> Attachments: HDFSSpoutforStorm v2.pdf, HDFSSpoutforStorm.pdf, 
> hdfs-spout.1.patch
>
>
> Create an HDFS spout so that Storm can suck in data from files in a HDFS 
> directory



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-1526) Improve Storm core performance

2016-02-09 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/STORM-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15139711#comment-15139711
 ] 

Roshan Naik commented on STORM-1526:


Thanks [~kabhwan] for merging to 1.x also.

> Improve Storm core performance
> --
>
> Key: STORM-1526
> URL: https://issues.apache.org/jira/browse/STORM-1526
> Project: Apache Storm
>  Issue Type: Bug
>    Reporter: Roshan Naik
>    Assignee: Roshan Naik
> Fix For: 1.0.0
>
>
> Profiling a Speed of Light toplogy running on  Storm core without ACKers is 
> showing:
> - Call tree info : shows that a big part of the nextTuple() invocation is 
> consumed in the SpoutOutputCollector.emit() call. 20% of it goes in 
> Reflection by the clojure code
> Method Stats view : Shows that a lot of time is spent blocking on the 
> disruptor queue
> The performance issue is narrowed down to this Clojure code in executor.clj :
> {code}
> (defn mk-custom-grouper
>   [^CustomStreamGrouping grouping ^WorkerTopologyContext context ^String
> component-id ^String stream-id target-tasks]
>   (.prepare grouping context (GlobalStreamId. component-id stream-id)
> target-tasks)
>   (if (instance? LoadAwareCustomStreamGrouping grouping)
>  (fn. [task-id ^List values load]
> (.chooseTasks grouping task-id values load)); <-- problematic
> invocation
>  (fn [task-id ^List values load]
> (.chooseTasks grouping task-id values
> {code}
> *grouping* is statically typed to the base type CustomStreamGrouping. In
> this run, its actual type is the derived type
> LoadAwareCustomStreamGrouping.
> The base type does not have a chooseTasks() method with 3 args. Only the
> derived type has that method. Consequently clojure falls back to
> dynamically iterating over the methods in the *grouping* object to locate
> the right method & then invoke it appropriately. This falls in the
> critical path  SpoutOutputCollector.emit() where it takes about ~20% time
> .. just locating the right method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (STORM-1526) Improve Storm core performance

2016-02-04 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik reassigned STORM-1526:
--

Assignee: Roshan Naik

> Improve Storm core performance
> --
>
> Key: STORM-1526
> URL: https://issues.apache.org/jira/browse/STORM-1526
> Project: Apache Storm
>  Issue Type: Bug
>    Reporter: Roshan Naik
>    Assignee: Roshan Naik
>
> Profiling a Speed of Light toplogy running on  Storm core without ACKers is 
> showing:
> - Call tree info : shows that a big part of the nextTuple() invocation is 
> consumed in the SpoutOutputCollector.emit() call. 20% of it goes in 
> Reflection by the clojure code
> Method Stats view : Shows that a lot of time is spent blocking on the 
> disruptor queue
> The performance issue is narrowed down to this Clojure code in executor.clj :
> {code}
> (defn mk-custom-grouper
>   [^CustomStreamGrouping grouping ^WorkerTopologyContext context ^String
> component-id ^String stream-id target-tasks]
>   (.prepare grouping context (GlobalStreamId. component-id stream-id)
> target-tasks)
>   (if (instance? LoadAwareCustomStreamGrouping grouping)
>  (fn. [task-id ^List values load]
> (.chooseTasks grouping task-id values load)); <-- problematic
> invocation
>  (fn [task-id ^List values load]
> (.chooseTasks grouping task-id values
> {code}
> *grouping* is statically typed to the base type CustomStreamGrouping. In
> this run, its actual type is the derived type
> LoadAwareCustomStreamGrouping.
> The base type does not have a chooseTasks() method with 3 args. Only the
> derived type has that method. Consequently clojure falls back to
> dynamically iterating over the methods in the *grouping* object to locate
> the right method & then invoke it appropriately. This falls in the
> critical path  SpoutOutputCollector.emit() where it takes about ~20% time
> .. just locating the right method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-1526) Improve Storm core performance

2016-02-05 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/STORM-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15134640#comment-15134640
 ] 

Roshan Naik commented on STORM-1526:


[~dossett] i updated the PR title.

> Improve Storm core performance
> --
>
> Key: STORM-1526
> URL: https://issues.apache.org/jira/browse/STORM-1526
> Project: Apache Storm
>  Issue Type: Bug
>    Reporter: Roshan Naik
>    Assignee: Roshan Naik
>
> Profiling a Speed of Light toplogy running on  Storm core without ACKers is 
> showing:
> - Call tree info : shows that a big part of the nextTuple() invocation is 
> consumed in the SpoutOutputCollector.emit() call. 20% of it goes in 
> Reflection by the clojure code
> Method Stats view : Shows that a lot of time is spent blocking on the 
> disruptor queue
> The performance issue is narrowed down to this Clojure code in executor.clj :
> {code}
> (defn mk-custom-grouper
>   [^CustomStreamGrouping grouping ^WorkerTopologyContext context ^String
> component-id ^String stream-id target-tasks]
>   (.prepare grouping context (GlobalStreamId. component-id stream-id)
> target-tasks)
>   (if (instance? LoadAwareCustomStreamGrouping grouping)
>  (fn. [task-id ^List values load]
> (.chooseTasks grouping task-id values load)); <-- problematic
> invocation
>  (fn [task-id ^List values load]
> (.chooseTasks grouping task-id values
> {code}
> *grouping* is statically typed to the base type CustomStreamGrouping. In
> this run, its actual type is the derived type
> LoadAwareCustomStreamGrouping.
> The base type does not have a chooseTasks() method with 3 args. Only the
> derived type has that method. Consequently clojure falls back to
> dynamically iterating over the methods in the *grouping* object to locate
> the right method & then invoke it appropriately. This falls in the
> critical path  SpoutOutputCollector.emit() where it takes about ~20% time
> .. just locating the right method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (STORM-1526) Improve Storm core performance

2016-02-04 Thread Roshan Naik (JIRA)

Roshan Naik created STORM-1526:
--

 Summary: Improve Storm core performance
 Key: STORM-1526
 URL: https://issues.apache.org/jira/browse/STORM-1526
 Project: Apache Storm
  Issue Type: Bug
Reporter: Roshan Naik


Profiling a Speed of Light toplogy running on  Storm core without ACKers is 
showing:

- Call tree info : shows that a big part of the nextTuple() invocation is 
consumed in the SpoutOutputCollector.emit() call. 20% of it goes in Reflection 
by the clojure code

Method Stats view : Shows that a lot of time is spent blocking on the disruptor 
queue


The performance issue is narrowed down to this Clojure code in executor.clj :
{code}
(defn mk-custom-grouper
  [^CustomStreamGrouping grouping ^WorkerTopologyContext context ^String
component-id ^String stream-id target-tasks]
  (.prepare grouping context (GlobalStreamId. component-id stream-id)
target-tasks)
  (if (instance? LoadAwareCustomStreamGrouping grouping)
 (fn. [task-id ^List values load]
(.chooseTasks grouping task-id values load)); <-- problematic
invocation
 (fn [task-id ^List values load]
(.chooseTasks grouping task-id values

{code}


*grouping* is statically typed to the base type CustomStreamGrouping. In
this run, its actual type is the derived type
LoadAwareCustomStreamGrouping.
The base type does not have a chooseTasks() method with 3 args. Only the
derived type has that method. Consequently clojure falls back to
dynamically iterating over the methods in the *grouping* object to locate
the right method & then invoke it appropriately. This falls in the
critical path  SpoutOutputCollector.emit() where it takes about ~20% time
.. just locating the right method.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-1539) Improve Storm ACK-ing performance

2016-02-11 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/STORM-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143663#comment-15143663
 ] 

Roshan Naik commented on STORM-1539:


The attached profiler info related to the amount of time taken by ~45k 
invocations each of  Spout.nextTuple() and  Bolt.execute()  suggests a perf 
boost of:

*Spout.nextTuple() :*   6953ms -> 5396ms  =  *~30%* improvement
*Bolt.execute() :* 5313ms -> 3687ms =   *~ 44%* improvement

> Improve Storm ACK-ing performance
> -
>
> Key: STORM-1539
> URL: https://issues.apache.org/jira/browse/STORM-1539
> Project: Apache Storm
>  Issue Type: Bug
>    Reporter: Roshan Naik
>Assignee: Roshan Naik
> Attachments: after.png, before.png
>
>
> Profiling a simple speed of light topology, shows that a good chunk of time 
> of the SpoutOutputCollector.emit() is spent  in the clojure reduce()  
> function.. which is part of the ACK-ing logic. 
> Re-implementing this reduce() logic in java gives a big performance boost in  
> both in the Spout.nextTuple() and  Bolt.execute()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (STORM-1539) Improve Storm ACK-ing performance

2016-02-11 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-1539:
---
Attachment: after.png
before.png

Attaching before/after  screenshots of profiler screenshots

> Improve Storm ACK-ing performance
> -
>
> Key: STORM-1539
> URL: https://issues.apache.org/jira/browse/STORM-1539
> Project: Apache Storm
>  Issue Type: Bug
>    Reporter: Roshan Naik
>    Assignee: Roshan Naik
> Attachments: after.png, before.png
>
>
> Profiling a simple speed of light topology, shows that a good chunk of time 
> of the SpoutOutputCollector.emit() is spent  in the clojure reduce()  
> function.. which is part of the ACK-ing logic. 
> Re-implementing this reduce() logic in java gives a big performance boost in  
> both in the Spout.nextTuple() and  Bolt.execute()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (STORM-1632) Disable event logging by default

2016-03-15 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-1632:
---
Priority: Blocker  (was: Major)

> Disable event logging by default
> 
>
> Key: STORM-1632
> URL: https://issues.apache.org/jira/browse/STORM-1632
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>    Reporter: Roshan Naik
>    Assignee: Roshan Naik
>Priority: Blocker
> Fix For: 1.0.0
>
>
> EventLogging has performance penalty. For a simple speed of light topology  
> with a single instances of a spout and a bolt, disabling event logging 
> indicates a 7% to 9% perf improvement (with acker count =1)
> Event logging can be enabled when there is need to do debug, but turned off 
> by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (STORM-1580) Secure hdfs spout failed

2016-04-05 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-1580:
---
Attachment: HdfsSpoutTopology.java

Sorry for the delayed update... kept getting pulled into other urgent things 
and took sometime to setup a kerberized cluster.

*Update:*  I modified  HdfsSpoutTopology.java  (from examples/storm-starter )  
for  kerberos and tried it on a secure cluster. It worked fine.

I am attaching the modified java file.

Your error might indicate some issue likely on the kerberos setup side. 
try these:
- kinit with the same keytab and principal on that host and verify its ok by 
running some hadoop fs -ls  commands 
- Ensure hdfs-site.xml and core-site.xml from the kerberized cluster are 
packaged as resources in your topology. A quick way to do this is to copy them 
into storm/lib and restart supervisor.


> Secure hdfs spout failed
> 
>
> Key: STORM-1580
> URL: https://issues.apache.org/jira/browse/STORM-1580
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-hdfs
>Reporter: guoht
>  Labels: security
> Attachments: HdfsSpoutTopology.java
>
>
> Some error occured when using secure hdfs spout:
> "Login successful for user t...@example.com using keytab file 
> /home/test/test.keytab
> 2016-02-26 10:33:14 o.a.h.i.Client [WARN] Exception encountered while 
> connecting to the server : javax.security.sasl.SaslException: GSS initiate 
> failed [Caused by GSSException: No valid credentials provided (Mechanism 
> level: Failed to find any Kerberos tgt)]
> 2016-02-26 10:33:14 o.a.h.i.Client [WARN] Exception encountered while 
> connecting to the server : javax.security.sasl.SaslException: GSS initiate 
> failed [Caused by GSSException: No valid credentials provided (Mechanism 
> level: Failed to find any Kerberos tgt)]
> 2016-02-26 10:33:14 o.a.h.i.r.RetryInvocationHandler [INFO] Exception while 
> invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB over 
> hnn025/192.168.137.2:8020 after 1 fail over attempts. Trying to fail over 
> immediately.
> java.io.IOException: Failed on local exception: java.io.IOException: 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]; Host Details : local host is: "HDD021/192.168.137.6"; 
> destination host is: "hnn025":8020;"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (STORM-1643) Performance Fix: Optimize clojure lookups related to throttling and stats

2016-03-21 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-1643:
---
Summary: Performance Fix: Optimize clojure lookups related to throttling 
and stats  (was: Performance Fix: Optimize clojure lookups related throttling 
and stats)

> Performance Fix: Optimize clojure lookups related to throttling and stats
> -
>
> Key: STORM-1643
> URL: https://issues.apache.org/jira/browse/STORM-1643
> Project: Apache Storm
>  Issue Type: Bug
>    Reporter: Roshan Naik
>    Assignee: Roshan Naik
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (STORM-1643) Performance Fix: Optimize clojure lookups related throttling and stats

2016-03-21 Thread Roshan Naik (JIRA)

Roshan Naik created STORM-1643:
--

 Summary: Performance Fix: Optimize clojure lookups related 
throttling and stats
 Key: STORM-1643
 URL: https://issues.apache.org/jira/browse/STORM-1643
 Project: Apache Storm
  Issue Type: Bug
Reporter: Roshan Naik
Assignee: Roshan Naik






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-1580) Secure hdfs spout failed

2016-03-23 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/STORM-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209389#comment-15209389
 ] 

Roshan Naik commented on STORM-1580:


[~ght]  fyi.. I am beginning to take a look at this.

> Secure hdfs spout failed
> 
>
> Key: STORM-1580
> URL: https://issues.apache.org/jira/browse/STORM-1580
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-hdfs
>Reporter: guoht
>  Labels: security
>
> Some error occured when using secure hdfs spout:
> "Login successful for user t...@example.com using keytab file 
> /home/test/test.keytab
> 2016-02-26 10:33:14 o.a.h.i.Client [WARN] Exception encountered while 
> connecting to the server : javax.security.sasl.SaslException: GSS initiate 
> failed [Caused by GSSException: No valid credentials provided (Mechanism 
> level: Failed to find any Kerberos tgt)]
> 2016-02-26 10:33:14 o.a.h.i.Client [WARN] Exception encountered while 
> connecting to the server : javax.security.sasl.SaslException: GSS initiate 
> failed [Caused by GSSException: No valid credentials provided (Mechanism 
> level: Failed to find any Kerberos tgt)]
> 2016-02-26 10:33:14 o.a.h.i.r.RetryInvocationHandler [INFO] Exception while 
> invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB over 
> hnn025/192.168.137.2:8020 after 1 fail over attempts. Trying to fail over 
> immediately.
> java.io.IOException: Failed on local exception: java.io.IOException: 
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]; Host Details : local host is: "HDD021/192.168.137.6"; 
> destination host is: "hnn025":8020;"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-1643) Performance Fix: Optimize clojure lookups related to throttling and stats

2016-03-21 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/STORM-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204568#comment-15204568
 ] 

Roshan Naik commented on STORM-1643:


:key  lookups in Clojure are expensive. And some of keys like :storm-conf are 
being looked up multiple times.   Optimizing these by looking up once and 
reusing improves performance.

Will attach profiler screenshots demonstrating the before and after  report.

> Performance Fix: Optimize clojure lookups related to throttling and stats
> -
>
> Key: STORM-1643
> URL: https://issues.apache.org/jira/browse/STORM-1643
> Project: Apache Storm
>  Issue Type: Bug
>    Reporter: Roshan Naik
>    Assignee: Roshan Naik
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-1643) Performance Fix: Optimize clojure lookups related to throttling and stats

2016-03-21 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/STORM-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204576#comment-15204576
 ] 

Roshan Naik commented on STORM-1643:


[~revans2]
This is for 1.x.  
Wrt  2.x... the fixes are small enough that we could optionally choose to not 
wait for the Java rewrite. 

> Performance Fix: Optimize clojure lookups related to throttling and stats
> -
>
> Key: STORM-1643
> URL: https://issues.apache.org/jira/browse/STORM-1643
> Project: Apache Storm
>  Issue Type: Bug
>    Reporter: Roshan Naik
>    Assignee: Roshan Naik
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (STORM-1643) Performance Fix: Optimize clojure lookups related to throttling and stats

2016-03-21 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-1643:
---
Comment: was deleted

(was: :key  lookups in Clojure are expensive. And some of keys like :storm-conf 
are being looked up multiple times.   Optimizing these by looking up once and 
reusing improves performance.

Will attach profiler screenshots demonstrating the before and after  report.)

> Performance Fix: Optimize clojure lookups related to throttling and stats
> -
>
> Key: STORM-1643
> URL: https://issues.apache.org/jira/browse/STORM-1643
> Project: Apache Storm
>  Issue Type: Bug
>    Reporter: Roshan Naik
>    Assignee: Roshan Naik
> Attachments: after.png, before.png
>
>
> :key lookups in Clojure are expensive. And some of keys like :storm-conf are 
> being looked up multiple times. Optimizing these by looking up once and 
> reusing improves performance.
> Will attach profiler screenshots demonstrating the before and after report.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (STORM-1643) Performance Fix: Optimize clojure lookups related to throttling and stats

2016-03-21 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-1643:
---
Attachment: after.png
before.png

> Performance Fix: Optimize clojure lookups related to throttling and stats
> -
>
> Key: STORM-1643
> URL: https://issues.apache.org/jira/browse/STORM-1643
> Project: Apache Storm
>  Issue Type: Bug
>    Reporter: Roshan Naik
>    Assignee: Roshan Naik
> Attachments: after.png, before.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (STORM-1643) Performance Fix: Optimize clojure lookups related to throttling and stats

2016-03-21 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-1643:
---
Description: 
:key lookups in Clojure are expensive. And some of keys like :storm-conf are 
being looked up multiple times. Optimizing these by looking up once and reusing 
improves performance.
Will attach profiler screenshots demonstrating the before and after report.

> Performance Fix: Optimize clojure lookups related to throttling and stats
> -
>
> Key: STORM-1643
> URL: https://issues.apache.org/jira/browse/STORM-1643
> Project: Apache Storm
>  Issue Type: Bug
>    Reporter: Roshan Naik
>    Assignee: Roshan Naik
> Attachments: after.png, before.png
>
>
> :key lookups in Clojure are expensive. And some of keys like :storm-conf are 
> being looked up multiple times. Optimizing these by looking up once and 
> reusing improves performance.
> Will attach profiler screenshots demonstrating the before and after report.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (STORM-1643) Performance Fix: Optimize clojure lookups related to throttling and stats

2016-03-21 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-1643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-1643:
---
Attachment: before2.png
after2.png

Attaching another pair of before2/after2 profiler screenshots highlighting 
another node in call tree where the perf difference is observed

> Performance Fix: Optimize clojure lookups related to throttling and stats
> -
>
> Key: STORM-1643
> URL: https://issues.apache.org/jira/browse/STORM-1643
> Project: Apache Storm
>  Issue Type: Bug
>    Reporter: Roshan Naik
>    Assignee: Roshan Naik
> Attachments: after.png, after2.png, before.png, before2.png
>
>
> :key lookups in Clojure are expensive. And some of keys like :storm-conf are 
> being looked up multiple times. Optimizing these by looking up once and 
> reusing improves performance.
> Will attach profiler screenshots demonstrating the before and after report.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (STORM-1632) Disable event logging by default

2016-03-22 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-1632:
---
Description: 
EventLogging has performance penalty. For a simple speed of light topology  
with a single instances of a spout and a bolt, disabling event logging delivers 
a 7% to 9% perf improvement (with acker count =1)

Event logging can be enabled when there is need to do debug, but turned off by 
default.


**Update:** with acker=0  the observed impact was much higher... **30%** faster 
when event loggers = 0 

  was:
EventLogging has performance penalty. For a simple speed of light topology  
with a single instances of a spout and a bolt, disabling event logging delivers 
a 7% to 9% perf improvement (with acker count =1)

Event logging can be enabled when there is need to do debug, but turned off by 
default.


**Update:** with acker=0  the observed impact was much higher... **30%** faster 
with event loggers = 0 


> Disable event logging by default
> 
>
> Key: STORM-1632
> URL: https://issues.apache.org/jira/browse/STORM-1632
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>    Reporter: Roshan Naik
>    Assignee: Roshan Naik
>Priority: Blocker
> Fix For: 1.0.0
>
>
> EventLogging has performance penalty. For a simple speed of light topology  
> with a single instances of a spout and a bolt, disabling event logging 
> delivers a 7% to 9% perf improvement (with acker count =1)
> Event logging can be enabled when there is need to do debug, but turned off 
> by default.
> **Update:** with acker=0  the observed impact was much higher... **30%** 
> faster when event loggers = 0 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (STORM-1632) Disable event logging by default

2016-03-22 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-1632:
---
Description: 
EventLogging has performance penalty. For a simple speed of light topology  
with a single instances of a spout and a bolt, disabling event logging delivers 
a 7% to 9% perf improvement (with acker count =1)

Event logging can be enabled when there is need to do debug, but turned off by 
default.


**Update:** with acker=0  the observed impact was much higher... **30%** faster 
with event loggers = 0 

  was:
EventLogging has performance penalty. For a simple speed of light topology  
with a single instances of a spout and a bolt, disabling event logging delivers 
a 7% to 9% perf improvement (with acker count =1)

Event logging can be enabled when there is need to do debug, but turned off by 
default.


> Disable event logging by default
> 
>
> Key: STORM-1632
> URL: https://issues.apache.org/jira/browse/STORM-1632
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>    Reporter: Roshan Naik
>    Assignee: Roshan Naik
>Priority: Blocker
> Fix For: 1.0.0
>
>
> EventLogging has performance penalty. For a simple speed of light topology  
> with a single instances of a spout and a bolt, disabling event logging 
> delivers a 7% to 9% perf improvement (with acker count =1)
> Event logging can be enabled when there is need to do debug, but turned off 
> by default.
> **Update:** with acker=0  the observed impact was much higher... **30%** 
> faster with event loggers = 0 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (STORM-1632) Disable event logging by default

2016-03-22 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-1632:
---
Description: 
EventLogging has performance penalty. For a simple speed of light topology  
with a single instances of a spout and a bolt, disabling event logging delivers 
a 7% to 9% perf improvement (with acker count =1)

Event logging can be enabled when there is need to do debug, but turned off by 
default.


**Update:** with acker=0  the observed impact was much higher... **25%** faster 
when event loggers = 0 

  was:
EventLogging has performance penalty. For a simple speed of light topology  
with a single instances of a spout and a bolt, disabling event logging delivers 
a 7% to 9% perf improvement (with acker count =1)

Event logging can be enabled when there is need to do debug, but turned off by 
default.


**Update:** with acker=0  the observed impact was much higher... **30%** faster 
when event loggers = 0 


> Disable event logging by default
> 
>
> Key: STORM-1632
> URL: https://issues.apache.org/jira/browse/STORM-1632
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>    Reporter: Roshan Naik
>    Assignee: Roshan Naik
>Priority: Blocker
> Fix For: 1.0.0
>
>
> EventLogging has performance penalty. For a simple speed of light topology  
> with a single instances of a spout and a bolt, disabling event logging 
> delivers a 7% to 9% perf improvement (with acker count =1)
> Event logging can be enabled when there is need to do debug, but turned off 
> by default.
> **Update:** with acker=0  the observed impact was much higher... **25%** 
> faster when event loggers = 0 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (STORM-1632) Disable event logging by default

2016-03-24 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-1632:
---
Attachment: BasicTopology.java

uploading  topology code to validate perf hit. 
To use it: .. 
1) Copy the java file into examples/storm-starter  
2) Rebuild  the  storm-starter package using mvn. 
3) Run topology as follows:

storm  jar 
/Users/rnaik/Projects/idea/storm/examples/storm-starter/target/storm-starter-1.0.0-SNAPSHOT.jar
 -c topology.eventlogger.executors=0 -c topology.max.spout.pending=2000  -c 
topology.disruptor.batch.size=1storm.starter.BasicTopology

and then again  with  {{topology.eventlogger.executors=1}}

I have set those additional 2 flags as they improved performance over the 
defaults for this topology.

I normally let it run for about 11 min  and then capture the 10 min window 
metrics from UI page.

> Disable event logging by default
> 
>
> Key: STORM-1632
> URL: https://issues.apache.org/jira/browse/STORM-1632
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>    Reporter: Roshan Naik
>    Assignee: Roshan Naik
>Priority: Blocker
> Fix For: 1.0.0
>
> Attachments: BasicTopology.java
>
>
> EventLogging has performance penalty. For a simple speed of light topology  
> with a single instances of a spout and a bolt, disabling event logging 
> delivers a 7% to 9% perf improvement (with acker count =1)
> Event logging can be enabled when there is need to do debug, but turned off 
> by default.
> **Update:** with acker=0  the observed impact was much higher... **25%** 
> faster when event loggers = 0 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-1772) Create topologies for measuring performance

2016-05-10 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/STORM-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15279190#comment-15279190
 ] 

Roshan Naik commented on STORM-1772:


Hi [~mauzhang],
Yes thats it. I first observed that perf difference issue when working on 
STORM-1632, but was not able to get to the bottom of it. The storm native 
topology mentioned here : 
https://github.com/apache/storm/pull/1217#issuecomment-201074919

I can try to locate the benchmark-specific version of the topology but its a 
straightforward rewrite.

The storm native showed a  difference of ~12% when doing a A/B test (with and 
without the fix)
The benchmark specific version of the topology .. it was 25%  as noted in the 
description of STORM-1632.



IMO..  briefly ignoring the perf diff issue, it would be good to go ahead and 
see what we can incorporate from that benchmark . In this jira my goal is to 
add a few topologies for perf testing... not to create a benchmarking 
tool/framework itself. In that sense its not conflicting with STORM-642. 

*side note:* If we are adding a benchmarking framework, it would be good if it 
can run standard Storm topologies directly and not require topologies to be 
written specifically for it.

> Create topologies for measuring performance
> ---
>
> Key: STORM-1772
> URL: https://issues.apache.org/jira/browse/STORM-1772
> Project: Apache Storm
>  Issue Type: Bug
>    Reporter: Roshan Naik
>    Assignee: Roshan Naik
>
> Would be very useful to have some simple reference topologies included with 
> Storm that can be used to measure performance both by devs during development 
> (to start with) and perhaps also on a real storm cluster (subsequently). 
> To start with, the goal is to put the focus on the performance 
> characteristics of individual building blocks such as specifics bolts, 
> spouts,  grouping options, queues, etc. So, initially biased towards 
> micro-benchmarking but subsequently we could add higher level ones too.
> Although there is a storm benchmarking tool (originally written by Intel?) 
> that can be used, and i have personally used it, its better for this to be 
> integrated into Storm proper and also maintained by devs as storm evolves. 
> On a side note, in some instances I have noticed (to my surprise) that the 
> perf numbers change when the topologies written for Intel benchmark when 
> rewritten without the required wrappers so that they runs directly under 
> Storm.
> Have a few topologies in mind for measuring each of these:
> # *Queuing and Spout Emit Performance:* A topology with a Generator Spout but 
> no bolts.
> # *Queuing & Grouping performance*:   Generator Spout -> A grouping method -> 
> DevNull Bolt
> # *Hdfs Bolt:*Generator Spout ->  Hdfs Bolt
> # *Hdfs Spout:*   Hdfs Spout ->  DevNull Botl
> # *Kafka Spout:*   Kafka Spout ->  DevNull Bolt 
> # *Simple Data Movement*: Kafka Spout -> Hdfs Bolt
> Shall add these for Storm core first. Then we can have the same for Trident 
> also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (STORM-1772) Create topologies for measuring performance

2016-05-09 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik reassigned STORM-1772:
--

Assignee: Roshan Naik

> Create topologies for measuring performance
> ---
>
> Key: STORM-1772
> URL: https://issues.apache.org/jira/browse/STORM-1772
> Project: Apache Storm
>  Issue Type: Bug
>    Reporter: Roshan Naik
>    Assignee: Roshan Naik
>
> Would be very useful to have some simple reference topologies included with 
> Storm that can be used to measure performance both by devs during development 
> (to start with) and perhaps also on a real storm cluster (subsequently). 
> To start with, the goal is to put the focus on the performance 
> characteristics of individual building blocks such as specifics bolts, 
> spouts,  grouping options, queues, etc. So, initially biased towards 
> micro-benchmarking but subsequently we could add higher level ones too.
> Although there is a storm benchmarking tool (originally written by Intel?) 
> that can be used, and i have personally used it, its better for this to be 
> integrated into Storm proper and also maintained by devs as storm evolves. 
> On a side note, in some instances I have noticed (to my surprise) that the 
> perf numbers change when the topologies written for Intel benchmark when 
> rewritten without the required wrappers so that they runs directly under 
> Storm.
> Have a few topologies in mind for measuring each of these:
> # *Queuing and Spout Emit Performance:* A topology with a Generator Spout but 
> no bolts.
> # *Queuing & Grouping performance*:   Generator Spout -> A grouping method -> 
> DevNull Bolt
> # *Hdfs Bolt:*Generator Spout ->  Hdfs Bolt
> # *Hdfs Spout:*   Hdfs Spout ->  DevNull Botl
> # *Kafka Spout:*   Kafka Spout ->  DevNull Bolt 
> # *Simple Data Movement*: Kafka Spout -> Hdfs Bolt
> Shall add these for Storm core first. Then we can have the same for Trident 
> also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (STORM-1772) Create topologies for measuring performance

2016-05-09 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-1772:
---
Description: 
Would be very useful to have some simple reference topologies included with 
Storm that can be used to measure performance both by devs during development 
(to start with) and perhaps also on a real storm cluster (subsequently). 

To start with, the goal is to put the focus on the performance characteristics 
of individual building blocks such as specifics bolts, spouts,  grouping 
options, queues, etc. So, initially biased towards micro-benchmarking but 
subsequently we could add higher level ones too.

Although there is a storm benchmarking tool (originally written by Intel?) that 
can be used, and i have personally used it, its better for this to be 
integrated into Storm proper and also maintained by devs as storm evolves. 

On a side note, in some instances I have noticed (to my surprise) that the perf 
numbers change when the topologies written for Intel benchmark when rewritten 
without the required wrappers so that they runs directly under Storm.

Have a few topologies in mind for measuring each of these:

# *Queuing and Spout Emit Performance:* A topology with a Generator Spout but 
no bolts.
# *Queuing & Grouping performance*:   Generator Spout -> A grouping method -> 
DevNull Bolt
# *Hdfs Bolt:*Generator Spout ->  Hdfs Bolt
# *Hdfs Spout:*   Hdfs Spout ->  DevNull Botl
# *Kafka Spout:*   Kafka Spout ->  DevNull Bolt 
# *Simple Data Movement*: Kafka Spout -> Hdfs Bolt

Shall add these for Storm core first. Then we can have the same for Trident 
also.

  was:
Would be very useful to have some simple reference topologies included with 
Storm that can be used to measure performance that can be used both by devs 
during development (to start with) and perhaps also on a real storm cluster 
(subsequently). 

To start with, the goal is to put the focus on the performance characteristics 
of individual building blocks such as specifics bolts, spouts,  grouping 
options, queues, etc. So, initially biased towards micro-benchmarking but 
subsequently we could add higher level ones too.

Although there is a storm benchmarking tool (originally written by Intel?) that 
can be used, and i have personally used, its better for this to be integrated 
into Storm proper and also maintained by devs as storm evolves. 

On a side note, in some instances I have noticed (to my surprise) that the perf 
numbers change when the topologies written for Intel benchmark when rewritten 
without the required wrappers so that they runs directly under Storm.

Have a few topologies in mind for measuring each of these:

# *Queuing and Spout Emit Performance:* A topology with a Generator Spout but 
no bolts.
# *Queuing & Grouping performance*:   Generator Spout -> A grouping method -> 
DevNull Bolt
# *Hdfs Bolt:*Generator Spout ->  Hdfs Bolt
# *Hdfs Spout:*   Hdfs Spout ->  DevNull Botl
# *Kafka Spout:*   Kafka Spout ->  DevNull Bolt 
# *Simple Data Movement*: Kafka Spout -> Hdfs Bolt

Shall add these for Storm core first. Then we can have the same for Trident 
also.


> Create topologies for measuring performance
> ---
>
> Key: STORM-1772
> URL: https://issues.apache.org/jira/browse/STORM-1772
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Roshan Naik
>
> Would be very useful to have some simple reference topologies included with 
> Storm that can be used to measure performance both by devs during development 
> (to start with) and perhaps also on a real storm cluster (subsequently). 
> To start with, the goal is to put the focus on the performance 
> characteristics of individual building blocks such as specifics bolts, 
> spouts,  grouping options, queues, etc. So, initially biased towards 
> micro-benchmarking but subsequently we could add higher level ones too.
> Although there is a storm benchmarking tool (originally written by Intel?) 
> that can be used, and i have personally used it, its better for this to be 
> integrated into Storm proper and also maintained by devs as storm evolves. 
> On a side note, in some instances I have noticed (to my surprise) that the 
> perf numbers change when the topologies written for Intel benchmark when 
> rewritten without the required wrappers so that they runs directly under 
> Storm.
> Have a few topologies in mind for measuring each of these:
> # *Queuing and Spout Emit Performance:* A topology with a Generator Spout but 
> no bolts.
> # *Queuing & Grouping performance*:   Generator Spout -> A grouping method -> 
> DevNull Bolt
> # *Hdfs Bolt:*Generator Spout ->  Hdfs Bolt
> # *Hdfs Spout:*   Hdfs Spout ->  DevNu

[jira] [Created] (STORM-1772) Create topologies for measuring performance

2016-05-09 Thread Roshan Naik (JIRA)

Roshan Naik created STORM-1772:
--

 Summary: Create topologies for measuring performance
 Key: STORM-1772
 URL: https://issues.apache.org/jira/browse/STORM-1772
 Project: Apache Storm
  Issue Type: Bug
Reporter: Roshan Naik


Would be very useful to have some simple reference topologies included with 
Storm that can be used to measure performance that can be used both by devs 
during development (to start with) and perhaps also on a real storm cluster 
(subsequently). 

To start with, the goal is to put the focus on the performance characteristics 
of individual building blocks such as specifics bolts, spouts,  grouping 
options, queues, etc. So, initially biased towards micro-benchmarking but 
subsequently we could add higher level ones too.

Although there is a storm benchmarking tool (originally written by Intel?) that 
can be used, and i have personally used, its better for this to be integrated 
into Storm proper and also maintained by devs as storm evolves. 

On a side note, in some instances I have noticed (to my surprise) that the perf 
numbers change when the topologies written for Intel benchmark when rewritten 
without the required wrappers so that they runs directly under Storm.

Have a few topologies in mind for measuring each of these:

# *Queuing and Spout Emit Performance:* A topology with a Generator Spout but 
no bolts.
# *Queuing & Grouping performance*:   Generator Spout -> A grouping method -> 
DevNull Bolt
# *Hdfs Bolt:*Generator Spout ->  Hdfs Bolt
# *Hdfs Spout:*   Hdfs Spout ->  DevNull Botl
# *Kafka Spout:*   Kafka Spout ->  DevNull Bolt 
# *Simple Data Movement*: Kafka Spout -> Hdfs Bolt

Shall add these for Storm core first. Then we can have the same for Trident 
also.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-1910) One topology can't use hdfs spout to read from two locations

2016-07-14 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/STORM-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15378435#comment-15378435
 ] 

Roshan Naik commented on STORM-1910:


[~ptgoetz] should this be marked for 1.0.2 as well ?

> One topology can't use hdfs spout to read from two locations
> 
>
> Key: STORM-1910
> URL: https://issues.apache.org/jira/browse/STORM-1910
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-hdfs
>Affects Versions: 1.0.1
>Reporter: Raghav Kumar Gautam
>Assignee: Roshan Naik
> Fix For: 2.0.0, 1.1.0
>
>
> The hdfs uri is passed using config:
> {code}
> conf.put(Configs.HDFS_URI, hdfsUri);
> {code}
> I see two problems with this approach:
> 1. If someone wants to used two hdfsUri in same or different spouts - then 
> that does not seem feasible.
> https://github.com/apache/storm/blob/d17b3b9c3cbc89d854bfb436d213d11cfd4545ec/examples/storm-starter/src/jvm/storm/starter/HdfsSpoutTopology.java#L117-L117
> https://github.com/apache/storm/blob/d17b3b9c3cbc89d854bfb436d213d11cfd4545ec/external/storm-hdfs/src/main/java/org/apache/storm/hdfs/spout/HdfsSpout.java#L331-L331
> {code}
> if ( !conf.containsKey(Configs.SOURCE_DIR) ) {
>   LOG.error(Configs.SOURCE_DIR + " setting is required");
>   throw new RuntimeException(Configs.SOURCE_DIR + " setting is required");
> }
> this.sourceDirPath = new Path( conf.get(Configs.SOURCE_DIR).toString() );
> {code}
> 2. It does not fail fast i.e. at the time of topology submissing. We can fail 
> fast if the hdfs path is invalid or credentials/permissions are not ok. Such 
> errors at this time can only be detected at runtime by looking at the worker 
> logs.
> https://github.com/apache/storm/blob/d17b3b9c3cbc89d854bfb436d213d11cfd4545ec/external/storm-hdfs/src/main/java/org/apache/storm/hdfs/spout/HdfsSpout.java#L297-L297



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (STORM-1949) Storm backpressure can cause spout to stop emitting and stall topology

2016-07-06 Thread Roshan Naik (JIRA)

Roshan Naik created STORM-1949:
--

Summary: Storm backpressure can cause spout to stop emitting and
stall topology
Key: STORM-1949
URL: https://issues.apache.org/jira/browse/STORM-1949
Project: Apache Storm
Issue Type: Bug
Reporter: Roshan Naik

Problem can be reproduced by this [Word count
topology|https://github.com/hortonworks/storm/blob/perftopos1.x/examples/storm-starter/src/jvm/org/apache/storm/starter/perf/FileReadWordCountTopo.java]
within a IDE.
I ran it with 1 spout instance, 2 splitter bolt instances, 2 counter bolt
instances.

The problem is more easily reproduced with WC topology as it causes an
explosion of tuples due to splitting a sentence tuple into word tuples. As the
bolts have to process more tuples than the spout is producing, spout needs to
operate slower.

The amount of time it takes for the topology to stall can vary.. but typically
under 10 mins.

*My theory:* I suspect there is a race condition in the way ZK is being
utilized to enable/disable back pressure. When congested (i.e pressure exceeds
high water mark), the bolt's worker records this congested situation in ZK by
creating a node. Once the congestion is reduced below the low water mark, it
deletes this node.
The spout's worker has setup a watch on the parent node, expecting a callback
whenever there is change in the child nodes. On receiving the callback the
spout's worker lists the parent node to check if there are 0 or more child
nodes it is essentially trying to figure out the nature of state change in
ZK to determine whether to throttle or not. Subsequently it setsup another
watch in ZK to keep an eye on future changes.

When there are multiple bolts, there can be rapid creation/deletion of these ZK
nodes. Between the time the worker receives a callback and sets up the next
watch.. many changes may have undergone in ZK which will go unnoticed by the
spout.

The condition that the bolts are no longer congested may not get noticed as a
result of this.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (STORM-1949) Backpressure can cause spout to stop emitting and stall topology

2016-07-06 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-1949:
---
Summary: Backpressure can cause spout to stop emitting and stall topology  
(was: Storm backpressure can cause spout to stop emitting and stall topology)

> Backpressure can cause spout to stop emitting and stall topology
> 
>
> Key: STORM-1949
> URL: https://issues.apache.org/jira/browse/STORM-1949
> Project: Apache Storm
>  Issue Type: Bug
>    Reporter: Roshan Naik
>
> Problem can be reproduced by this [Word count 
> topology|https://github.com/hortonworks/storm/blob/perftopos1.x/examples/storm-starter/src/jvm/org/apache/storm/starter/perf/FileReadWordCountTopo.java]
> within a IDE.
> I ran it with 1 spout instance, 2 splitter bolt instances, 2 counter bolt 
> instances.
> The problem is more easily reproduced with WC topology as it causes an 
> explosion of tuples due to splitting a sentence tuple into word tuples. As 
> the bolts have to process more tuples than the  spout is producing, spout 
> needs to operate slower.
> The amount of time it takes for the topology to stall can vary.. but 
> typically under 10 mins. 
> *My theory:*  I suspect there is a race condition in the way ZK is being 
> utilized to enable/disable back pressure. When congested (i.e pressure 
> exceeds high water mark), the bolt's worker records this congested situation 
> in ZK by creating a node. Once the congestion is reduced below the low water 
> mark, it deletes this node. 
> The spout's worker has setup a watch on the parent node, expecting a callback 
> whenever there is change in the child nodes. On receiving the callback the 
> spout's worker lists the parent node to check if there are 0 or more child 
> nodes it is essentially trying to figure out the nature of state change 
> in ZK to determine whether to throttle or not. Subsequently  it setsup 
> another watch in ZK to keep an eye on future changes.
> When there are multiple bolts, there can be rapid creation/deletion of these 
> ZK nodes. Between the time the worker receives a callback and sets up the 
> next watch.. many changes may have undergone in ZK which will go unnoticed by 
> the spout. 
> The condition that the bolts are no longer congested may not get noticed as a 
> result of this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (STORM-1956) Disable Backpressure by default

2016-07-08 Thread Roshan Naik (JIRA)

Roshan Naik created STORM-1956:
--

 Summary: Disable Backpressure by default
 Key: STORM-1956
 URL: https://issues.apache.org/jira/browse/STORM-1956
 Project: Apache Storm
  Issue Type: Bug
  Components: storm-core
Affects Versions: 1.0.0, 1.0.1
Reporter: Roshan Naik
Assignee: Roshan Naik


Some of the context on this is captured in STORM-1949 
In short.. wait for BP mechanism to mature some more and be production ready 
before we enable by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (STORM-1949) Backpressure can cause spout to stop emitting and stall topology

2016-07-08 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/STORM-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368451#comment-15368451
 ] 

Roshan Naik edited comment on STORM-1949 at 7/8/16 8:53 PM:


[~revans2] Not sure what you mean by "back on the write" .. are u saying have 
background thread that simply polls ZK every so often ?  That might fix this 
issue.

However, there is one basic issue with this BP mechanism in general. It can put 
too much load on ZK. For each enable/disable throttle signal raised by any 
worker we have all this interaction going on with ZK..

- Some worker adds/deletes ZK node 
- ZK issues callbacks to all workers with watches setup
- All those workers will list the parent node in ZK to count the number of 
children (expensive?)
- All those workers will setup another watch in ZK
 
Given that PaceMaker was introduced to take load off of ZK... this approach 
feels like a regression in terms of ability to scale. There are some other 
issues as well but thats for later.

After reviewing BP, I done feel it is sufficiently mature to be considered 
stable and ready for production. 

IMO Until we have a more solid BP mechanism we should disable it by default as 
soon as possible. I can open another jira for that.


was (Author: roshan_naik):
[~revans2] Not sure what you mean by "back on the write" .. are u saying have 
background thread that simply polls ZK every so often ?  That might fix this 
issue.

However, there is one basic issue with this BP mechanism in general. Its can 
put too much load on ZK. For each enable/disable throttle signal raised by any 
worker we have all this interaction going on with ZK..

- Some worker adds/deletes ZK node 
- ZK issues callbacks to all workers with watches setup
- All those workers will list the parent node in ZK to count the number of 
children (expensive?)
- All those workers will setup another watch in ZK
 
Given that PaceMaker was introduced to take load off of ZK... this approach 
feels like a regression. There are some other issues as well but thats for 
later.

After reviewing BP, I feel it is not mature enough to be considered stable and 
ready for production. 

IMO Until we have a more solid BP mechanism we should disable it by default as 
soon as possible. I can open another jira for that.

> Backpressure can cause spout to stop emitting and stall topology
> 
>
> Key: STORM-1949
> URL: https://issues.apache.org/jira/browse/STORM-1949
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Roshan Naik
>
> Problem can be reproduced by this [Word count 
> topology|https://github.com/hortonworks/storm/blob/perftopos1.x/examples/storm-starter/src/jvm/org/apache/storm/starter/perf/FileReadWordCountTopo.java]
> within a IDE.
> I ran it with 1 spout instance, 2 splitter bolt instances, 2 counter bolt 
> instances.
> The problem is more easily reproduced with WC topology as it causes an 
> explosion of tuples due to splitting a sentence tuple into word tuples. As 
> the bolts have to process more tuples than the  spout is producing, spout 
> needs to operate slower.
> The amount of time it takes for the topology to stall can vary.. but 
> typically under 10 mins. 
> *My theory:*  I suspect there is a race condition in the way ZK is being 
> utilized to enable/disable back pressure. When congested (i.e pressure 
> exceeds high water mark), the bolt's worker records this congested situation 
> in ZK by creating a node. Once the congestion is reduced below the low water 
> mark, it deletes this node. 
> The spout's worker has setup a watch on the parent node, expecting a callback 
> whenever there is change in the child nodes. On receiving the callback the 
> spout's worker lists the parent node to check if there are 0 or more child 
> nodes it is essentially trying to figure out the nature of state change 
> in ZK to determine whether to throttle or not. Subsequently  it setsup 
> another watch in ZK to keep an eye on future changes.
> When there are multiple bolts, there can be rapid creation/deletion of these 
> ZK nodes. Between the time the worker receives a callback and sets up the 
> next watch.. many changes may have undergone in ZK which will go unnoticed by 
> the spout. 
> The condition that the bolts are no longer congested may not get noticed as a 
> result of this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (STORM-1949) Backpressure can cause spout to stop emitting and stall topology

2016-07-08 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/STORM-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368451#comment-15368451
 ] 

Roshan Naik edited comment on STORM-1949 at 7/8/16 8:50 PM:


[~revans2] Not sure what you mean by "back on the write" .. are u saying have 
background thread that simply polls ZK every so often ?  That might fix this 
issue.

However, there is one basic issue with this BP mechanism in general. Its can 
put too much load on ZK. For each enable/disable throttle signal raised by any 
worker we have all this interaction going on with ZK..

- Some worker adds/deletes ZK node 
- ZK issues callbacks to all workers with watches setup
- All those workers will list the parent node in ZK to count the number of 
children (expensive?)
- All those workers will setup another watch in ZK
 
Given that PaceMaker was introduced to take load off of ZK... this approach 
feels like a regression. There are some other issues as well but thats for 
later.

After reviewing BP, I feel it is not mature enough to be considered stable and 
ready for production. 

IMO Until we have a more solid BP mechanism we should disable it by default as 
soon as possible. I can open another jira for that.


was (Author: roshan_naik):
[~revans2] Not sure what you mean by "back on the write" .. are u saying have 
background thread that simply polls ZK every so often ?  That might fix this 
issue.

However, there is one basic issue with this BP mechanism in general. Its can 
put too much load on ZK. For each enable/disable throttle signal raised by any 
worker we have all this interaction going on with ZK..

- Some worker adds/deletes ZK node 
- ZK issues callbacks to all workers with watches setup
- All those workers will list the parent node in ZK to count the number of 
children (expensive?)
- All those workers will setup another watch in ZK
 
Given that PaceMaker was introduced to take load off of ZK... this approach 
feels like a regression. There are some other issues as well but thats for a 
different JIRA.

After reviewing BP, I feel it is not mature enough to be considered stable and 
ready for production. 

IMO Until we have a more solid BP mechanism we should disable it by default as 
soon as possible. I can open another jira for that.

> Backpressure can cause spout to stop emitting and stall topology
> 
>
> Key: STORM-1949
> URL: https://issues.apache.org/jira/browse/STORM-1949
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Roshan Naik
>
> Problem can be reproduced by this [Word count 
> topology|https://github.com/hortonworks/storm/blob/perftopos1.x/examples/storm-starter/src/jvm/org/apache/storm/starter/perf/FileReadWordCountTopo.java]
> within a IDE.
> I ran it with 1 spout instance, 2 splitter bolt instances, 2 counter bolt 
> instances.
> The problem is more easily reproduced with WC topology as it causes an 
> explosion of tuples due to splitting a sentence tuple into word tuples. As 
> the bolts have to process more tuples than the  spout is producing, spout 
> needs to operate slower.
> The amount of time it takes for the topology to stall can vary.. but 
> typically under 10 mins. 
> *My theory:*  I suspect there is a race condition in the way ZK is being 
> utilized to enable/disable back pressure. When congested (i.e pressure 
> exceeds high water mark), the bolt's worker records this congested situation 
> in ZK by creating a node. Once the congestion is reduced below the low water 
> mark, it deletes this node. 
> The spout's worker has setup a watch on the parent node, expecting a callback 
> whenever there is change in the child nodes. On receiving the callback the 
> spout's worker lists the parent node to check if there are 0 or more child 
> nodes it is essentially trying to figure out the nature of state change 
> in ZK to determine whether to throttle or not. Subsequently  it setsup 
> another watch in ZK to keep an eye on future changes.
> When there are multiple bolts, there can be rapid creation/deletion of these 
> ZK nodes. Between the time the worker receives a callback and sets up the 
> next watch.. many changes may have undergone in ZK which will go unnoticed by 
> the spout. 
> The condition that the bolts are no longer congested may not get noticed as a 
> result of this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-1949) Backpressure can cause spout to stop emitting and stall topology

2016-07-08 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/STORM-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368451#comment-15368451
 ] 

Roshan Naik commented on STORM-1949:


[~revans2] Not sure what you mean by "back on the write" .. are u saying have 
background thread that simply polls ZK every so often ?  That might fix this 
issue.

However, there is one basic issue with this BP mechanism in general. Its can 
put too much load on ZK. For each enable/disable throttle signal raised by any 
worker we have all this interaction going on with ZK..

- Some worker adds/deletes ZK node 
- ZK issues callbacks to all workers with watches setup
- All those workers will list the parent node in ZK to count the number of 
children (expensive?)
- All those workers will setup another watch in ZK
 
Given that PaceMaker was introduced to take load off of ZK... this approach 
feels like a regression. There are some other issues as well but thats for a 
different JIRA.

After reviewing BP, I feel it is not mature enough to be considered stable and 
ready for production. 

IMO Until we have a more solid BP mechanism we should disable it by default as 
soon as possible. I can open another jira for that.

> Backpressure can cause spout to stop emitting and stall topology
> 
>
> Key: STORM-1949
> URL: https://issues.apache.org/jira/browse/STORM-1949
> Project: Apache Storm
>  Issue Type: Bug
>Reporter: Roshan Naik
>
> Problem can be reproduced by this [Word count 
> topology|https://github.com/hortonworks/storm/blob/perftopos1.x/examples/storm-starter/src/jvm/org/apache/storm/starter/perf/FileReadWordCountTopo.java]
> within a IDE.
> I ran it with 1 spout instance, 2 splitter bolt instances, 2 counter bolt 
> instances.
> The problem is more easily reproduced with WC topology as it causes an 
> explosion of tuples due to splitting a sentence tuple into word tuples. As 
> the bolts have to process more tuples than the  spout is producing, spout 
> needs to operate slower.
> The amount of time it takes for the topology to stall can vary.. but 
> typically under 10 mins. 
> *My theory:*  I suspect there is a race condition in the way ZK is being 
> utilized to enable/disable back pressure. When congested (i.e pressure 
> exceeds high water mark), the bolt's worker records this congested situation 
> in ZK by creating a node. Once the congestion is reduced below the low water 
> mark, it deletes this node. 
> The spout's worker has setup a watch on the parent node, expecting a callback 
> whenever there is change in the child nodes. On receiving the callback the 
> spout's worker lists the parent node to check if there are 0 or more child 
> nodes it is essentially trying to figure out the nature of state change 
> in ZK to determine whether to throttle or not. Subsequently  it setsup 
> another watch in ZK to keep an eye on future changes.
> When there are multiple bolts, there can be rapid creation/deletion of these 
> ZK nodes. Between the time the worker receives a callback and sets up the 
> next watch.. many changes may have undergone in ZK which will go unnoticed by 
> the spout. 
> The condition that the bolts are no longer congested may not get noticed as a 
> result of this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (STORM-1956) Disable Backpressure by default

2016-07-08 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-1956:
---
Priority: Blocker  (was: Major)

> Disable Backpressure by default
> ---
>
> Key: STORM-1956
> URL: https://issues.apache.org/jira/browse/STORM-1956
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>Affects Versions: 1.0.0, 1.0.1
>    Reporter: Roshan Naik
>Assignee: Roshan Naik
>Priority: Blocker
> Fix For: 1.0.2
>
>
> Some of the context on this is captured in STORM-1949 
> In short.. wait for BP mechanism to mature some more and be production ready 
> before we enable by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (STORM-1956) Disable Backpressure by default

2016-07-08 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-1956:
---
Fix Version/s: 1.0.2

> Disable Backpressure by default
> ---
>
> Key: STORM-1956
> URL: https://issues.apache.org/jira/browse/STORM-1956
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>Affects Versions: 1.0.0, 1.0.1
>    Reporter: Roshan Naik
>Assignee: Roshan Naik
>Priority: Blocker
> Fix For: 1.0.2
>
>
> Some of the context on this is captured in STORM-1949 
> In short.. wait for BP mechanism to mature some more and be production ready 
> before we enable by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-1949) Backpressure can cause spout to stop emitting and stall topology

2016-07-08 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/STORM-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15368640#comment-15368640
 ] 

Roshan Naik commented on STORM-1949:



Have not worked out a concrete solution to avoiding ZK as yet. But 
[~sriharsha]'s  line of thinking is interesting ... basically see if we can use 
the internal messaging system as opposed to messaging over ZK. 

Opened STORM-1956 for disabling BP by default. 

> Backpressure can cause spout to stop emitting and stall topology
> 
>
> Key: STORM-1949
> URL: https://issues.apache.org/jira/browse/STORM-1949
> Project: Apache Storm
>  Issue Type: Bug
>    Reporter: Roshan Naik
>
> Problem can be reproduced by this [Word count 
> topology|https://github.com/hortonworks/storm/blob/perftopos1.x/examples/storm-starter/src/jvm/org/apache/storm/starter/perf/FileReadWordCountTopo.java]
> within a IDE.
> I ran it with 1 spout instance, 2 splitter bolt instances, 2 counter bolt 
> instances.
> The problem is more easily reproduced with WC topology as it causes an 
> explosion of tuples due to splitting a sentence tuple into word tuples. As 
> the bolts have to process more tuples than the  spout is producing, spout 
> needs to operate slower.
> The amount of time it takes for the topology to stall can vary.. but 
> typically under 10 mins. 
> *My theory:*  I suspect there is a race condition in the way ZK is being 
> utilized to enable/disable back pressure. When congested (i.e pressure 
> exceeds high water mark), the bolt's worker records this congested situation 
> in ZK by creating a node. Once the congestion is reduced below the low water 
> mark, it deletes this node. 
> The spout's worker has setup a watch on the parent node, expecting a callback 
> whenever there is change in the child nodes. On receiving the callback the 
> spout's worker lists the parent node to check if there are 0 or more child 
> nodes it is essentially trying to figure out the nature of state change 
> in ZK to determine whether to throttle or not. Subsequently  it setsup 
> another watch in ZK to keep an eye on future changes.
> When there are multiple bolts, there can be rapid creation/deletion of these 
> ZK nodes. Between the time the worker receives a callback and sets up the 
> next watch.. many changes may have undergone in ZK which will go unnoticed by 
> the spout. 
> The condition that the bolts are no longer congested may not get noticed as a 
> result of this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-1910) One topology can't use hdfs spout to read from two locations

2016-07-07 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/STORM-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15367096#comment-15367096
 ] 

Roshan Naik commented on STORM-1910:


WRT Pt# 2 in the description, we cannot check for valid HDFS path on the client 
side as it cannot be assumed that HDFS is configure and available on the host 
from where the topology is being submitted.



> One topology can't use hdfs spout to read from two locations
> 
>
> Key: STORM-1910
> URL: https://issues.apache.org/jira/browse/STORM-1910
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-hdfs
>Affects Versions: 1.0.1
>Reporter: Raghav Kumar Gautam
>Assignee: Roshan Naik
> Fix For: 1.1.0
>
>
> The hdfs uri is passed using config:
> {code}
> conf.put(Configs.HDFS_URI, hdfsUri);
> {code}
> I see two problems with this approach:
> 1. If someone wants to used two hdfsUri in same or different spouts - then 
> that does not seem feasible.
> https://github.com/apache/storm/blob/d17b3b9c3cbc89d854bfb436d213d11cfd4545ec/examples/storm-starter/src/jvm/storm/starter/HdfsSpoutTopology.java#L117-L117
> https://github.com/apache/storm/blob/d17b3b9c3cbc89d854bfb436d213d11cfd4545ec/external/storm-hdfs/src/main/java/org/apache/storm/hdfs/spout/HdfsSpout.java#L331-L331
> {code}
> if ( !conf.containsKey(Configs.SOURCE_DIR) ) {
>   LOG.error(Configs.SOURCE_DIR + " setting is required");
>   throw new RuntimeException(Configs.SOURCE_DIR + " setting is required");
> }
> this.sourceDirPath = new Path( conf.get(Configs.SOURCE_DIR).toString() );
> {code}
> 2. It does not fail fast i.e. at the time of topology submissing. We can fail 
> fast if the hdfs path is invalid or credentials/permissions are not ok. Such 
> errors at this time can only be detected at runtime by looking at the worker 
> logs.
> https://github.com/apache/storm/blob/d17b3b9c3cbc89d854bfb436d213d11cfd4545ec/external/storm-hdfs/src/main/java/org/apache/storm/hdfs/spout/HdfsSpout.java#L297-L297



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-1949) Backpressure can cause spout to stop emitting and stall topology

2016-08-16 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/STORM-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423916#comment-15423916
 ] 

Roshan Naik commented on STORM-1949:


With BP disabled the topo ran fine. Dont think saw any NPE during my runs.

> Backpressure can cause spout to stop emitting and stall topology
> 
>
> Key: STORM-1949
> URL: https://issues.apache.org/jira/browse/STORM-1949
> Project: Apache Storm
>  Issue Type: Bug
>    Reporter: Roshan Naik
> Attachments: 1.x-branch-works-perfect.png
>
>
> Problem can be reproduced by this [Word count 
> topology|https://github.com/hortonworks/storm/blob/perftopos1.x/examples/storm-starter/src/jvm/org/apache/storm/starter/perf/FileReadWordCountTopo.java]
> within a IDE.
> I ran it with 1 spout instance, 2 splitter bolt instances, 2 counter bolt 
> instances.
> The problem is more easily reproduced with WC topology as it causes an 
> explosion of tuples due to splitting a sentence tuple into word tuples. As 
> the bolts have to process more tuples than the  spout is producing, spout 
> needs to operate slower.
> The amount of time it takes for the topology to stall can vary.. but 
> typically under 10 mins. 
> *My theory:*  I suspect there is a race condition in the way ZK is being 
> utilized to enable/disable back pressure. When congested (i.e pressure 
> exceeds high water mark), the bolt's worker records this congested situation 
> in ZK by creating a node. Once the congestion is reduced below the low water 
> mark, it deletes this node. 
> The spout's worker has setup a watch on the parent node, expecting a callback 
> whenever there is change in the child nodes. On receiving the callback the 
> spout's worker lists the parent node to check if there are 0 or more child 
> nodes it is essentially trying to figure out the nature of state change 
> in ZK to determine whether to throttle or not. Subsequently  it setsup 
> another watch in ZK to keep an eye on future changes.
> When there are multiple bolts, there can be rapid creation/deletion of these 
> ZK nodes. Between the time the worker receives a callback and sets up the 
> next watch.. many changes may have undergone in ZK which will go unnoticed by 
> the spout. 
> The condition that the bolts are no longer congested may not get noticed as a 
> result of this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (STORM-1949) Backpressure can cause spout to stop emitting and stall topology

2016-08-23 Thread Roshan Naik (JIRA)


 [ 
https://issues.apache.org/jira/browse/STORM-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roshan Naik updated STORM-1949:
---
Attachment: wordcounttopo.zip

Attaching the wordcount topo that i used.

> Backpressure can cause spout to stop emitting and stall topology
> 
>
> Key: STORM-1949
> URL: https://issues.apache.org/jira/browse/STORM-1949
> Project: Apache Storm
>  Issue Type: Bug
>    Reporter: Roshan Naik
>Assignee: Alessandro Bellina
> Attachments: 1.x-branch-works-perfect.png, wordcounttopo.zip
>
>
> Problem can be reproduced by this [Word count 
> topology|https://github.com/hortonworks/storm/blob/perftopos1.x/examples/storm-starter/src/jvm/org/apache/storm/starter/perf/FileReadWordCountTopo.java]
> within a IDE.
> I ran it with 1 spout instance, 2 splitter bolt instances, 2 counter bolt 
> instances.
> The problem is more easily reproduced with WC topology as it causes an 
> explosion of tuples due to splitting a sentence tuple into word tuples. As 
> the bolts have to process more tuples than the  spout is producing, spout 
> needs to operate slower.
> The amount of time it takes for the topology to stall can vary.. but 
> typically under 10 mins. 
> *My theory:*  I suspect there is a race condition in the way ZK is being 
> utilized to enable/disable back pressure. When congested (i.e pressure 
> exceeds high water mark), the bolt's worker records this congested situation 
> in ZK by creating a node. Once the congestion is reduced below the low water 
> mark, it deletes this node. 
> The spout's worker has setup a watch on the parent node, expecting a callback 
> whenever there is change in the child nodes. On receiving the callback the 
> spout's worker lists the parent node to check if there are 0 or more child 
> nodes it is essentially trying to figure out the nature of state change 
> in ZK to determine whether to throttle or not. Subsequently  it setsup 
> another watch in ZK to keep an eye on future changes.
> When there are multiple bolts, there can be rapid creation/deletion of these 
> ZK nodes. Between the time the worker receives a callback and sets up the 
> next watch.. many changes may have undergone in ZK which will go unnoticed by 
> the spout. 
> The condition that the bolts are no longer congested may not get noticed as a 
> result of this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-1949) Backpressure can cause spout to stop emitting and stall topology

2016-08-23 Thread Roshan Naik (JIRA)


[ 
https://issues.apache.org/jira/browse/STORM-1949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15433795#comment-15433795
 ] 

Roshan Naik commented on STORM-1949:


The amount of  *additional* pressure this BP mechanism adds to ZK in it current 
state really should be sufficient reason to leave it disabled by default. If we 
fix the problem I noted in the description, as per Bobby's suggestion, that 
would put even more pressure on ZK. Putting such pressure on ZK (or Nimbus) 
from any subsystem in Storm is essentially a regression in terms of scaling 
ability, which then begets future fixes (PaceMaker for instance)

> Backpressure can cause spout to stop emitting and stall topology
> 
>
> Key: STORM-1949
> URL: https://issues.apache.org/jira/browse/STORM-1949
> Project: Apache Storm
>  Issue Type: Bug
>    Reporter: Roshan Naik
>Assignee: Alessandro Bellina
> Attachments: 1.x-branch-works-perfect.png, wordcounttopo.zip
>
>
> Problem can be reproduced by this [Word count 
> topology|https://github.com/hortonworks/storm/blob/perftopos1.x/examples/storm-starter/src/jvm/org/apache/storm/starter/perf/FileReadWordCountTopo.java]
> within a IDE.
> I ran it with 1 spout instance, 2 splitter bolt instances, 2 counter bolt 
> instances.
> The problem is more easily reproduced with WC topology as it causes an 
> explosion of tuples due to splitting a sentence tuple into word tuples. As 
> the bolts have to process more tuples than the  spout is producing, spout 
> needs to operate slower.
> The amount of time it takes for the topology to stall can vary.. but 
> typically under 10 mins. 
> *My theory:*  I suspect there is a race condition in the way ZK is being 
> utilized to enable/disable back pressure. When congested (i.e pressure 
> exceeds high water mark), the bolt's worker records this congested situation 
> in ZK by creating a node. Once the congestion is reduced below the low water 
> mark, it deletes this node. 
> The spout's worker has setup a watch on the parent node, expecting a callback 
> whenever there is change in the child nodes. On receiving the callback the 
> spout's worker lists the parent node to check if there are 0 or more child 
> nodes it is essentially trying to figure out the nature of state change 
> in ZK to determine whether to throttle or not. Subsequently  it setsup 
> another watch in ZK to keep an eye on future changes.
> When there are multiple bolts, there can be rapid creation/deletion of these 
> ZK nodes. Between the time the worker receives a callback and sets up the 
> next watch.. many changes may have undergone in ZK which will go unnoticed by 
> the spout. 
> The condition that the bolts are no longer congested may not get noticed as a 
> result of this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

< 1 2

101 - 143 of 143 matches

Mail list logo