[jira] [Updated] (STORM-1351) Storm spouts and bolts need a way to communicate problems back to toplogy runner

Roshan Naik (JIRA) Tue, 24 Nov 2015 15:11:54 -0800

     [ 
https://issues.apache.org/jira/browse/STORM-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Roshan Naik updated STORM-1351:
-------------------------------
    Description: 
A spout can be having a problem generating a  tuple in nextTuple()  because 
 -a) there is no data to generate currently 
 - b) there is some I/O  issues it is experiencing

If the spout returns immediately from the nextTuple() call then the nextTuple() 
will be invoked immediately leading to CPU spike. The CPU spike would last till 
the situation is remedied by new coming data or the i/o issues getting resolved.

Currently to work around this, the spouts will have to implement a exponential 
backoff mechanism internally. There are two problems with this:
 - each spout needs to implement this backoff logic
 - since nextTuple() has an internal sleep and takes longer to return, the 
latency metrics computation gets thrown off


*Thoughts for Solution:*
The spout should be able to indicate a 'no data',  'experiencing error' or 'all 
good' status back to the caller of nextTuple() so that the right backoff logic 
can kick in.

- The most natural way to do this is using the return type of the nextTuple 
method. Currently nextTuple() returns void.  However, this will break source 
and binary compat since the new storm will not be able to invoke the methods on 
the unmodified spouts. This breaking change can only be considered as an option 
only prior to v1.0. 

- Alternatively this can be done by providing an additional method on the 
collector to indicate the condition to the topology runner. The spout can 
invoke this explicitly. the metrics can then also account for 'no data' and 
'error attempts'

- Alternatively - The toplogy  runner may just examine the collector if there 
was new data generated by the nextTuple() call. In this case it cannot 
distinguish between errors v/s no incoming data. 

  was:
A spout can be having a problem generating a  tuple in nextTuple()  because 
 -a) there is no data to generate currently 
 - b) there is some I/O  issues it is experiencing

If the spout returns immediately from the nextTuple() call then the nextTuple() 
will be invoked immediately leading to CPU spike. The CPU spike would last till 
the situation is remedied by new coming data or the i/o issues getting resolved.

Currently to work around this, the spouts will have to implement a exponential 
backoff mechanism internally. There are two problems with this:
 - each spout needs to implement this backoff logic
 - since nextTuple() has an internal sleep and takes longer to return, the 
latency metrics computation gets thrown off


*Thoughts for Solution:*
The spout should be able to indicate a 'backoff',  'experiencing error' or 'all 
good' status back to the caller of nextTuple() so that the right backoff logic 
can kick in.

- The most natural way to do this is using the return type of the nextTuple 
method. Currently nextTuple() returns void.  However, this will break source 
and binary compat since the new storm will not be able to invoke the methods on 
the unmodified spouts. This breaking change can only be considered as an option 
only prior to v1.0. 

- Alternatively this can be done by providing an additional method on the 
collector to indicate the condition to the topology runner. The spout can 
invoke this explicitly. the metrics can then also account for 'no data' and 
'error attempts'

- Alternatively - The toplogy  runner may just examine the collector if there 
was new data generated by the nextTuple() call. In this case it cannot 
distinguish between errors v/s no incoming data. 


> Storm spouts and bolts need a way to communicate problems back to toplogy 
> runner
> --------------------------------------------------------------------------------
>
>                 Key: STORM-1351
>                 URL: https://issues.apache.org/jira/browse/STORM-1351
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-core
>            Reporter: Roshan Naik
>
> A spout can be having a problem generating a  tuple in nextTuple()  because 
>  -a) there is no data to generate currently 
>  - b) there is some I/O  issues it is experiencing
> If the spout returns immediately from the nextTuple() call then the 
> nextTuple() will be invoked immediately leading to CPU spike. The CPU spike 
> would last till the situation is remedied by new coming data or the i/o 
> issues getting resolved.
> Currently to work around this, the spouts will have to implement a 
> exponential backoff mechanism internally. There are two problems with this:
>  - each spout needs to implement this backoff logic
>  - since nextTuple() has an internal sleep and takes longer to return, the 
> latency metrics computation gets thrown off
> *Thoughts for Solution:*
> The spout should be able to indicate a 'no data',  'experiencing error' or 
> 'all good' status back to the caller of nextTuple() so that the right backoff 
> logic can kick in.
> - The most natural way to do this is using the return type of the nextTuple 
> method. Currently nextTuple() returns void.  However, this will break source 
> and binary compat since the new storm will not be able to invoke the methods 
> on the unmodified spouts. This breaking change can only be considered as an 
> option only prior to v1.0. 
> - Alternatively this can be done by providing an additional method on the 
> collector to indicate the condition to the topology runner. The spout can 
> invoke this explicitly. the metrics can then also account for 'no data' and 
> 'error attempts'
> - Alternatively - The toplogy  runner may just examine the collector if there 
> was new data generated by the nextTuple() call. In this case it cannot 
> distinguish between errors v/s no incoming data. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (STORM-1351) Storm spouts and bolts need a way to communicate problems back to toplogy runner

Reply via email to