[jira] [Commented] (STORM-1560) Topology stops processing after Netty catches/swallows Throwable

2017-10-11 Thread Robert Joseph Evans (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16200404#comment-16200404
 ] 

Robert Joseph Evans commented on STORM-1560:


[~saurav689],

In 1.1.1 The only time that "Giving up to schedule" is thrown, is when the 
netty client is closing.  The only time that it is closing is when the 
scheduling has changed and we no longer have a need to that client, which is 
what you have in your logs with the refresh-connections-timer.

>From the logs it looks like your topology had a worker scheduled to be on 
>192.168.2.195:6702, but that worker never came up for some reason.  You didn't 
>include the logs so I cannot tell.  After some time nimbus rescheduled the 
>worker to be on a different host/port.  At that point you got an exception 
>while we were closing the client. 

The later logs for Netty-server-localhost-6702-worker-1 indicate that a worker 
that was connected to this worker broke the connection.  They are most likely 
not related to the first one.

Did your topology eventually recover?  Did you ever look at the logs for 
192.168.2.195:6702 to try and see why it didn't come up?  In 2.x we have added 
in better logging so hopefully we would be able to see which worker 
disconnected from the server.


> Topology stops processing after Netty catches/swallows Throwable
> 
>
> Key: STORM-1560
> URL: https://issues.apache.org/jira/browse/STORM-1560
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>Affects Versions: 1.0.0
>Reporter: P. Taylor Goetz
> Attachments: fix-lockup.patch
>
>
> In some scenarios, netty connection problems can leave a topology in an 
> unrecoverable state. The likely culprit is the Netty {{HashedWheelTimer}} 
> class that contains the following code:
> {code}
> public void expire() {
> if(this.compareAndSetState(0, 2)) {
> try {
> this.task.run(this);
> } catch (Throwable var2) {
> if(HashedWheelTimer.logger.isWarnEnabled()) {
> HashedWheelTimer.logger.warn("An exception was thrown 
> by " + TimerTask.class.getSimpleName() + '.', var2);
> }
> }
> }
> }
> {code}
> The exception being swallowed can be seen below:
> {code}
> 2016-02-18 08:46:59.116 o.a.s.m.n.Client [INFO] closing Netty Client 
> Netty-Client-/192.168.202.6:6701
> 2016-02-18 08:46:59.173 o.a.s.m.n.Client [INFO] waiting up to 60 ms to 
> send 0 pending messages to Netty-Client-/192.168.202.6:6701
> 2016-02-18 08:46:59.271 STDIO [ERROR] Feb 18, 2016 8:46:59 AM 
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer
> WARNING: An exception was thrown by TimerTask.
> java.lang.RuntimeException: Giving up to scheduleConnect to 
> Netty-Client-/192.168.202.6:6701 after 44 failed attempts. 3 messages were 
> lost
>   at org.apache.storm.messaging.netty.Client$Connect.run(Client.java:573)
>   at 
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:546)
>   at 
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$Worker.notifyExpiredTimeouts(HashedWheelTimer.java:446)
>   at 
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:395)
>   at 
> org.apache.storm.shade.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> The netty client then never recovers, and the follows messages repeat forever:
> {code}
> 2016-02-18 09:42:56.251 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:25.248 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:55.248 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:55.752 o.a.s.m.n.Client [ERROR] discarding 2 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:56.252 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:44:25.249 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (STORM-1560) Topology stops processing after Netty catches/swallows Throwable

2017-10-11 Thread Saurav Suman (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1612#comment-1612
 ] 

Saurav Suman commented on STORM-1560:
-

Hi ,

I also got similar error. I am using storm 1.1.1 . Please advise.

2017-10-10 13:17:47.674 o.a.s.m.n.Client client-boss-1 [ERROR] connection 
attempt 199 to Netty-Client-/192.168.2.195:6702 failed: 
java.net.ConnectException: Connection refused: /192.168.2.195:6702
2017-10-10 13:17:47.674 o.a.s.u.StormBoundedExponentialBackoffRetry 
client-boss-1 [WARN] WILL SLEEP FOR 738ms (MAX)
2017-10-10 13:17:48.474 o.a.s.m.n.Client client-boss-1 [ERROR] connection 
attempt 200 to Netty-Client-/192.168.2.195:6702 failed: 
java.net.ConnectException: Connection refused: /192.168.2.195:6702
2017-10-10 13:17:48.475 o.a.s.u.StormBoundedExponentialBackoffRetry 
client-boss-1 [WARN] WILL SLEEP FOR 741ms (MAX)
2017-10-10 13:17:48.972 o.a.s.m.n.Client refresh-connections-timer [INFO] 
closing Netty Client Netty-Client-/192.168.2.195:6702
2017-10-10 13:17:48.977 o.a.s.m.n.Client refresh-connections-timer [INFO] 
waiting up to 60 ms to send 0 pending messages to 
Netty-Client-/192.168.2.195:6702
2017-10-10 13:17:49.271 STDIO client-schedule-service-1 [ERROR] Oct 10, 2017 
1:17:49 PM org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer
WARNING: An exception was thrown by TimerTask.
java.lang.RuntimeException: Giving up to scheduleConnect to 
Netty-Client-/192.168.2.195:6702 after 200 failed attempts. 0 messages were lost
at org.apache.storm.messaging.netty.Client$Connect.run(Client.java:606)
at 
org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:546)
at 
org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$Worker.notifyExpiredTimeouts(HashedWheelTimer.java:446)
at 
org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:395)
at 
org.apache.storm.shade.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at java.lang.Thread.run(Thread.java:748)
2017-10-10 13:19:55.636 o.a.s.m.n.StormServerHandler 
Netty-server-localhost-6702-worker-1 [ERROR] server errors in handling the 
request
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_131]
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) 
~[?:1.8.0_131]
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) 
~[?:1.8.0_131]
at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[?:1.8.0_131]
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) 
~[?:1.8.0_131]
at 
org.apache.storm.shade.org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
 [storm-core-1.1.1.jar:1.1.1]
at 
org.apache.storm.shade.org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
 [storm-core-1.1.1.jar:1.1.1]
at 
org.apache.storm.shade.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
 [storm-core-1.1.1.jar:1.1.1]
at 
org.apache.storm.shade.org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
 [storm-core-1.1.1.jar:1.1.1]
at 
org.apache.storm.shade.org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
 [storm-core-1.1.1.jar:1.1.1]
at 
org.apache.storm.shade.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
 [storm-core-1.1.1.jar:1.1.1]
at 
org.apache.storm.shade.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
 [storm-core-1.1.1.jar:1.1.1]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[?:1.8.0_131]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[?:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
2017-10-10 13:28:14.369 o.a.s.m.n.StormServerHandler 
Netty-server-localhost-6702-worker-1 [ERROR] server errors in handling the 
request
java.io.IOException: Connection reset by peer

> Topology stops processing after Netty catches/swallows Throwable
> 
>
> Key: STORM-1560
> URL: https://issues.apache.org/jira/browse/STORM-1560
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>Affects Versions: 1.0.0
>Reporter: P. Taylor Goetz
> Attachments: fix-lockup.patch
>
>
> In some scenarios, netty connection problems can leave a topology in an 
> unrecoverable state. The likely culprit is the Netty {{HashedWheelTimer}} 
> class that contains the following code:
> {code}
> public void expire() {
> if(this.compareAndSetState(0, 2)) {
>   

[jira] [Commented] (STORM-1560) Topology stops processing after Netty catches/swallows Throwable

2017-06-21 Thread ryan.jin (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057297#comment-16057297
 ] 

ryan.jin commented on STORM-1560:
-

[~nico.meyer]

:)  I will review the diff between 0.10.x and 1.0.x , thank you

> Topology stops processing after Netty catches/swallows Throwable
> 
>
> Key: STORM-1560
> URL: https://issues.apache.org/jira/browse/STORM-1560
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>Affects Versions: 1.0.0
>Reporter: P. Taylor Goetz
> Attachments: fix-lockup.patch
>
>
> In some scenarios, netty connection problems can leave a topology in an 
> unrecoverable state. The likely culprit is the Netty {{HashedWheelTimer}} 
> class that contains the following code:
> {code}
> public void expire() {
> if(this.compareAndSetState(0, 2)) {
> try {
> this.task.run(this);
> } catch (Throwable var2) {
> if(HashedWheelTimer.logger.isWarnEnabled()) {
> HashedWheelTimer.logger.warn("An exception was thrown 
> by " + TimerTask.class.getSimpleName() + '.', var2);
> }
> }
> }
> }
> {code}
> The exception being swallowed can be seen below:
> {code}
> 2016-02-18 08:46:59.116 o.a.s.m.n.Client [INFO] closing Netty Client 
> Netty-Client-/192.168.202.6:6701
> 2016-02-18 08:46:59.173 o.a.s.m.n.Client [INFO] waiting up to 60 ms to 
> send 0 pending messages to Netty-Client-/192.168.202.6:6701
> 2016-02-18 08:46:59.271 STDIO [ERROR] Feb 18, 2016 8:46:59 AM 
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer
> WARNING: An exception was thrown by TimerTask.
> java.lang.RuntimeException: Giving up to scheduleConnect to 
> Netty-Client-/192.168.202.6:6701 after 44 failed attempts. 3 messages were 
> lost
>   at org.apache.storm.messaging.netty.Client$Connect.run(Client.java:573)
>   at 
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:546)
>   at 
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$Worker.notifyExpiredTimeouts(HashedWheelTimer.java:446)
>   at 
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:395)
>   at 
> org.apache.storm.shade.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> The netty client then never recovers, and the follows messages repeat forever:
> {code}
> 2016-02-18 09:42:56.251 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:25.248 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:55.248 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:55.752 o.a.s.m.n.Client [ERROR] discarding 2 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:56.252 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:44:25.249 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (STORM-1560) Topology stops processing after Netty catches/swallows Throwable

2017-06-21 Thread Nico Meyer (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057224#comment-16057224
 ] 

Nico Meyer commented on STORM-1560:
---

[~sunny.davy]
I would propose upgrading to 1.0.3 directly. It is a little bit of work to fix 
all the namespaces, and some other minor things. But there are a lot of other 
improvements in the 1.x version that warrant the time spent in my opinion.

> Topology stops processing after Netty catches/swallows Throwable
> 
>
> Key: STORM-1560
> URL: https://issues.apache.org/jira/browse/STORM-1560
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>Affects Versions: 1.0.0
>Reporter: P. Taylor Goetz
> Attachments: fix-lockup.patch
>
>
> In some scenarios, netty connection problems can leave a topology in an 
> unrecoverable state. The likely culprit is the Netty {{HashedWheelTimer}} 
> class that contains the following code:
> {code}
> public void expire() {
> if(this.compareAndSetState(0, 2)) {
> try {
> this.task.run(this);
> } catch (Throwable var2) {
> if(HashedWheelTimer.logger.isWarnEnabled()) {
> HashedWheelTimer.logger.warn("An exception was thrown 
> by " + TimerTask.class.getSimpleName() + '.', var2);
> }
> }
> }
> }
> {code}
> The exception being swallowed can be seen below:
> {code}
> 2016-02-18 08:46:59.116 o.a.s.m.n.Client [INFO] closing Netty Client 
> Netty-Client-/192.168.202.6:6701
> 2016-02-18 08:46:59.173 o.a.s.m.n.Client [INFO] waiting up to 60 ms to 
> send 0 pending messages to Netty-Client-/192.168.202.6:6701
> 2016-02-18 08:46:59.271 STDIO [ERROR] Feb 18, 2016 8:46:59 AM 
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer
> WARNING: An exception was thrown by TimerTask.
> java.lang.RuntimeException: Giving up to scheduleConnect to 
> Netty-Client-/192.168.202.6:6701 after 44 failed attempts. 3 messages were 
> lost
>   at org.apache.storm.messaging.netty.Client$Connect.run(Client.java:573)
>   at 
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:546)
>   at 
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$Worker.notifyExpiredTimeouts(HashedWheelTimer.java:446)
>   at 
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:395)
>   at 
> org.apache.storm.shade.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> The netty client then never recovers, and the follows messages repeat forever:
> {code}
> 2016-02-18 09:42:56.251 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:25.248 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:55.248 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:55.752 o.a.s.m.n.Client [ERROR] discarding 2 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:56.252 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:44:25.249 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (STORM-1560) Topology stops processing after Netty catches/swallows Throwable

2017-06-21 Thread ryan.jin (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057220#comment-16057220
 ] 

ryan.jin commented on STORM-1560:
-

[~nico.meyer]

Okey, maybe I will upgrade to 0.10.2 firstly. :)

> Topology stops processing after Netty catches/swallows Throwable
> 
>
> Key: STORM-1560
> URL: https://issues.apache.org/jira/browse/STORM-1560
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>Affects Versions: 1.0.0
>Reporter: P. Taylor Goetz
> Attachments: fix-lockup.patch
>
>
> In some scenarios, netty connection problems can leave a topology in an 
> unrecoverable state. The likely culprit is the Netty {{HashedWheelTimer}} 
> class that contains the following code:
> {code}
> public void expire() {
> if(this.compareAndSetState(0, 2)) {
> try {
> this.task.run(this);
> } catch (Throwable var2) {
> if(HashedWheelTimer.logger.isWarnEnabled()) {
> HashedWheelTimer.logger.warn("An exception was thrown 
> by " + TimerTask.class.getSimpleName() + '.', var2);
> }
> }
> }
> }
> {code}
> The exception being swallowed can be seen below:
> {code}
> 2016-02-18 08:46:59.116 o.a.s.m.n.Client [INFO] closing Netty Client 
> Netty-Client-/192.168.202.6:6701
> 2016-02-18 08:46:59.173 o.a.s.m.n.Client [INFO] waiting up to 60 ms to 
> send 0 pending messages to Netty-Client-/192.168.202.6:6701
> 2016-02-18 08:46:59.271 STDIO [ERROR] Feb 18, 2016 8:46:59 AM 
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer
> WARNING: An exception was thrown by TimerTask.
> java.lang.RuntimeException: Giving up to scheduleConnect to 
> Netty-Client-/192.168.202.6:6701 after 44 failed attempts. 3 messages were 
> lost
>   at org.apache.storm.messaging.netty.Client$Connect.run(Client.java:573)
>   at 
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:546)
>   at 
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$Worker.notifyExpiredTimeouts(HashedWheelTimer.java:446)
>   at 
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:395)
>   at 
> org.apache.storm.shade.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> The netty client then never recovers, and the follows messages repeat forever:
> {code}
> 2016-02-18 09:42:56.251 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:25.248 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:55.248 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:55.752 o.a.s.m.n.Client [ERROR] discarding 2 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:56.252 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:44:25.249 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (STORM-1560) Topology stops processing after Netty catches/swallows Throwable

2017-06-21 Thread ryan.jin (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057088#comment-16057088
 ] 

ryan.jin commented on STORM-1560:
-

[~nico.meyer]

Because of the Log 

{code:java}
2016-02-18 08:46:59.116 o.a.s.m.n.Client [INFO] closing Netty Client 
Netty-Client-/192.168.202.6:6701
{code}
,
the method {color:red}close(){color} in backtype.storm.messaging.netty.Client 
is invoked.

So your Patch maybe to not working because reconnectingAllowed will always 
return {color:red}FALSE{color}.

{code:java}
public void run(Timeout timeout)
  throws Exception
{
  if (Client.this.reconnectingAllowed())
  {
{code}

[STORM-2561|https://issues.apache.org/jira/browse/STORM-2561]


> Topology stops processing after Netty catches/swallows Throwable
> 
>
> Key: STORM-1560
> URL: https://issues.apache.org/jira/browse/STORM-1560
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>Affects Versions: 1.0.0
>Reporter: P. Taylor Goetz
> Attachments: fix-lockup.patch
>
>
> In some scenarios, netty connection problems can leave a topology in an 
> unrecoverable state. The likely culprit is the Netty {{HashedWheelTimer}} 
> class that contains the following code:
> {code}
> public void expire() {
> if(this.compareAndSetState(0, 2)) {
> try {
> this.task.run(this);
> } catch (Throwable var2) {
> if(HashedWheelTimer.logger.isWarnEnabled()) {
> HashedWheelTimer.logger.warn("An exception was thrown 
> by " + TimerTask.class.getSimpleName() + '.', var2);
> }
> }
> }
> }
> {code}
> The exception being swallowed can be seen below:
> {code}
> 2016-02-18 08:46:59.116 o.a.s.m.n.Client [INFO] closing Netty Client 
> Netty-Client-/192.168.202.6:6701
> 2016-02-18 08:46:59.173 o.a.s.m.n.Client [INFO] waiting up to 60 ms to 
> send 0 pending messages to Netty-Client-/192.168.202.6:6701
> 2016-02-18 08:46:59.271 STDIO [ERROR] Feb 18, 2016 8:46:59 AM 
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer
> WARNING: An exception was thrown by TimerTask.
> java.lang.RuntimeException: Giving up to scheduleConnect to 
> Netty-Client-/192.168.202.6:6701 after 44 failed attempts. 3 messages were 
> lost
>   at org.apache.storm.messaging.netty.Client$Connect.run(Client.java:573)
>   at 
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:546)
>   at 
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$Worker.notifyExpiredTimeouts(HashedWheelTimer.java:446)
>   at 
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:395)
>   at 
> org.apache.storm.shade.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> The netty client then never recovers, and the follows messages repeat forever:
> {code}
> 2016-02-18 09:42:56.251 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:25.248 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:55.248 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:55.752 o.a.s.m.n.Client [ERROR] discarding 2 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:56.252 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:44:25.249 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (STORM-1560) Topology stops processing after Netty catches/swallows Throwable

2017-03-20 Thread sun (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-1560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15932302#comment-15932302
 ] 

sun commented on STORM-1560:


Hi,Nico Meyer,the same problem appears in version 0.10.0.The situation is 
similar to what you say.Can I know how you solve this problem?I am Looking 
forward to your replay.Thanks a lot.

> Topology stops processing after Netty catches/swallows Throwable
> 
>
> Key: STORM-1560
> URL: https://issues.apache.org/jira/browse/STORM-1560
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-core
>Affects Versions: 1.0.0
>Reporter: P. Taylor Goetz
>
> In some scenarios, netty connection problems can leave a topology in an 
> unrecoverable state. The likely culprit is the Netty {{HashedWheelTimer}} 
> class that contains the following code:
> {code}
> public void expire() {
> if(this.compareAndSetState(0, 2)) {
> try {
> this.task.run(this);
> } catch (Throwable var2) {
> if(HashedWheelTimer.logger.isWarnEnabled()) {
> HashedWheelTimer.logger.warn("An exception was thrown 
> by " + TimerTask.class.getSimpleName() + '.', var2);
> }
> }
> }
> }
> {code}
> The exception being swallowed can be seen below:
> {code}
> 2016-02-18 08:46:59.116 o.a.s.m.n.Client [INFO] closing Netty Client 
> Netty-Client-/192.168.202.6:6701
> 2016-02-18 08:46:59.173 o.a.s.m.n.Client [INFO] waiting up to 60 ms to 
> send 0 pending messages to Netty-Client-/192.168.202.6:6701
> 2016-02-18 08:46:59.271 STDIO [ERROR] Feb 18, 2016 8:46:59 AM 
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer
> WARNING: An exception was thrown by TimerTask.
> java.lang.RuntimeException: Giving up to scheduleConnect to 
> Netty-Client-/192.168.202.6:6701 after 44 failed attempts. 3 messages were 
> lost
>   at org.apache.storm.messaging.netty.Client$Connect.run(Client.java:573)
>   at 
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:546)
>   at 
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$Worker.notifyExpiredTimeouts(HashedWheelTimer.java:446)
>   at 
> org.apache.storm.shade.org.jboss.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:395)
>   at 
> org.apache.storm.shade.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> The netty client then never recovers, and the follows messages repeat forever:
> {code}
> 2016-02-18 09:42:56.251 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:25.248 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:55.248 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:55.752 o.a.s.m.n.Client [ERROR] discarding 2 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:43:56.252 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> 2016-02-18 09:44:25.249 o.a.s.m.n.Client [ERROR] discarding 1 messages 
> because the Netty client to Netty-Client-/192.168.202.6:6701 is being closed
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)