[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2015-11-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15006119#comment-15006119
 ] 

ASF GitHub Bot commented on STORM-513:
--

Github user hustfxj commented on the pull request:

https://github.com/apache/storm/pull/286#issuecomment-156884789
  
+1 


> ShellBolt keeps sending heartbeats even when child process is hung
> --
>
> Key: STORM-513
> URL: https://issues.apache.org/jira/browse/STORM-513
> Project: Apache Storm
>  Issue Type: Bug
>  Components: storm-multilang
>Affects Versions: 0.9.2-incubating
> Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
>Reporter: Dan Blanchard
>Assignee: Jungtaek Lim
>Priority: Blocker
> Fix For: 0.9.3-rc2
>
>
> If I'm understanding everything correctly with how ShellBolts work, the Java 
> ShellBolt executor is the part of the topology that sends heartbeats back to 
> Nimbus to let it know that a particular multilang bolt is still alive.  The 
> problem with this is that if the multilang subprocess/bolt severely hangs 
> (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
> ShellBolt does not seem to notice or care. Simply having the tuple get 
> replayed when it times out will not suffice either, because the subprocess 
> will still be stuck.
> The most obvious way to handle this seem to be to add heartbeating to the 
> multilang protocol itself, so that the ShellBolt expects a message of some 
> kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212476#comment-14212476
 ] 

ASF GitHub Bot commented on STORM-513:
--

Github user harshach commented on the pull request:

https://github.com/apache/storm/pull/286#issuecomment-63094779
  
@HeartSaVioR  I am working on doing some tests on this PR. I tried to build 
the storm with your changes in and I am getting these failures. Can you please 
check if you see any of these issues. Thanks.

java.lang.Exception: Shell Process Exception: Exception in bolt: undefined 
method `+' for nil:NilClass - tester_bolt.rb:29:in 
`process'\n/private/var/folders/yb/67h7c1sx2d95r5c_x5cjdwmhgp/T/ddda5ca6-8167-4ed1-bfef-a1a2001f65a2/supervisor/stormdist/test-1-1415984043/resources/storm.rb:186:in
 `run'\ntester_bolt.rb:37:in `main'
at backtype.storm.task.ShellBolt.handleError(ShellBolt.java:188) 
[classes/:na]
at backtype.storm.task.ShellBolt.access$1100(ShellBolt.java:69) 
[classes/:na]
at 
backtype.storm.task.ShellBolt$BoltReaderRunnable.run(ShellBolt.java:331) 
[classes/:na]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]
90960 [Thread-1055] ERROR backtype.storm.task.ShellBolt - Halting process: 
ShellBolt died.
java.lang.RuntimeException: backtype.storm.multilang.NoOutputException: 
Pipe to subprocess seems to be broken! No output read.
Serializer Exception:


at 
backtype.storm.utils.ShellProcess.readShellMsg(ShellProcess.java:101) 
~[classes/:na]
at 
backtype.storm.task.ShellBolt$BoltReaderRunnable.run(ShellBolt.java:318) 
~[classes/:na]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]



 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
Affects Versions: 0.9.2-incubating
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Priority: Blocker
 Fix For: 0.9.3-rc2


 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-11-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212990#comment-14212990
 ] 

ASF GitHub Bot commented on STORM-513:
--

Github user asfgit closed the pull request at:

https://github.com/apache/storm/pull/286


 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
Affects Versions: 0.9.2-incubating
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Assignee: Jungtaek Lim
Priority: Blocker
 Fix For: 0.9.3-rc2


 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-11-14 Thread Jungtaek Lim (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14213075#comment-14213075
 ] 

Jungtaek Lim commented on STORM-513:


Please don't forget to document and mention about multilang protocol has 
changed.
Thanks for merging!

 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
Affects Versions: 0.9.2-incubating
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Assignee: Jungtaek Lim
Priority: Blocker
 Fix For: 0.9.3-rc2


 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203837#comment-14203837
 ] 

ASF GitHub Bot commented on STORM-513:
--

Github user HeartSaVioR commented on the pull request:

https://github.com/apache/storm/pull/286#issuecomment-62295398
  
@itaifrenkel OK, I've commented to STORM-528.


 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
Affects Versions: 0.9.2-incubating
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Priority: Blocker
 Fix For: 0.9.3-rc2


 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203860#comment-14203860
 ] 

ASF GitHub Bot commented on STORM-513:
--

Github user HeartSaVioR commented on the pull request:

https://github.com/apache/storm/pull/286#issuecomment-62296198
  
We should document this feature to 
http://storm.apache.org/documentation/Multilang-protocol.html
In bolts, bolt should handle heartbeat tuple, and send sync response ASAP.


 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
Affects Versions: 0.9.2-incubating
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Priority: Blocker
 Fix For: 0.9.3-rc2


 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203866#comment-14203866
 ] 

ASF GitHub Bot commented on STORM-513:
--

Github user itaifrenkel commented on the pull request:

https://github.com/apache/storm/pull/286#issuecomment-62296939
  
@HeartSaVioR I wanted to fix the multilang docs some time ago for nodejs 
support, but couldn't find a way to provide a pull request ?


 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
Affects Versions: 0.9.2-incubating
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Priority: Blocker
 Fix For: 0.9.3-rc2


 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203874#comment-14203874
 ] 

ASF GitHub Bot commented on STORM-513:
--

Github user HeartSaVioR commented on the pull request:

https://github.com/apache/storm/pull/286#issuecomment-62298800
  
AFAIK, there was a discussion about moving documents (containing website) 
to git, started by @ptgoetz, and +1 by many committers.
But it's actually not applied, so we can't fix ourselves. We can just 
report to dev mailing list.

Personally I'm heavily inspired to Redis documentation. Document has been 
treated to same as code.
It's on http://github.com/antirez/redis-doc.


 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
Affects Versions: 0.9.2-incubating
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Priority: Blocker
 Fix For: 0.9.3-rc2


 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-11-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203821#comment-14203821
 ] 

ASF GitHub Bot commented on STORM-513:
--

Github user itaifrenkel commented on the pull request:

https://github.com/apache/storm/pull/286#issuecomment-62295127
  
Please comment on STORM-528 if you resolved the py files divergence problem.


 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
Affects Versions: 0.9.2-incubating
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Priority: Blocker
 Fix For: 0.9.3-rc2


 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-11-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203825#comment-14203825
 ] 

ASF GitHub Bot commented on STORM-513:
--

Github user itaifrenkel commented on the pull request:

https://github.com/apache/storm/pull/286#issuecomment-62295232
  
@HeartSaVioR @clockfly  I think we need to keep the multilang protocl 
implementation as simple as possible. A full roundtrip of heartbeat messages is 
not that bad, as long as it does not add too much latency. If you would like an 
optimization for the rountrip messages then you could consider any emit as an 
heartbeat, and trigger the heartbeat rountrip only if there are not enough 
emits from the bolt. It makes the java code more complicated :(, but achieves 
similar goals, and leaves the multilang implementation simpler :). All-in-all I 
think this commit is good, and we could discuss various optimizations later on.


 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
Affects Versions: 0.9.2-incubating
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Priority: Blocker
 Fix For: 0.9.3-rc2


 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-11-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196872#comment-14196872
 ] 

ASF GitHub Bot commented on STORM-513:
--

Github user HeartSaVioR commented on the pull request:

https://github.com/apache/storm/pull/286#issuecomment-61719213
  
I've got a change to discuss about this PR with @clockfly , and he also 
stated if subprocess is too busy, subprocess cannot send heartbeat in time, 
which I've stated first of this PR.

Actually it's better to let subprocess have heartbeat thread and send 
heartbeat periodically.
But there're two things to consider.
1. ShellSpout runs with PING-PONG communication, and ShellSpout must wait 
sync from nextTuple(). So if we change ShellSpout to have reader thread, we 
should implement nextTuple() to wait for reading sync from reader thread, 
which is a little complex than current.
2. We should ensure that main thread and heartbeat thread don't write 
stdout (maybe Pipe) at the same time. GIL could let us feel free, but there 
will be other languages that support real (?) thread. Writing operation should 
be with lock.

Since I'm not a Javascript (nodejs) guy, and I'm a beginner to Ruby, I 
cannot cover two things with .js. 
So I wish to implement it to other PR when we think we can't stand its 
limitation, or I have some more time.

Btw, Nimbus / Supervisor can find dead process due to subprocess hang up to 
SUPERVISOR_WORKER_TIMEOUT_SECS * 2 + a (maybe), cause there're two heartbeat 
check, ShellProcess checks subprocess (and suicide if subprocess cannot 
respond), Nimbus / Supervisor checks ShellProcess.
(Just for @clockfly )


 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
Affects Versions: 0.9.2-incubating
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Priority: Blocker
 Fix For: 0.9.3-rc2


 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-11-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14196893#comment-14196893
 ] 

ASF GitHub Bot commented on STORM-513:
--

Github user HeartSaVioR commented on the pull request:

https://github.com/apache/storm/pull/286#issuecomment-61720636
  
OK, I've upmerged.
Btw, I found py files are diverged too so I need to copy and paste one file 
to another.


 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
Affects Versions: 0.9.2-incubating
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Priority: Blocker
 Fix For: 0.9.3-rc2


 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1418#comment-1418
 ] 

ASF GitHub Bot commented on STORM-513:
--

Github user HeartSaVioR commented on the pull request:

https://github.com/apache/storm/pull/286#issuecomment-60854884
  
Can PR be included to 0.9.3, or next version?


 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
Affects Versions: 0.9.2-incubating
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Priority: Blocker
 Fix For: 0.9.3-rc2


 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182577#comment-14182577
 ] 

ASF GitHub Bot commented on STORM-513:
--

Github user HeartSaVioR commented on a diff in the pull request:

https://github.com/apache/storm/pull/286#discussion_r19326747
  
--- Diff: storm-core/src/jvm/backtype/storm/task/ShellBolt.java ---
@@ -305,4 +283,95 @@ private void die(Throwable exception) {
 System.exit(11);
 }
 }
+
+private class BoltHeartbeatTimerTask extends TimerTask {
+private ShellBolt bolt;
+
+public BoltHeartbeatTimerTask(ShellBolt bolt) {
+this.bolt = bolt;
+}
+
+@Override
+public void run() {
+long currentTimeMillis = System.currentTimeMillis();
+long lastHeartbeat = getLastHeartbeat();
+
+LOG.debug(BOLT - current time : {}, last heartbeat : {}, 
worker timeout (ms) : {},
+currentTimeMillis, lastHeartbeat, workerTimeoutMills);
+
+if (currentTimeMillis - lastHeartbeat  workerTimeoutMills) {
+bolt.die(new RuntimeException(subprocess heartbeat 
timeout));
+}
+
+String genId = Long.toString(_rand.nextLong());
+try {
+_pendingWrites.put(createHeartbeatBoltMessage(genId));
--- End diff --

@itaifrenkel Oh, I see. I didn't know that options exists. Thanks for 
letting me know!
Then we can flip heartbeat flag which means it's time to send heartbeat 
as you state, and let BoltWriter.run() loop takes care of it first.
What do you think?


 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
Affects Versions: 0.9.2-incubating
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Priority: Blocker
 Fix For: 0.9.3-rc2


 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183361#comment-14183361
 ] 

ASF GitHub Bot commented on STORM-513:
--

Github user ptgoetz commented on the pull request:

https://github.com/apache/storm/pull/286#issuecomment-60438795
  
@itaifrenkel No, not that's available for use via the bolt API, but it's an 
interesting idea. You could effectively do the same by making the scheduler 
static (1 per worker/JVM), but that feels kind of hacky.


 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
Affects Versions: 0.9.2-incubating
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Priority: Blocker
 Fix For: 0.9.3-rc2


 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-10-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183363#comment-14183363
 ] 

ASF GitHub Bot commented on STORM-513:
--

Github user ptgoetz commented on the pull request:

https://github.com/apache/storm/pull/286#issuecomment-60438969
  
Since there were additional commits added to the pull request, we need to 
give it more time for others to review before merging, but I am still +1 for 
the patch.


 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
Affects Versions: 0.9.2-incubating
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Priority: Blocker
 Fix For: 0.9.3-rc2


 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-10-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168594#comment-14168594
 ] 

ASF GitHub Bot commented on STORM-513:
--

Github user HeartSaVioR commented on a diff in the pull request:

https://github.com/apache/storm/pull/286#discussion_r18746971
  
--- Diff: storm-core/src/jvm/backtype/storm/spout/ShellSpout.java ---
@@ -56,13 +67,18 @@ public void open(Map stormConf, TopologyContext context,
 _collector = collector;
 _context = context;
 
+workerTimeoutMills = 1000 * 
RT.intCast(stormConf.get(Config.SUPERVISOR_WORKER_TIMEOUT_SECS));
+
 _process = new ShellProcess(_command);
 
 Number subpid = _process.launch(stormConf, context);
 LOG.info(Launched subprocess with pid  + subpid);
+
+heartBeatExecutorService = 
MoreExecutors.getExitingScheduledExecutorService(new 
ScheduledThreadPoolExecutor(1));
--- End diff --

@itaifrenkel 
MoreExecutors.getExitingScheduledExecutorService() receives 
ScheduledThreadPoolExecutor, not ScheduledExecutorService. I tried to change 
it, but compiler complained.


 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Priority: Blocker

 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-10-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168599#comment-14168599
 ] 

ASF GitHub Bot commented on STORM-513:
--

Github user itaifrenkel commented on a diff in the pull request:

https://github.com/apache/storm/pull/286#discussion_r18747035
  
--- Diff: storm-core/src/jvm/backtype/storm/spout/ShellSpout.java ---
@@ -56,13 +67,18 @@ public void open(Map stormConf, TopologyContext context,
 _collector = collector;
 _context = context;
 
+workerTimeoutMills = 1000 * 
RT.intCast(stormConf.get(Config.SUPERVISOR_WORKER_TIMEOUT_SECS));
+
 _process = new ShellProcess(_command);
 
 Number subpid = _process.launch(stormConf, context);
 LOG.info(Launched subprocess with pid  + subpid);
+
+heartBeatExecutorService = 
MoreExecutors.getExitingScheduledExecutorService(new 
ScheduledThreadPoolExecutor(1));
--- End diff --

You could cast, But that's ok as it is


 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Priority: Blocker

 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-10-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168095#comment-14168095
 ] 

ASF GitHub Bot commented on STORM-513:
--

Github user itaifrenkel commented on a diff in the pull request:

https://github.com/apache/storm/pull/286#discussion_r18741096
  
--- Diff: storm-core/src/jvm/backtype/storm/spout/ShellSpout.java ---
@@ -189,9 +205,52 @@ private void handleLog(ShellMsg shellMsg) {
 
 @Override
 public void activate() {
+LOG.info(Start checking heartbeat...);
+// prevent timer to check heartbeat based on last thing before 
activate
+setHeartbeat();
+heartBeatTimer.scheduleAtFixedRate(new 
SpoutHeartbeatTimerTask(this), 1000, 1 * 1000);
 }
 
 @Override
 public void deactivate() {
+heartBeatTimer.cancel();
+}
+
+private void setHeartbeat() {
+lastHeartbeatTimestamp.set(System.currentTimeMillis());
+}
+
+private long getLastHeartbeat() {
+return lastHeartbeatTimestamp.get();
+}
+
+private void die(Throwable exception) {
+heartBeatTimer.cancel();
+
+LOG.error(Halting process: ShellSpout died., exception);
+_collector.reportError(exception);
+System.exit(11);
 }
+
+private class SpoutHeartbeatTimerTask extends TimerTask {
+private ShellSpout spout;
+
+public SpoutHeartbeatTimerTask(ShellSpout spout) {
+this.spout = spout;
+}
+
+@Override
+public void run() {
+long currentTimeMillis = System.currentTimeMillis();
+long lastHeartbeat = getLastHeartbeat();
+
+LOG.debug(current time :  + currentTimeMillis + , last 
heartbeat :  + lastHeartbeat
--- End diff --

whe doing debug logging try to refrain from using +. Either surraound with 
isDebugLevel()  or  use current time :{}, last heartbeat : {} 


 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Priority: Blocker

 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-10-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168096#comment-14168096
 ] 

ASF GitHub Bot commented on STORM-513:
--

Github user itaifrenkel commented on a diff in the pull request:

https://github.com/apache/storm/pull/286#discussion_r18741104
  
--- Diff: storm-core/src/jvm/backtype/storm/spout/ShellSpout.java ---
@@ -189,9 +205,52 @@ private void handleLog(ShellMsg shellMsg) {
 
 @Override
 public void activate() {
+LOG.info(Start checking heartbeat...);
+// prevent timer to check heartbeat based on last thing before 
activate
+setHeartbeat();
+heartBeatTimer.scheduleAtFixedRate(new 
SpoutHeartbeatTimerTask(this), 1000, 1 * 1000);
--- End diff --

Could you please add a  bolt configuration to diable it? Some of our python 
bolt are expiriemental and are unanchored. I would not want to crash the entire 
worker, since it would incur serious downtime.


 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Priority: Blocker

 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-10-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168106#comment-14168106
 ] 

ASF GitHub Bot commented on STORM-513:
--

Github user itaifrenkel commented on a diff in the pull request:

https://github.com/apache/storm/pull/286#discussion_r18741180
  
--- Diff: storm-core/src/jvm/backtype/storm/spout/ShellSpout.java ---
@@ -189,9 +205,52 @@ private void handleLog(ShellMsg shellMsg) {
 
 @Override
 public void activate() {
+LOG.info(Start checking heartbeat...);
+// prevent timer to check heartbeat based on last thing before 
activate
+setHeartbeat();
+heartBeatTimer.scheduleAtFixedRate(new 
SpoutHeartbeatTimerTask(this), 1000, 1 * 1000);
 }
 
 @Override
 public void deactivate() {
+heartBeatTimer.cancel();
+}
+
+private void setHeartbeat() {
+lastHeartbeatTimestamp.set(System.currentTimeMillis());
+}
+
+private long getLastHeartbeat() {
+return lastHeartbeatTimestamp.get();
+}
+
+private void die(Throwable exception) {
+heartBeatTimer.cancel();
+
+LOG.error(Halting process: ShellSpout died., exception);
+_collector.reportError(exception);
+System.exit(11);
--- End diff --

All of our pyton and multilang bolts have special code that intercepts the 
SIG_TERM singal and kill when parent process dies. This has not been 
contributed back since it is very linux specific and logger specific. Without 
it you might end up having zomie worker processes. This does not relate to your 
commit since you didn't invent the System.exit(11) thingy, however it would 
make things worse when a process is not responding. Ideally you would at least 
want to call process.destory() first. As process destroy is implemented without 
kill -9 it is not guaranteed to work (sigar's implements this per OS quite 
nicely).



 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Priority: Blocker

 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-10-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168108#comment-14168108
 ] 

ASF GitHub Bot commented on STORM-513:
--

Github user itaifrenkel commented on a diff in the pull request:

https://github.com/apache/storm/pull/286#discussion_r18741189
  
--- Diff: storm-core/src/dev/resources/storm.js ---
@@ -243,6 +243,12 @@ BasicBolt.prototype.__emit = function(commandDetails) {
 BasicBolt.prototype.handleNewCommand = function(command) {
 var self = this;
 var tup = new Tuple(command[id], command[comp], command[stream], 
command[task], command[tuple]);
+
+if (tup.task == -1  tup.stream == __heartbeat) {
+self.sync();
+return;
+}
+
 var callback = function(err) {
   if (err) {
--- End diff --

storm.py , storm.rb, and storm.js are comitted three times. Each change 
needs to be modified in all three files. (Yeay - I know )


 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Priority: Blocker

 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-10-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168114#comment-14168114
 ] 

ASF GitHub Bot commented on STORM-513:
--

Github user itaifrenkel commented on a diff in the pull request:

https://github.com/apache/storm/pull/286#discussion_r18741220
  
--- Diff: storm-core/src/jvm/backtype/storm/spout/ShellSpout.java ---
@@ -56,13 +64,18 @@ public void open(Map stormConf, TopologyContext context,
 _collector = collector;
 _context = context;
 
+workerTimeoutMills = 1000 * 
RT.intCast(stormConf.get(Config.SUPERVISOR_WORKER_TIMEOUT_SECS));
+
 _process = new ShellProcess(_command);
 
 Number subpid = _process.launch(stormConf, context);
 LOG.info(Launched subprocess with pid  + subpid);
+
+heartBeatTimer = new Timer(context.getThisTaskId() + 
-heartbeatTimer, true);
--- End diff --

see also 
http://stackoverflow.com/questions/17419386/scheduledexecutorservice-how-to-stop-action-without-stopping-executor
 


 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Priority: Blocker

 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-10-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168130#comment-14168130
 ] 

ASF GitHub Bot commented on STORM-513:
--

Github user HeartSaVioR commented on a diff in the pull request:

https://github.com/apache/storm/pull/286#discussion_r18741389
  
--- Diff: storm-core/src/jvm/backtype/storm/spout/ShellSpout.java ---
@@ -189,9 +205,52 @@ private void handleLog(ShellMsg shellMsg) {
 
 @Override
 public void activate() {
+LOG.info(Start checking heartbeat...);
+// prevent timer to check heartbeat based on last thing before 
activate
+setHeartbeat();
+heartBeatTimer.scheduleAtFixedRate(new 
SpoutHeartbeatTimerTask(this), 1000, 1 * 1000);
 }
 
 @Override
 public void deactivate() {
+heartBeatTimer.cancel();
+}
+
+private void setHeartbeat() {
+lastHeartbeatTimestamp.set(System.currentTimeMillis());
+}
+
+private long getLastHeartbeat() {
+return lastHeartbeatTimestamp.get();
+}
+
+private void die(Throwable exception) {
+heartBeatTimer.cancel();
+
+LOG.error(Halting process: ShellSpout died., exception);
+_collector.reportError(exception);
+System.exit(11);
 }
+
+private class SpoutHeartbeatTimerTask extends TimerTask {
+private ShellSpout spout;
+
+public SpoutHeartbeatTimerTask(ShellSpout spout) {
+this.spout = spout;
+}
+
+@Override
+public void run() {
+long currentTimeMillis = System.currentTimeMillis();
+long lastHeartbeat = getLastHeartbeat();
+
+LOG.debug(current time :  + currentTimeMillis + , last 
heartbeat :  + lastHeartbeat
--- End diff --

@itaifrenkel Oh, you're right! It's slf4j so we should use {} to log 
effectively.


 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Priority: Blocker

 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-10-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168132#comment-14168132
 ] 

ASF GitHub Bot commented on STORM-513:
--

Github user HeartSaVioR commented on a diff in the pull request:

https://github.com/apache/storm/pull/286#discussion_r18741401
  
--- Diff: storm-core/src/jvm/backtype/storm/spout/ShellSpout.java ---
@@ -189,9 +205,52 @@ private void handleLog(ShellMsg shellMsg) {
 
 @Override
 public void activate() {
+LOG.info(Start checking heartbeat...);
+// prevent timer to check heartbeat based on last thing before 
activate
+setHeartbeat();
+heartBeatTimer.scheduleAtFixedRate(new 
SpoutHeartbeatTimerTask(this), 1000, 1 * 1000);
 }
 
 @Override
 public void deactivate() {
+heartBeatTimer.cancel();
+}
+
+private void setHeartbeat() {
+lastHeartbeatTimestamp.set(System.currentTimeMillis());
+}
+
+private long getLastHeartbeat() {
+return lastHeartbeatTimestamp.get();
+}
+
+private void die(Throwable exception) {
+heartBeatTimer.cancel();
+
+LOG.error(Halting process: ShellSpout died., exception);
+_collector.reportError(exception);
+System.exit(11);
--- End diff --

@itaifrenkel I agree that we should process.destroy() before terminating 
itself. 
(It has been maintained by JDK and it's implemented with JNI, so it would 
be OS specific.)
I also think storm project tries to support Windows, signal handle to 
SIGTERM maybe not a solution.
I'll change it to call process.destroy() first.


 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Priority: Blocker

 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-10-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168144#comment-14168144
 ] 

ASF GitHub Bot commented on STORM-513:
--

Github user HeartSaVioR commented on a diff in the pull request:

https://github.com/apache/storm/pull/286#discussion_r18741463
  
--- Diff: storm-core/src/dev/resources/storm.js ---
@@ -243,6 +243,12 @@ BasicBolt.prototype.__emit = function(commandDetails) {
 BasicBolt.prototype.handleNewCommand = function(command) {
 var self = this;
 var tup = new Tuple(command[id], command[comp], command[stream], 
command[task], command[tuple]);
+
+if (tup.task == -1  tup.stream == __heartbeat) {
+self.sync();
+return;
+}
+
 var callback = function(err) {
   if (err) {
--- End diff --

@itaifrenkel Sorry, I don't know what you say. Could you please describe 
more clear? I committed all changes in one commit.


 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Priority: Blocker

 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-10-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168149#comment-14168149
 ] 

ASF GitHub Bot commented on STORM-513:
--

Github user HeartSaVioR commented on a diff in the pull request:

https://github.com/apache/storm/pull/286#discussion_r18741577
  
--- Diff: storm-core/src/jvm/backtype/storm/spout/ShellSpout.java ---
@@ -56,13 +64,18 @@ public void open(Map stormConf, TopologyContext context,
 _collector = collector;
 _context = context;
 
+workerTimeoutMills = 1000 * 
RT.intCast(stormConf.get(Config.SUPERVISOR_WORKER_TIMEOUT_SECS));
+
 _process = new ShellProcess(_command);
 
 Number subpid = _process.launch(stormConf, context);
 LOG.info(Launched subprocess with pid  + subpid);
+
+heartBeatTimer = new Timer(context.getThisTaskId() + 
-heartbeatTimer, true);
--- End diff --

@itaifrenkel Users can enable tick tuple and use it to his/her purpose. 
Also we cannot use high tick time (N minute) for heartbeat.
So we shouldn't use tick tuples.


 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Priority: Blocker

 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-10-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168157#comment-14168157
 ] 

ASF GitHub Bot commented on STORM-513:
--

Github user HeartSaVioR commented on a diff in the pull request:

https://github.com/apache/storm/pull/286#discussion_r18741678
  
--- Diff: storm-core/src/dev/resources/storm.js ---
@@ -243,6 +243,12 @@ BasicBolt.prototype.__emit = function(commandDetails) {
 BasicBolt.prototype.handleNewCommand = function(command) {
 var self = this;
 var tup = new Tuple(command[id], command[comp], command[stream], 
command[task], command[tuple]);
+
+if (tup.task == -1  tup.stream == __heartbeat) {
+self.sync();
+return;
+}
+
 var callback = function(err) {
   if (err) {
--- End diff --

@itaifrenkel Oh, I see... It exists 3 times. I think only one should be 
existed, but I'm not sure it can be.
I'll apply my changes to other files.


 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Priority: Blocker

 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-10-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168217#comment-14168217
 ] 

ASF GitHub Bot commented on STORM-513:
--

Github user itaifrenkel commented on a diff in the pull request:

https://github.com/apache/storm/pull/286#discussion_r18742262
  
--- Diff: storm-core/src/jvm/backtype/storm/spout/ShellSpout.java ---
@@ -189,9 +207,53 @@ private void handleLog(ShellMsg shellMsg) {
 
 @Override
 public void activate() {
+LOG.info(Start checking heartbeat...);
+// prevent timer to check heartbeat based on last thing before 
activate
+setHeartbeat();
+heartBeatExecutor.scheduleAtFixedRate(new 
SpoutHeartbeatTimerTask(this), 1, 1, TimeUnit.SECONDS);
--- End diff --

I would recommend  scheduleWithFixedDelay since it is more explicit in how 
it uses a single thread. Again, ask the comitters.


 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Priority: Blocker

 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-10-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14168533#comment-14168533
 ] 

ASF GitHub Bot commented on STORM-513:
--

Github user itaifrenkel commented on a diff in the pull request:

https://github.com/apache/storm/pull/286#discussion_r18745857
  
--- Diff: storm-core/src/jvm/backtype/storm/spout/ShellSpout.java ---
@@ -189,9 +207,53 @@ private void handleLog(ShellMsg shellMsg) {
 
 @Override
 public void activate() {
+LOG.info(Start checking heartbeat...);
+// prevent timer to check heartbeat based on last thing before 
activate
+setHeartbeat();
+heartBeatExecutor.scheduleAtFixedRate(new 
SpoutHeartbeatTimerTask(this), 1, 1, TimeUnit.SECONDS);
--- End diff --

When having a single thread there is not much difference between the two. 
Even more your call back is non blocking so the difference is insignificant. 
Nevertheless, even if it were entirley not accurate the code logic would stay 
intact. There is nothing in that callback that warrants exactly one second.


 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Priority: Blocker

 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-10-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163526#comment-14163526
 ] 

ASF GitHub Bot commented on STORM-513:
--

GitHub user HeartSaVioR opened a pull request:

https://github.com/apache/storm/pull/286

STORM-513 check heartbeat from multilang subprocess

Related issue link : https://issues.apache.org/jira/browse/STORM-513

It seems that ShellSpout and ShellBolt doesn't check subprocess, and set 
heartbeat with their only states.
Subprocess could hang, but it doesn't affect ShellSpout / ShellBolt. It 
just stops working on tuple.
It's better to check heartbeat from subprocess, and suicide if subprocess 
stops working.

* Spout
  * ShellSpout sends next to subprocess continuously
  * subprocess sends sync to ShellSpout when next is received
  * so we can treat sync, or any messages to heartbeat
* Bolt
  * ShellBolt sends tuples to subprocess if it's available
  * so we need to send heartbeat tuple
   * subprocess sends sync to ShellBolt when heartbeat tuple is received

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HeartSaVioR/storm STORM-513

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/storm/pull/286.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #286


commit ca5874cdf11af8d835335d228b643f28aeb3f9c3
Author: Jungtaek Lim kabh...@gmail.com
Date:   2014-10-08T13:48:01Z

STORM-513 check heartbeat from multilang subprocess

* Spout
** ShellSpout sends next to subprocess continuously
** subprocess sends sync to ShellSpout when next is received
** so we can treat sync, or any messages to heartbeat
* Bolt
** ShellBolt sends tuples to subprocess if it's available
** so we need to send heartbeat tuple
** subprocess sends sync to ShellBolt when heartbeat tuple is
received

commit 1a0d4bdd735ba0ade42f6777a4c47affec931557
Author: Jungtaek Lim kabh...@gmail.com
Date:   2014-10-08T14:06:24Z

Fix mixed tab / space, remove FIXME




 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Priority: Blocker

 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (STORM-513) ShellBolt keeps sending heartbeats even when child process is hung

2014-10-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/STORM-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163530#comment-14163530
 ] 

ASF GitHub Bot commented on STORM-513:
--

Github user dan-blanchard commented on the pull request:

https://github.com/apache/storm/pull/286#issuecomment-58363290
  
As the person who filed the issue. Thanks for coming up with a solution so 
quickly!


 ShellBolt keeps sending heartbeats even when child process is hung
 --

 Key: STORM-513
 URL: https://issues.apache.org/jira/browse/STORM-513
 Project: Apache Storm
  Issue Type: Bug
 Environment: Linux: 2.6.32-431.11.2.el6.x86_64 (RHEL 6.5)
Reporter: Dan Blanchard
Priority: Blocker

 If I'm understanding everything correctly with how ShellBolts work, the Java 
 ShellBolt executor is the part of the topology that sends heartbeats back to 
 Nimbus to let it know that a particular multilang bolt is still alive.  The 
 problem with this is that if the multilang subprocess/bolt severely hangs 
 (i.e., it will not even respond to {{SIGALRM}} and the like), the Java 
 ShellBolt does not seem to notice or care. Simply having the tuple get 
 replayed when it times out will not suffice either, because the subprocess 
 will still be stuck.
 The most obvious way to handle this seem to be to add heartbeating to the 
 multilang protocol itself, so that the ShellBolt expects a message of some 
 kind every {{timeout}} seconds.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)