Thanks Eric! I created https://issues.apache.org/jira/browse/ZEPPELIN-798
- Migrate to Jetty version 9 that has fix for websocket deadlock bug
causing Zeppelin server hangs. This is pretty important for us so please
let me know how I can help.

For now, I have made some changes to reduce websocket communications and
probability of hangs:

   - For the LIST_NOTES operation, I use broadcastNoteList(conn) that sends
   note list to the current connection instead of using broadcastAll. What is
   the reason for using broadcastAll?
   - I removed synchronized (noteSocketMap) from broadcast so that one bad
   socket does not hang the server. Do you think this can cause serious
   problems?


On Thu, Apr 7, 2016 at 3:06 AM, Eric Charles <e...@apache.org> wrote:

> On 07/04/16 07:18, Prasad Wagle wrote:
>
>> Hi,
>>
>> We experienced three Zeppelin server hangs today. I have included one of
>> the stack traces below. It is similar to the stack trace in a websocket
>> deadlock bug in Jetty 8. From the bug report
>> <https://bugs.eclipse.org/bugs/show_bug.cgi?id=389645>:
>>
>>     However, Jetty 9 has already refactored the low level read/write on
>>     a socket heavily to compensate for websocket, spdy, and http/2
>>     Marking this as WONTFIX for Jetty 7/8
>>     Use Jetty 9
>>
>>
>> Is there a workaround? Has anyone tried using Jetty 9 in Zeppelin? What
>> is the effort involved?
>>
>
>
> I have upgraded the source code to Jetty 9 which implies a few different
> constructs.
>
> Could you open a JIRA? I will then submit a PRo
>
>
>> Thanks,
>> Prasad
>>
>>
>> *Stack trace*
>>
>>
>> "pool-1-thread-10" #141 prio=5 os_prio=0 tid=0x0000000001513000
>> nid=0x6749 in Object.wait() [0x00007fdab6ff4000]
>>     java.lang.Thread.State: TIMED_WAITING (on object monitor)
>>          at java.lang.Object.wait(Native Method)
>>          at
>>
>> org.eclipse.jetty.io.nio.SelectChannelEndPoint.blockWritable(SelectChannelEndPoint.java:494)
>>          - locked <0x00000006c50d9b48> (a
>> org.eclipse.jetty.io.nio.SelectChannelEndPoint)
>>          at
>>
>> org.eclipse.jetty.io.nio.SslConnection$SslEndPoint.blockWritable(SslConnection.java:723)
>>          at
>>
>> org.eclipse.jetty.websocket.WebSocketGeneratorRFC6455.flush(WebSocketGeneratorRFC6455.java:248)
>>          at
>>
>> org.eclipse.jetty.websocket.WebSocketGeneratorRFC6455.addFrame(WebSocketGeneratorRFC6455.java:114)
>>          at
>>
>> org.eclipse.jetty.websocket.WebSocketConnectionRFC6455$WSFrameConnection.sendMessage(WebSocketConnectionRFC6455.java:439)
>>          at
>> org.apache.zeppelin.socket.NotebookSocket.send(NotebookSocket.java:89)
>>          at
>>
>> org.apache.zeppelin.socket.NotebookServer.broadcast(NotebookServer.java:286)
>>          - locked <0x00000006c3a1cd08> (a java.util.HashMap)
>>          at
>>
>> org.apache.zeppelin.socket.NotebookServer.broadcastNote(NotebookServer.java:370)
>>          at
>>
>> org.apache.zeppelin.socket.NotebookServer$ParagraphJobListener.afterStatusChange(NotebookServer.java:945)
>>          at org.apache.zeppelin.scheduler.Job.setStatus(Job.java:143)
>>          at
>>
>> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.afterStatusChange(RemoteScheduler.java:379)
>>          at
>>
>> org.apache.zeppelin.scheduler.RemoteScheduler$JobStatusPoller.getStatus(RemoteScheduler.java:261)
>>          - locked <0x00000006c5885178> (a
>> org.apache.zeppelin.scheduler.RemoteScheduler$JobStatusPoller)
>>          at
>>
>> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:335)
>>          at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>          at
>>
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>>          at
>>
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>          at
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>          at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>          at java.lang.Thread.run(Thread.java:745)
>>
>

Reply via email to