Hi, thank you Eric, upgrading Jetty sounds like a great idea!
Prasad, I think braodcastAll and synchronization of note<->client_connection is used by default to achieve the ability to collaborate over analysis with multiple people at same Note in realtime - to notify all other clients who have this Note open about the changes that you did in your browser tab (like in 2 different tabs you can see). I believe it might be possible to replace a map with concurrent implementation to avoid excessive synchronization though, as we did in [1] before. If same behaviour persist after upgrading to Jetty 9, could you pelase create an separate issue for that and I will be happy help and look more into it. Thanks! 1. https://issues.apache.org/jira/browse/ZEPPELIN-312 -- Alex On Fri, Apr 8, 2016 at 1:28 AM, Prasad Wagle <prasadwa...@gmail.com> wrote: > Thanks Eric! I created https://issues.apache.org/jira/browse/ZEPPELIN-798 > - Migrate to Jetty version 9 that has fix for websocket deadlock bug > causing Zeppelin server hangs. This is pretty important for us so please > let me know how I can help. > > For now, I have made some changes to reduce websocket communications and > probability of hangs: > > - For the LIST_NOTES operation, I use broadcastNoteList(conn) that > sends note list to the current connection instead of using broadcastAll. > What is the reason for using broadcastAll? > - I removed synchronized (noteSocketMap) from broadcast so that one > bad socket does not hang the server. Do you think this can cause serious > problems? > > > On Thu, Apr 7, 2016 at 3:06 AM, Eric Charles <e...@apache.org> wrote: > >> On 07/04/16 07:18, Prasad Wagle wrote: >> >>> Hi, >>> >>> We experienced three Zeppelin server hangs today. I have included one of >>> the stack traces below. It is similar to the stack trace in a websocket >>> deadlock bug in Jetty 8. From the bug report >>> <https://bugs.eclipse.org/bugs/show_bug.cgi?id=389645>: >>> >>> However, Jetty 9 has already refactored the low level read/write on >>> a socket heavily to compensate for websocket, spdy, and http/2 >>> Marking this as WONTFIX for Jetty 7/8 >>> Use Jetty 9 >>> >>> >>> Is there a workaround? Has anyone tried using Jetty 9 in Zeppelin? What >>> is the effort involved? >>> >> >> >> I have upgraded the source code to Jetty 9 which implies a few different >> constructs. >> >> Could you open a JIRA? I will then submit a PRo >> >> >>> Thanks, >>> Prasad >>> >>> >>> *Stack trace* >>> >>> >>> "pool-1-thread-10" #141 prio=5 os_prio=0 tid=0x0000000001513000 >>> nid=0x6749 in Object.wait() [0x00007fdab6ff4000] >>> java.lang.Thread.State: TIMED_WAITING (on object monitor) >>> at java.lang.Object.wait(Native Method) >>> at >>> >>> org.eclipse.jetty.io.nio.SelectChannelEndPoint.blockWritable(SelectChannelEndPoint.java:494) >>> - locked <0x00000006c50d9b48> (a >>> org.eclipse.jetty.io.nio.SelectChannelEndPoint) >>> at >>> >>> org.eclipse.jetty.io.nio.SslConnection$SslEndPoint.blockWritable(SslConnection.java:723) >>> at >>> >>> org.eclipse.jetty.websocket.WebSocketGeneratorRFC6455.flush(WebSocketGeneratorRFC6455.java:248) >>> at >>> >>> org.eclipse.jetty.websocket.WebSocketGeneratorRFC6455.addFrame(WebSocketGeneratorRFC6455.java:114) >>> at >>> >>> org.eclipse.jetty.websocket.WebSocketConnectionRFC6455$WSFrameConnection.sendMessage(WebSocketConnectionRFC6455.java:439) >>> at >>> org.apache.zeppelin.socket.NotebookSocket.send(NotebookSocket.java:89) >>> at >>> >>> org.apache.zeppelin.socket.NotebookServer.broadcast(NotebookServer.java:286) >>> - locked <0x00000006c3a1cd08> (a java.util.HashMap) >>> at >>> >>> org.apache.zeppelin.socket.NotebookServer.broadcastNote(NotebookServer.java:370) >>> at >>> >>> org.apache.zeppelin.socket.NotebookServer$ParagraphJobListener.afterStatusChange(NotebookServer.java:945) >>> at org.apache.zeppelin.scheduler.Job.setStatus(Job.java:143) >>> at >>> >>> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.afterStatusChange(RemoteScheduler.java:379) >>> at >>> >>> org.apache.zeppelin.scheduler.RemoteScheduler$JobStatusPoller.getStatus(RemoteScheduler.java:261) >>> - locked <0x00000006c5885178> (a >>> org.apache.zeppelin.scheduler.RemoteScheduler$JobStatusPoller) >>> at >>> >>> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:335) >>> at >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) >>> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >>> at >>> >>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) >>> at >>> >>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) >>> at >>> >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>> at >>> >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>> at java.lang.Thread.run(Thread.java:745) >>> >>