Hi,

thank you Eric, upgrading Jetty sounds like a great idea!

Prasad, I think braodcastAll and synchronization of
note<->client_connection is used by default to achieve the ability to
collaborate over analysis with multiple people at same Note in realtime -
to notify all other clients who have this Note open about the changes that
you did in your browser tab (like in 2 different tabs you can see).

I believe it might be possible to replace a map with concurrent
implementation to avoid excessive synchronization though, as we did in [1]
before. If same behaviour persist after upgrading to Jetty 9, could you
pelase create an separate issue for that and I will be happy help and look
more into it.

Thanks!

 1. https://issues.apache.org/jira/browse/ZEPPELIN-312

--
Alex


On Fri, Apr 8, 2016 at 1:28 AM, Prasad Wagle <prasadwa...@gmail.com> wrote:

> Thanks Eric! I created https://issues.apache.org/jira/browse/ZEPPELIN-798
> - Migrate to Jetty version 9 that has fix for websocket deadlock bug
> causing Zeppelin server hangs. This is pretty important for us so please
> let me know how I can help.
>
> For now, I have made some changes to reduce websocket communications and
> probability of hangs:
>
>    - For the LIST_NOTES operation, I use broadcastNoteList(conn) that
>    sends note list to the current connection instead of using broadcastAll.
>    What is the reason for using broadcastAll?
>    - I removed synchronized (noteSocketMap) from broadcast so that one
>    bad socket does not hang the server. Do you think this can cause serious
>    problems?
>
>
> On Thu, Apr 7, 2016 at 3:06 AM, Eric Charles <e...@apache.org> wrote:
>
>> On 07/04/16 07:18, Prasad Wagle wrote:
>>
>>> Hi,
>>>
>>> We experienced three Zeppelin server hangs today. I have included one of
>>> the stack traces below. It is similar to the stack trace in a websocket
>>> deadlock bug in Jetty 8. From the bug report
>>> <https://bugs.eclipse.org/bugs/show_bug.cgi?id=389645>:
>>>
>>>     However, Jetty 9 has already refactored the low level read/write on
>>>     a socket heavily to compensate for websocket, spdy, and http/2
>>>     Marking this as WONTFIX for Jetty 7/8
>>>     Use Jetty 9
>>>
>>>
>>> Is there a workaround? Has anyone tried using Jetty 9 in Zeppelin? What
>>> is the effort involved?
>>>
>>
>>
>> I have upgraded the source code to Jetty 9 which implies a few different
>> constructs.
>>
>> Could you open a JIRA? I will then submit a PRo
>>
>>
>>> Thanks,
>>> Prasad
>>>
>>>
>>> *Stack trace*
>>>
>>>
>>> "pool-1-thread-10" #141 prio=5 os_prio=0 tid=0x0000000001513000
>>> nid=0x6749 in Object.wait() [0x00007fdab6ff4000]
>>>     java.lang.Thread.State: TIMED_WAITING (on object monitor)
>>>          at java.lang.Object.wait(Native Method)
>>>          at
>>>
>>> org.eclipse.jetty.io.nio.SelectChannelEndPoint.blockWritable(SelectChannelEndPoint.java:494)
>>>          - locked <0x00000006c50d9b48> (a
>>> org.eclipse.jetty.io.nio.SelectChannelEndPoint)
>>>          at
>>>
>>> org.eclipse.jetty.io.nio.SslConnection$SslEndPoint.blockWritable(SslConnection.java:723)
>>>          at
>>>
>>> org.eclipse.jetty.websocket.WebSocketGeneratorRFC6455.flush(WebSocketGeneratorRFC6455.java:248)
>>>          at
>>>
>>> org.eclipse.jetty.websocket.WebSocketGeneratorRFC6455.addFrame(WebSocketGeneratorRFC6455.java:114)
>>>          at
>>>
>>> org.eclipse.jetty.websocket.WebSocketConnectionRFC6455$WSFrameConnection.sendMessage(WebSocketConnectionRFC6455.java:439)
>>>          at
>>> org.apache.zeppelin.socket.NotebookSocket.send(NotebookSocket.java:89)
>>>          at
>>>
>>> org.apache.zeppelin.socket.NotebookServer.broadcast(NotebookServer.java:286)
>>>          - locked <0x00000006c3a1cd08> (a java.util.HashMap)
>>>          at
>>>
>>> org.apache.zeppelin.socket.NotebookServer.broadcastNote(NotebookServer.java:370)
>>>          at
>>>
>>> org.apache.zeppelin.socket.NotebookServer$ParagraphJobListener.afterStatusChange(NotebookServer.java:945)
>>>          at org.apache.zeppelin.scheduler.Job.setStatus(Job.java:143)
>>>          at
>>>
>>> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.afterStatusChange(RemoteScheduler.java:379)
>>>          at
>>>
>>> org.apache.zeppelin.scheduler.RemoteScheduler$JobStatusPoller.getStatus(RemoteScheduler.java:261)
>>>          - locked <0x00000006c5885178> (a
>>> org.apache.zeppelin.scheduler.RemoteScheduler$JobStatusPoller)
>>>          at
>>>
>>> org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:335)
>>>          at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>>>          at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>>          at
>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
>>>          at
>>>
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>>          at
>>>
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>          at
>>>
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>          at java.lang.Thread.run(Thread.java:745)
>>>
>>

Reply via email to