Just opened https://github.com/apache/incubator-zeppelin/pull/831 that upgrades to jetty9

On 11/04/16 14:47, Alexander Bezzubov wrote:
Thank you for pointing LIST_NOTES broadcasting to every client, I'm not
sure that that's what was meant to happen in such case.

I have never seen the behavior you describe and it looks like a race
condition on a run note message. Did you have a chance to try applying
only the first part of the changes that you have described earlier,
keeping the synchronized noteSocketMap?

--
Alex

On Fri, Apr 8, 2016 at 12:47 PM, Prasad Wagle <prasadwa...@gmail.com
<mailto:prasadwa...@gmail.com>> wrote:

    Thanks Alex. I understand the reason for synchronization of
    note<->client_connection. However, I don't think I understand why if
    I request LIST_NOTES which does not involve any changes, the server
    sends the list of notes to all clients using broadcastNoteList()
    which uses broadcastAll.

    After deploying the changes I mentioned earlier, the server ran fine
    for 18 hours before running into a deadlock (jstack output below).
    We could download the top level page and notes but not run any
    paragraphs. Server restart fixed the problem. Do you think this is a
    result of my changes or a separate issue?

    Found one Java-level deadlock:
    =============================
    "qtp873175411-3443":
       waiting to lock monitor 0x00000000031e6158 (object
    0x00000006c3b1fba8, a java.util.HashMap),
       which is held by "DefaultQuartzScheduler_Worker-4"
    "DefaultQuartzScheduler_Worker-4":
       waiting to lock monitor 0x000000000268ad58 (object
    0x00000006c34a12c0, a java.util.ArrayList),
       which is held by "DefaultQuartzScheduler_Worker-2"
    "DefaultQuartzScheduler_Worker-2":
       waiting to lock monitor 0x00000000031e6158 (object
    0x00000006c3b1fba8, a java.util.HashMap),
       which is held by "DefaultQuartzScheduler_Worker-4"

    Java stack information for the threads listed above:
    ===================================================
    "qtp873175411-3443":
    at
    
org.apache.zeppelin.interpreter.InterpreterFactory.getNoteInterpreterSettingBinding(InterpreterFactory.java:502)
    - waiting to lock <0x00000006c3b1fba8> (a java.util.HashMap)
    at
    
org.apache.zeppelin.notebook.NoteInterpreterLoader.getInterpreterSettings(NoteInterpreterLoader.java:60)
    at
    
org.apache.zeppelin.socket.NotebookServer.sendAllAngularObjects(NotebookServer.java:951)
    at
    org.apache.zeppelin.socket.NotebookServer.sendNote(NotebookServer.java:437)
    at
    org.apache.zeppelin.socket.NotebookServer.onMessage(NotebookServer.java:123)
    at
    org.apache.zeppelin.socket.NotebookSocket.onMessage(NotebookSocket.java:70)
    at
    
org.eclipse.jetty.websocket.WebSocketConnectionRFC6455$WSFrameHandler.onFrame(WebSocketConnectionRFC6455.java:835)
    at
    
org.eclipse.jetty.websocket.WebSocketParserRFC6455.parseNext(WebSocketParserRFC6455.java:349)
    at
    
org.eclipse.jetty.websocket.WebSocketConnectionRFC6455.handle(WebSocketConnectionRFC6455.java:225)
    at org.eclipse.jetty.io.nio.SslConnection.handle(SslConnection.java:196)
    at
    
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:667)
    at
    
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:52)
    at
    
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
    at
    
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
    at java.lang.Thread.run(Thread.java:745)
    "DefaultQuartzScheduler_Worker-4":
    at org.apache.zeppelin.notebook.Note.getParagraphs(Note.java:441)
    - waiting to lock <0x00000006c34a12c0> (a java.util.ArrayList)
    at
    
org.apache.zeppelin.search.LuceneSearch.updateIndexDoc(LuceneSearch.java:172)
    at org.apache.zeppelin.notebook.Note.persist(Note.java:463)
    at
    
org.apache.zeppelin.socket.NotebookServer$ParagraphJobListener.afterStatusChange(NotebookServer.java:935)
    at org.apache.zeppelin.scheduler.Job.setStatus(Job.java:143)
    at org.apache.zeppelin.notebook.Paragraph.jobAbort(Paragraph.java:271)
    at org.apache.zeppelin.scheduler.Job.abort(Job.java:232)
    at
    
org.apache.zeppelin.interpreter.InterpreterFactory.stopJobAllInterpreter(InterpreterFactory.java:593)
    at
    
org.apache.zeppelin.interpreter.InterpreterFactory.restart(InterpreterFactory.java:547)
    - locked <0x00000006c3b1fba8> (a java.util.HashMap)
    at
    org.apache.zeppelin.notebook.Notebook$CronJob.execute(Notebook.java:440)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
    at
    
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
    - locked <0x00000006c3ac3dc0> (a java.lang.Object)
    "DefaultQuartzScheduler_Worker-2":
    at
    
org.apache.zeppelin.interpreter.InterpreterFactory.getNoteInterpreterSettingBinding(InterpreterFactory.java:502)
    - waiting to lock <0x00000006c3b1fba8> (a java.util.HashMap)
    at
    
org.apache.zeppelin.notebook.NoteInterpreterLoader.getInterpreterSettings(NoteInterpreterLoader.java:60)
    at
    
org.apache.zeppelin.notebook.NoteInterpreterLoader.get(NoteInterpreterLoader.java:77)
    at org.apache.zeppelin.notebook.Note.runAll(Note.java:409)
    - locked <0x00000006c34a12c0> (a java.util.ArrayList)
    at
    org.apache.zeppelin.notebook.Notebook$CronJob.execute(Notebook.java:419)
    at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
    at
    
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
    - locked <0x00000006c3abd630> (a java.lang.Object)

    Found 1 deadlock.

    On Thu, Apr 7, 2016 at 6:46 PM, Alexander Bezzubov <b...@apache.org
    <mailto:b...@apache.org>> wrote:

        Hi,

        thank you Eric, upgrading Jetty sounds like a great idea!

        Prasad, I think braodcastAll and synchronization of
        note<->client_connection is used by default to achieve the
        ability to collaborate over analysis with multiple people at
        same Note in realtime - to notify all other clients who have
        this Note open about the changes that you did in your browser
        tab (like in 2 different tabs you can see).

        I believe it might be possible to replace a map with concurrent
        implementation to avoid excessive synchronization though, as we
        did in [1] before. If same behaviour persist after upgrading to
        Jetty 9, could you pelase create an separate issue for that and
        I will be happy help and look more into it.

        Thanks!

          1. https://issues.apache.org/jira/browse/ZEPPELIN-312

        --
        Alex


        On Fri, Apr 8, 2016 at 1:28 AM, Prasad Wagle
        <prasadwa...@gmail.com <mailto:prasadwa...@gmail.com>> wrote:

            Thanks Eric! I created
            https://issues.apache.org/jira/browse/ZEPPELIN-798 - Migrate
            to Jetty version 9 that has fix for websocket deadlock bug
            causing Zeppelin server hangs. This is pretty important for
            us so please let me know how I can help.

            For now, I have made some changes to reduce websocket
            communications and probability of hangs:

              * For the LIST_NOTES operation, I use
                broadcastNoteList(conn) that sends note list to the
                current connection instead of using broadcastAll. What
                is the reason for using broadcastAll?
              * I removed synchronized (noteSocketMap) from broadcast so
                that one bad socket does not hang the server. Do you
                think this can cause serious problems?


            On Thu, Apr 7, 2016 at 3:06 AM, Eric Charles
            <e...@apache.org <mailto:e...@apache.org>> wrote:

                On 07/04/16 07:18, Prasad Wagle wrote:

                    Hi,

                    We experienced three Zeppelin server hangs today. I
                    have included one of
                    the stack traces below. It is similar to the stack
                    trace in a websocket
                    deadlock bug in Jetty 8. From the bug report
                    <https://bugs.eclipse.org/bugs/show_bug.cgi?id=389645>:

                         However, Jetty 9 has already refactored the low
                    level read/write on
                         a socket heavily to compensate for websocket,
                    spdy, and http/2
                         Marking this as WONTFIX for Jetty 7/8
                         Use Jetty 9


                    Is there a workaround? Has anyone tried using Jetty
                    9 in Zeppelin? What
                    is the effort involved?



                I have upgraded the source code to Jetty 9 which implies
                a few different constructs.

                Could you open a JIRA? I will then submit a PRo


                    Thanks,
                    Prasad


                    *Stack trace*


                    "pool-1-thread-10" #141 prio=5 os_prio=0
                    tid=0x0000000001513000
                    nid=0x6749 in Object.wait() [0x00007fdab6ff4000]
                         java.lang.Thread.State: TIMED_WAITING (on
                    object monitor)
                              at java.lang.Object.wait(Native Method)
                              at
                    
org.eclipse.jetty.io.nio.SelectChannelEndPoint.blockWritable(SelectChannelEndPoint.java:494)
                              - locked <0x00000006c50d9b48> (a
                    org.eclipse.jetty.io.nio.SelectChannelEndPoint)
                              at
                    
org.eclipse.jetty.io.nio.SslConnection$SslEndPoint.blockWritable(SslConnection.java:723)
                              at
                    
org.eclipse.jetty.websocket.WebSocketGeneratorRFC6455.flush(WebSocketGeneratorRFC6455.java:248)
                              at
                    
org.eclipse.jetty.websocket.WebSocketGeneratorRFC6455.addFrame(WebSocketGeneratorRFC6455.java:114)
                              at
                    
org.eclipse.jetty.websocket.WebSocketConnectionRFC6455$WSFrameConnection.sendMessage(WebSocketConnectionRFC6455.java:439)
                              at
                    
org.apache.zeppelin.socket.NotebookSocket.send(NotebookSocket.java:89)
                              at
                    
org.apache.zeppelin.socket.NotebookServer.broadcast(NotebookServer.java:286)
                              - locked <0x00000006c3a1cd08> (a
                    java.util.HashMap)
                              at
                    
org.apache.zeppelin.socket.NotebookServer.broadcastNote(NotebookServer.java:370)
                              at
                    
org.apache.zeppelin.socket.NotebookServer$ParagraphJobListener.afterStatusChange(NotebookServer.java:945)
                              at
                    org.apache.zeppelin.scheduler.Job.setStatus(Job.java:143)
                              at
                    
org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.afterStatusChange(RemoteScheduler.java:379)
                              at
                    
org.apache.zeppelin.scheduler.RemoteScheduler$JobStatusPoller.getStatus(RemoteScheduler.java:261)
                              - locked <0x00000006c5885178> (a
                    
org.apache.zeppelin.scheduler.RemoteScheduler$JobStatusPoller)
                              at
                    
org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:335)
                              at
                    
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
                              at
                    java.util.concurrent.FutureTask.run(FutureTask.java:266)
                              at
                    
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
                              at
                    
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
                              at
                    
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
                              at
                    
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
                              at java.lang.Thread.run(Thread.java:745)



Reply via email to