My environment is on windows and I see this well using sparkr. It appears the R Session somehow is killed after period of inactivity and it can't be found using task manager. I would need to restart the interpreter in order to get another R Session.
> On Feb 17, 2017, at 9:11 PM, RUSHIKESH RAUT <rushikeshraut...@gmail.com> > wrote: > > Yes, it will reproduce the problem. Try running all the tutorials notebook at > random, you will see it is in hanging state. > In my experience it occurs more often with R and sparkR code, even if there > is enough memory in the cluster. Just leave the spark session running and > after an hour or so it stops responding. > Happens with me everytime that when I make some visualization on top of it > and share it with someone, by the time other person looks at it Zeppelin > starts to act weirdly and doesn't response. > > On Feb 18, 2017 9:09 AM, "moon soo Lee" <m...@apache.org> wrote: > Hi, > > Download 0.7.0 -> Run R tutorial notebook repeatedly > > will reproduce the problem? Otherwise, can someone clarify instruction to > reproduce the problem? > > > Thanks, > moon > >> On Sat, Feb 18, 2017 at 5:45 AM xyun...@simuwell.com <xyun...@simuwell.com> >> wrote: >> Within the Scala REPL everything is working fine. Even you application >> session is down but you run the same code again you will see there is a new >> job getting created. But zeppelin has such a problem. you can do a test. run >> a notebook and get the job finished at the end of the notebook. Then re-run >> the same notebook again then it will get stucked. Running spark code on >> Scala REPL and Zeppelin are different >> >> From: Paul Brenner >> Date: 2017-02-17 12:37 >> To: users >> Subject: Re: Re: Zeppelin unable to respond after some time >> >> I don’t believe that this explains my issue. Running the Scala REPL also >> keeps a session alive for as long as the REPL is running. I’ve had REPLs >> open for days (shhhhh don’t tell anyone) that have correspondingly kept >> sessions alive for the same period of time with no problem. I only see this >> issue in zeppelin. >> >> We run zeppelin on a server and allow multiple users to connect, each with >> their own interpreters. We also find that zeppelin memory usage on the >> server will steadily creep up over time. Executing sys.exit in a spark >> paragraph, restarting the interpreter, and using yarn application -kill >> often will cause zeppelin to end the related interpreter process but not >> always. So over time we find that many zombie processes pile up and eat up >> resources. >> >> The only way to keep on top of this is to regularly login to the zeppelin >> server and kill zombie jobs. Here is a command that I’ve found helpful. When >> you know that a specific user has no active zeppelin interpreters running >> then execute the following: >> >> ps aux | grep zeppelin | grep "2BSGYY7S8" | grep java | awk -F " " >> '{print$2}' | xargs sudo -u yarn kill -9 >> >> where “2BSGYY7S8" is the interpreter id (found in interpreter.json) and >> “yarn” is actually the name of the user that originally started zeppelin >> with: >> zeppelin-daemon.sh start >> >> To kill every interpreter except for a specific users just flip it around >> with: >> ps aux | grep zeppelin | grep -v "2BSGYY7S8" |grep -v >> "zeppelin.server.ZeppelinServer" | grep java | awk -F " " '{print$2}' | >> xargs sudo -u yarn kill -9 >> >> >> If I do this every few days zeppelin keeps humming along pretty smoothly >> most of the time. >> Paul Brenner >> DATA SCIENTIST >> (217) 390-3033 >> >> >> >> On Fri, Feb 17, 2017 at 3:23 PM "xyun...@simuwell.com" >> <">"xyun...@simuwell.com" > wrote: >> The problem could be not only the resource, but the session. If you run a >> chunk of spark code and you should see the a running application in the >> spark UI, but in your code if you shut it down after the job is finished, >> then on the spark UI you will see the hob is finished. Within zeppelin, each >> job will start the spark session only once(different interpreter mode could >> be set if you want notebooks to share the session or not), if you closed it >> ,it will never restart it again. The only way to get the same code work >> again is to restat the interpreter or restart zeppelin. I`m not sure if I >> explain clearly, but hope it could help >> >> From: Paul Brenner >> Date: 2017-02-17 12:14 >> To: users >> Subject: Re: Re: Zeppelin unable to respond after some time >> >> I’ve definitely had this problem with jobs that don’t take all the resources >> on the cluster. Also, my experience matches what others have reported: just >> restarting zeppelin and re-runing the stuck paragraph solves the issue. >> >> I’ve also experienced this problem with for loops. Some for loops which >> write to disk but absolutely don’t have any variables that are increasing in >> size will hang in Zeppelin. If I run the exact same code in the scala REPL >> it goes through without problem. >> >> >> >> Paul Brenner >> DATA SCIENTIST >> (217) 390-3033 >> >> >> >> On Fri, Feb 17, 2017 at 2:12 PM "xyun...@simuwell.com" >> <">"xyun...@simuwell.com" > wrote: >> I have solved the similar issue before. You should check on spark UI and >> probably you will see your single job is taking all the resources. Therefore >> further job that submitting to the same cluster will just hang on there. >> When you restart zeppelin then the old job is killed and all the resource it >> took will be released >> >> xyun...@simuwell.com >> >> From: RUSHIKESH RAUT >> Date: 2017-02-17 02:29 >> To: users >> Subject: Re: Zeppelin unable to respond after some time >> Yes happens with r and spark codes frequently >> >> On Feb 17, 2017 3:25 PM, "小野圭二" <onoke...@gmail.com> wrote: >> yes, almost every time. >> There are not any special operations. >> Just run the tutorial demos. >> From my feeling, it happens in R demo frequently. >> >> 2017-02-17 18:50 GMT+09:00 Jeff Zhang <zjf...@gmail.com>: >> >> Is it easy to reproduce it ? >> >> 小野圭二 <onoke...@gmail.com>于2017年2月17日周五 下午5:47写道: >> I am facing on the same issue now. >> >> 2017-02-17 18:25 GMT+09:00 RUSHIKESH RAUT <rushikeshraut...@gmail.com>: >> Hi all, >> >> I am facing a issue while using Zeppelin. I am trying to load some data(not >> that big data) into Zeppelin and then build some visualization on it. The >> problem is that when I try to run the code first time it's working but after >> some time the same code doesn't work. It remains in running state on gui, >> but no logs are generated in Zeppelin logs. Also all further tasks are >> hanging in pending state. >> As soon as I restart Zeppelin it works. So I am guessing it's some memory >> issue. I have read that Zeppelin stores the data in memory so it is possible >> that it runs out of memory after some time. >> How do I debug this issue? How much is the default memory that Zeppelin >> takes at start? Also is there any way that I can run Zeppelin with specified >> memory so that I can start the process with more memory. Because it doesn't >> make sense to restart Zeppelin after every half hour >> >> Thanks, >> Rushikesh Raut >> >> >