Hi Carlos, I meant profiling the Livy server with a JVM profiler.
Thanks, Bests. Marco Il giorno mar 9 lug 2019 alle ore 00:18 Kadu Vido < carlos.v...@lendico.com.br> ha scritto: > Hi, Marco, > > We're using livy 0.6.0, which I'm afraid ships by default with EMR 5.24, > so I cannot test a former version. > > It's a holiday in Brazil and our network is IP-gated (which is also why I > don't have logs), so we won't be able to access it tomorrow. I'll run that > profiling as soon as we're back on Wednesday. I'm assuming you want a *perf > record*, let me know otherwise. > > *Carlos Vido * > > Data Engineer @ Lendico Brasil <https://www.lendico.com.br> > > > On Mon, 8 Jul 2019 at 16:55, Marco Gaido <marcogaid...@gmail.com> wrote: > >> Hi all, >> >> Seems like a perf issue in livy server. I assume you are using a recent >> version of livy. >> >> If this is he case, may you profile livy server in order to understand >> which is the problem? >> >> Thanks, >> Marco >> >> On Mon, 8 Jul 2019, 21:03 Kadu Vido, <carlos.v...@lendico.com.br> wrote: >> >>> Hi, I'm working with Hugo in the same project. >>> >>> Shubham, we're using almost the same setup, only difference is Airflow >>> 1.10.1. I coded a workaround in our Livy hook, it has a parameter for >>> retries and whenever the session returns anything different from 'idle', we >>> try again before failing the task. It's not ideal but at least our >>> pipelines aren't stuck anymore. >>> >>> Zhang, I don't have yarn logs in hand but I can search for them if you'd >>> like to take a look. However, our latest clues point a different way: >>> >>> 1 - running *top* on the master node, we observed that LIvy rapidly >>> takes all the available CPUs after we send just a few requests (3 or 4 >>> already cause this to happen, if we send upwards of 10, it'll crash the >>> service). >>> >>> 2. We can get around this spacing them out a bit -- that is, if we use a >>> loop to open the sessions and wait ~10s betwen them, it'll give Livy enough >>> time to release the CPU resources before trying to open a new one. We've >>> had help from some AWS engineers that tried on several instance sizes and >>> found out that on larger instances they can try to open 10 or 12 >>> simultaneously, but: >>> >>> 3. Regardless of the size of the cluster, we cannot hold more than 9 >>> simultaneous sessions open. It doesn't matter if our cluster has enough >>> vCPUs or RAM to handle more, and the size of the master node doesn't matter >>> either: from the 10th session onwards, each one seems to either die or drop. >>> >>> *Carlos Vido * >>> >>> Data Engineer @ Lendico Brasil <https://www.lendico.com.br> >>> >>> >>> On Sat, 6 Jul 2019 at 13:30, Shubham Gupta <y2k.shubhamgu...@gmail.com> >>> wrote: >>> >>>> I'm facing precisely same issue. >>>> . >>>> I've written a LivySessionHook that's just a wrapper over PyLivy >>>> Session <https://pylivy.readthedocs.io/en/latest/api/session.html>. >>>> >>>> - I'm able to use this hook to send code-snippets to remote EMR via >>>> Python shell a few times, after which it starts throwing "caught >>>> exception 500 Server Error: Internal Server Error for url" (and >>>> continues to do so for next hour or so). >>>> - However when the same hook is triggered via Airflow operator, I >>>> get absolutely no success (always results in 500 error). >>>> >>>> . >>>> I'm using >>>> >>>> - Airflow 1.10.3 >>>> - Python 3.7.3 >>>> - EMR 5.24.1 >>>> - Livy 0.6.0 >>>> - Spark 2.4.2 >>>> >>>> >>>> *Shubham Gupta* >>>> Software Engineer >>>> zomato >>>> >>>> >>>> On Sat, Jul 6, 2019 at 6:56 PM Jeff Zhang <zjf...@gmail.com> wrote: >>>> >>>>> For the dead/killed session, could you check the yarn app logs ? >>>>> >>>>> Hugo Herlanin <hugo.herla...@lendico.com.br> 于2019年7月4日周四 下午9:41写道: >>>>> >>>>>> >>>>>> Hey, user mail is not working out! >>>>>> >>>>>> I am having some problems with livy setup. My use case is as follows: >>>>>> I use a DAG in airflow (1.10) to create a cluster in EMR (5.24.1, one >>>>>> master is m4.large and two nodes in m5a.xlarge), and when it is ready, >>>>>> this dag sends 5 to 7 simultaneous requests to Livy. I think I'm not >>>>>> messing with the Livy settings, I just set livy.spark.deploy-mode = >>>>>> client >>>>>> and >>>>>> livy.repl.enable-hive-context = true. >>>>>> >>>>>> The problem is that from these ~ 5 to 7 sessions, just one or two >>>>>> opens (goes to 'idle') and all others go straight to 'dead' or 'killed', >>>>>> in >>>>>> logs Yarn returns that the sessions were killed by 'livy' user. I tried >>>>>> to >>>>>> tinker with all possible timeout settings, but this is still happening. >>>>>> If >>>>>> I send more than ~10 simultaneous requests, livy responds with 500, and >>>>>> if >>>>>> I continue sending requests, the server freezes. This happens even if EMR >>>>>> has enough resources available. >>>>>> >>>>>> I know the cluster is able to handle that many questions because it >>>>>> works when I open them via a loop with an interval of 15 seconds or more, >>>>>> but it feels like livy should be able to deal with that many requests >>>>>> simultaneously. It seems strange that I should need to manage the queue >>>>>> in >>>>>> such a way for an API of a distributed system. >>>>>> >>>>>> Do you have any clue about where I might be doing wrong? Is there any >>>>>> known limitation that I'm unaware of? >>>>>> >>>>>> Best, >>>>>> >>>>>> Hugo Herlanin >>>>>> >>>>>> >>>>> >>>>> -- >>>>> Best Regards >>>>> >>>>> Jeff Zhang >>>>> >>>>