I'm facing precisely same issue. . I've written a LivySessionHook that's just a wrapper over PyLivy Session <https://pylivy.readthedocs.io/en/latest/api/session.html>.
- I'm able to use this hook to send code-snippets to remote EMR via Python shell a few times, after which it starts throwing "caught exception 500 Server Error: Internal Server Error for url" (and continues to do so for next hour or so). - However when the same hook is triggered via Airflow operator, I get absolutely no success (always results in 500 error). . I'm using - Airflow 1.10.3 - Python 3.7.3 - EMR 5.24.1 - Livy 0.6.0 - Spark 2.4.2 *Shubham Gupta* Software Engineer zomato On Sat, Jul 6, 2019 at 6:56 PM Jeff Zhang <zjf...@gmail.com> wrote: > For the dead/killed session, could you check the yarn app logs ? > > Hugo Herlanin <hugo.herla...@lendico.com.br> 于2019年7月4日周四 下午9:41写道: > >> >> Hey, user mail is not working out! >> >> I am having some problems with livy setup. My use case is as follows: I >> use a DAG in airflow (1.10) to create a cluster in EMR (5.24.1, one master >> is m4.large and two nodes in m5a.xlarge), and when it is ready, this dag >> sends 5 to 7 simultaneous requests to Livy. I think I'm not messing with >> the Livy settings, I just set livy.spark.deploy-mode = client and >> livy.repl.enable-hive-context = true. >> >> The problem is that from these ~ 5 to 7 sessions, just one or two opens >> (goes to 'idle') and all others go straight to 'dead' or 'killed', in logs >> Yarn returns that the sessions were killed by 'livy' user. I tried to >> tinker with all possible timeout settings, but this is still happening. If >> I send more than ~10 simultaneous requests, livy responds with 500, and if >> I continue sending requests, the server freezes. This happens even if EMR >> has enough resources available. >> >> I know the cluster is able to handle that many questions because it works >> when I open them via a loop with an interval of 15 seconds or more, but it >> feels like livy should be able to deal with that many requests >> simultaneously. It seems strange that I should need to manage the queue in >> such a way for an API of a distributed system. >> >> Do you have any clue about where I might be doing wrong? Is there any >> known limitation that I'm unaware of? >> >> Best, >> >> Hugo Herlanin >> >> > > -- > Best Regards > > Jeff Zhang >