Hi Carlos,

I meant profiling the Livy server with a JVM profiler.


Il giorno mar 9 lug 2019 alle ore 00:18 Kadu Vido <
carlos.v...@lendico.com.br> ha scritto:

> Hi, Marco,
> We're using livy 0.6.0, which I'm afraid ships by default with EMR 5.24,
> so I cannot test a former version.
> It's a holiday in Brazil and our network is IP-gated (which is also why I
> don't have logs), so we won't be able to access it tomorrow. I'll run that
> profiling as soon as we're back on Wednesday. I'm assuming you want a *perf
> record*, let me know otherwise.
> *Carlos Vido *
> Data Engineer @ Lendico Brasil <https://www.lendico.com.br>
> On Mon, 8 Jul 2019 at 16:55, Marco Gaido <marcogaid...@gmail.com> wrote:
>> Hi all,
>> Seems like a perf issue in livy server. I assume you are using a recent
>> version of livy.
>> If this is he case, may you profile livy server in order to understand
>> which is the problem?
>> Thanks,
>> Marco
>> On Mon, 8 Jul 2019, 21:03 Kadu Vido, <carlos.v...@lendico.com.br> wrote:
>>> Hi, I'm working with Hugo in the same project.
>>> Shubham, we're using almost the same setup, only difference is Airflow
>>> 1.10.1. I coded a workaround in our Livy hook, it has a parameter for
>>> retries and whenever the session returns anything different from 'idle', we
>>> try again before failing the task. It's not ideal but at least our
>>> pipelines aren't stuck anymore.
>>> Zhang, I don't have yarn logs in hand but I can search for them if you'd
>>> like to take a look. However, our latest clues point a different way:
>>> 1 - running *top* on the master node, we observed that LIvy rapidly
>>> takes all the available CPUs after we send just a few requests (3 or 4
>>> already cause this to happen, if we send upwards of 10, it'll crash the
>>> service).
>>> 2. We can get around this spacing them out a bit -- that is, if we use a
>>> loop to open the sessions and wait ~10s betwen them, it'll give Livy enough
>>> time to release the CPU resources before trying to open a new one. We've
>>> had help from some AWS engineers that tried on several instance sizes and
>>> found out that on larger instances they can try to open 10 or 12
>>> simultaneously, but:
>>> 3. Regardless of the size of the cluster, we cannot hold more than 9
>>> simultaneous sessions open. It doesn't matter if our cluster has enough
>>> vCPUs or RAM to handle more, and the size of the master node doesn't matter
>>> either: from the 10th session onwards, each one seems to either die or drop.
>>> *Carlos Vido *
>>> Data Engineer @ Lendico Brasil <https://www.lendico.com.br>
>>> On Sat, 6 Jul 2019 at 13:30, Shubham Gupta <y2k.shubhamgu...@gmail.com>
>>> wrote:
>>>> I'm facing precisely same issue.
>>>> .
>>>> I've written a LivySessionHook that's just a wrapper over PyLivy
>>>> Session <https://pylivy.readthedocs.io/en/latest/api/session.html>.
>>>>    - I'm able to use this hook to send code-snippets to remote EMR via
>>>>    Python shell a few times, after which it starts throwing "caught
>>>>    exception 500 Server Error: Internal Server Error for url" (and
>>>>    continues to do so for next hour or so).
>>>>    - However when the same hook is triggered via Airflow operator, I
>>>>    get absolutely no success (always results in 500 error).
>>>> .
>>>> I'm using
>>>>    - Airflow 1.10.3
>>>>    - Python 3.7.3
>>>>    - EMR 5.24.1
>>>>    - Livy 0.6.0
>>>>    - Spark 2.4.2
>>>> *Shubham Gupta*
>>>> Software Engineer
>>>>  zomato
>>>> On Sat, Jul 6, 2019 at 6:56 PM Jeff Zhang <zjf...@gmail.com> wrote:
>>>>> For the dead/killed session, could you check the yarn app logs ?
>>>>> Hugo Herlanin <hugo.herla...@lendico.com.br> 于2019年7月4日周四 下午9:41写道:
>>>>>> Hey, user mail is not working out!
>>>>>> I am having some problems with livy setup. My use case is as follows:
>>>>>> I use a DAG in airflow (1.10) to create a cluster in EMR (5.24.1, one
>>>>>> master is m4.large and two nodes in m5a.xlarge), and when it is ready,
>>>>>> this dag sends 5 to 7 simultaneous requests to Livy. I think I'm not
>>>>>> messing with the Livy settings, I  just set livy.spark.deploy-mode = 
>>>>>> client
>>>>>> and
>>>>>> livy.repl.enable-hive-context = true.
>>>>>> The problem is that from these ~ 5 to 7 sessions, just one or two
>>>>>> opens (goes to 'idle') and all others go straight to 'dead' or 'killed', 
>>>>>> in
>>>>>> logs  Yarn returns that the sessions were killed by 'livy' user. I tried 
>>>>>> to
>>>>>> tinker with all possible timeout settings, but this is still happening. 
>>>>>> If
>>>>>> I send more than ~10 simultaneous requests, livy responds with 500, and 
>>>>>> if
>>>>>> I continue sending requests, the server freezes. This happens even if EMR
>>>>>> has enough resources available.
>>>>>> I know the cluster is able to handle that many questions because it
>>>>>> works when I open them via a loop with an interval of 15 seconds or more,
>>>>>> but it feels like livy should be able to deal with that many requests
>>>>>> simultaneously. It seems strange that I should need to manage the queue 
>>>>>> in
>>>>>> such a way for an API of a distributed system.
>>>>>> Do you have any clue about where I might be doing wrong? Is there any
>>>>>> known limitation that I'm unaware of?
>>>>>> Best,
>>>>>> Hugo Herlanin
>>>>> --
>>>>> Best Regards
>>>>> Jeff Zhang

Reply via email to