Thank you Saisai for your response.

    I did have a chance to investigate further and I should give a little 
background on why I feel network cost is not the issue:
    I added to our application Kylo (http://kylo.io) as an optional spark 
server that is used as a replacement for our existing spark server.  I noticed 
the performance issues when I use Livy instead of our pre-existing server.  
Kylo's spark-shell would consistently execute queries quickly (e.g. <100ms) and 
the same would take longer (>1500ms) with a 500ms polling (0ms initial query) 
interval.  This led me to write code that would query Livy quickly in Python 
(50ms) and wrap the scala code execute in Livy with some timer method that logs 
to Livy logs the time taken.   I would notice that my faster queries are 
executing in Livy in <50ms, yet Livy does return the results for at least 350ms 
(7 queries for results made, 6 returned to client as pending).  I feel fairly 
confident that Livy has some overhead other than network.


   I've since discovered these settings in livy-client.conf.template

# Initial interval before polling for Job results
# livy.client.http.job.initial-poll-interval = 100ms
# Maximum interval between successive polls
# livy.client.http.job.max-poll-interval = 5s

and I looked at Livy source and noticed it seems it has a geomertic interval 
for polling
https://github.com/cloudera/livy/blob/5de6cf21c61db4093646a23c65c37c8b52202dc8/client-http/src/main/java/com/cloudera/livy/client/http/JobHandleImpl.java#L266
<https://github.com/cloudera/livy/blob/5de6cf21c61db4093646a23c65c37c8b52202dc8/client-http/src/main/java/com/cloudera/livy/client/http/JobHandleImpl.java#L266>
I'm thinking that could be the source of my issue but I need a chance to dive 
deeper.  Do you think tuning those parameters could improve the situation?


Thanks,

Tim



________________________________
From: Saisai Shao <sai.sai.s...@gmail.com>
Sent: Wednesday, August 1, 2018 7:23:55 PM
To: user@livy.incubator.apache.org
Subject: Re: How to tune Livy for fast queries

[External Email]
________________________________
Probably some network cost should also be counted in. There's no such 
configuration for tuning. If you find some performance issue, you can create a 
JIRA or even a patch to fix Livy.

Harsch, Tim <tim.har...@teradata.com<mailto:tim.har...@teradata.com>> 
于2018年8月1日周三 上午8:04写道:

I have a Livy application that I'm trying to tune as I'm seeing some 
performance issue when the queries are fast queries.  I've wrapped my queries 
with a timer that logs the time taken.  The spark code executed typically takes 
50ms to 150ms.  I'm querying Livy every 500ms looking for my response, and 
generally it doesn't succeed until the third check.   It seems Livy itself is 
spending up to an extra 1000ms.  Where is Livy spending this time?  Are there 
any tuning parameters I can adjust?


Also, I am having difficulty changing any of the settings in livy-client.conf.  
I placed the file in /etc/hadoop/conf and livy/conf folder but my settings seem 
to get ignored.


Thanks

Tim

Reply via email to