Thank you Saisai for your response.
I did have a chance to investigate further and I should give a little background on why I feel network cost is not the issue: I added to our application Kylo (http://kylo.io) as an optional spark server that is used as a replacement for our existing spark server. I noticed the performance issues when I use Livy instead of our pre-existing server. Kylo's spark-shell would consistently execute queries quickly (e.g. <100ms) and the same would take longer (>1500ms) with a 500ms polling (0ms initial query) interval. This led me to write code that would query Livy quickly in Python (50ms) and wrap the scala code execute in Livy with some timer method that logs to Livy logs the time taken. I would notice that my faster queries are executing in Livy in <50ms, yet Livy does return the results for at least 350ms (7 queries for results made, 6 returned to client as pending). I feel fairly confident that Livy has some overhead other than network. I've since discovered these settings in livy-client.conf.template # Initial interval before polling for Job results # livy.client.http.job.initial-poll-interval = 100ms # Maximum interval between successive polls # livy.client.http.job.max-poll-interval = 5s and I looked at Livy source and noticed it seems it has a geomertic interval for polling https://github.com/cloudera/livy/blob/5de6cf21c61db4093646a23c65c37c8b52202dc8/client-http/src/main/java/com/cloudera/livy/client/http/JobHandleImpl.java#L266 <https://github.com/cloudera/livy/blob/5de6cf21c61db4093646a23c65c37c8b52202dc8/client-http/src/main/java/com/cloudera/livy/client/http/JobHandleImpl.java#L266> I'm thinking that could be the source of my issue but I need a chance to dive deeper. Do you think tuning those parameters could improve the situation? Thanks, Tim ________________________________ From: Saisai Shao <sai.sai.s...@gmail.com> Sent: Wednesday, August 1, 2018 7:23:55 PM To: user@livy.incubator.apache.org Subject: Re: How to tune Livy for fast queries [External Email] ________________________________ Probably some network cost should also be counted in. There's no such configuration for tuning. If you find some performance issue, you can create a JIRA or even a patch to fix Livy. Harsch, Tim <tim.har...@teradata.com<mailto:tim.har...@teradata.com>> 于2018年8月1日周三 上午8:04写道: I have a Livy application that I'm trying to tune as I'm seeing some performance issue when the queries are fast queries. I've wrapped my queries with a timer that logs the time taken. The spark code executed typically takes 50ms to 150ms. I'm querying Livy every 500ms looking for my response, and generally it doesn't succeed until the third check. It seems Livy itself is spending up to an extra 1000ms. Where is Livy spending this time? Are there any tuning parameters I can adjust? Also, I am having difficulty changing any of the settings in livy-client.conf. I placed the file in /etc/hadoop/conf and livy/conf folder but my settings seem to get ignored. Thanks Tim