> not get the progress messages back until the query finishes which
>somewhat defeats the purpose of interactive usage.

That happens entirely on the client side btw.

So to avoid a hard sleep() + check loop causing pointless HTTP traffic,
HiveServer2 now does a long poll on the server side.

hive.server2.long.polling.timeout", "5000ms"


This means that it is edge-triggered to return whenever the query finishes
instead of adding extra time when the results are ready but beeline
doesn't know about.


However, the get_logs() synchronizes on the same HiveStatement and is
mutexed out by the long poll for getting results.

You can escape this on a low-concurrency cluster by changing the
long.polling.timeout to 0.5s instead of 5s & restarting HS2.

However as the total # of concurrent queries goes up, the current setting
does very well due to the reduction in total # of "Nope, come back" http
noise (largest parallel workload I've seen is about ~3000 queries on 10
HS2 nodes load-balanced).

Cheers,
Gopal


Reply via email to