Hi Satish,
You did not say if you are using HAProxy for the RESTful API or the native
Drill RPC (as used by the Drill client, JDBC and ODBC.)
To understand the use of proxies and load balancers, it is helpful to remember
that Drill is a stateful SQL engine. Drill encourages the use of many stateful
commands such as USE, CTTAS, and ALTER SESSION.
Session state is lost when connecting to a new Drillbit, or reconnecting to the
same Drillbit. Thus, a query that runs fine before the reconnect can fail
afterwards.
This issue is not unique to Drill; it is a common constraint of all old-school
SQL engines.
If state were not an issue, then the Drill client itself could handle HA. The
client is given a list of ZK nodes. The client, on encountering a disconnect,
could ask ZK for a new node and reconnect. Since ZK is HA, the client can also
recover from a ZK node failure by trying another.
We discussed this client-based HA approach multiple times, but each time, the
SQL state has been a show-stopper.
In short, the issue is not whether to use HAProxy to solve the problem; Drill
can do it internally in the client. The issue is how to handle session state.
A possible solution would be to store user session state in ZK so that we could
re-establish the same logical session after a physical reconnection. In
particular a unique session ID could be used to key connections to session
state in ZK.
Making this change would be a good contributor project: it involves detailed
knowledge of how the Drill session and ZK state work, but is pretty isolated to
just those specific areas.
Thanks,
- Paul
On Monday, August 20, 2018, 8:26:09 AM PDT, drill
<[email protected]> wrote:
Hi Team,
Good Evening . I am Satish working as big data developer. I need your help
regarding Drill high availability usinh Ha proxy load balancer.
Is Apache drill supports High availability if yes please let me know the
process.
-Thanks,
Satish
Sent from Mail for Windows 10