Hi Arun, 1. We don't have HA for statestored and catalogd yet. So they can only be deployed on one master node. However, I think it's not strictly a single point of failure since Impala can partially work: When catalogd crashes, coordinator can still serve queries if the metadata of required tables are loaded (cached). Only DDL/DMLs and queries on unloaded tables will be impacted. The scenario of statestored crashes is more complex so I might be wrong here. I think coordinators can still schedule the work base on the cached cluster membership info. New nodes won't be discovered but crashed nodes can be blacklisted by coordinators. Coordinators can still execute DDL/DMLs and fetch catalog on-demand via the direct connection with catalogd (required LocalCatalog mode enabled).
Anyway, adding HA for catalogd and statestored is a good addition. Welcome for contributions! 2. The hosts of catalogd and statestored are specified by startup flags, i.e. state_store_host and catalog_service_host. It's unrelated to Hive Metastore. There is on-going work to ease the multi-node deployment. You can refer to this for the configuration: https://gerrit.cloudera.org/c/18939/6/package/conf/impalad_flags The whole patch is https://gerrit.cloudera.org/c/18939/ Best Regards, Quanlong On Wed, Apr 26, 2023 at 8:38 PM Arun J <mail....@gmail.com> wrote: > > Team, > > Upon building, have binaries of statestored, catalogd & impaled built for a > single node and is working fine with apache hive,hdfs installed separately. > > I have a couple of questions about the Multi-Node cluster setup for Impala. > > 1. How to install/configure multi-master Impala setup? Planning to run > statestored, catalogd in the master node(s) and impalad in the slave nodes - > how multi-master setup should be? Will this be a single-point failure if > that is not possible? > > 2. Where is the configuration to provide cluster URL for Impala? How do I > tell the impala daemon that this is the node running statestored/catalogd & > here are other daemons? Is this routed through hive metastore only or am I > missing something? > > Thanks in advance, > JAK