Hi Arun,

1. We don't have HA for statestored and catalogd yet. So they can only
be deployed on one master node. However, I think it's not strictly a
single point of failure since Impala can partially work:
When catalogd crashes, coordinator can still serve queries if the
metadata of required tables are loaded (cached). Only DDL/DMLs and
queries on unloaded tables will be impacted.
The scenario of statestored crashes is more complex so I might be
wrong here. I think coordinators can still schedule the work base on
the cached cluster membership info. New nodes won't be discovered but
crashed nodes can be blacklisted by coordinators. Coordinators can
still execute DDL/DMLs and fetch catalog on-demand via the direct
connection with catalogd (required LocalCatalog mode enabled).

Anyway, adding HA for catalogd and statestored is a good addition.
Welcome for contributions!

2. The hosts of catalogd and statestored are specified by startup
flags, i.e. state_store_host and catalog_service_host. It's unrelated
to Hive Metastore.
There is on-going work to ease the multi-node deployment. You can
refer to this for the configuration:
https://gerrit.cloudera.org/c/18939/6/package/conf/impalad_flags
The whole patch is https://gerrit.cloudera.org/c/18939/

Best Regards,
Quanlong

On Wed, Apr 26, 2023 at 8:38 PM Arun J <mail....@gmail.com> wrote:
>
> Team,
>
> Upon building, have binaries of statestored, catalogd & impaled built for a 
> single node and is working fine with apache hive,hdfs installed separately.
>
> I have a couple of questions about the Multi-Node cluster setup for Impala.
>
> 1. How to install/configure multi-master Impala setup?  Planning to run 
> statestored, catalogd in the master node(s) and impalad in the slave nodes - 
> how multi-master setup should be?  Will this be a single-point failure if 
> that is not possible?
>
> 2. Where is the configuration to provide cluster URL for Impala?  How do I 
> tell the impala daemon that this is the node running statestored/catalogd & 
> here are other daemons?  Is this routed through hive metastore only or am I 
> missing something?
>
> Thanks in advance,
> JAK

Reply via email to