I have questions about this as well.

If the catalogd restart, its catalog version will start from 0. However, the 
catalog version in Impala daemon still not change. This cause the Impala 
daemons not accepting catalog updates until the catalog version match. The 
cluster is in an abnormal state during that time.


Thus, there's still a single point of failure in Impala. Please correct me if 
I'm wrong.


Thanks,
Quanlong

At 2017-12-05 18:50:34, "Aleksei Maželis" <[email protected]> wrote:

Hi,


Referring to the chapter of impala_faq about single point of failure at 
https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_faq.html#faq_ha__faq_spof:


<quote>
There is not a single point of failure in Impala. All Impala daemons are fully 
able to handle incoming queries. If a machine fails however, all queries with 
fragments running on that machine will fail. Because queries are expected to 
return quickly, you can just rerun the query if there is a failure. See Impala 
Concepts and Architecture for details about the Impala architecture.
The longer answer: Impala must be able to connect to the Hive metastore. Impala 
aggressively caches metadata so the metastore host should have minimal load. 
Impala relies on the HDFS NameNode, and, in CDH4, you can configure HA for 
HDFS. Impala also has centralized services, known as the statestore and catalog 
services, that run on one host only. Impala continues to execute queries if the 
statestore host is down, but it will not get state updates. For example, if a 
host is added to the cluster while the statestore host is down, the existing 
instances of impalad running on the other hosts will not find out about this 
new host. Once the statestore process is restarted, all the information it 
serves is automatically reconstructed from all running Impala daemons.
</quote>


It appears that (despite the first sentence in the quote) the centralized 
services (statestore and catalog) do represent a single point of failure. Is it 
so, or am I missing something? If so, what is a workaround in case high 
availability is a requirement?


Regards,
Aleksei Maželis 

Reply via email to