RE: invalidate metadata behaviour

Antoni Ivanov Wed, 29 Nov 2017 02:12:56 -0800

Thanks, 
I hope you don't mind a few more questions: 

> Node2 would also eventually consider these invalidated 
- How exactly does it work. E.g When I issue invalidate metadata. Does it say 
to the catalogd to invalidate metadata or is this information broadcasted 
through the statestored?

> stored in the catalog daemon centrally
- Oh so metadata is stored in the catalogD . I thought it was stored only in 
the statestore (and cached in each ImpalaD) and catalog facilitate fetching 
metadata from Hive Metastore and Block information from HDFS Namenode.
What was I wrong ? 

- Does INVALIDATE METADATA have any impact on the Hive Metastore . I don't 
believe so, right? E.g instead of running invalidate metadata (say after HDFS 
rebalance) I can restart Impala to clear caches (including statestore catalog 
topic) so that new data is loaded lazily again. 

-Antoni 

-----Original Message-----
From: Jeszy [mailto:[email protected]] 
Sent: Wednesday, November 29, 2017 9:56 AM
To: [email protected]
Cc: [email protected]
Subject: Re: invalidate metadata behaviour

Hey Antoni,

On 29 November 2017 at 07:42, Antoni Ivanov <[email protected]> wrote:
> Hi,
>
>
>
> I am wondering if I run INVALIDATE METADATA for the whole database on 
> node1
>
> Then I ran a query on node2 – would the query on node2 used the cached 
> metadata for the tables or it would know it’s invalidated?

Node2 would also eventually consider these invalidated.

> And second how safe it is to run it for a database with many (say 30) 
> tables over 10,000 partitions and 2000 more under 5000 partitions 
> (most of the under 100)
>
> And each Impala Deamon node has a little (below Cloudera recommended) 
> memory
> (32G)

These numbers influence the size of the catalog cache, which is stored in the 
catalog daemon centrally, and then replicated on each impalad, or on each 
coordinator in more recent versions. The metadata you mention (2000 tables * 
5000 partitions each, plus the big tables) is in the 10 million partitions 
range. Each of those will have at least one file with 3 blocks, probably more, 
so all this adds up to a sizeable metadata. The cached version will require a 
large amount of memory (on the catalog as well as the daemons/coordinators), 
which could easily lead to even small queries running out of memory with only 
32gb.

> Thanks,
>
> Antoni

HTH

RE: invalidate metadata behaviour

Reply via email to