Hello Community,

I Have 400 Impala tables that partitioned by Year,month and day, and the
retention for these tables is 6 months.

I would like to increase these tables partitions by adding  the first 2
digits of the account, that meaning i will increase the partitions of each
table by X100.

For sure i will review these tables and make sure i do this for the large
tables only.

Is there is a limit for the number of partitions for each table,
theorytically No but intersting to know the best practises, I know this
will impact the metastore and catalog server.

What i'm looking for is:

1- How i can check the size for the metadata that each impala node store
and the catalog server as a whole?

2- Is there a linear relationship between number of tables/partitions and
the memory needed for the metastore and catalog server?
In other words, for example if i would like to do the mention change, what
is the needed changes i should do in terms of memory for the metastore,
Catalog, and Impala Daemon to minmize the impact.

3- Is there a relationship between the DDL statements that i will do
(mainly DROP partitions) and the memory of the metastore and Catalog, and
impala daemon memory?


4- Is there any metric in Cloudera Manager that i can use to get about the
partitions and it's impact on the mentioned 3 Roles?

5- in a note a side, on 200 of the impala tables i have, i have to run
ALTER Table xxxx recover partitions each 20 minutes, and DROP/CREATE tables
twice a day.
which actions i can take to reduce the running time of these operations.

 I'm intersting to know the actions that i can terms in terms of:

 A) Number of impala daemons in the cluster (adding more nodes).
 B) Number of the nodes that can act as coordinator ( I'm using VIP for the
cordinator and i can drop and add nodes to this VIP).
 C) The impala daemon memory limit.
 D) The catalog role memory and the hive metastore memory.



-- 
Take Care
Fawze Abujaber

Reply via email to