Hello Community, I Have 400 Impala tables that partitioned by Year,month and day, and the retention for these tables is 6 months.
I would like to increase these tables partitions by adding the first 2 digits of the account, that meaning i will increase the partitions of each table by X100. For sure i will review these tables and make sure i do this for the large tables only. Is there is a limit for the number of partitions for each table, theorytically No but intersting to know the best practises, I know this will impact the metastore and catalog server. What i'm looking for is: 1- How i can check the size for the metadata that each impala node store and the catalog server as a whole? 2- Is there a linear relationship between number of tables/partitions and the memory needed for the metastore and catalog server? In other words, for example if i would like to do the mention change, what is the needed changes i should do in terms of memory for the metastore, Catalog, and Impala Daemon to minmize the impact. 3- Is there a relationship between the DDL statements that i will do (mainly DROP partitions) and the memory of the metastore and Catalog, and impala daemon memory? 4- Is there any metric in Cloudera Manager that i can use to get about the partitions and it's impact on the mentioned 3 Roles? 5- in a note a side, on 200 of the impala tables i have, i have to run ALTER Table xxxx recover partitions each 20 minutes, and DROP/CREATE tables twice a day. which actions i can take to reduce the running time of these operations. I'm intersting to know the actions that i can terms in terms of: A) Number of impala daemons in the cluster (adding more nodes). B) Number of the nodes that can act as coordinator ( I'm using VIP for the cordinator and i can drop and add nodes to this VIP). C) The impala daemon memory limit. D) The catalog role memory and the hive metastore memory. -- Take Care Fawze Abujaber