> For your 2nd job, you don't need to run INVALIDATE METADATA statements since > DDL and DML in impala will refresh the metadata automatically. > -- For the invalidate i almost sure i don't need, but what about the SET > SYNC_DDL=1, should i use it in the 2nd case, and should i add it after the > ALTER statement?
You mentioned that you have load balancing for impalads. So you should add "SET SYNC_DDL=1" before the ALTER statement and in the same session. For example, in your 2nd job: impala-shell -i "$LPIMPALA" -q "SET SYNC_DDL=1; ALTER TABLE table_name2 DROP IF EXISTS PARTITION (dt<'$RetentionDate');" > For your 1st job, again, you don't need to run the INVALIDATE METADATA > statement after your ALTER TABLE statement in Impala. After the INSERT > OVERWRITE statement in hive, you just need to run a REFRESH statement instead > of the INVALIDATE METADATA statement. > -- So you mean i don't need after dropping some partitions to run refresh > statement even? after the inserti need the refresh. Yes. The REFRESH statement is for modifications outside Impala (e.g. in hive). If you run ALTER TABLE inside Impala, you don't need the REFRESH statement. > What about SET SYNC_DDL=1? where i should set it, is it something that i can > do this globally at impala and not bu query? If you're accessing Impala through a load-balancing proxy, you should SET SYNC_DDL=1 before any DDL or DML statements. There's a more detail doc explaining this: https://www.cloudera.com/documentation/enterprise/5-14-x/topics/impala_sync_ddl.html You can set it globally by adding --default_query_options=SYNC_DDL=1 in the startup options. If you're using Cloudera Manager, you can add "SYNC_DDL=1" in "Impala Daemon Query Options Advanced Configuration Snippet (Safety Valve)". > Do you recommend me to use REFRSH or ALTER table x Recover partitions? I'm not quite sure about this. But I'm sure that if you're just adding new partitions into a table, you can use ALTER TABLE x Recover partitions. Otherwise, I believe you should not use it. At 2018-07-04 21:39:29,"Fawze Abujaber" <[email protected]> wrote: Hi Quanlong, Thanks for your quick response. This is an ETLs that i inherent from other engineers, so trying to make sure the changes i'm planning to do will not impact the current work. Please see my responses inline. On Wed, Jul 4, 2018 at 4:23 PM, Quanlong Huang <[email protected]> wrote: Hi Fawze, As you need a quick consult, let me answer your questions first. The official Cloudera folks will correct me if something wrong. For your 2nd job, you don't need to run INVALIDATE METADATA statements since DDL and DML in impala will refresh the metadata automatically. -- For the invalidate i almost sure i don't need, but what about the SET SYNC_DDL=1, should i use it in the 2nd case, and should i add it after the ALTER statement? For your 1st job, again, you don't need to run the INVALIDATE METADATA statement after your ALTER TABLE statement in Impala. After the INSERT OVERWRITE statement in hive, you just need to run a REFRESH statement instead of the INVALIDATE METADATA statement. -- So you mean i don't need after dropping some partitions to run refresh statement even? after the insert i need the refresh. What about SET SYNC_DDL=1? where i should set it, is it something that i can do this globally at impala and not bu query? Do you recommend me to use REFRSH or ALTER table x Recover partitions? BTW, what do you mean by "set DDL=1"? If you're mentioning the SYNC_DDL option, it's recommended to set it to true if you're using load balancer above of the impala daemons. If you're interested in the difference between INVALIDATE METADATA and REFRESH, there's a discussion thread: https://lists.apache.org/thread.html/44b4f223b49e8d7cb07865d5dfba50378d6e41fc173369b56c36b652@%3Cdev.impala.apache.org%3E In a nutshell, there're only two cases you should use the INVALIDATE METADATA statement: - you create a table in hive and want to use it in Impala - the block locations of the underlying hive table files changed by HDFS balancer You can ignore the second case since it's just for performance, not for correctness. In other cases like adding/dropping files/partitions in hive, you just need the REFRESH statement. Regards, Quanlong At 2018-07-04 18:11:41, "Fawze Abujaber" <[email protected]> wrote: Hi Community, I need a quick consult for using the SET DDL while i'm using impala VIP for the load balancing. Below are 2 ETL jobs, the first one is using hive and the second is impala and both have DDL operations. I will start with the 2nd as it should stright forward since the data inserted into the table_name2 by impala, so i think i don't need the invalidate statement but still need the DDL statement, is it right? should i add the set DDL after ALTER table statement? do you really need it as the insert isn't based on the ALTER table statement since it's dropping old partitions. For the 1st one where i'm using hive, I know i have to use invalidate metadata, i was thinking to use the set DDL=1 after each invalidate metadata statement, should i use after both statements, or it's enough to be used only in the last one? I read the documntation about this but wasn't sure if get this point right, so this is why i'm asking for consultation here, note a side i still think that i should use refresh statement after the insert while using invalidate after the ALTER statement. Thanks for your help in advance. RetentionDate=$(date --date="183 days ago" +%Y%m%d) impala-shell -i "$LPIMPALA" -q "ALTER TABLE table_name DROP IF EXISTS PARTITION (concat(year,month,day)<'$RetentionDate');" impala-shell -i "$LPIMPALA" -q "INVALIDATE METADATA table_name;" hive -e" set parquet.compression=SNAPPY; set hive.merge.mapredfiles=true; set hive.mapred.mode=nonstrict; set hive.exec.dynamic.partition.mode=nonstrict; INSERT OVERWRITE TABLE table_name PARTITION (year, month, day) .... .... .... .... ;" impala-shell -i "$LPIMPALA" -q "INVALIDATE METADATA table_name;" impala-shell -i "$LPIMPALA" -q "compute INCREMENTAL STATS table_name;" ======================================================= 2nd one. impala-shell -i "$LPIMPALA" -q "ALTER TABLE table_name2 DROP IF EXISTS PARTITION (dt<'$RetentionDate');" impala-shell -i "$LPIMPALA" -q "INVALIDATE METADATA table_name2;" impala-shell -B -i "$LPIMPALA" -q" INSERT OVERWRITE TABLE table_name2 PARTITION (dt) .... .... .... .... " impala-shell -i "$LPIMPALA" -q "INVALIDATE METADATA table_name2;" -- Take Care Fawze Abujaber -- Take Care Fawze Abujaber
