Hi Fawze,

As you need a quick consult, let me answer your questions first. The official 
Cloudera folks will correct me if something wrong.


For your 2nd job, you don't need to run INVALIDATE METADATA statements since 
DDL and DML in impala will refresh the metadata automatically.


For your 1st job, again, you don't need to run the INVALIDATE METADATA 
statement after your ALTER TABLE statement in Impala. After the INSERT 
OVERWRITE statement in hive, you just need to run a REFRESH statement instead 
of the INVALIDATE METADATA statement.


BTW, what do you mean by "set DDL=1"? If you're mentioning the SYNC_DDL option, 
it's recommended to set it to true if you're using load balancer above of the 
impala daemons.


If you're interested in the difference between INVALIDATE METADATA and REFRESH, 
there's a discussion thread: 
https://lists.apache.org/thread.html/44b4f223b49e8d7cb07865d5dfba50378d6e41fc173369b56c36b652@%3Cdev.impala.apache.org%3E


In a nutshell, there're only two cases you should use the INVALIDATE METADATA 
statement:
- you create a table in hive and want to use it in Impala
- the block locations of the underlying hive table files changed by HDFS 
balancer
You can ignore the second case since it's just for performance, not for 
correctness. In other cases like adding/dropping files/partitions in hive, you 
just need the REFRESH statement.


Regards,
Quanlong



At 2018-07-04 18:11:41, "Fawze Abujaber" <fawz...@gmail.com> wrote:

Hi Community,


I need a quick consult for using the SET DDL while i'm using impala VIP for the 
load balancing.


Below are 2 ETL jobs, the first one is using hive and the second is impala and 
both have DDL operations.


I will start with the 2nd as it should stright forward since the data inserted 
into the table_name2 by impala, so i think i don't need the invalidate 
statement but still need the DDL statement, is it right?
should i add the set DDL after ALTER table statement? do you really need it as 
the insert isn't based on the ALTER table statement since it's dropping old 
partitions.


For the 1st one where i'm using hive, I know i have to use invalidate metadata, 
i was thinking to use the set DDL=1 after each invalidate metadata statement, 
should i use after both statements, or it's enough to be used only in the last 
one?
I read the documntation about this but wasn't sure if get this point right, so 
this is why i'm asking for consultation here, note a side i still think that i 
should use  refresh statement after the insert while using invalidate after the 
ALTER statement.


Thanks for your help in advance.






RetentionDate=$(date --date="183 days ago" +%Y%m%d)
impala-shell -i "$LPIMPALA" -q "ALTER TABLE table_name DROP IF EXISTS PARTITION 
(concat(year,month,day)<'$RetentionDate');"
impala-shell -i "$LPIMPALA" -q "INVALIDATE METADATA table_name;"


hive -e"
set parquet.compression=SNAPPY;
set hive.merge.mapredfiles=true;
set hive.mapred.mode=nonstrict;
set hive.exec.dynamic.partition.mode=nonstrict;




INSERT OVERWRITE TABLE table_name PARTITION (year, month, day)


....
....
....
....


;"
impala-shell -i "$LPIMPALA" -q "INVALIDATE METADATA table_name;"
impala-shell -i "$LPIMPALA" -q "compute INCREMENTAL STATS table_name;"








=======================================================


2nd one.


impala-shell -i "$LPIMPALA" -q "ALTER TABLE table_name2 DROP IF EXISTS 
PARTITION (dt<'$RetentionDate');"
impala-shell -i "$LPIMPALA" -q "INVALIDATE METADATA table_name2;"


impala-shell -B -i "$LPIMPALA" -q"




INSERT OVERWRITE TABLE table_name2 PARTITION (dt)
....
....
....
....
"


impala-shell -i "$LPIMPALA" -q "INVALIDATE METADATA table_name2;"


--

Take Care
Fawze Abujaber

Reply via email to