why distribute by partition column while creating flat hive table?

赵天烁 Tue, 23 Aug 2016 03:35:38 -0700

I have a table with huge data increasment every day,bilion level.when I build a 
cube relate to that table,it stuck in creating flat hive table....for ever.
I check the mr process and found that the task sql in this step is ended with 
"DISTRIBUTE BY  ${partition date column}"
I try to manually execute the same sql,but remove the " distribute by ", then 
everything goes fine with in 10 min.
as far as I know this step of create a flat table is helpful when I have a star 
schema,but what I only have is that fact table. so why bother to create a table 
with the same structure even the data are the same?the only different is the 
table name....
so I think is it possible to just create a view with intermediate table name 
that kylin need when I havn't define any lookup table?this way will eliminate 
that long term task which seems like achieved nothing.


________________________________
赵天烁
Kevin Zhao
[email protected]<mailto:[email protected]>

珠海市魅族科技有限公司
MEIZU Technology Co., Ltd.
广东省珠海市科技创新海岸魅族科技楼
MEIZU Tech Bldg., Technology & Innovation Coast
Zhuhai, 519085, Guangdong, China
meizu.com

why distribute by partition column while creating flat hive table?

Reply via email to