liyunzhang_intel created HIVE-17287: ---------------------------------------
Summary: HoS can not deal with skewed data group by Key: HIVE-17287 URL: https://issues.apache.org/jira/browse/HIVE-17287 Project: Hive Issue Type: Bug Reporter: liyunzhang_intel In [tpcds/query67.sql|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query67.sql], fact table {{store_sales}} joins with small tables {{date_dim}}, {{item}},{{store}}. After join, groupby the intermediate data. Here the data of {{store_sales}} on 3TB tpcds is skewed: there are 1824 partitions. The biggest partition is 25.7G and others are 715M. {code} hadoop fs -du -h /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales .... 715.0 M /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452639 713.9 M /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452640 714.1 M /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452641 712.9 M /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452642 25.7 G /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__ {code} The skewed table {{store_sales}} caused the failed job. Is there any way to solve the groupby problem of skewed table? I tried to enable {{hive.groupby.skewindata}} to first divide the data more evenly then start do group by. But the job still hangs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)