Ashish Sharma created SPARK-46477: ------------------------------------- Summary: Hive table (bucketdb.dept_part_buk) is bucketed but partition (location=nk) is not bucketed Key: SPARK-46477 URL: https://issues.apache.org/jira/browse/SPARK-46477 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.3.2 Reporter: Ashish Sharma
Presto fail to read partition of hive table update by spark sql with following error Hive table (bucketdb.dept_part_buk) is bucketed but partition (location=nk) is not bucketed Spark SQL which cause read failure in presto ALTER TABLE bucketdb.dept_part_buk PARTITION (location='nk') set LOCATION 'file:///tmp/location=nk1'; Root Cause ALTER TABLE table_name [PARTITION partition_spec] SET LOCATION "loc"; is dropping bucket columns information in HMS. Repo Script CREATE TABLE bucketdb.dept_part_buk ( deptno INT, dname STRING, location STRING) PARTITIONED BY (location) CLUSTERED BY (deptno) INTO 2 BUCKETS STORED AS textfile location 'file:///tmp'; ALTER TABLE bucketdb.dept_part_buk ADD IF NOT EXISTS PARTITION (location='nk') LOCATION 'file:///tmp/location=nk'; ALTER TABLE bucketdb.dept_part_buk PARTITION (location='nk') set LOCATION 'file:///tmp/location=nk1'; Investigation Before running query HMS state mysql> select * from SDS where SD_ID = 137; +-------+-------+------------------------------------------+------------------------------+------------------------------------------------------+--------------------------------------------+-------------+------------------------------------------------------------+----------+ | SD_ID | CD_ID | INPUT_FORMAT | IS_COMPRESSED | IS_STOREDASSUBDIRECTORIES | LOCATION | NUM_BUCKETS | OUTPUT_FORMAT | SERDE_ID | +-------+-------+------------------------------------------+------------------------------+------------------------------------------------------+--------------------------------------------+-------------+------------------------------------------------------------+----------+ | 137 | 106 | org.apache.hadoop.mapred.TextInputFormat | 0x00 | 0x00 | file:/tmp/location=nk | 2 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | 137 | +-------+-------+------------------------------------------+------------------------------+------------------------------------------------------+--------------------------------------------+-------------+------------------------------------------------------------+----------+ mysql> select * from BUCKETING_COLS where SD_ID = 137; +-------+-----------------+-------------+ | SD_ID | BUCKET_COL_NAME | INTEGER_IDX | +-------+-----------------+-------------+ | 137 | deptno | 0 | +-------+-----------------+-------------+ Spark Sql Query ALTER TABLE bucketdb.dept_part_buk PARTITION (location='nk') set LOCATION 'file:///Users/someuser/sparkdata/location=nk1'; HMS state after Running the Spark SQL query HMS mysql> select * from SDS where SD_ID = 137; +-------+-------+------------------------------------------+------------------------------+------------------------------------------------------+---------------------------------------------+-------------+------------------------------------------------------------+----------+ | SD_ID | CD_ID | INPUT_FORMAT | IS_COMPRESSED | IS_STOREDASSUBDIRECTORIES | LOCATION | NUM_BUCKETS | OUTPUT_FORMAT | SERDE_ID | +-------+-------+------------------------------------------+------------------------------+------------------------------------------------------+---------------------------------------------+-------------+------------------------------------------------------------+----------+ | 137 | 106 | org.apache.hadoop.mapred.TextInputFormat | 0x00 | 0x00 | file:/tmp/location=nk1 | 0 | org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | 137 | +-------+-------+------------------------------------------+------------------------------+------------------------------------------------------+---------------------------------------------+-------------+------------------------------------------------------------+----------+ mysql> select * from BUCKETING_COLS where SD_ID = 141; Empty set (0.00 sec) -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org