Steven Cardella created SPARK-25774: ---------------------------------------
Summary: Eliminate query anomalies with empty partitions - TRUNCATE, SELECT DISTINCT, etc. Key: SPARK-25774 URL: https://issues.apache.org/jira/browse/SPARK-25774 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.2.0 Environment: Right now, I'm using Cloudera with Spark 2.2.0, but I understand it's a widespread thing. Reporter: Steven Cardella If you run a spark SQL TRUNCATE TABLE command on a managed table in Hive, it deletes the files in HDFS but leaves the partitions and partition folder structure. If you then SELECT DISTINCT on the partition columns, it returns all the empty partition values. So, you can have a SELECT DISTINCT return rows but SELECT * on the same table returns 0 rows. Coming from SQL Server and the like, SELECT DISTINCT always reflects the ROWS, and Impala works like that as well. I'd like SELECT DISTINCT to reflect rows, not partitions, TRUNCATE TABLE to have the option to drop partitions, and MSCK REPAIR TABLE to have the option to drop empty partitions. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org