Select distinct on partitioned column requires reading all the files?

2015-02-23 Thread Stephen Boesch
When querying a hive table according to a partitioning column, it would be logical that a simple select count(distinct partitioned_column_name) from my_partitioned_table would complete almost instantaneously. But we are seeing that both hive and impala are unable to execute this query properly:

Re: Select distinct on partitioned column requires reading all the files?

2015-02-23 Thread Stephen Boesch
Subject: Select distinct on partitioned column requires reading all the files? When querying a hive table according to a partitioning column, it would be logical that a simple select count(distinct partitioned_column_name) from my_partitioned_table would complete almost instantaneously. But we

Re: Select distinct on partitioned column requires reading all the files?

2015-02-23 Thread Gopal Vijayaraghavan
Reply-To: user@hive.apache.org user@hive.apache.org Date: Monday, February 23, 2015 at 10:26 PM To: user@hive.apache.org user@hive.apache.org Subject: Select distinct on partitioned column requires reading all the files? When querying a hive table according to a partitioning column, it would