Re: Question about partition pruning when there's a type mismatch
Thanks for the tips, Gopal. I stepped through the code in a debugger and found that in the case of String = String, the predicate was pushed down to the SQL query on the metastore side, whereas in the case of String = Int, the SQL filter pushdown failed, so GenericUDFOPEqual gets evaluated and returns null, in which case the PartitionPruner treats the value of the predicate as unknown and returns all partitions. On Mon, Nov 28, 2016 at 3:04 PM, Gopal Vijayaraghavanwrote: > > > I'm wondering why Hive tries to scan all partitions when the quotes are > omitted. Without the quotes, shouldn't 2016-11-28-00 get evaluated as an > arithmetic expression, then get cast to a string, and then partitioning > pruning still occur? > > The order of evaluation is different - String = Integer becomes > UDFToDouble(String) = UDFToDouble(Integer) (because that keeps the >= > behavior consistent with =). > > The version you're running is very relevant here. > > Not all versions of hive have a constant folding optimization & even with > that, only recent versions of hive perform partition pruning when the > partition column is wrapped in a UDF. > > Posting the output of an "explain " would also help. > > Cheers, > Gopal > > >
Re: Question about partition pruning when there's a type mismatch
> I'm wondering why Hive tries to scan all partitions when the quotes are > omitted. Without the quotes, shouldn't 2016-11-28-00 get evaluated as an > arithmetic expression, then get cast to a string, and then partitioning > pruning still occur? The order of evaluation is different - String = Integer becomes UDFToDouble(String) = UDFToDouble(Integer) (because that keeps the >= behavior consistent with =). The version you're running is very relevant here. Not all versions of hive have a constant folding optimization & even with that, only recent versions of hive perform partition pruning when the partition column is wrapped in a UDF. Posting the output of an "explain " would also help. Cheers, Gopal
Question about partition pruning when there's a type mismatch
Hi, Some of our Hive tables are partitioned on a STRING column "datepartition" whose format is "-mm-dd-hh". Usually, when users want to read only a specific day of data, they'll specify WHERE datepartition = '2016-11-28-00', for example. However, sometimes users accidentally forget the quotes around the datepartition value (e.g.: WHERE datepartition = 2016-11-28-00), and all partitions end up being scanned (though no results are returned). I'm wondering why Hive tries to scan all partitions when the quotes are omitted. Without the quotes, shouldn't 2016-11-28-00 get evaluated as an arithmetic expression, then get cast to a string, and then partitioning pruning still occur? Best, Anthony