Re: Question about partition pruning when there's a type mismatch

2016-11-29 Thread Anthony Hsu
Thanks for the tips, Gopal. I stepped through the code in a debugger and
found that in the case of String = String, the predicate was pushed down to
the SQL query on the metastore side, whereas in the case of String = Int,
the SQL filter pushdown failed, so GenericUDFOPEqual gets evaluated and
returns null, in which case the PartitionPruner treats the value of the
predicate as unknown and returns all partitions.

On Mon, Nov 28, 2016 at 3:04 PM, Gopal Vijayaraghavan 
wrote:

>
> > I'm wondering why Hive tries to scan all partitions when the quotes are
> omitted. Without the quotes, shouldn't 2016-11-28-00 get evaluated as an
> arithmetic expression, then get cast to a string, and then partitioning
> pruning still occur?
>
> The order of evaluation is different - String = Integer becomes
> UDFToDouble(String) = UDFToDouble(Integer) (because that keeps the >=
> behavior consistent with =).
>
> The version you're running is very relevant here.
>
> Not all versions of hive have a constant folding optimization & even with
> that, only recent versions of hive perform partition pruning when the
> partition column is wrapped in a UDF.
>
> Posting the output of an "explain " would also help.
>
> Cheers,
> Gopal
>
>
>


Re: Question about partition pruning when there's a type mismatch

2016-11-28 Thread Gopal Vijayaraghavan

> I'm wondering why Hive tries to scan all partitions when the quotes are 
> omitted. Without the quotes, shouldn't 2016-11-28-00 get evaluated as an 
> arithmetic expression, then get cast to a string, and then partitioning 
> pruning still occur?

The order of evaluation is different - String = Integer becomes 
UDFToDouble(String) = UDFToDouble(Integer) (because that keeps the >= behavior 
consistent with =).

The version you're running is very relevant here.

Not all versions of hive have a constant folding optimization & even with that, 
only recent versions of hive perform partition pruning when the partition 
column is wrapped in a UDF.

Posting the output of an "explain " would also help.

Cheers,
Gopal




Question about partition pruning when there's a type mismatch

2016-11-28 Thread Anthony Hsu
Hi,

Some of our Hive tables are partitioned on a STRING column "datepartition"
whose format is "-mm-dd-hh". Usually, when users want to read only a
specific day of data, they'll specify WHERE datepartition =
'2016-11-28-00', for example. However, sometimes users accidentally forget
the quotes around the datepartition value (e.g.: WHERE datepartition =
2016-11-28-00), and all partitions end up being scanned (though no results
are returned). I'm wondering why Hive tries to scan all partitions when the
quotes are omitted. Without the quotes, shouldn't 2016-11-28-00 get
evaluated as an arithmetic expression, then get cast to a string, and then
partitioning pruning still occur?

Best,
Anthony