In fact, it's just not a reasonable thing to do partition pruning on. Imagine a situation where you had:
WHERE partition_column = f(unix_timestamp()) AND ordinary_column = f(unix_timestamp). The right hand side of the predicate has to be evaluated at map-time, whereas you're assuming that left hand side should be evaluated at compile time, which means you have two different values of unix_timestamp() floating around, which can only end badly. Cheers, Phil. On 19 April 2012 16:35, Philip Tromans <philip.j.trom...@gmail.com> wrote: > I don't know what the state of Hive's partition pruning is, but I > would imagine that the problem is that the two example you're giving > are fundamentally different. > > 1) WHERE local_date = =date_add('2011-12-07',3) , > > the udf is a function of some constants, so the constant gets > evaluated at compile time. > > 2) WHERE local_date = date_sub(to_date(from_unixtime(unix_timestamp())),3), > > unix_timestamp() is not explicitly a constant, and so the expression > won't be simplified. > > I would imagine that the constant simplification code probably doesn't > know the difference between a partition column and a real column, and > so treats everything as a real column. If local_date wasn't a > partition column, then there's no reasonable way of simplifying that > predicate at compile time. > > Cheers, > > Phil. > > On 19 April 2012 11:50, Nitin Pawar <nitinpawar...@gmail.com> wrote: >> as per my understanding, >> >> In this case hive needs to look for all the partitions because it does not >> have the value before hand on the partition check and note the udfs are >> executed on the mapred and not on hive client side. >> >> I would suggest you write a hive query in a file and replace the partition >> value with a variable >> something like >> >> for partitionValue in values >> >> hive $HIVEPARAMS -hiveconf partition=$partition -e hivequery.hql >> >> and then in hivequery.sql you can refer the variable with >> >> where column_name = '${hiveconf:partition}' >> >> I may be wrong in interpreting the execution pattern of hivequery but this >> approach solved my problem >> Thanks, >> nitin >> >> >> On Thu, Apr 19, 2012 at 3:27 PM, Ramkumar <ramkumar.sugava...@yahoo.com> >> wrote: >>> >>> Hi, >>> >>> I have a table partitioned by local_date. When I write a query with >>> >>> WHERE local_date = =date_add('2011-12-07',3) , >>> >>> hive executes the UDF ahead and looks only into the specific partitions. >>> But when the udf becomes more complex like >>> >>> WHERE local_date = date_sub(to_date(from_unixtime(unix_timestamp())),3), >>> >>> hive looks through all the partitions even though the above function can >>> very well be computed ahead of time and optimize the query. Is this >>> behaviour intentional ? And is there a workaround other than hardcoding the >>> date or using a param? >>> >>> Thanks, >>> Ramkumar >> >> >> >> >> -- >> Nitin Pawar >>