Hi Jamal,

date is standard linux/unix tool, see the manual page:
http://linux.die.net/man/1/date.

The $(...) tells the shell to execute the command and insert it's output
into the string. So in this case it will execute command
date -d -1day +%Y%m%d
which returns yesterday date in the format you need.

Jan


On Tue, Aug 7, 2012 at 8:24 PM, Raihan Jamal <[email protected]> wrote:

> Yes it supports -e option, but in your query what is date?
>
> hive -e "CREATE TEMPORARY FUNCTION yesterdaydate
> AS 'com.example.hive.udf.YesterdayDate';
> SELECT * FROM REALTIME where dt=$(*date* -d -1day +%Y%m%d) LIMIT 10;"
>
>
>
> *Raihan Jamal*
>
>
>
> On Tue, Aug 7, 2012 at 11:18 AM, Jan Dolinár <[email protected]> wrote:
>
>> By the way, even without hiveconf, you can run hive from shell like this
>> to achieve what you want using shell capabilities:
>>
>> hive -e "CREATE TEMPORARY FUNCTION yesterdaydate
>> AS 'com.example.hive.udf.YesterdayDate';
>> SELECT * FROM REALTIME where dt=$(date -d -1day +%Y%m%d) LIMIT 10;"
>>
>> At least if hive 6.0 supports -e option, I don't have where to check that.
>>
>> Jan
>>
>>
>> On Tue, Aug 7, 2012 at 8:11 PM, Vijay <[email protected]> wrote:
>>
>>> Given the implementation of the UDF, I don't think hive would be able
>>> to use partition pruning. Especially the version you're using. I'd
>>> really recommend upgrading to a later version that has the hiveconf
>>> support. That can save a lot of trouble rather than trying to get
>>> things working on 0.6
>>>
>>> On Tue, Aug 7, 2012 at 11:07 AM, Raihan Jamal <[email protected]>
>>> wrote:
>>> > Hi Jan,
>>> >
>>> > I have date in different format also, so that is the reason I was
>>> thinking
>>> > to do by this approach. How can I make sure this will work on the
>>> selected
>>> > partition only and it will not scan the entire table. I will add your
>>> > suggestion in my UDF as deterministic thing.
>>> >
>>> > My simple question here is- How to get the Yesterdays date which I can
>>> use
>>> > on the Date Partition I cannot use hiveconf here as I am working with
>>> Hive
>>> > 0.6
>>> >
>>> >
>>> >
>>> >
>>> > Raihan Jamal
>>> >
>>> >
>>> >
>>> > On Tue, Aug 7, 2012 at 10:37 AM, Jan Dolinár <[email protected]>
>>> wrote:
>>> >>
>>> >> I'm afraid that  he query
>>> >>
>>> >> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') LIMIT 10;
>>> >>
>>> >> will scan entire table, because the functions is evaluated at
>>> runtime, so
>>> >> Hive doesn't know what the value is when it decides which files to
>>> scan. I
>>> >> am not 100% sure though, you should try it.
>>> >>
>>> >> Also, you might want to try to add annotation to your UDF saying that
>>> the
>>> >> function is deterministic:
>>> >> @UDFType(deterministic=false)
>>> >>
>>> >> I think Hive might be clever enough to evaluate it early enough to
>>> use the
>>> >> partition pruning correctly, since it operates on constant
>>> expression. But
>>> >> again, I'm not really sure, maybe someone with deeper knowledge of
>>> Hive
>>> >> optimizations will tell us more. It is actually quite interesting
>>> question.
>>> >>
>>> >> Another way to help Hive with the optimizations might be to skip
>>> passing
>>> >> the format string argument, if you have all dates in same format, you
>>> can
>>> >> call the function just like 'yesterdaydate()' and hardcode the format
>>> in the
>>> >> function.
>>> >>
>>> >> Jan
>>> >>
>>> >>
>>> >> On Tue, Aug 7, 2012 at 7:20 PM, Raihan Jamal <[email protected]>
>>> >> wrote:
>>> >>>
>>> >>> Hi Jan,
>>> >>>
>>> >>>
>>> >>>
>>> >>> I figured that out, it is working fine for me now. The only question
>>> I
>>> >>> have is, if I am doing like this-
>>> >>>
>>> >>>
>>> >>>
>>> >>> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd') LIMIT 10;
>>> >>>
>>> >>>
>>> >>>
>>> >>> Then the above query will be evaluated as below right?
>>> >>>
>>> >>>
>>> >>>
>>> >>> SELECT * FROM REALTIME where dt= ‘20120806’ LIMIT 10;
>>> >>>
>>> >>>
>>> >>>
>>> >>> So that means it will look for data in the corresponding dt partition
>>> >>> (20120806) only right as above table is partitioned on dt column ?
>>> And it
>>> >>> will not scan the whole table right?
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> Raihan Jamal
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Mon, Aug 6, 2012 at 10:56 PM, Jan Dolinár <[email protected]>
>>> wrote:
>>> >>>>
>>> >>>> Hi Jamal,
>>> >>>>
>>> >>>> Check if the function really returns what it should and that your
>>> data
>>> >>>> are really in yyyyMMdd format. You can do this by simple query like
>>> this:
>>> >>>>
>>> >>>> SELECT dt, yesterdaydate('yyyyMMdd') FROM REALTIME LIMIT 1;
>>> >>>>
>>> >>>> I don't see anything wrong with the function itself, it works well
>>> for
>>> >>>> me (although I tested it in hive 0.7.1). The only thing I would
>>> change about
>>> >>>> it would be to optimize it by calling 'new' only at the time of
>>> construction
>>> >>>> and reusing the object when the function is called, but that should
>>> not
>>> >>>> affect the functionality at all.
>>> >>>>
>>> >>>> Best regards,
>>> >>>> Jan
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> On Tue, Aug 7, 2012 at 3:39 AM, Raihan Jamal <[email protected]
>>> >
>>> >>>> wrote:
>>> >>>>>
>>> >>>>> Problem
>>> >>>>>
>>> >>>>> I created the below UserDefinedFunction to get the yesterday's day
>>> in
>>> >>>>> the format I wanted as I will be passing the format into this
>>> below method
>>> >>>>> from the query.
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> public final class YesterdayDate extends UDF {
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>                 public String evaluate(final String format) {
>>> >>>>>
>>> >>>>>                                 DateFormat dateFormat = new
>>> >>>>> SimpleDateFormat(format);
>>> >>>>>
>>> >>>>>                                 Calendar cal =
>>> Calendar.getInstance();
>>> >>>>>
>>> >>>>>                                 cal.add(Calendar.DATE, -1);
>>> >>>>>
>>> >>>>>                                 return
>>> >>>>> dateFormat.format(cal.getTime()).toString();
>>> >>>>>
>>> >>>>>                 }
>>> >>>>>
>>> >>>>> }
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> So whenever I try to run the query like below by adding the jar to
>>> >>>>> classpath and creating the temporary function yesterdaydate, I
>>> always get
>>> >>>>> zero result back-
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> hive> create temporary function yesterdaydate as
>>> >>>>> 'com.example.hive.udf.YesterdayDate';
>>> >>>>>
>>> >>>>> OK
>>> >>>>>
>>> >>>>> Time taken: 0.512 seconds
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> Below is the query I am running-
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> hive> SELECT * FROM REALTIME where dt= yesterdaydate('yyyyMMdd')
>>> LIMIT
>>> >>>>> 10;
>>> >>>>>
>>> >>>>> OK
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> And I always get zero result back but the data is there in that
>>> table
>>> >>>>> for Aug 5th.
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> What wrong I am doing? Any suggestions will be appreciated.
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> NOTE:- As I am working with Hive 0.6 so it doesn’t support variable
>>> >>>>> substitution thing, so I cannot use hiveconf here and the above
>>> table has
>>> >>>>> been partitioned on dt(date) column.
>>> >>>>
>>> >>>>
>>> >>>
>>> >>
>>> >
>>>
>>
>>
>

Reply via email to