Re: date format question

2012-09-17 Thread Ashish Thusoo
You could use the unix_timestamp function ... unix_timestamp(ts, pattern) = unix_timestamp(2012-09-10, '-MM-dd') ... something on those lines Also checkout https://cwiki.apache.org/Hive/languagemanual-udf.html#LanguageManualUDF-DateFunctions for more datetime functions in Hive. Ashish On

Re: Hive client / thrift service query submission auditing

2012-09-12 Thread Ashish Thusoo
Hey Matt, We did something similar at Facebook to capture the information on who ran what on the clusters and dumped that out to an audit db. Specifically we were using Hive post execution hooks to achive that

Re: SQL help

2012-05-24 Thread Ashish Thusoo
Hi Mohit, Hive does not support window functions afaik. The following link might be useful if you can bring that in... https://github.com/hbutani/SQLWindowing/wiki Not sure if this is being brought into trunk at some point... Ashish On Thu, May 24, 2012 at 1:02 PM, Mohit Anchlia

Re: how to select without Mapreduce after index build?

2012-05-11 Thread Ashish Thusoo
Indexing in Hive works through map/reduce. There are no active components in Hive (such as the region servers in Hbase), so the way the index is basically used is by running the map/reduce job on the table that holds the index data to get all the relevant offsets into the main table and then using

Re: Dimensional Data Model on Hive

2012-05-10 Thread Ashish Thusoo
Also of most of the things that you will be doing is full scans as opposed to needle in haystack queries there is usually no point in paying the overhead of running hbase region servers. Only if your data is heavily accessed by a key is the overhead of hbase justified. Another case could be when

Re: Hive on Standalone Machine

2012-04-25 Thread Ashish Thusoo
Hive needs the hadoop jars to talk to hadoop. The machine that it is installed on has to have those jars installed. However, it does not need to be a part of the hadoop cluster in the sense that it does not need to have a TaskTracker or DataNode running. The machine can operate purely as a client

Re: Dynamic partitions in Hive

2012-04-25 Thread Ashish Thusoo
According to https://cwiki.apache.org/Hive/dynamicpartitions.html for dynamic partitions the partition clause must look like PARTITION(year, month, edate) the actual expressions should be included in the select list. So in your example the select list should look something like SELECT

Re: Can Hive 0.7 Rebuild partitions ?

2011-05-19 Thread Ashish Thusoo
afaik there is nothing like that currently. File a feature for this on the JIRA? Ashish On May 19, 2011, at 2:25 AM, Jasper Knulst wrote: Hi, I have a partitioned external table on Hive 0.7. New subfolders are regularly added to the base table HDFS folder. I now have to perform this

Re: Implementing conditional and control statements in Hive

2011-05-11 Thread Ashish Thusoo
With streaming, UDF or UDTFs you would get almost any kind of control flow you want without having those features implemented in Hive proper. For udf, udaf or udtf you use java for implementation. In streaming you can use any language of your choice. Not sure if this addresses stuff? Ashish

Re: Strategy for Loading Apache Logs

2011-05-11 Thread Ashish Thusoo
you could always have another sub partition under the daily partition. This sub partition could be the timestamp on when you did the load. So when you run the statement you would create a new sub partition within the date partition and in effect you end up doing an append to the Hive partition.

Re: Cross join in Hive.

2011-05-02 Thread Ashish Thusoo
you could probably just say (1 = 1) in the on clause for the join. set hive.mapred.mode=nonstrict; select ... from T1 join T2 on (1 = 1); Ashish On May 1, 2011, at 10:27 PM, Raghunath, Ranjith wrote: Forgot to mentionthe condition for the inner join should be the column set to 1 in the

Re: insert - Hadoop vs. Hive

2011-03-30 Thread Ashish Thusoo
If the data is already in the right format you should use LOAD syntax in Hive. This basically moves files into hdfs (so it should be not less performant than hdfs). If the data is not in the correct format or it needs to be transformed then the insert statement needs to be used. Ashish On Mar

Re: Efficient mechanism to simulate the row level updates in Hive

2011-02-16 Thread Ashish Thusoo
This is quite difficult to do in Hive on Hadoop. Hive over Hadoop really does not support row level updates so basically you are reduced to periodically merging the raw stream of updates with the main table and generating a new snapshot of the table. Another possible approach could be to use

Re: [VOTE] Bylaws for Apache Hive Project

2010-10-22 Thread Ashish Thusoo
I knew I was going to miss a pig somewhere... :) Ashish Sent from my iPhone On Oct 22, 2010, at 2:55 PM, John Sichi jsi...@facebook.com wrote: Hive users etc are encouraged to vote too :) JVS (gotta love cut-and-paste) On Oct 22, 2010, at 2:51 PM, Ashish Thusoo wrote: Hi Folks, I