You could use the unix_timestamp function ...
unix_timestamp(ts, pattern) = unix_timestamp(2012-09-10, '-MM-dd')
...
something on those lines
Also checkout
https://cwiki.apache.org/Hive/languagemanual-udf.html#LanguageManualUDF-DateFunctions
for more datetime functions in Hive.
Ashish
On
Hey Matt,
We did something similar at Facebook to capture the information on who ran
what on the clusters and dumped that out to an audit db. Specifically we
were using Hive post execution hooks to achive that
Hi Mohit,
Hive does not support window functions afaik.
The following link might be useful if you can bring that in...
https://github.com/hbutani/SQLWindowing/wiki
Not sure if this is being brought into trunk at some point...
Ashish
On Thu, May 24, 2012 at 1:02 PM, Mohit Anchlia
Indexing in Hive works through map/reduce. There are no active components
in Hive (such as the region servers in Hbase), so the way the index is
basically used is by running the map/reduce job on the table that holds the
index data to get all the relevant offsets into the main table and then
using
Also of most of the things that you will be doing is full scans as opposed
to needle in haystack queries there is usually no point in paying the
overhead of running hbase region servers. Only if your data is heavily
accessed by a key is the overhead of hbase justified. Another case could be
when
Hive needs the hadoop jars to talk to hadoop. The machine that it is
installed on has to have those jars installed. However, it does not need to
be a part of the hadoop cluster in the sense that it does not need to
have a TaskTracker or DataNode running. The machine can operate purely as a
client
According to
https://cwiki.apache.org/Hive/dynamicpartitions.html
for dynamic partitions the partition clause must look like
PARTITION(year, month, edate)
the actual expressions should be included in the select list. So in your
example the select list should look something like
SELECT
afaik there is nothing like that currently. File a feature for this on the JIRA?
Ashish
On May 19, 2011, at 2:25 AM, Jasper Knulst wrote:
Hi,
I have a partitioned external table on Hive 0.7. New subfolders are regularly
added to the base table HDFS folder.
I now have to perform this
With streaming, UDF or UDTFs you would get almost any kind of control flow you
want without having those features implemented in Hive proper. For udf, udaf or
udtf you use java for implementation. In streaming you can use any language of
your choice. Not sure if this addresses stuff?
Ashish
you could always have another sub partition under the daily partition. This sub
partition could be the timestamp on when you did the load. So when you run the
statement you would create a new sub partition within the date partition and in
effect you end up doing an append to the Hive partition.
you could probably just say (1 = 1) in the on clause for the join.
set hive.mapred.mode=nonstrict;
select ... from T1 join T2 on (1 = 1);
Ashish
On May 1, 2011, at 10:27 PM, Raghunath, Ranjith wrote:
Forgot to mentionthe condition for the inner join should be the column set
to 1 in the
If the data is already in the right format you should use LOAD syntax in Hive.
This basically moves files into hdfs (so it should be not less performant than
hdfs). If the data is not in the correct format or it needs to be transformed
then the insert statement needs to be used.
Ashish
On Mar
This is quite difficult to do in Hive on Hadoop. Hive over Hadoop really does
not support row level updates so basically you are reduced to periodically
merging the raw stream of updates with the main table and generating a new
snapshot of the table. Another possible approach could be to use
I knew I was going to miss a pig somewhere... :)
Ashish
Sent from my iPhone
On Oct 22, 2010, at 2:55 PM, John Sichi jsi...@facebook.com wrote:
Hive users etc are encouraged to vote too :)
JVS (gotta love cut-and-paste)
On Oct 22, 2010, at 2:51 PM, Ashish Thusoo wrote:
Hi Folks,
I
14 matches
Mail list logo