as creating a Hive
table, inserting a single row of data into that table, and fetching results
from the table? I would be very grateful.
Thanks!
Best Regards,
Nishant Kelkar
Note: I've looked into the HiveRunner and hive_test projects as mentioned
in my SO post. However, neither of these support a CDH Hadoop version,
which is what I need to use. Specifically, my CDH version is 2.0.0-cdh4.7.0.
Best Regards,
Nishant Kelkar
On Wed, Nov 5, 2014 at 12:29 AM, Nishant
the second SET command to allow for dynamic partitioning
over all partition fields. stack is a in-built UDTF that takes in as
first argument, the number of rows to insert N (in this case N=1) followed
by NK arguments, where K is the number of columns you have.
Best Regards,
Nishant Kelkar
On Wed
*__HIVE_DEFAULT_PARTITION__*
What is this __HIVE_DEFAULT_PARTITION__? How can I change it to be NULL?
P.S. I'm new to partitioning in Hive.
Best Regards,
Nishant Kelkar
' in constant
Best Regards,
Nishant Kelkar
On Sun, Nov 2, 2014 at 10:27 PM, Nishant Kelkar nishant@gmail.com
wrote:
Hi All,
So I have a table, say *test_table*. It has 3 columns, a, b, and c.
For simplicity, let's say all are of type STRING. Say the table, when
created, was partitioned on fields
Just found a related ticket: https://issues.apache.org/jira/browse/HIVE-1309
Best Regards,
Nishant Kelkar
On Sun, Nov 2, 2014 at 10:32 PM, Nishant Kelkar nishant@gmail.com
wrote:
FYI,
I tried the following query too:
*INSERT INTO TABLE* test_table
*PARTITION* (a=foo, b=*CAST*(NULL
it in
format: -MM-DD.
Hope this helps.
Best Regards,
Nishant Kelkar
On Wed, Sep 10, 2014 at 10:04 AM, Raj Hadoop hadoop...@yahoo.com wrote:
Hi,
I have a requirement in Hive to remove duplicate records ( they differ
only by one column i.e a date column) and keep the latest date record.
Sample
Hi Raj,
You'll have to change the format of your date to something like -MM-DD.
For example, for 2-oct-2013 it will be 2013-10-02.
Best Regards,
Nishant Kelkar
On Wed, Sep 10, 2014 at 11:48 AM, Raj Hadoop hadoop...@yahoo.com wrote:
The
SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date
and dirty :)
Best Regards,
Nishant Kelkar
On Wed, Sep 10, 2014 at 12:48 PM, Raj Hadoop hadoop...@yahoo.com wrote:
sort_array returns in ascending order. so the first element cannot be the
largest date. the last element is the largest date.
On Wednesday, September 10, 2014 3:38 PM, Nishant Kelkar
SPLIT(a[0].product_details)[0] AS first_name FROM rollup a;
Hope that helps!
Best Regards,
Nishant Kelkar
On Wed, Sep 3, 2014 at 1:47 PM, anusha Mangina anusha.mang...@gmail.com
wrote:
I have a table defined as:
CREATE TABLE foo (
id INT,
start_time STRING,
name STRING,
value
creation query, since an INSERT OVERWRITE would
expect a struct and get a string instead, thus throwing errors right?
On Wed, Sep 3, 2014 at 2:06 PM, Nishant Kelkar nishant@gmail.com
wrote:
Sorry, I meant the following in my example:
INSERT OVERWRITE TABLE rollup
SELECT id, start_time
-in-a-cluster
So to answer your question, no, Hive does not move all data to one location
and create a single table. The whole point of using MapReduce as a
framework is to take the compute to the data, not vice versa.
Hope that helps!
Thanks and Regards,
Nishant Kelkar
On Thu, Aug 14, 2014 at 7
Maybe try this at the Hive terminal:
SET mapreduce.input.fileinputformat.split.maxsize=your_split_size;
Where your_split_size = SUM(all small file sizes) / #mappers you'd like
Thanks and Regards,
Nishant
On Wed, Aug 13, 2014 at 4:00 AM, Ana Gillan ana.gil...@gmail.com wrote:
Hi,
I am
Hi Sameer,
Try the following:
CREATE INDEX user_id_ordering_mode_index ON TABLE
products(user_id,ordering_mode)
AS 'COMPACT' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS
TEXTFILE;
Let me know what it says.
On Mon, Jul 28, 2014 at 1:03 PM, Sameer Tilak ssti...@live.com wrote:
Hey Stuart,
As far as I know, files in HDFS are immutable. So I would think that your
query below would not have a direct Hive conversion.
What you can do though, is create a local text file and then create an
EXTERNAL TABLE on top of that. Then, instead of your INSERT query, just use
some linux
Hi Andrew,
Try extending the *GenericUDAF *class.
Thanks,
Regards,
Nishant Kelkar
On Mon, Jun 16, 2014 at 5:47 PM, Botelho, Andrew andrew.bote...@emc.com
wrote:
Hi all,
I brought up some old code that creates a UDAF (User-Defined Aggregate
Function) in order to find the range
Kelkar [mailto:nishant@gmail.com]
*Sent:* Monday, June 16, 2014 6:00 PM
*To:* user@hive.apache.org
*Subject:* Re: UDAF Class Deprecated
Hi Andrew,
Try extending the *GenericUDAF *class.
Thanks,
Regards,
Nishant Kelkar
On Mon, Jun 16, 2014 at 5:47 PM, Botelho, Andrew
For more complex GenericUDFs (which you won't need here, probably), here's
something I wrote:
http://blog.spryinc.com/2013/10/writing-hive-genericudfs.html#more
Internal to the UDF, your input format should be 'dd-MON-' and output
format as 'dd-MM-'.
Good luck!
Best,
Nishant Kelkar
I'd try avoid using the Hive inbuilt from_unixtime and unix_timestamp
functions. They are buggy, in that
they depend on the cluster's timezone. So if some of your cluster nodes
have a different timezone than
others, these functions suffer.
Right now, for what you want, it probably doesn't matter.
19 matches
Mail list logo