Re: Integration Testing with Hive

2014-11-05 Thread Nishant Kelkar
as creating a Hive table, inserting a single row of data into that table, and fetching results from the table? I would be very grateful. Thanks! Best Regards, Nishant Kelkar

Re: Integration Testing with Hive

2014-11-05 Thread Nishant Kelkar
Note: I've looked into the HiveRunner and hive_test projects as mentioned in my SO post. However, neither of these support a CDH Hadoop version, which is what I need to use. Specifically, my CDH version is 2.0.0-cdh4.7.0. Best Regards, Nishant Kelkar On Wed, Nov 5, 2014 at 12:29 AM, Nishant

Re: CREATE (PARTITIONED) TABLE AS error

2014-11-05 Thread Nishant Kelkar
the second SET command to allow for dynamic partitioning over all partition fields. stack is a in-built UDTF that takes in as first argument, the number of rows to insert N (in this case N=1) followed by NK arguments, where K is the number of columns you have. Best Regards, Nishant Kelkar On Wed

Re: Passing NULL Value to Partitioned Field in Table

2014-11-02 Thread Nishant Kelkar
*__HIVE_DEFAULT_PARTITION__* What is this __HIVE_DEFAULT_PARTITION__? How can I change it to be NULL? P.S. I'm new to partitioning in Hive. Best Regards, Nishant Kelkar

Re: Passing NULL Value to Partitioned Field in Table

2014-11-02 Thread Nishant Kelkar
' in constant Best Regards, Nishant Kelkar On Sun, Nov 2, 2014 at 10:27 PM, Nishant Kelkar nishant@gmail.com wrote: Hi All, So I have a table, say *test_table*. It has 3 columns, a, b, and c. For simplicity, let's say all are of type STRING. Say the table, when created, was partitioned on fields

Re: Passing NULL Value to Partitioned Field in Table

2014-11-02 Thread Nishant Kelkar
Just found a related ticket: https://issues.apache.org/jira/browse/HIVE-1309 Best Regards, Nishant Kelkar On Sun, Nov 2, 2014 at 10:32 PM, Nishant Kelkar nishant@gmail.com wrote: FYI, I tried the following query too: *INSERT INTO TABLE* test_table *PARTITION* (a=foo, b=*CAST*(NULL

Re: Remove duplicate records in Hive

2014-09-10 Thread Nishant Kelkar
it in format: -MM-DD. Hope this helps. Best Regards, Nishant Kelkar On Wed, Sep 10, 2014 at 10:04 AM, Raj Hadoop hadoop...@yahoo.com wrote: Hi, I have a requirement in Hive to remove duplicate records ( they differ only by one column i.e a date column) and keep the latest date record. Sample

Re: Remove duplicate records in Hive

2014-09-10 Thread Nishant Kelkar
Hi Raj, You'll have to change the format of your date to something like -MM-DD. For example, for 2-oct-2013 it will be 2013-10-02. Best Regards, Nishant Kelkar On Wed, Sep 10, 2014 at 11:48 AM, Raj Hadoop hadoop...@yahoo.com wrote: The SORT_ARRAY(COLLECT_SET(date))[0] AS latest_date

Re: Remove duplicate records in Hive

2014-09-10 Thread Nishant Kelkar
and dirty :) Best Regards, Nishant Kelkar On Wed, Sep 10, 2014 at 12:48 PM, Raj Hadoop hadoop...@yahoo.com wrote: sort_array returns in ascending order. so the first element cannot be the largest date. the last element is the largest date. On Wednesday, September 10, 2014 3:38 PM, Nishant Kelkar

Re: Collect_set() of non-primitive types

2014-09-03 Thread Nishant Kelkar
SPLIT(a[0].product_details)[0] AS first_name FROM rollup a; Hope that helps! Best Regards, Nishant Kelkar On Wed, Sep 3, 2014 at 1:47 PM, anusha Mangina anusha.mang...@gmail.com wrote: I have a table defined as: CREATE TABLE foo ( id INT, start_time STRING, name STRING, value

Re: Collect_set() of non-primitive types

2014-09-03 Thread Nishant Kelkar
creation query, since an INSERT OVERWRITE would expect a struct and get a string instead, thus throwing errors right? On Wed, Sep 3, 2014 at 2:06 PM, Nishant Kelkar nishant@gmail.com wrote: Sorry, I meant the following in my example: INSERT OVERWRITE TABLE rollup SELECT id, start_time

Re: Data in Hive

2014-08-14 Thread Nishant Kelkar
-in-a-cluster So to answer your question, no, Hive does not move all data to one location and create a single table. The whole point of using MapReduce as a framework is to take the compute to the data, not vice versa. Hope that helps! Thanks and Regards, Nishant Kelkar On Thu, Aug 14, 2014 at 7

Re: Optimising Map and Reduce numbers for Compressed Dataset

2014-08-13 Thread Nishant Kelkar
Maybe try this at the Hive terminal: SET mapreduce.input.fileinputformat.split.maxsize=your_split_size; Where your_split_size = SUM(all small file sizes) / #mappers you'd like Thanks and Regards, Nishant On Wed, Aug 13, 2014 at 4:00 AM, Ana Gillan ana.gil...@gmail.com wrote: Hi, I am

Re: java.lang.NumberFormatException.forInputString

2014-07-28 Thread Nishant Kelkar
Hi Sameer, Try the following: CREATE INDEX user_id_ordering_mode_index ON TABLE products(user_id,ordering_mode) AS 'COMPACT' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; Let me know what it says. On Mon, Jul 28, 2014 at 1:03 PM, Sameer Tilak ssti...@live.com wrote:

Re: simple insert query question

2014-06-19 Thread Nishant Kelkar
Hey Stuart, As far as I know, files in HDFS are immutable. So I would think that your query below would not have a direct Hive conversion. What you can do though, is create a local text file and then create an EXTERNAL TABLE on top of that. Then, instead of your INSERT query, just use some linux

Re: UDAF Class Deprecated

2014-06-16 Thread Nishant Kelkar
Hi Andrew, Try extending the *GenericUDAF *class. Thanks, Regards, Nishant Kelkar On Mon, Jun 16, 2014 at 5:47 PM, Botelho, Andrew andrew.bote...@emc.com wrote: Hi all, I brought up some old code that creates a UDAF (User-Defined Aggregate Function) in order to find the range

Re: UDAF Class Deprecated

2014-06-16 Thread Nishant Kelkar
Kelkar [mailto:nishant@gmail.com] *Sent:* Monday, June 16, 2014 6:00 PM *To:* user@hive.apache.org *Subject:* Re: UDAF Class Deprecated Hi Andrew, Try extending the *GenericUDAF *class. Thanks, Regards, Nishant Kelkar On Mon, Jun 16, 2014 at 5:47 PM, Botelho, Andrew

Re: Need help in Date format

2014-06-12 Thread Nishant Kelkar
For more complex GenericUDFs (which you won't need here, probably), here's something I wrote: http://blog.spryinc.com/2013/10/writing-hive-genericudfs.html#more Internal to the UDF, your input format should be 'dd-MON-' and output format as 'dd-MM-'. Good luck! Best, Nishant Kelkar

Re: Need help in Date format

2014-06-12 Thread Nishant Kelkar
I'd try avoid using the Hive inbuilt from_unixtime and unix_timestamp functions. They are buggy, in that they depend on the cluster's timezone. So if some of your cluster nodes have a different timezone than others, these functions suffer. Right now, for what you want, it probably doesn't matter.