Re: Collect_set() of non-primitive types

2014-09-03 Thread Nishant Kelkar
I don't know of anything like what you want atleast until Hive 0.11.

However, you could try something like this:

INSERT OVERWRITE TABLE rollup
SELECT id, start_time, collect_set(concat_ws(,, objects.name,
objects.value, objects.type)) AS product_details
FROM bar
GROUP BY id, start_time;


It's a bit hacky, but it does the trick. It basically concats all the
objects you care about in a string, and then collects a set of that. Then,
when you want to extract a field from the product_details array, just do
this (say you want to extract the very first product's name in each array):

SELECT SPLIT(a[0].product_details)[0] AS first_name FROM rollup a;


Hope that helps!

Best Regards,
Nishant Kelkar



On Wed, Sep 3, 2014 at 1:47 PM, anusha Mangina anusha.mang...@gmail.com
wrote:

 I have a table defined as:

 CREATE TABLE foo (
   id INT,
   start_time STRING,
   name STRING,
   value STRING,
   type STRING
 )

 The 'name', 'value' and 'type' fields actually describe another object, and
 I'd like to turn these into STRUCTs, something like:

 CREATE TABLE bar (
   id INT,
   start_time STRING,
   object STRUCTname: STRING,
 value: STRING,
 type: STRING
 )

 However, I'd also like to create a rollup table containing the most recent
 result for a given 'id' field. There can be multiple different 'object'
 structs for a given combination of 'id' and 'start_time', so I thought I'd
 turn this into an array of structs. My ideal rollup table would look like:

 CREATE TABLE rollup (
   id INT,
   start_time STRING,
   objects ARRAYSTRUCTname: STRING,
 value: STRING,
 type: STRING
 )

 However, I can't do this because the following query fails:

 INSERT OVERWRITE TABLE rollup
 SELECT id, start_time, collect_set(object)
 FROM bar
 GROUP BY id, start_time

 Here's the error I get:

 FAILED: UDFArgumentTypeException Only primitive type arguments are accepted
 but structname:string,value:string,type:string was passed as parameter 1.

 Is there any way I can do this?




Re: Collect_set() of non-primitive types

2014-09-03 Thread Nishant Kelkar
Geez, too many mistakes for the day :P

I meant the following above

*CREATE TABLE* rollup_new *AS*
SELECT id, start_time, collect_set(concat_ws(,, object.name
http://objects.name/, object.value, object.type)) AS product_details
FROM bar
GROUP BY id, start_time;

The change is in the table creation query, since an INSERT OVERWRITE would
expect a struct and get a string instead, thus throwing errors right?



On Wed, Sep 3, 2014 at 2:06 PM, Nishant Kelkar nishant@gmail.com
wrote:

 Sorry, I meant the following in my example:

 INSERT OVERWRITE TABLE rollup
 SELECT id, start_time, collect_set(concat_ws(,, object.name 
 http://objects.name/, object.value, object.type)) AS product_details
 FROM bar
 GROUP BY id, start_time;

 object instead of object*s*, as you have in your example :)

 Best Regards,
 Nishant Kelkar


 On Wed, Sep 3, 2014 at 2:03 PM, Nishant Kelkar nishant@gmail.com
 wrote:

 I don't know of anything like what you want atleast until Hive 0.11.

 However, you could try something like this:

 INSERT OVERWRITE TABLE rollup
 SELECT id, start_time, collect_set(concat_ws(,, objects.name, 
 objects.value, objects.type)) AS product_details
 FROM bar
 GROUP BY id, start_time;


 It's a bit hacky, but it does the trick. It basically concats all the
 objects you care about in a string, and then collects a set of that. Then,
 when you want to extract a field from the product_details array, just do
 this (say you want to extract the very first product's name in each array):

 SELECT SPLIT(a[0].product_details)[0] AS first_name FROM rollup a;


 Hope that helps!

 Best Regards,
 Nishant Kelkar



 On Wed, Sep 3, 2014 at 1:47 PM, anusha Mangina anusha.mang...@gmail.com
 wrote:

 I have a table defined as:

 CREATE TABLE foo (
   id INT,
   start_time STRING,
   name STRING,
   value STRING,
   type STRING
 )

 The 'name', 'value' and 'type' fields actually describe another object, and
 I'd like to turn these into STRUCTs, something like:

 CREATE TABLE bar (
   id INT,
   start_time STRING,
   object STRUCTname: STRING,
 value: STRING,
 type: STRING
 )

 However, I'd also like to create a rollup table containing the most recent
 result for a given 'id' field. There can be multiple different 'object'
 structs for a given combination of 'id' and 'start_time', so I thought I'd
 turn this into an array of structs. My ideal rollup table would look like:

 CREATE TABLE rollup (
   id INT,
   start_time STRING,
   objects ARRAYSTRUCTname: STRING,
 value: STRING,
 type: STRING
 )

 However, I can't do this because the following query fails:

 INSERT OVERWRITE TABLE rollup
 SELECT id, start_time, collect_set(object)
 FROM bar
 GROUP BY id, start_time

 Here's the error I get:

 FAILED: UDFArgumentTypeException Only primitive type arguments are accepted
 but structname:string,value:string,type:string was passed as parameter 1.

 Is there any way I can do this?






Re: collect_set() with non-primitive types

2012-11-28 Thread Edward Capriolo
On Wed, Nov 28, 2012 at 1:29 PM, Brad Cavanagh brad.cavan...@gmail.comwrote:

 I have a table defined as:

 CREATE TABLE foo (
   id INT,
   start_time STRING,
   name STRING,
   value STRING,
   type STRING
 )

 The 'name', 'value' and 'type' fields actually describe another object,
 and I'd like to turn these into STRUCTs, something like:

 CREATE TABLE bar (
   id INT,
   start_time STRING,
   object STRUCTname: STRING,
 value: STRING,
 type: STRING
 )

 However, I'd also like to create a rollup table containing the most recent
 result for a given 'id' field. There can be multiple different 'object'
 structs for a given combination of 'id' and 'start_time', so I thought I'd
 turn this into an array of structs. My ideal rollup table would look like:

 CREATE TABLE rollup (
   id INT,
   start_time STRING,
   objects ARRAYSTRUCTname: STRING,
 value: STRING,
 type: STRING
 )

 However, I can't do this because the following query fails:

 INSERT OVERWRITE TABLE rollup
 SELECT id, start_time, collect_set(object)
 FROM bar
 GROUP BY id, start_time

 Here's the error I get:

 FAILED: UDFArgumentTypeException Only primitive type arguments are
 accepted but structname:string,value:string,type:string was passed as
 parameter 1.

 Is there any way I can do this?

 I'm using Hive 0.9.0 as installed using CDH4.1.0:

  % yum info hive.noarch
 Version : 0.9.0+148
 Release : 1.cdh4.1.0.p0.27.el6

 Cheers,
  Brad.


You are going to have to write your own UDF. collect set could be extended
to support nested complex types but it does not do that right now.