Re: Collect_set() of non-primitive types
I don't know of anything like what you want atleast until Hive 0.11. However, you could try something like this: INSERT OVERWRITE TABLE rollup SELECT id, start_time, collect_set(concat_ws(,, objects.name, objects.value, objects.type)) AS product_details FROM bar GROUP BY id, start_time; It's a bit hacky, but it does the trick. It basically concats all the objects you care about in a string, and then collects a set of that. Then, when you want to extract a field from the product_details array, just do this (say you want to extract the very first product's name in each array): SELECT SPLIT(a[0].product_details)[0] AS first_name FROM rollup a; Hope that helps! Best Regards, Nishant Kelkar On Wed, Sep 3, 2014 at 1:47 PM, anusha Mangina anusha.mang...@gmail.com wrote: I have a table defined as: CREATE TABLE foo ( id INT, start_time STRING, name STRING, value STRING, type STRING ) The 'name', 'value' and 'type' fields actually describe another object, and I'd like to turn these into STRUCTs, something like: CREATE TABLE bar ( id INT, start_time STRING, object STRUCTname: STRING, value: STRING, type: STRING ) However, I'd also like to create a rollup table containing the most recent result for a given 'id' field. There can be multiple different 'object' structs for a given combination of 'id' and 'start_time', so I thought I'd turn this into an array of structs. My ideal rollup table would look like: CREATE TABLE rollup ( id INT, start_time STRING, objects ARRAYSTRUCTname: STRING, value: STRING, type: STRING ) However, I can't do this because the following query fails: INSERT OVERWRITE TABLE rollup SELECT id, start_time, collect_set(object) FROM bar GROUP BY id, start_time Here's the error I get: FAILED: UDFArgumentTypeException Only primitive type arguments are accepted but structname:string,value:string,type:string was passed as parameter 1. Is there any way I can do this?
Re: Collect_set() of non-primitive types
Geez, too many mistakes for the day :P I meant the following above *CREATE TABLE* rollup_new *AS* SELECT id, start_time, collect_set(concat_ws(,, object.name http://objects.name/, object.value, object.type)) AS product_details FROM bar GROUP BY id, start_time; The change is in the table creation query, since an INSERT OVERWRITE would expect a struct and get a string instead, thus throwing errors right? On Wed, Sep 3, 2014 at 2:06 PM, Nishant Kelkar nishant@gmail.com wrote: Sorry, I meant the following in my example: INSERT OVERWRITE TABLE rollup SELECT id, start_time, collect_set(concat_ws(,, object.name http://objects.name/, object.value, object.type)) AS product_details FROM bar GROUP BY id, start_time; object instead of object*s*, as you have in your example :) Best Regards, Nishant Kelkar On Wed, Sep 3, 2014 at 2:03 PM, Nishant Kelkar nishant@gmail.com wrote: I don't know of anything like what you want atleast until Hive 0.11. However, you could try something like this: INSERT OVERWRITE TABLE rollup SELECT id, start_time, collect_set(concat_ws(,, objects.name, objects.value, objects.type)) AS product_details FROM bar GROUP BY id, start_time; It's a bit hacky, but it does the trick. It basically concats all the objects you care about in a string, and then collects a set of that. Then, when you want to extract a field from the product_details array, just do this (say you want to extract the very first product's name in each array): SELECT SPLIT(a[0].product_details)[0] AS first_name FROM rollup a; Hope that helps! Best Regards, Nishant Kelkar On Wed, Sep 3, 2014 at 1:47 PM, anusha Mangina anusha.mang...@gmail.com wrote: I have a table defined as: CREATE TABLE foo ( id INT, start_time STRING, name STRING, value STRING, type STRING ) The 'name', 'value' and 'type' fields actually describe another object, and I'd like to turn these into STRUCTs, something like: CREATE TABLE bar ( id INT, start_time STRING, object STRUCTname: STRING, value: STRING, type: STRING ) However, I'd also like to create a rollup table containing the most recent result for a given 'id' field. There can be multiple different 'object' structs for a given combination of 'id' and 'start_time', so I thought I'd turn this into an array of structs. My ideal rollup table would look like: CREATE TABLE rollup ( id INT, start_time STRING, objects ARRAYSTRUCTname: STRING, value: STRING, type: STRING ) However, I can't do this because the following query fails: INSERT OVERWRITE TABLE rollup SELECT id, start_time, collect_set(object) FROM bar GROUP BY id, start_time Here's the error I get: FAILED: UDFArgumentTypeException Only primitive type arguments are accepted but structname:string,value:string,type:string was passed as parameter 1. Is there any way I can do this?
Re: collect_set() with non-primitive types
On Wed, Nov 28, 2012 at 1:29 PM, Brad Cavanagh brad.cavan...@gmail.comwrote: I have a table defined as: CREATE TABLE foo ( id INT, start_time STRING, name STRING, value STRING, type STRING ) The 'name', 'value' and 'type' fields actually describe another object, and I'd like to turn these into STRUCTs, something like: CREATE TABLE bar ( id INT, start_time STRING, object STRUCTname: STRING, value: STRING, type: STRING ) However, I'd also like to create a rollup table containing the most recent result for a given 'id' field. There can be multiple different 'object' structs for a given combination of 'id' and 'start_time', so I thought I'd turn this into an array of structs. My ideal rollup table would look like: CREATE TABLE rollup ( id INT, start_time STRING, objects ARRAYSTRUCTname: STRING, value: STRING, type: STRING ) However, I can't do this because the following query fails: INSERT OVERWRITE TABLE rollup SELECT id, start_time, collect_set(object) FROM bar GROUP BY id, start_time Here's the error I get: FAILED: UDFArgumentTypeException Only primitive type arguments are accepted but structname:string,value:string,type:string was passed as parameter 1. Is there any way I can do this? I'm using Hive 0.9.0 as installed using CDH4.1.0: % yum info hive.noarch Version : 0.9.0+148 Release : 1.cdh4.1.0.p0.27.el6 Cheers, Brad. You are going to have to write your own UDF. collect set could be extended to support nested complex types but it does not do that right now.